K-Mean Clustering & It’s Use Cases

Abhishek Dwibedy
4 min readJul 19, 2021

--

The K-Means algorithm is one of the oldest and most commonly used clustering algorithms. it is a great starting point for new ml enthusiasts to pick up, given the simplicity of its implementation. as part of this post, we will review the origins of this algorithm and typical usage scenarios.

The history :

The term “k-means” was first used by James Macqueen in 1967 as part of his paper on “some methods for classification and analysis of multivariate observations”. the standard algorithm was also used in bell labs as part of a technique in pulse code modulation in 1957. it was also published by in 1965 by e. w. forgy and typically is also known as the lloyd-forgy method.

What is k-means?

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. in simple words, the aim is to segregate groups with similar traits and assign them into clusters. the goal of the k-means algorithm is to find groups in the data, with the number of groups represented by the variable k. the algorithm works iteratively to assign each data point to one of k groups based on the features that are provided. in the reference image below, k=2, and there are two clusters identified from the source dataset.

The outputs of executing a k-means on a dataset are:

  • k centroids: centroids for each of the k clusters identified from the dataset.
  • complete dataset labeled to ensure each data point is assigned to one of the clusters.

How K-Means Works?

Where can I apply K-Means?

k-means can typically be applied to data that has a smaller number of dimensions, is numeric, and is continuous. think of a scenario in which you want to make groups of similar things from a randomly distributed collection of things; k-means is very suitable for such scenarios.

Use cases of K-Mean clustering in the security domain

  • Intrusion Detection — Data mining technology has a good application in the field of intrusion detection. K-Means algorithm is difficult to process high-dimensional data, local optimal solution, and cannot determine K value. As an improved K-Means algorithm, firstly, the PCA algorithm is used to reduce the dimension of the data set, and then the Outlier detection is used to eliminate the Outliers that have a great influence on the final clustering result. Then, the initial clustering center point is selected based on the distance to avoid the local optimal solution. Finally, K- is used.
  • Crime analysis — is defined as analytical processes which provide relevant information relative to crime patterns and trend correlations to assist personal in planning the deployment of resources for the detection and suppression of criminal activities. The main objectives of crime analysis include:
  1. Extraction of crime patterns by analysis of available crime and criminal data.
  2. Detection of crime based on spatial distribution of existing data and anticipation of crime rate using different data mining techniques.
  3. Crime detection.
  • Call Detail Record Clustering — K-means clustering is the popular unsupervised clustering algorithm used to find the pattern in the data. Here, K-means is applied among “total activity and activity hours” to find the usage pattern with respect to the activity hours. By using this clustering mechanism, you can find the clusters making more traffic to the telecom network in the measure of total activity.
  • Credit Card Fraud Detection — As Credit card has the power to purchase the things, its frauds also increased. The operation performed to validate the Credit card number, which is done as a combination of Luhn algorithm and K-Means algorithm. Luhn Algorithm will be applied if a credit card number is not accepted by K-Means Algorithm. K-Means is then enhanced to addition of epochs. The main is to increase the security system of credit card and debit card using k-means clustering algorithm. In addition, the proposed model should collect the detailed user profile, security questions and good model in verification and validation of credit card.

Thank you for reading,

I hope you liked it…

--

--