K-MEANS CLUSTERING IN SECURITY DOMAIN

Atharva Jagtap
3 min readAug 10, 2021

The K-means algorithm is the most widely used clustering algorithm that uses an explicit distance measure to partition the data set into clusters.

K-means is a part of clustering method which come’s under unsupervised machine learning algorithms

To understand “What is K-means Clustering? and “ How K-means Algorithm works?”, Let’s first understand “ What is Unsupervised Learning?”

Unsupervised Learning :-

Unsupervised learning is the training of a machine using information that is neither classified nor labeled and allowing the algorithm to act on that information without guidance. Here the task of the machine is to group unsorted information according to similarities, patterns, and differences without any prior training of data

It allows the model to work on its own to discover patterns and information that was previously undetected. It mainly deals with unlabelled data.

Clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. So to solve this problem Clustering Algorithms were introduced and one of them is K-means Clustering.

So, What is K-means Clustering?

K-means Algorithm :-

K-Means Algorithm is a partition-based method of clustering and is very popular for its simplicity. K-means identifies k number of centroids and then allocates every data point to the nearest cluster while keeping the centroids as small as possible.

“K” means number of cluster you choose/ number of centroids needed.

And “means” in K-means is averaging the dataset to find the centroids.

Now, “How does K-means Algorithm works?”

  1. K points are placed into the object data space representing the initial group of centroids.
  2. Each object or data point is assigned into the closest k.
  3. After all objects are assigned, the positions of the k centroids are recalculated.
  4. Steps 2 and 3 are repeated until the positions of the centroids no longer move.

Applications of K-means :-

  • Customer Profiling
  • Market segmentation
  • Computer vision
  • Geo-statistics
  • Astronomy
  • Document clustering
  • Identifying crime-prone areas

Use-cases of K-means

1.Automating Clustering of it a Alerts -

Large enterprise infrastructure technology components such as network, storage, or database generate large volumes of alert messages. Because alert messages potentially point to operational issues, they must be manually screened for prioritization for downstream processes.

2.Crime Document Classification :-

Cluster documents in multiple categories based on tags, topics, and the content of the document. This is a very standard classification problem and k-means is a highly suitable algorithm for this purpose. The initial processing of the documents is needed to represent each document as a vector and uses term frequency to identify commonly used terms that help classify the document.

3. Cyber-profiling Criminal :-

Cyber profiling is the process of collecting data from individuals and groups to identify significant correlations. The idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene.

Thank you for reading my block.

--

--