Machine Learning

Possibly the simplest way to explain K-Means algorithm

Clustering is a technique for finding similarity groups in a data, called clusters. It attempts to group individuals in a population together by similarity, but not driven by a specific purpose. Clustering is often called an unsupervised learning, as you don’t have prescribed labels in the data and no class values denoting a priori grouping of the data instances are given. In this post, let’s discuss about the famous centroid based clustering algorithm — K-means — in a simplest way.
Check out the following figures to get started:
Figure 1:

clustering in k means

Figure 2:

Move centroid step in K means algorithim

To run a k-means algorithm, you have to randomly initialize three points (See the figures 1 and 2) called the cluster centroids. I have three cluster centroids, because I want to group my data into three clusters. K-means is an iterative algorithm and it does two steps: 1. Cluster assignment step 2. Move centroid step.
In Cluster assignment step, the algorithm goes through each of the data points and depending on which cluster is closer, whether the red cluster centroid or the blue cluster centroid or the green; It assigns the data points to one of the three cluster centroids.
In move centroid step, K-means moves the centroids to the average of the points in a cluster. In other words, the algorithm calculates the average of all the points in a cluster and moves the centroid to that average location.
This process is repeated until there is no change in the clusters (or possibly until some other stopping condition is met). K is chosen randomly or by giving specific initial starting points by the user.
Now, check out the figures 3 and 4 below. They are the examples of K-means being run on 90 data points (with k =3). The data does not have well defined clusters as in the previous examples. Figure 3 shows the initial data points before clustering and figure 4 shows the result after 16 iterations. The three lines in figure 4 shows the path from each centroid’s initial location to its final location.
Figure 3:
data points in a k means algorithim
Figure 4:
K means algorithm a simple explanation
K-means is usually run many times, starting with different random centroids each time. The results can be compared by examining the clusters or by a numeric measure such as the clusters’ distortion, which is the sum of the squared differences between each data point and its corresponding centroid. In cluster distortion case, the clustering with lowest distortion value can be chosen as the best clustering.
For choosing an appropriate value for K, just run the experiment using different values of K and see which ones generate good results. Since, K-means is used for exploratory data mining, you must examine the clustering results anyways to determine which clusters make sense. The value for k can be decreased if some clusters are too small, and increased if the clusters are too broad.
For a more objective measure, you can experiment with increasing values of k and graph various metrics (indices) of the quality of the resulting clustering’s.  There are various methods on this Wikipedia page to determine the number of clusters in a data set.

14 Comments
  1. I love your blog.. very nice colors & theme.
    Did you design this website yourself or did you hire someone to do
    it for you? Plz answer back as I’m looking to construct my own blog and would like to find out where u got this
    from. kudos

  2. buy usa proxy 12 months ago
    Reply

    Superb, what a website it is! This blog presents helpful fafts to us, keep it up.

  3. MaxPowerHit 12 months ago
    Reply

    Download missing dll from Not Found xlive.dll page. Fix the error now!

  4. EJutidA 11 months ago
    Reply

    Elton John is my favourite singer of the world. I’m happy to present for you this setlist 2019. Check Elton John tour SAskatoon website to get your best tickets for the farewell Elton John tour.

  5. minecraft 8 months ago
    Reply

    Awesome article.

  6. I’d like to find out more? I’d want to find out more details.

  7. Marcellus So 7 months ago
    Reply

    Backlinks that work http://bit.ly/2w7wZZN

  8. denfy 7 months ago
    Reply

    we just update our solution for DistributedCom error, please check this out.

  9. gamefly free trial 6 months ago
    Reply

    Hmm is anyone else experiencing problems with the images on this blog loading?
    I’m trying to figure out if its a problem on my end or
    if it’s the blog. Any responses would be greatly appreciated.

  10. my response 6 months ago
    Reply

    I simply want to say I’m newbie to blogging and site-building and really loved this web blog. Probably I’m want to bookmark your blog . You absolutely have good stories. Kudos for revealing your web page.

  11. This page makes me think of the other comment I was seeing

  12. our coconut oil 3 weeks ago
    Reply

    These are truly fantastic ideas in regarding blogging.
    You have touched some pleasant factors here. Any way keep up wrinting.

  13. begeni satin al

  14. I like the helpful information you provide to your articles.
    I will bookmark your weblog and check once more
    here frequently. I’m somewhat certain I’ll learn lots of new stuff proper here!

    Good luck for the following!

Leave a Comment

Your email address will not be published.

You may also like

Pin It on Pinterest