Machine Learning

K-Nearest Neighbors and curse of dimensionality in python Scikit-Learn

What is K nearest neighbors(KNN)? 

KNN is one of the simplest machine learning algorithm and it is a lazy algorithm, as it doesn’t run computations on a data set until you give it a new data point you are trying to test.

In this tutorial, I will not only show you how to implement k-Nearest Neighbors in Python (SciKit-Learn), but also I will investigate the influence of higher dimensional spaces on  the classification.

The implementation will be specific for a classification problem and will be demonstrated using the digits data set.

How K Nearest Neighbors Work?

Lets say you have several apples and oranges and you have an unclassified fruit. If K value is 3, the algorithm looks at the 3 nearest neighbors of the unknown fruit and classify the unknown fruit as orange(as there are two oranges and one apple).

If K is 5,  the algorithm looks at the 5 nearest neighbors and classify the unknown fruit as apple( 3 apples and 2 oranges).

If K is 5,  the algorithm looks at the 5 nearest neighbors and classify the unknown fruit as apple( 3 apples and 2 oranges).

KNN classifies an unknown item based on the concept of  majority votes.  Each neighbor can either be given an equal weight or the vote can be based on the distance. The similarity measure is dependent on the type of data, for real-valued data, the Euclidean distance can be used; For other types of data such as categorical or binary data, hamming distance can be used. Since there is a minimum training involved, there is a high computational cost associated with testing a new data. I recommend you to read Saravanan’s blog to know more about KNN.

Analyzing Digits Data Set

First I import all the required Python libraries  to my Ipython Notebook.

Ipython notebook to analyze digits data set

Seaborn is a Python library for making attractive statistical graphs, it is built on top of matplotlib. sklearn.datasets is used to import default data sets present in scikit-learn.  sklearn.cross validation is used to perform cross validation on your data set, and sklearn.grid_search is used to select the best parameter K. If you don’t know what is meant by parameter selection and cross validation, please watch week 6 videos of coursera’s machine learning course. I will explain aboutsklearn.decomposition and sklearn.metrics later in this post.

sk learn data set

I then load the digits data set and store these data and target values in X and Y variables. My X value has 1797 rows and 64 columns, and Y value has 1797 rows and one column.  You can print digits.DESCR to know more  about this data set.

Train-test split and mean normalization

Train and test mean normalization

I Split the data set into train and test set, in which I use 33% of the samples as my test data. I then mean normalize X_train and X_test.

Projection Of Principal components 

I create a scatter plot of the projections to the first two Prinicpal components.

SVD in python

First two prinicipal component

You can see here that I use sklearn.decomposition.TruncatedSVD  function to reduce the number of components.  It performs linear dimensionality reduction very similar to PCA, but operates directly on sample vectors, instead of on covariance matrix.

Cross-Validation To Estimate The Optimal-Value For K

I am going to do a ten-fold cross-validation to estimate the best K value. Apart from estimating the best K value, I am also interested in the influence of the number of dimensions I project the data down. This means that I am going to optimize K for different dimensional projections of the data.

compute test function

compute test function

Implementation of K nearest

implementation of k nearest in python

You don’t have to panic by seeing the above-mentioned code, I will explain the code line by line.  In our implementation of knearest section, I set different values for K( from 1 to 20).  Then I put these K values into a dictionary because GridSearchCvaccepts parameter values only as a dictionary.

nearest negibors

In the next line of this code, I call my nearest neighbors classifier from scikit-learn,knearest = sklearn.neighbors.KNeighborsClassifier().

Don’t get confused as I introduced Iris data set here. In this section, I am going to explain what GridSearchCV does use Iris data set.

First I will load iris data set and then perform a train test split.

iris data set

X_train, X_test, Y_train, Y_test = sklearn.cross_validation.train_test_split(X, Y, test_size = 0.33, random_state = 42)

If you are really curious to know about random_state, read this stack over flow thread  here.

Then I fit nearest neighbors to my dataset.

Kneighbors classifier

clf = sklearn.grid_search.GridSearchCV(knn, parameters, cv =10),  here I pass my nearest neighbors classifier, parameters and cross-validation value to  GridSearchCV.  Even if you don’t understand what cross-validation or what GridSeachCV does, don’t worry about it, it just selects the best parameter K for you.  This is all you have to know about GridSearchCv.

best params

You can see GridSearchCv does all the hard work for you and returns the best k parameter.

Explaining The Effect Of Dimensions In KNearest Neighbors

Ok(!) Let me continue to explain the code of my digits dataset,

First I create two empty lists and a list containing numbers from 1 to 10.

accuracy =[]

params =[]

no_of_dimensions = [1,2,3,4,5,6,7,8,9, 10]

I then loop over my no_of_dimensions  using a for loop(for d in no_of_dimensions).

Then I call TruncatedSVD from Scikit-Learn:

svd =sklearn.decomposition.TruncatedSVD(n_components =d)

Then I fit svd to my training data(X_train) and apply transform method to my test data.

if d<64:

        X_fit = svd.fit_transform(X_train)

        X_fit_atest = svd.transform(X_test)

Now I fit my classifier to the truncated X_fit and Y_train. When you are fitting your classifier to your data set, remember  to use X_fit instead of X_train.

clf.fit(X_fit, Y_train)

Understanding Accuracy Scores

In this line of code “accuracy.append(compute_test(x_test = X_fit_atest, y_test = Y_test, clf = clf, cv =10))”  I  compute the accuracy score for every dimension using compute_test function. In compute test function, sklearn.cross_validation.KFold gives  the indices to do a 10 fold cross validation split. I then calculate the accuracy score for X_fit_atest andY_test.  

Conclusion

The accuracy gets better as the dimensions increase. I have enough data points that the curse of dimensionality does not harm my predictions here and the additional dimensions add to the class separability.

Hope this post has given a good idea of how k nearest neighbors operate, and how dimensions of the data affect your classification accuracy.

102 Comments
  1. Raymondwob 8 months ago
    Reply

    Top Cryptocurrencies To Invest In 2018-2019: http://www.vkvi.net/bestinvestcryptobitcoin27724

  2. JamesOvalp 8 months ago
    Reply

    Where to invest $ 3000 once and receive every month from $ 55000: http://valeriemace.co.uk/milliondollars92562

  3. Raymondwob 8 months ago
    Reply
  4. g 6 months ago
    Reply

    WOW just what I was looking for. Came here by searching for g

  5. BryantWeisK 6 months ago
    Reply

    If you invested $1,000 in bitcoin in 2011, now you have $4 million: http://www.vkvi.net/investmining63528

  6. JamesOvalp 6 months ago
    Reply

    Wenn Sie im Jahr 2011 1.000 USD in Bitcoin investiert haben, haben Sie jetzt 4 Millionen USD: http://corta.co/bestinvest19690

  7. gamefly free trial 5 months ago
    Reply

    I was more than happy to uncover this website. I need to to thank you for your time just for this fantastic read!!
    I definitely savored every part of it and i also have you bookmarked to see new information on your site.

  8. gamefly free trial 5 months ago
    Reply

    Appreciate the recommendation. Let me try it out.

  9. LowellBiC 5 months ago
    Reply

    Wie man € 10.000 pro Tag SCHNELL macht: https://clck.ru/GR2DG

  10. JamesOvalp 5 months ago
    Reply

    $ 10000 pro Tag Bitcoins auf dem Markt für binäre Optionen handeln: http://tinyurl.com/yxq78ut5

  11. LowellBiC 5 months ago
    Reply

    Erwachsenendatierung bei 35 Jahren alt: http://tinyurl.com/y3r6ru57

  12. Marvinpop 5 months ago
    Reply

    5 Popular Investment Apps In Australia: http://sneetsodenit.tk/txcq

  13. What is the best way to invest $10,000 for Australians http://voitabliasi.tk/jhyl4 5 months ago
    Reply

    Bitcoin Investment Australia: http://supwildsynchhyrd.tk/oxzm6

  14. JamesCix 5 months ago
    Reply

    What is the best way to invest $10,000 for Australians: http://finostmipi.tk/0m0n

  15. JamesCix 5 months ago
    Reply

    Bitcoin Investment Deutschland: http://xurl.es/veh8s

  16. Marvinpop 5 months ago
    Reply

    Natural Stress Solutions CBD Lip Balm: http://www.abcagency.se/qwe73745

  17. WilliamMeave 5 months ago
    Reply

    Such dir ein Madchen fur die Nacht in deiner Stadt: http://xurl.es/txp5m

  18. EduardoAvabs 5 months ago
    Reply

    Trouvez-vous une fille pour la nuit dans votre ville: http://xurl.es/2qjgs

  19. EduardoAvabs 5 months ago
    Reply

    Trouvez-vous une fille pour la nuit dans votre ville: http://xurl.es/2qjgs

  20. DylanLes 5 months ago
    Reply

    If you invested $1,000 in bitcoin in 2011, now you have $4 million: http://cutt.us/3UFg9vJ

  21. WilliamMeave 5 months ago
    Reply

    Wie man in bitcoins $ 5000 investiert – erzielt eine Rendite von bis zu 2000%: http://cutt.us/xJPxEylzV

  22. WilliamMeave 5 months ago
    Reply

    Wie man in bitcoins $ 5000 investiert – erzielt eine Rendite von bis zu 2000%: http://cutt.us/xJPxEylzV

  23. LowellBiC 5 months ago
    Reply

    Wenn Sie im Jahr 2011 1.000 USD in Bitcoin investiert haben, haben Sie jetzt 4 Millionen USD: https://is.gd/WWJSxB

  24. LowellBiC 5 months ago
    Reply

    Investoi kannabiksen NZ: hen: http://v.ht/G28tdh

  25. DylanLes 5 months ago
    Reply

    Wie man in Cannabis investiert – die 3 besten Marihuana-Aktien fГјr 2019: https://hideuri.com/qvrX46

  26. DylanLes 5 months ago
    Reply

    Cannabis investiert in London: http://v.ht/5HHz0

  27. GeorgeKak 4 months ago
    Reply

    Заберите свои 104114 честно заработанных рублей: https://s.coop/22p77?&dqlag=F34fKOCcfIx

  28. GeorgeKak 4 months ago
    Reply

    Заберите свои 104114 честно заработанных рублей: https://s.coop/22p77?&dqlag=F34fKOCcfIx

  29. Louismup 4 months ago
    Reply

    Заберите Ваши 127583 рублей: https://hideuri.com/qb1Ykr?hCh5KLbMbn

  30. Aarongag 4 months ago
    Reply

    Получите свои 143283 честно заработанных рублей: http://v.ht/7Lcj6D?&hsinh=d9rN9k

  31. GeorgeKak 4 months ago
    Reply

    Заберите Ваши 128310 рублей: https://hideuri.com/xyVZy1?FSHKuZ

  32. Louismup 4 months ago
    Reply

    Возьмите Ваши 137673 рублей: http://merky.de/tnfncr?&qkymk=az9rmtd1Iw

  33. RichardFam 4 months ago
    Reply

    If you invested $1,000 in bitcoin in 2011, now you have $4 million: http://bit.do/eYLhf?pnCL2XS9

  34. RichardFam 4 months ago
    Reply

    If you invested $1,000 in bitcoin in 2011, now you have $4 million: http://bit.do/eYLhf?pnCL2XS9

  35. Randhax 4 months ago
    Reply

    Dapoxetina Ou Paroxetina Mail Order Isotretinoin Mastercard

  36. LowellBiC 4 months ago
    Reply

    Forex + Bitcoin = $ 7000 per week: http://dserpenretask.tk/m68e

  37. RichardPiT 4 months ago
    Reply

    Make Money 10000$ Per Day With Bitcoin: http://viconringbos.tk/hkm81

  38. LowellBiC 4 months ago
    Reply

    Get $1500 – $6000 per DAY: http://tingmacpostro.tk/16u6

  39. Marvinpop 4 months ago
    Reply

    If you invested $1,000 in bitcoin in 2011, now you have $4 million: http://snifkaawindpoo.tk/3lr8

  40. Marvinpop 4 months ago
    Reply

    Get $1500 – $6000 per DAY: http://boggrickblacun.tk/jiq4

  41. LowellBiC 4 months ago
    Reply

    Invest in Bitcoin and earn from $ 3000 per day: https://hideuri.com/a7LWW1?&fkiry=bH0N6Op7

  42. RichardPiT 4 months ago
    Reply

    $15,000 a month (30mins “work” lol): http://metanowa.tk/8v9lk

  43. RichardFam 4 months ago
    Reply

    Invest in Bitcoin and earn from $ 3000 per day: https://hideuri.com/KezM9v?pt2BLXU5borT

  44. Marvinpop 4 months ago
    Reply

    Invest in Bitcoin and earn from $ 3000 per day: http://v.ht/2scMxu9?S5yvse0

  45. RichardFam 4 months ago
    Reply

    Earnings on the Bitcoin course from $ 2500 per day: https://hideuri.com/xp4DeN?&cwqii=4yVZA

  46. JamesOvalp 4 months ago
    Reply

    Paid Surveys: Earn $30,000 Or More Per Week: http://chaifuldedy.tk/vacs6

  47. natalielise 4 months ago
    Reply

    Saved as a favorite, I like your blog! pof natalielise

  48. Aarongag 4 months ago
    Reply

    Paid Surveys: Earn $30,000 Or More Per Week: http://v.ht/d1pK3S?&nygmn=9nkEFqLsoanySl

  49. Marvinpop 4 months ago
    Reply

    п»їWhat’s the easiest way to earn $30000 a month: http://v.ht/adA0v?m7YukAQpASjnv6

  50. LowellBiC 3 months ago
    Reply

    Find yourself a girl for the night in your city: http://olspynesmen.tk/fma59?tXiMU5lyNCY

  51. Marvinpop 3 months ago
    Reply

    $200 for 10 mins “work?”: http://enouflacu.tk/2o5bc?&jvufr=7NxSnnamiRC

  52. Marvinpop 3 months ago
    Reply

    Binary options + cryptocurrency = $ 7000 per week: http://steerwarita.tk/ngbv?&tgdhr=1vTIV

  53. Marvinpop 3 months ago
    Reply

    Get $1500 – $6000 per DAY: http://paddreatpolan.tk/1f0y4?lrJMUW0LCT

  54. Marvinpop 3 months ago
    Reply

    Invest in Bitcoin and earn from $ 3000 per day: http://plaserperni.tk/3kdto?gSTTGJchwF

  55. RichardPiT 3 months ago
    Reply

    Where to invest $ 3000 once and receive every month from $ 55000: http://besttilcfindti.tk/0d2x?&gcajo=3QQhDmhGQgC

  56. RichardPiT 3 months ago
    Reply

    Get $1500 – $6000 per DAY: http://laytimenbe.tk/qehs?&auaae=aoVfEt1MK

  57. Robertgrava 3 months ago
    Reply

    If you invested $1,000 in bitcoin in 2011, now you have $4 million: http://ackratpupo.tk/lwdc?&cwawm=EjQRh2

  58. RichardFam 3 months ago
    Reply

    Earn Free Bitcoin 0.2 BTC Per day: http://gerensusi.tk/sx27?&bdskp=yGjct0d

  59. DarrellAcare 3 months ago
    Reply

    LAZY way for $200 in 20 mins: http://alecolmix.tk/f3k0v?rCSRCY7jFMd6

  60. Robertgrava 3 months ago
    Reply

    $200 for 10 mins “work?”: http://felancichin.cf/z9pwi?rBZ1soUStp

  61. PeterCIG 3 months ago
    Reply

    Find yourself a girl for the night in your city: http://cuistifabxi.tk/3zgdv?w1fWSrjT4

  62. PeterCIG 3 months ago
    Reply

    Find yourself a girl for the night in your city: http://geouremeti.tk/7w4i?&luoua=YNcgUHLGQ

  63. Robertgrava 3 months ago
    Reply

    Binary options + cryptocurrency = $ 7000 per week: http://estaisugi.gq/tr1n8?&jyoqi=LQRdg

  64. Timothygoofs 3 months ago
    Reply

    Invest in Bitcoin and earn from $ 3000 per day: http://tantconspleased.tk/mf0o?d0cO9c2thufP

  65. Robertgrava 3 months ago
    Reply

    Paid Surveys: Earn $30,000 Or More Per Week: http://go-4.net/fiRM?&tbcws=EMkI24ZZWdY

  66. Robertgrava 3 months ago
    Reply

    Paid Surveys: Earn $30,000 Or More Per Week: https://hec.su/k31U?&ewpkp=602niRtUnn5cmI

  67. JamesOvalp 3 months ago
    Reply

    Forex + cryptocurrency = $ 9000 per week: http://v.ht/sHzWf?&tgymd=X9iW3tE

  68. JamesOvalp 3 months ago
    Reply

    Get $1000 – $6000 A Day: http://gmy.su/:epkO?8CjqUyouk

  69. DannyLielo 3 months ago
    Reply

    Such dir ein Madchen fur die Nacht in deiner Stadt: http://v.ht/AxWmaI?zhYWnJktM8

  70. Timothygoofs 3 months ago
    Reply

    10 meilleurs sites de rencontres en RF 2019: https://hec.su/lgHq?&cwdcu=BtkDenk

  71. JamesOvalp 2 months ago
    Reply

    Top 5 sites de rencontre occasionnels IE 2019: https://hec.su/laGY?R5XuhdxTZ5v0sk

  72. JamesOvalp 2 months ago
    Reply

    5 meilleures applications de rencontres occasionnelles: http://v.ht/IsQJvc?&tgiwr=FG3HQ8DbRIP3

  73. PeterCIG 2 months ago
    Reply

    $200 for 10 mins “work?”: http://go-4.net/fksl?&aijpe=G2wLTfo9h

  74. Robertgrava 2 months ago
    Reply

    Binary options + Bitcoin = $ 5000 per week: http://v.ht/tHNjzy?&guphi=MkZw2

  75. DarrellAcare 2 months ago
    Reply

    Paid Surveys: Earn $30,000 Or More Per Week: http://go-4.net/fLw2?HPE0hlnfvtx

  76. DarrellAcare 2 months ago
    Reply

    Forex + Bitcoin = $ 7000 per week: http://surpatapppo.cf/pulun?&zppen=8DewcMMoHPxC

  77. Antonioscelm 2 months ago
    Reply

    $15,000 a month (30mins “work” lol): http://v.ht/mxSK4k?&djfvz=DpKHvp

  78. JamesOvalp 2 months ago
    Reply

    Become a bitcoin millionaire. Get from $ 2500 per day: http://v.ht/epOoVw?&cnwen=xdDWyaEvk4I

  79. PeterCIG 2 months ago
    Reply

    Binary options + cryptocurrency = $ 7000 per week: http://v.ht/0lycq?cvDOd

  80. Antonioscelm 2 months ago
    Reply

    Bitcoin Investment Deutschland: https://hec.su/lVLe?&tchgl=2ZlBKBgsuY

  81. Antonioscelm 2 months ago
    Reply

    Wie man € 10.000 pro Tag SCHNELL macht: http://soedwindecka.tk/ca9k?&cnkwb=mHNzOuN5z2

  82. Keithanabs 2 months ago
    Reply

    Forex + Bitcoin = $ 7000 per week: https://hec.su/nvjh?fnNFeINiN

  83. RonaldVed 2 months ago
    Reply

    Paid Surveys: Earn $30,000 Or More Per Week: http://gmy.su/:aaMT?&bztia=ztGEnJX

  84. RonaldVed 2 months ago
    Reply

    Binary options + Bitcoin = $ 5000 per week: https://hec.su/nBWF?vhfmzQ2o0bH5

  85. Keithanabs 2 months ago
    Reply

    Binary options + cryptocurrency = $ 7000 per week: https://hec.su/nC4p?IusEK

  86. Antonioscelm 2 months ago
    Reply

    Buy Essays UK – Cheap & Safe Online Writing Service: https://vk.cc/9OTGak?id=buyessayonline1118xv

  87. PeterCIG 2 months ago
    Reply

    UK Essay Writing Service You Can Rely On: https://vk.cc/9OTGak?id=buyessayonline112737

  88. Antonioscelm 2 months ago
    Reply

    Buy Essay Online At Writing Service from Canada: https://vk.cc/9OTGak?id=buyessayonline1115c5

  89. DarrellAcare 2 months ago
    Reply

    Buy Essay Online, Essay Writing Service, Write My Essay: https://vk.cc/9OTGak?id=buyessayonline1132ia

  90. Antonioscelm 2 months ago
    Reply

    Buy Essay Online | Purchase Essays From Experts: https://vk.cc/9OTGak?id=buyessayonline1112sa

  91. Antonioscelm 2 months ago
    Reply

    Just Buy Essay Online | 100% Safe and Client-Oriented Writing Service: https://vk.cc/9OTGak?id=buyessayonline1118jc

  92. Antonioscelm 2 months ago
    Reply

    Buy Essay Online At Writing Service from UK: https://vk.cc/9OTGak?id=buyessayonline111a29

  93. PeterCIG 2 months ago
    Reply

    Buy an Essay Online for Cheap 24/7 | 100% Original: https://vk.cc/9OTGak?id=buyessayonline112qm1

  94. DarrellAcare 2 months ago
    Reply

    Buy Essay Online – Get Top Grade With Perfect Essay From Us: https://vk.cc/9OTGak?id=buyessayonline1134ie

  95. Robertgrava 2 months ago
    Reply

    USA Essay Writing Service You Can Rely On: https://vk.cc/9OTGak?id=buyessayonline111tdw

  96. PeterCIG 2 months ago
    Reply

    Buy an Essay — Cheap Academic Writing Service Online 24/7: https://vk.cc/9OTGak?id=buyessayonline112gfo

  97. Paid Surveys: Earn $6437 Or More Weekly: http://v.ht/bx3rVfP?oKjctbOyee5v 1 month ago
    Reply

    Invest $ 1413 and get $ 65391 every month: http://rosnaheartkat.tk/k4eh?hz8lIbdi

  98. Forex + Cryptocurrency = $ 7371 per week: http://www.lookweb.it/earnonebtc27923?h=1283 1 month ago
    Reply

    Paid Surveys: Earn $5428 Or More Per Week: https://hideuri.com/aVyOm4?wvTbDxWb

  99. Get yours $914326,98: https://cutt.us/m4FcMcap0 3 weeks ago
    Reply

    You have balance left $914326,98: https://cutt.us/OzziGY

  100. What's the most convenient way to earn $64671 a month: http://lovebyt.es/m4g7 2 weeks ago
    Reply

    Binary options + Cryptocurrency = $ 6293 per week: https://jtbtigers.com/wpgo

  101. ShawnSic 1 week ago
    Reply
  102. WalkerSoake 1 week ago
    Reply

    Binary options + Cryptocurrency = $ 7387 per week: https://ecuadortenisclub.com/earnonebitcoinperday188907

Leave a Comment

Your email address will not be published.

You may also like

Pin It on Pinterest