What is K nearest neighbors(KNN)?* *

*KNN is one of the simplest machine learning algorithm and it is a lazy algorithm, as it doesn’t run computations on a data set until you give it a new data point you are trying to test.*

In this tutorial, I will not only show you how to implement k-Nearest Neighbors in Python (SciKit-Learn), but also I will investigate the influence of higher dimensional spaces on the classification.

The implementation will be specific for a classification problem and will be demonstrated using the digits data set.

**How K Nearest Neighbors Work?**

Lets say you have several apples and oranges and you have an unclassified fruit. If K value is 3, the algorithm looks at the 3 nearest neighbors of the unknown fruit and classify the unknown fruit as orange(as there are two oranges and one apple).

If K is 5, the algorithm looks at the 5 nearest neighbors and classify the unknown fruit as apple( 3 apples and 2 oranges).

If K is 5, the algorithm looks at the 5 nearest neighbors and classify the unknown fruit as apple( 3 apples and 2 oranges).

KNN classifies an unknown item based on the concept of majority votes. Each neighbor can either be given an equal weight or the vote can be based on the distance. The similarity measure is dependent on the type of data, for real-valued data, the Euclidean distance can be used; For other types of data such as categorical or binary data, hamming distance can be used. Since there is a minimum training involved, there is a high computational cost associated with testing a new data. I recommend you to read Saravanan’s blog to know more about KNN.

**Analyzing Digits Data Set**

First I import all the required Python libraries to my Ipython Notebook.

*Seaborn *is a Python library for making attractive statistical graphs, it is built on top of matplotlib. *sklearn.datasets** *is used to import default data sets present in scikit-learn. *sklearn.cross validation* is used to perform cross validation on your data set, and *sklearn.grid_search** *is used to select the best parameter K. If you don’t know what is meant by parameter selection and cross validation, please watch week 6 videos of coursera’s machine learning course. I will explain about*sklearn.decomposition* and *sklearn.metrics* later in this post.

I then load the digits data set and store these data and target values in *X and Y *variables. My X value has 1797 rows and 64 columns, and Y value has 1797 rows and one column. You can *print digits.DESCR *to know more about this data set.

**Train-test split and mean normalization**

I Split the data set into train and test set, in which I use 33% of the samples as my test data. I then mean normalize X_train and X_test.

**Projection Of Principal components **

I create a scatter plot of the projections to the first two Prinicpal components.

You can see here that I use *sklearn.decomposition.TruncatedSVD * function to reduce the number of components. It performs linear dimensionality reduction very similar to PCA, but operates directly on sample vectors, instead of on covariance matrix.

**Cross-Validation To Estimate The Optimal-Value For K**

I am going to do a ten-fold cross-validation to estimate the best K value. Apart from estimating the best K value, I am also interested in the influence of the number of dimensions I project the data down. This means that I am going to optimize K for different dimensional projections of the data.

*compute test function*

*Implementation of K nearest*

You don’t have to panic by seeing the above-mentioned code, I will explain the code line by line. In our *implementation of knearest* section, I set different values for K( from 1 to 20). Then I put these K values into a dictionary because *GridSearchCv*accepts parameter values only as a dictionary.

In the next line of this code, I call my nearest neighbors classifier from scikit-learn,*knearest = sklearn.neighbors.KNeighborsClassifier().*

Don’t get confused as I introduced Iris data set here. In this section, I am going to explain what *GridSearchCV* does use Iris data set.

First I will load iris data set and then perform a train test split.

*X_train, X_test, Y_train, Y_test = sklearn.cross_validation.train_test_split(X, Y, test_size = 0.33, random_state = 42)*

If you are really curious to know about *random_state*, read this stack over flow thread here.

Then I fit nearest neighbors to my dataset.

*clf = sklearn.grid_search.GridSearchCV(knn, parameters, cv =10), ** *here I pass my nearest neighbors classifier, parameters and cross-validation value to * GridSearchCV. ** *Even if you don’t understand what cross-validation or what *GridSeachCV* does, don’t worry about it, it just selects the best parameter K for you. This is all you have to know about *GridSearchCv*.

You can see *GridSearchCv* does all the hard work for you and returns the best k parameter.

**Explaining The Effect Of Dimensions In KNearest Neighbors**

Ok(!) Let me continue to explain the code of my digits dataset,

First I create two empty lists and a list containing numbers from 1 to 10.

*accuracy =[]*

*params =[]*

*no_of_dimensions = [1,2,3,4,5,6,7,8,9, 10]*

I then loop over my *no_of_dimensions ** *using a for loop(*for d in no_of_dimensions*).

Then I call *TruncatedSVD* from Scikit-Learn:

*svd =sklearn.decomposition.TruncatedSVD(n_components =d)*

Then I fit svd to my training data(*X_train*) and apply transform method to my test data.

*if d<64:*

* X_fit = svd.fit_transform(X_train)*

* X_fit_atest = svd.transform(X_test)*

Now I fit my classifier to the truncated *X_fit* and *Y_train.** *When you are fitting your classifier to your data set, remember to use *X_fit* instead of *X_train.*

*clf.fit(X_fit, Y_train)*

**Understanding Accuracy Scores**

In this line of code “*accuracy.append(compute_test(x_test = X_fit_atest, y_test = Y_test, clf = clf, cv =10))*” I compute the accuracy score for every dimension using *compute_test function*. In *compute test* function, *sklearn.cross_validation.KFold** *gives the indices to do a 10 fold cross validation split. I then calculate the accuracy score for *X_fit_atest* and*Y_test. *

**Conclusion**

The accuracy gets better as the dimensions increase. I have enough data points that the curse of dimensionality does not harm my predictions here and the additional dimensions add to the class separability.

Hope this post has given a good idea of how k nearest neighbors operate, and how dimensions of the data affect your classification accuracy.

Raymondwob4 months agoTop Cryptocurrencies To Invest In 2018-2019: http://www.vkvi.net/bestinvestcryptobitcoin27724

JamesOvalp4 months agoWhere to invest $ 3000 once and receive every month from $ 55000: http://valeriemace.co.uk/milliondollars92562

Raymondwob4 months agoCryptocurrency Investing 2019: http://valeriemace.co.uk/15000investbinarycrypto65772

g2 months agoWOW just what I was looking for. Came here by searching for g

BryantWeisK2 months agoIf you invested $1,000 in bitcoin in 2011, now you have $4 million: http://www.vkvi.net/investmining63528

JamesOvalp2 months agoWenn Sie im Jahr 2011 1.000 USD in Bitcoin investiert haben, haben Sie jetzt 4 Millionen USD: http://corta.co/bestinvest19690

gamefly free trial1 month agoI was more than happy to uncover this website. I need to to thank you for your time just for this fantastic read!!

I definitely savored every part of it and i also have you bookmarked to see new information on your site.

gamefly free trial1 month agoAppreciate the recommendation. Let me try it out.

LowellBiC1 month agoWie man € 10.000 pro Tag SCHNELL macht: https://clck.ru/GR2DG

JamesOvalp1 month ago$ 10000 pro Tag Bitcoins auf dem Markt fГјr binГ¤re Optionen handeln: http://tinyurl.com/yxq78ut5

LowellBiC1 month agoErwachsenendatierung bei 35 Jahren alt: http://tinyurl.com/y3r6ru57

Marvinpop1 month ago5 Popular Investment Apps In Australia: http://sneetsodenit.tk/txcq

What is the best way to invest $10,000 for Australians http://voitabliasi.tk/jhyl41 month agoBitcoin Investment Australia: http://supwildsynchhyrd.tk/oxzm6

JamesCix1 month agoWhat is the best way to invest $10,000 for Australians: http://finostmipi.tk/0m0n

JamesCix1 month agoBitcoin Investment Deutschland: http://xurl.es/veh8s

Marvinpop4 weeks agoNatural Stress Solutions CBD Lip Balm: http://www.abcagency.se/qwe73745

WilliamMeave4 weeks agoSuch dir ein Madchen fur die Nacht in deiner Stadt: http://xurl.es/txp5m

EduardoAvabs4 weeks agoTrouvez-vous une fille pour la nuit dans votre ville: http://xurl.es/2qjgs

EduardoAvabs4 weeks agoTrouvez-vous une fille pour la nuit dans votre ville: http://xurl.es/2qjgs

DylanLes3 weeks agoIf you invested $1,000 in bitcoin in 2011, now you have $4 million: http://cutt.us/3UFg9vJ

WilliamMeave3 weeks agoWie man in bitcoins $ 5000 investiert – erzielt eine Rendite von bis zu 2000%: http://cutt.us/xJPxEylzV

WilliamMeave3 weeks agoWie man in bitcoins $ 5000 investiert – erzielt eine Rendite von bis zu 2000%: http://cutt.us/xJPxEylzV

LowellBiC3 weeks agoWenn Sie im Jahr 2011 1.000 USD in Bitcoin investiert haben, haben Sie jetzt 4 Millionen USD: https://is.gd/WWJSxB

LowellBiC3 weeks agoInvestoi kannabiksen NZ: hen: http://v.ht/G28tdh

DylanLes3 weeks agoWie man in Cannabis investiert – die 3 besten Marihuana-Aktien fГјr 2019: https://hideuri.com/qvrX46

DylanLes2 weeks agoCannabis investiert in London: http://v.ht/5HHz0

GeorgeKak1 week agoР—Р°Р±РµСЂРёС‚Рµ СЃРІРѕРё 104114 С‡РµСЃС‚РЅРѕ Р·Р°СЂР°Р±РѕС‚Р°РЅРЅС‹С… СЂСѓР±Р»РµР№: https://s.coop/22p77?&dqlag=F34fKOCcfIx

GeorgeKak1 week agoР—Р°Р±РµСЂРёС‚Рµ СЃРІРѕРё 104114 С‡РµСЃС‚РЅРѕ Р·Р°СЂР°Р±РѕС‚Р°РЅРЅС‹С… СЂСѓР±Р»РµР№: https://s.coop/22p77?&dqlag=F34fKOCcfIx

Louismup1 week agoР—Р°Р±РµСЂРёС‚Рµ Р’Р°С€Рё 127583 СЂСѓР±Р»РµР№: https://hideuri.com/qb1Ykr?hCh5KLbMbn

Aarongag1 week agoРџРѕР»СѓС‡РёС‚Рµ СЃРІРѕРё 143283 С‡РµСЃС‚РЅРѕ Р·Р°СЂР°Р±РѕС‚Р°РЅРЅС‹С… СЂСѓР±Р»РµР№: http://v.ht/7Lcj6D?&hsinh=d9rN9k

GeorgeKak1 week agoР—Р°Р±РµСЂРёС‚Рµ Р’Р°С€Рё 128310 СЂСѓР±Р»РµР№: https://hideuri.com/xyVZy1?FSHKuZ

Louismup1 week agoР’РѕР·СЊРјРёС‚Рµ Р’Р°С€Рё 137673 СЂСѓР±Р»РµР№: http://merky.de/tnfncr?&qkymk=az9rmtd1Iw

RichardFam1 week agoIf you invested $1,000 in bitcoin in 2011, now you have $4 million: http://bit.do/eYLhf?pnCL2XS9

RichardFam1 week agoIf you invested $1,000 in bitcoin in 2011, now you have $4 million: http://bit.do/eYLhf?pnCL2XS9

Randhax11 hours agoDapoxetina Ou Paroxetina Mail Order Isotretinoin Mastercard