Machine Learning

How to run Linear regression in Python scikit-Learn

You know that linear regression is a popular technique and you might as well seen the mathematical equation of linear regression. But do you know how to implement a linear regression in Python?? If so don’t read this post because this post is all about implementing linear regression in Python. There are several ways in which you can do that, you can do linear regression using numpy, scipy, stats model and sckit learn. But in this post I am going to use scikit learn to perform linear regression.


Scikit-learn is a powerful Python module for machine learning. It contains function for regression, classification, clustering, model selection and dimensionality reduction. Today, I will explore the sklearn.linear_model module which contains “methods intended for regression in which the target value is expected to be a linear combination of the input variables”.

In this post, I will use Boston Housing data set, the data set contains information about the housing values in suburbs of Boston. This dataset was originally taken from the StatLib library which is maintained at Carnegie Mellon University and is now available on the UCI Machine Learning Repository. UCI machine learning repository contains many interesting data sets, I encourage you to go through it.

So come on lets have fun with linear regression,

Exploring Boston Housing Data Set

The first step is to import the required Python libraries into Ipython Notebook.

Explore 1

This data set is available in sklearn Python module, so I will access it using scikitlearn. I am going to import Boston data set into Ipython notebook and store it in a variable called boston.


The object boston is a dictionary, so you can explore the keys of this dictionary.

boston keys

boston data shape

I am going to print the feature names of boston data set.

boston features

I will see the description of this data set to know more about it. In this data set I have 506 instances(rows) and 13 attributes or parameters(columns). The goal of this exercise is to predict the housing prices in boston region using the features given.

boston description


I am going to convert into a pandas data frame.

Pandas DataFrame

As you can see the column names are just numbers, so I am going to replace those numbers with the feature names.

bos columns contains the housing prices.

Boston target

I am going to add these target prices to the bos data frame.

Bos Price


Scikit Learn

In this section I am going to fit a linear regression model and predict the Boston housing prices. I will use the least squares method as the way to estimate the coefficients.

Y = boston housing price(also called “target” data in Python)


X = all the other features (or independent variables)

First, I am going to import linear regression from sci-kit learn module. Then I am going to drop the price column as I want only the parameters as my X values. I am going to store linear regression object in a variable called lm.

Skitlearn linear model

If you want to look inside the linear regression object, you can do so by typing LinearRegression. and the press <tab> key. This will give a list of functions available inside linear regression object.


Important functions to keep in mind while fitting a linear regression model are: -> fits a linear model

lm.predict() -> Predict Y using the linear model with estimated coefficients

lm.score() -> Returns the coefficient of determination (R^2). A measure of how well observed outcomes are replicated by the model, as the proportion of total variation of outcomes explained by the model.

You can also explore the functions inside lm object by pressing lm.<tab>


.coef_ gives the coefficients and .intercept_ gives the estimated intercepts.


Fitting a Linear Model

I am going to use all 13 parameters to fit a linear regression model. Two other parameters that you can pass to linear regression object are fit_intercept and normalize.

In [20]:, bos.PRICE)

Out[20]: LinearRegression(copy_X=True, fit_intercept=True, normalize=False)

I am going to print the intercept and number of coefficients.

Estimated Coeff

I then construct a data frame that contains features and estimated coefficients.

pd data frame

As you can see from the data frame that there is a high correlation between RM and prices. Lets plot a scatter plot between True housing prices and True RM.

Scatter plot

Relationship between RM and Price

As you can see that there is a positive correlation between RM and housing prices.

Predicting Prices

I am going to calculate the predicted prices (Y^i) using lm.predict. Then I display the first 5 housing prices. These are my predicted housing prices.

lm predict

Then I plot a scatter plot to compare true prices and the predicted prices.

Scatter plot in the pandas

Prices vs predicted prices

You can notice that there is some error in the prediction as the housing prices increase.

Lets calculate the mean squared error.

MSE full But if you fit linear regression for one feature the error will be very high. Lets take the feature ‘PTRATIO’ and calculate the mean squared error.

Linear regression and fitting

MSE prat

The mean squared error has increased. So this shows that a single feature is not a good predictor of housing prices.

Training and validation data sets

In practice you wont implement linear regression on the entire data set, you will have to split the data sets into training and test data sets. So that you train your model on training data and see how well it performed on test data.

How not to do train-test split:

train-test split

You can create training and test data sets manually, but this is not the right way to do, because you may be training your model on less expensive houses and testing on expensive houses.

How to do train-test split:

You have to divide your data sets randomly. Scikit learn provides a function called train_test_split to do this.

Xtrain and Xtest

I am going to build a linear regression model using my train-test data sets.

Linear reg

Then I calculate the mean squared error for training and test data.


print “Fit a model X_train, and calculate MSE with Y_train:”, np.mean((Y_train – lm.predict(X_train)) ** 2)

print “Fit a model X_train, and calculate MSE with X_test, Y_test:”, np.mean((Y_test – lm.predict(X_test)) ** 2)


Fit a model X_train, and calculate MSE with Y_train: 19.5467584735 Fit a model X_train, and calculate MSE with X_test, Y_test: 28.5413672756

Residual Plots

Residual plots are a good way to visualize the errors in your data. If you have done a good job then your data should be randomly scattered around line zero. If you see structure in your data, that means your model is not capturing some thing. Maye be there is a interaction between 2 variables that you are not considering, or may be you are measuring time dependent data. If you get some structure in your data, you should go back to your model and check whether you are doing a good job with your parameters.

Plt scatter

Residual plot


To recap what I have done till now,

  1. I explored the boston data set and then renamed its column names.
  2. I explored the boston data set using .DESCR, my goal was to predict the housing prices using the given features.
  3. I used Scikit learn to fit linear regression to the entire data set and calculated the mean squared error.
  4. I made a train-test split and calculated the mean squared error for my training data and test data.
  5. I then plotted the residuals for my training and test datasets.
  1. xbox 360 backups 2 years ago

    Pretty! This has been an incredibly wonderful post.
    Thank you for providing this information.

  2. Appreciating the commitment you put into your site
    annd detailed information you present.It’s awesome to come across a blog every once in a while that isn’t the same out of date rehashed information.
    Wonderful read! I’ve bookmarked your site and I’m adding your RSS feeds
    to my Google account.

  3. nagelstudio 2 years ago

    Hi, its pleasant post about media print, we all know media is a great source of information.

  4. nkotbea 2 years ago

    New Kids on the Block is my favourite pop-band of 90s. NKOTB had so many hits! The ones I remember are ‘Tonight’, ‘Baby, I Believe In You’ and their hit ‘Step By Step’. These are real masterpieces, not garbage like today! And it is awesome NKOTB have a tour in 2019! And I’m going to attend their concert this year. The full list is here: New Kids on the Block tour Montreal. Open the page and maybe we can even visit one of the concerts together!

  5. BBoysea 2 years ago

    Backstreet Boys BSB are an American rock boy band. The band was founded on April 20, 1993 in Orlando, Florida, by Lou Pearlman. Now this is the most successful boy band with more than 130 million records sold worldwide. The group was named after a flea market in Orlando, the “backstreet flea market”. In 2019 BB has more than 50 concerts in the US with their tour. Check concerts at Backstreet Boys tour Sunrise site. Full list of tour dates & concerts!

  6. 1 year ago

    It’s amazing in favor of me to have a web site, which is useful for my knowledge.
    thanks admin

  7. minecraft 1 year ago

    I used to be recommended this web site through my
    cousin. I am not positive whether this post is written through him as no
    one else know such unique about my trouble. You’re incredible!
    Thank you!

  8. minecraft 1 year ago

    After I initially commented I seem to have clicked the -Notify me when new comments are added- checkbox and now every time a comment is added
    I receive 4 emails with the same comment. Is there a means you can remove
    me from that service? Kudos!

  9. minecraft 1 year ago

    Greetings! I know this is kinda off topic but I was wondering which blog platform are you using for this site?
    I’m getting sick and tired of WordPress because I’ve had problems with
    hackers and I’m looking at options for another platform.

    I would be fantastic if you could point me in the direction of a good platform.

  10. JohnnyHit 1 year ago

    I like folk songs! I really do! And my favourite pop-folk band is Johnnyswim! The members Amanda Sudano and Abner Ramirez are about to perform more than 40 concerts for their fans in 2019 and 2020! To know more about Johnnyswim in 2019 visit website Johnnyswim tour. You aren’t going to miss any performance by Johnnyswim this year if you click on the link!

  11. gamefly 1 year ago

    It is truly a nice and useful piece of information. I am glad that you just shared this useful
    information with us. Please stay us up to date like this.
    Thank you for sharing.

  12. ChainHit 1 year ago

    I like EDM bands! I really do! And my favourite EDM band is Chainsmokers! DJs Andrew Taggart and Alex Pall are about to give more than 50 concerts to their fans in 2019 and 2020! To know more about Chainsmokers band in 2019 visit website Chainsmokers tour Cincinatti. You aren’t going to miss any concert in 2020 if you visit the link!

  13. Ezekiel Jenks 1 year ago

    UK Mobile broadband proxies for botting:

  14. Colin Zach 1 year ago

    Simply wish to say your article is as astonishing. The clearness to your put up is just nice and that i can suppose you’re an expert on this subject. Fine along with your permission let me to grab your feed to stay updated with imminent post. Thank you one million and please continue the gratifying work.|

  15. I have read so many content concerning the blogger lovers except this paragraph is in fact a nice post, keep it up.

  16. Breakea 1 year ago

    Breaking Benjamin is my favourite band of 90s. They had so many hit songs! The ones I remember are ‘The Diary of Jane’, ‘Tourniquet’ and their hit ‘So Cold’. These are real masterpieces, not fake ones like today! And it is sooo good that Breaking Benjamin have a tour in 2019-2020! So I’m going to visit Breaking Benjamin concert in 2019. The concert setlist is here: Click on it and maybe we can even visit one of the concerts together!


  18. Karl Conners 1 year ago

    Great items from you, man. I have be mindful your stuff previous to and you are just extremely wonderful. I actually like what you’ve obtained right here, really like what you’re stating and the way in which through which you are saying it. You are making it enjoyable and you still care for to stay it sensible. I can not wait to learn much more from you. This is really a tremendous site.|

  19. My family every time say that I am wasting my time here at net, but I know I am getting knowledge every day by reading such fastidious articles.|

  20. Ashton 12 months ago

    bitcoin-advertising is not a scam use them every week Zero issues Would happy tell all my friends to use them http://5urz4xridgemwrifo56ll4u2ya3kw7zwox327yovufhj55xbhqjua7qd.onion

  21. Filiberto Kirchberg 12 months ago

    Admiring the time and effort you put into your site and in depth information you present. It’s nice to come across a blog every once in a while that isn’t the same outdated rehashed information. Fantastic read! I’ve saved your site and I’m adding your RSS feeds to my Google account.|

  22. dsvjiqbzon 11 months ago


  23. coconut oil what 10 months ago

    Hi everyone, it’s my first pay a visit at this web page, and post is in fact fruitful for me, keep up posting such content.

  24. quest bars cheap 10 months ago

    Great article, exactly what I wanted to find.

  25. ps4 games 10 months ago

    After I originally left a comment I appear to have clicked on the -Notify me when new comments are added- checkbox and now
    every time a comment is added I get four emails with the same comment.

    There has to be a means you can remove me from that service?
    Appreciate it!

  26. ps4 games 10 months ago

    It’s the best time to make some plans for the future and it is time to be happy.
    I have read this post and if I could I wish to suggest you some interesting things or tips.

    Maybe you could write next articles referring to this article.
    I wish to read more things about it!

  27. ps4 games 10 months ago

    It’s hard to come by educated people about this topic,
    but you sound like you know what you’re talking about!

Leave a Comment

Your email address will not be published.

You may also like

Pin It on Pinterest