Machine Learning

Three mistakes to avoid if you are new to Machine Learning

09th Oct `14, 04:26 PM in Machine Learning

Machine learning is a field of computer science where algorithms improve their performance at a certain task as…

BDMS
Guest Contributor
 

Machine learning is a field of computer science where algorithms improve their performance at a certain task as more data are observed.To do so, algorithms select a hypothesis that best explains the data at hand with the hope that the hypothesis would generalize to future (unseen) data. Take the left panel in the figure in the header, the crosses denote the observed data projected in a two-dimensional space — in this case house prices and their corresponding size in square meters. The blue line is the algorithm’s best hypothesis to explain the observed data. It states “there is a linear relationship between the price and size of a house. As the house’s size increases, so does its price in linear increments.” Now using this hypothesis, I can predict the price of an unseen datapoint based on its size. As the dimensions of the data increase, the hypotheses that explain the data become more complex.However, given that we are using a finite sample of observations to learn our hypothesis, finding an adequate hypothesis that generalizes to unseen data is nontrivial. There are three major pitfalls one can fall into that will prevent you from having a generalizable model and hence the conclusions of your hypothesis will be in doubt.

Occam’s Razor

Occam’s razor is a principle attributed to William of Occam a 14th century philosopher. Occam’s razor advocates for choosing the simplest hypothesis that explains your data, yet no simpler. While this notion is simple and elegant, it is often misunderstood to mean that we must select the simplest hypothesis possible regardless of performance.

Read More
MORE FROM BIG DATA MADE SIMPLE