Machine learning is a field of computer science where algorithms improve their performance at a certain task as more data are observed.To do so, algorithms select a hypothesis that best explains the data at hand with the hope that the hypothesis would generalize to future (unseen) data. Take the left panel in the figure in the header, the crosses denote the observed data projected in a two-dimensional space — in this case house prices and their corresponding size in square meters. The blue line is the algorithm’s best hypothesis to explain the observed data. It states “there is a linear relationship between the price and size of a house. As the house’s size increases, so does its price in linear increments.” Now using this hypothesis, I can predict the price of an unseen datapoint based on its size. As the dimensions of the data increase, the hypotheses that explain the data become more complex.However, given that we are using a finite sample of observations to learn our hypothesis, finding an adequate hypothesis that generalizes to unseen data is nontrivial. There are three major pitfalls one can fall into that will prevent you from having a generalizable model and hence the conclusions of your hypothesis will be in doubt.