Data Science

8 common pitfalls that can ruin your prediction

31st May `18, 11:01 AM in Data Science

Do you remember that feeling when you plan everything very precisely, but something happens unexpectedly and ruins your…

Brandon Stanley
Brandon Stanley Contributor

Do you remember that feeling when you plan everything very precisely, but something happens unexpectedly and ruins your plans? It’s always an awkward situation, but it can also be a costly mistake in case it’s related to your business.

Making a poor estimation is not uncommon in big data. According to the research, more than 80% of companies are trying to be data-driven, but only a third say they do it successfully. It seems like huge volumes of information that keep piling up can be a genuine riddle for many business analysts.

In this article, I will briefly explain 8 common pitfalls that can ruin your predictions.

1. Lack a Business Case

Big data can draw meaningful conclusions out of seemingly unrelated information, but you still need a concrete business case to make use of these results. This is the only way to make big data truly applicable. For instance, you cannot simply analyze brand awareness on social media.

Instead, you need to use big data to improve brand image by setting clear parameters such as direct and indirect influence, geolocation, engagement, etc. Once you detect followers’ behavioral patterns, you can adjust social media strategy so as to increase brand awareness.

2. Poor data quality

The outcome of big data analysis depends on the quality of information. This is particularly the case with unstructured and semi-structured data because they need a pre-processing adaptation. Business intelligence managers at Rushmyessay UK explained that you should filter textual information through language correction libraries to polish the content. Image and video data quality are acquired from the source, but you always need quality data to generate accurate results.

3. Data Lifecycle

Timing plays a key role in comparative analytics, but many predictions go terribly wrong because they don’t take data lifecycle into the calculation. Let’s say you started importing a product in April 2017, so there are no sell-in parameters for the first quarter of the year. If your import prediction for Q1 2018 equals zero, you’ve made a big mistake.

It only suggests you should add more indicators to the research and come up with a more accurate estimation. For example, you could compare this product’s sellout with similar items you already had in your portfolio. Such data lifecycle awareness will lead you to the completely different outcome.

4. False Aggregations

Creating complex forecasts, you will often need to take into account individual events of a larger phenomenon. Some analysts don’t realize it and make false aggregations, which is the wrong way to analyze multilevel processes. If the first phase of an event is likely to occur in February, while the last should take place in October, the process itself will not end in June. There is no in-between result, so don’t make this kind of false aggregations.

5. Overfitting

While some companies create forecasts based on low-quality data, others make the mistake of overfitting. They add various highly specific indicators to the formula, but still expect to obtain a useful general prediction.

To put it simply, a good prediction would be to say that Cleveland Cavaliers win if LeBron James scores more than 40 points. On the other hand, overfitting happens if you claim that Cavs always win when:

- James scores 41-43 points
- The number of spectators is over 16 thousand
- The opponent ranks 3rd in Western Conference
- The number of fouls does not go under 31

6. Forecast What You Can Measure

Big data operates with huge resources of information, but it doesn’t mean you can use it to extrapolate everything using the same formula. On the contrary, you can only create forecasts based on measurable indicators. If you have to design daily transportation and delivery plans, the setup is completely different than weekly predictions. But in case you rely on weekly projections, your day-to-day planning will probably end up chaotic. Read the odds of foretelling rains and why monsoon prediction is hard.

7. Don’t Realize Data Complexity

Data pile up in different formats, leaving most people confused and unprepared. For instance, you might want to analyze social impact of the brand, Twitter and LinkedIn in particular. The two platforms are completely opposite in nature – while tweets take not more than 140 characters, LinkedIn posts are usually much longer and descriptive. Each data set here demands a different combination of processing cycles, so you must adapt it to gather the same type of results for both networks.

8. Not Measuring Big Data Efficiency

Big data is not perfect and you need to measure its efficiency. First of all, it will help you to understand the accuracy of the prediction model. Secondly, the business is changing and you will have to adapt your big data technique at some point. And thirdly, if your forecasts turn out to be too bad or too precise, there is probably something wrong with it, so you should find and fix the error.


Big data has the potential to give a fresh boost to your business, but it can also ruin it in case you make false predictions. A lot of data analysts make mistakes while designing plans and projections, so you need to be aware of the most frequent cases.

In this post, we showed you 8 common pitfalls that can ruin your predictions. Did you face any of the problems already? Do you know other examples of false big data estimations? Share your experiences in comments and we’ll be glad to discuss it!