Four cognitive biases that affect big data analysis

Data collection methods have evolved dramatically – especially with the ability to collect big data. The removal of human error by quantitatively logging information for statistical analysis improves the validity of the data collected, which intimates that whatever the data is used for will be more reliable. However, cognitive bias considerations still remain in the analysis of the data, which can call into question the utility of the recommendations formed from the evaluated data. What are they most common ones and how do we start tackling them?

Confirmation Bias

Confirmation bias refers to the need to prove a hypothesis and therefore to lean heavily on data that might lead this way. Confirmation bias acts to skew results in that the analysed data doesn’t actually represent the full picture of the scenario. For example, a data collection may want to prove that Twitter users were more engaged with a TV show while it was on air – and may neglect to take into account that the greater cumulative engagement occurred in the days after viewers had had a chance to digest the episode. So recommendations could result in companies producing show-related online materials at the wrong time. One of the most galling examples of confirmation bias occurred after the 2016 US Presidential election, where polls were gathered based on a Clinton win, ignoring evidence that might prove otherwise.

Availability Bias

The availability heuristic is just one of a number of phenomena that affect decision-making in daily life, that most people are unaware is even taking effect. Essentially, availability bias refers to the way in which people make decisions based only on information readily available to them. For example, a data collection may discover that respondents spend time looking at a website’s blog – and will use this information to develop the blog in order to convert to a sale or returning customer. However, the availability bias may cause other factors to be neglected due to the information that the blog is successful being the only piece relied on. For instance, the blog could be successful but could create very little engagement, meaning solely developing the blog would create no conversions. Value Walk’s article on behavioural finance helping stock market investors includes the image below, which outlines the availability heuristic in simple terms – and shows how the perspective needs to be shifted to take into account all the information available. The blog would be the small yellow circle, and the rest of the website would be the larger blue circle in the example.
cognitive biases

Selection Bias

Selection bias refers to the sample the data has been collected from being unrepresentative of people on the whole. Imagine a console game has collected data on how long players spend on the game and then begin to use this in their game development. The data only looks at existing users, and doesn’t take into account factors that might convert a non-user to a fan of the game.  For example, a survey found that Xbox gamers were overestimating the prevalence of the “red ring of death” console fault due to the likelihood of those who had experienced it to complete the survey.

Confounding Variables

One of the most dangerous biases results when a correlative relationship between two variables is actually only true when combined with an overlooked confounding variable. Confounding variables cannot be separated from the variables that lead to the correlation. For example, a data collection may discover that a commercial for a children’s theme park that airs during prime time on a children’s channel, which is broadcasting a show about the theme park itself leads to website check-ins. As the scientist cannot state empirically that it is either the commercial or the TV show itself leading to the higher rate of check-ins, the data would be impacted by a confounding variable. Ensure that all data collected can prove a relationship between two variables without being influenced by anything external.

Data collection can be time-consuming and unruly – but completing it successfully can pay dividends for a business, especially with the impact of big data. Biases can be mitigated against to ensure that the statistical recommendations have a low margin of error.

  1. Hey there! Do you know if they make any plugins to help with Search Engine Optimization? I’m trying to get my blog to rank for some targeted keywords but I’m not seeing
    very good results. If you know of any please share.
    Thank you!

  2. Greate pieces. Keep posting such kind of info on your blog.
    Im really impressed by your site.
    Hey there, You have performed an incredible job. I will definitely digg it and
    in my opinion recommend to my friends. I’m confident they will be benefited
    from this web site.

  3. 1 year ago

    Good day! Do you know if they make any plugins
    to help with Search Engine Optimization? I’m
    trying to get my blog to rank for some targeted keywords but I’m not seeing very good success.
    If you know of any please share. Cheers!

  4. quest bars cheap 1 year ago

    It’s remarkable to pay a visit this site and reading the views of
    all mates concerning this post, while I am also keen of getting familiarity.

  5. coconut oil 1 year ago

    Attractive element of content. I just stumbled upon your
    weblog and in accession capital to say that I get actually enjoyed account your
    weblog posts. Any way I’ll be subscribing on your augment or even I achievement you get right of entry to persistently quickly.

Leave a Comment

Your email address will not be published.

You may also like

Pin It on Pinterest