Data quality: the ugly duckling of big data?

TEKsystems (a subsidiary of Allegis Group, a private talent management firm) performed a big data survey in 2013 that revealed 60 percent of IT leaders believed their organizations lacked accountability for data quality, and more than 50 percent of IT leaders questioned the validity of their data.
A big data cloud vendor told me in early 2014 that it routinely used its product with data it uploaded from clients that was less than clean. In the process of uploading data to the cloud so big data analytics can be performed, the client gets a screen with different data fields and highlights the fields that he wants to upload to the cloud, and presses an Upload button. Data is then matched from the input file to fill in the fields for analytics that the client has selected. Within minutes, the client receives a full set of analytics that come with both summary charts and drilldown capabilities. In the process, however, the client might not get everything he wants. The primary reason is that there invariably are fields that he originally requested that the data he has furnished is unable to fill. The situation is symptomatic of data that is not “clean enough” to fully populate the requested fields for an analytics query.

Leave a Comment

Your email address will not be published.

You may also like

Crayon Yoda

Pin It on Pinterest