The field of data analytics is constantly evolving. The idea of data playing a central role across the modern organization would have seemed far-fetched just a few years ago. In this article, I will provide a quick rundown of the current state of data and suggest that the next big development will be the proliferation of streaming data.
A Brief History of Business Data
Prior to the digital age, the term ‘business data’ would primarily refer to financial records. Indeed, before there were data analysts, there were accountants and finance professionals who would peer over profit-and-loss statements, create forecasts and crunch the numbers to produce reports for managers, board members or shareholders.
The explosive growth of the world wide web and digital systems at the turn of the last millennium created a new wealth of business data. Software tools were being embedded in diverse business processes. Customer relationship management (CRM), helpdesk, enterprise resource planning (ERP) and web analytics, to name a few. And these tools generated data, which could then be processed and explored in order to extract insights and improve said processes.
Along with the growth in data, came a broad host of tools, techniques, and proficiencies for data analysis. These were mostly centered around relational databases – large stores of structured data that can be queried using SQL. From these core technologies emerged hundreds of specialized tools for storing, processing and visualizing data.
In the past decade or so, business data is once again evolving and changing shape. While the sources we mentioned above have become a critical part of modern organizations’ decision-making, they are only part of the picture. Now more and more resources being devoted to streaming data. Let’s examine this concept and explain its centrality to the future of data and analytics.
From Tables to Streams
While digital transformation has introduced a diverse set of new data sources, each with its own quirks and challenges, there is a common theme connecting them: in the most part, business data is tabular and structured.
Whether it’s financial statements, salesperson performance stats or support ticket resolution times, business data would typically be generated by a handful of sources and easily represented in spreadsheets and relational databases. As data accumulates, rows, columns, and tables are added to represent historical information or additional business processes.
An example of tabular data. Source: Wikipedia
Streaming data, on the other hand, follows a different set of rules. As the name implies, it is generated by a stream of events that occur continuously. While each of these events is small in size, they quickly accumulate to a massive amount of data as they are being created constantly. Furthermore – this data is not tabular and unstructured, typically landing in thousands or millions of JSON files which might look something like this:
Example of a JSON file, via Wikipedia
Growth in Streaming Data driven by SaaS, Connected Devices and Machine Learning
A few years ago, very few organizations were working with streaming data, but that number is on the rise:
Source: Google Trends
Three largescale technological trends are driving increased interest in streaming data:
A 2011 Wall Street Journal article famously proclaimed that “software is eating the world”, and that trend has by no means reversed. Today’s software is no longer limited to Silicon Valley – enterprises in every industry, from retail to banking, are developing software tools and applications to improve internal processes or provide better service to their consumers.
With software comes continuous streaming data. Server logs, clickstream, and granular usage statistics. And in an age where every large company is also a software company, they will eventually amass a large volume of streaming data.
IoT and connected devices
while some of the hype around the internet of things has died down, the technology itself has actually achieved mainstream adoption in industries such as transport, energy, and manufacturing. Modern machinery, power plants, and infrastructure are inundated with sensors that produce an endless stream of data.
Enterprises in these verticals are only now beginning to truly create value from machine data, but as measurement and analysis tools grow in sophistication, this trend is likely to expand significantly.
Artificial intelligence and machine learning
Neural networks, deep learning and predictive decisioning algorithms all rely on large-scale stream processing, identifying trends and outliers among thousands or millions of similar data events.
While here too there is plenty of hype to go around, few experts would disagree that these technologies are going to play a major part in both industry and science in the coming decade. As AI and ML enter the mainstream, we are likely to see increasing demand for tools and skilled personnel for capturing, processing and structuring streaming data (hence the oft-cited data scientist shortage).
Future-proofing your Data Operations
The trends we just described are likely to either continue their current growth trajectory or significantly accelerate – which is why forward-thinking organizations should already be incorporating streaming data into their analytics strategy. This is a long-term process that doesn’t begin or end with purchasing another set of technologies. Streaming data presents a unique set of challenges, and the traditional ways of analyzing structured data – SQL, databases and business analytics tools – often can’t be used without extensive infrastructure work.
However, planning is usually better than improvising. Enterprises that want to ensure their data and analytics systems will continue to provide value in three, five or ten years should already be preparing for a future in which streaming data plays a major role.