Nancy Kopp-Hensley is the director of technical marketing for IBM Analytics. Nancy has more than 20 years’ experience working in the data business in many capacities, from development and product management to sales and marketing.
You have been in data warehousing field for more than two decades. How did data warehousing evolve in past 20 years?
Oh wow where do I begin. When I started in Warehousing we were still selling clients on why they should even have one, it was a lot of work and expense to consolidate data, get people to share data for the greater good of the company. Warehousing started to show some serious value in the lines of business for both marketing and financial, and soon clients understood why data driven decisions were better, smarter decisions. It was funny, when we were no longer faced with having to sell the value of a warehouse, we were faced with what I refer to as “data mart madness”, suddenly there were marts popping up all over the business to tackle very specific issues- customer value, retention and recruitment, risk, fraud, and of course compliance. Now we were faced with a different problem- consolidation!
For years we preached the value of the “single source of truth” the Enterprise Data Warheouse, bring all your data here, it’s good for the enterprise. In theory that was all well and good but in actual practice, the EDW was plagued with issues. Funding, sponsorship, stewardship, governance and worst of all, the killer of the EDW – agilty. Let’s face it, you load all your into one place, keep it well managed, goverened and safe, but it’s also under lock and key. As the business become more and more dependent on analytics, they demanded better time to market on the analytics they deemed critical. Guess what happened- yes we were back to data marts popping up ALL OVER but this time they were appliances or on the Cloud, they required less admin, less dependence on IT. Well here we go again. Fact is if you look at the patterns of data warehousing over the years you can clearly see how both the requirements and the demand for more agilty, new data types and self service pushed the industry into uncomfortable places.
When “big data” hit, there was the elephant in the room and everyone asked- will Hadoop replace the data warehouse? While some of us knew that would not happen, there was enough of a pause in the industry for all of the major vendors to refresh their warehouse solution strategy. This is where the Logical Warehouse started to come to play because the warehouse no longer meant one structure but became an ecosystem of sorts to support new data types and new analytics with new deployment methods like appliances and cloud.
Today if you ask me I see the warehouse consisting of the core DW, Hadoop and Cloud, along with all the data integration and governance.
‘Logical data warehouse’ is the phrase used to describe a new approach to data management. What is it? How does it differ from traditional approach?
There are many differences from the traditional EDW world to the LDW world, First, the EDW is no longer one big monolithic structure. We are no longer focused on consolidating and managing the data in one place because in today’s world data is streaming in, it’s multistructured and it needs to be much more accessible. The traditional warehouse was not designed for all those things. It was designed as a single repository for reporting and analysis, it was trusted data but it wasn’t very acccesible, it was highly complex to manage and it was far from agile like the cloud is today.
Also, In the old traditional world we were very focused on keeping transactional and analytic systems separate. We would replicate the transactional data within the warehouse for analytics but we would never consider disturbing the performance of the systems that run our business with analytics nooooo way! If we needed more real time info, we moved that workload to an ODS and replicated the data in more real time, then ran the analytics on top of the ODS. Today clients need real time information and we have new capabilities to analytics against the transactional data without replicating it, moving it or causing an issue with performance.
Additionally Logical Warehouse meant all the data was no longer in one place- it can be in Hadoop, in the transactional systems and in the warehouse. It can also be streaming. The key in LDW is the integration and virtualization of the data across the LDW. Making that more consumable is critical. That’s why products that can help make noSQL look like SQL have become so popular (IBM BigSQL). Clients want to leverage all the data containers within the LDW but they may be challenged with the skillset to do that.
Tell us about some success/failure stories in data warehousing.
Well I don’t hear about too many failures these days but back in the day when clients were focused on consolidation (pulling back in the data marts) there were some epically expensive failures. Why? Because there is no clear and measurable value in consolidationif the data becomes less accessible. The whole reason we build warehouses was to gain intelligence not to hoard data ! We also saw many failures when clients didn’t have clear business questions they were trying to answer, build it and well, they never came- they built their own instead.
Today clients are in what I call “modernization mode” they are adding multustructured capabiltieies with Hadoop and Streaming data, they are building reservoirs and lakes to land the data and make it more accessible. They are leveraging in-memory and Spark to speed analytics, and cloud to make date more accessible to the business (self service). Their challenges are more in understanding how to use the different technologies for the workloads they have. I have seen clients over invest and then pull back on Hadoop, but it wasn’t really a failure of the technology, more of a lack of understaning on how best to leverage it. It used to be so simple, get hardware, pick a database and build out the warehouse. Now you have all types of containers, integration challenges, and hybrid cloud.
We’ve seen some clients have great success in leveraging Hadoop with the warehouse to increase their analytic capabiltiies, there are some great success stories with Cloud and how it can truly help provide self service and agility. There is a strong movment towards self service in general which is not just showing up in deployment but all the way through the acqustion of data itself in the reservoir and the tooling. This is the future and it will be fueled by a combination of technologies- traditional and open source . Clients will use Spark as the turbo charge to in-database analytics, no SQL will easily feed the warehouse from the applications and transactional data will be available for analysis in real time without performance degradation. Not all clients are there today but we see many doing a lot of that now.
Analytics is today’s emerging system of real-time user engagement and interaction. How do you respond to the current need for quicker and accurate insights?
Self service is the future. It’s what’s driving the changes all the way from how the data is acquired to the tools that leverage it so that data is much more democratized and usable by everyone in the enterprise. Making sure you architect with this as a key requirement will help clients engage more users and respond to the business faster.
“Big Data” plays a crucial role in today’s modern Data Warehousing practices. What are the new challenges and promises?
I think I addressed some of that above but I no longer believe in BigData, I think its about how we deal with ALL Data because in many clients the transactional data is the biggest thing they have. BigData which most associate with Hadoop are now integral to the data warehouse since the data warehouse is now much more logical.
Are the current Analytics technologies equipped to confront the exponential growth, availability and use of information in the data-rich landscape of tomorrow? How does the future look like in the coming decade?
Well being from IBM and knowing where we are going with natural language and our focus on self service, I look for the day when we can simply ask the question to the warehouse, I can subsribe to data and check it out of the data library, I have easy to use visualization tools that help me understand the data, find relevance, share my results across my cohorts and tell a story- all without the need of IT or a business analyst.