I read an article last week by @gigabarb on Fortune.com - Hadoop Adoption Hurt by Hype. I thought it was interesting, because…while everyone believes its a valuable solution…there is little consensus on WHEN it will catch fire. I mean…its pretty complex.
Big Data solutions are all the rage, at least in the media. There are gobs of info on it, tons of press and marketing messages screaming for analytics. Just watch Sportscenter…analytics in action.
The question is…What kind of analytics does your customer need? Foundational Business Intelligence? Real-Time Operational Performance Management? Analytics on sales data, operations, customer service….marketing? What is the pain point and how can you drill down the the business priority? What is important to the business…and what solution can help them achieve their number 1, 2, 3 objectives.
I shared this with my Analytics Sales Leader, Richard Novorro, and this was his response:
There’s an aphorism that’s been around for a while but it really applies here. It reads: “The future is already here! It’s just unevenly distributed. (Hold that thought and we’ll tie that back in a moment.)
If I were speaking with a prospect who told me that Hadoop adoption is simply not a priority, here’s what I might say:
While Gartner’s assessment of the Hadoop adoption rate tells us that it’s slower than expected, there is no question that Hadoop will become a priority. It’s just a question of when.
We all have a natural tendency to shy away from things that are new and different or unfamiliar. But consider that Hadoop was created for a purpose.
Given the exponential explosion in the amount of data, traditional relational databases have been stretched to their limit. No amount of partitioning, sharding, indexing, in memory techniques can fix this problem. (As Scotty said to Captain Kirk on many a Star Trek episode: Captain, I can’t change the laws of physics.)
Even columnar data bases, while in many cases, faster than relational databases, still struggle with compute resource requirements.
Enter Hadoop. It’s common knowledge that it can store lots of data and crunch it as well. But consider a few salient points about the HDFS architecture:
Storage costs are about ¼ the costs of traditional data storage
YARN 2.0’s massively parallel grid computing capabilities have a pervasive and cascading impact on development teams and the associated resources/programming requirements.
Because of points 1 & 2 we now see the emergence of Hadoop sandboxes or Data Lakes. We are seeing the beginning of a new era in data modeling. Since vast volumes of volatile data can now be stored cheaply and accessed rapidly via grid computing (10 guys painting the room in one hour rather than one guy painting the room in 10 hours). All of this without a metadata layer – the most time consuming process in any data warehousing implementation.
No metadata means analysts can model data dynamically/on the fly employing an ELT paradigm rather than the traditional and highly labor intensive ETL model where a data warehouse is built by many iterations of different metadata configurations in a trial and error fashion.
All of this allows for far more longitudinal analytics because storing 10 years of data rather than 3 is no problem.
Use cases abound – forget about finding a needle in a haystack – with machine learning you can find needles across haystacks.
The ability to surface trends and patters unobservable by humans means that the depth of analytic insight improves by an order of magnitude.
So back to the main question: Why will Hadoop become a priority?
The answer: Because the depth and quality of the data, and the flexibility with which it can be analyzed means no matter what analytic engine ingests it, the resulting analytics will be far more insightful, far more proactive, and a deeper pervasive impact on every company’s bottom line.
In short, if you don’t think Hadoop should be a priority, don’t worry about it. But I promise you one thing: Your competitors will and you’ll be at a disadvantage wondering about your eroding market share.