Revolutionary. That pretty much describes the data analysis time in which we live. Businesses grapple with huge quantities and varieties of data on one hand, and ever-faster expectations for analysis on the other. The vendor community is responding by providing highly distributed architectures and new levels of memory and processing power. Upstarts also exploit the open-source licensing model, which is not new, but is increasingly accepted and even sought out by data-management professionals.
Apache Hadoop, a nine-year-old open-source data-processing platform first used by Internet giants including Yahoo and Facebook, leads the big-data revolution. Cloudera introduced commercial support for enterprises in 2008, and MapR and Hortonworks piled on in 2009 and 2011, respectively. Among data-management incumbents, IBM and EMC-spinout Pivotal each has introduced its own Hadoop distribution. Microsoft and Teradata offer complementary software and first-line support for Hortonworks’ platform. Oracle resells and supports Cloudera, while HP, SAP, and others act more like Switzerland, working with multiple Hadoop software providers.