IBM loves Big Data. The bigger it gets, the more servers, storage, and services Big Blue would like to sell you (a lot more, please). But the volumes involved have already grown so big that IBM’s own researchers struggle to get a handle on it.
Last year, for example, IBM fellow Laura Haas asked one of her colleagues at the company’s Almaden research center in Silicon Valley why he wasn’t using bigger data sets. Because, he replied, it takes 80% of my time just to prep the data I have. Haas realized that the more IBM’s research agenda was consumed by analytics, the more time and energy its experts would spend struggling with expanding data sets, slowing down the pace of discovery.
The obvious thing was to hand the volumes in question over to dedicated data scientists, but removing researchers from the loop would only make things worse. Plus, it seemed to cut against the grain of Big Data, whose value isn’t governed by some function of Moore’s Law or Kryder’s Law in terms of the linear expansion of storage capacity or the falling costs of sensors.
Rather, it’s more a function of Metcalfe’s Law, which states that the value of a network is the square of the number of connected devices; the value is in the exponentially increasing connections, not the nodes. The same is true of IBM’s people, too. Instead of sidelining its researchers, how could it bring more eyes–and different ones–to opaque data sets being crunched in the cloud?