Most discussions on organizing Big Data center on repository frameworks – specifically Hadoop clusters and MapReduce frameworks. This technology-focused view often overlooks the most important question, “What are you planning to do with the data you’re collecting?”
Since every answer will be different, this means there’s no one-size-fits-all solution. Success lies in recognizing the different types of Big Data sources, using the proper mining technologies to find the treasure within each type, and then integrating and presenting those new insights appropriately according to your unique goals, to enable your organization to make more effective steering decisions.
A Taxonomy of Big Data sources and technologies
For this process let’s define the two buckets for organizing your Big Data – the sources for Big Data, and the technologies to mine those sources.
Here are the Top 10 Big Data source types and the corresponding mining techniques that might be applied to find your gold nuggets.
1. Social network profiles—Tapping user profiles from Facebook, LinkedIn, Yahoo, Google, and specific-interest social or travel sites, to cull individuals’ profiles and demographic information, and extend that to capture their hopefully-like-minded networks. (This requires a fairly straightforward API integration for importing pre-defined fields and values – for example, a social network API integration that gathers every B2B marketer on Twitter.)
2. Social influencers—Editor, analyst and subject-matter expert blog comments, user forums, Twitter & Facebook “likes,” Yelp-style catalog and review sites, and other review-centric sites like Apple’s App Store, Amazon, ZDNet, etc. (Accessing this data requires Natural Language Processing and/or text-based search capability to evaluate the positive/negative nature of words and phrases, derive meaning, index, and write the results).