Big data becomes a relevant topic in many companies this year. Although there is no standard definition of the term „big data“, Hadoop is the de facto standard for processing big data. Almost all big software vendors such as IBM, Oracle, SAP, or even Microsoft use it. However, when you have decided to use Hadoop, the first question is how to start and which product to choose for your big data processes. Several alternatives exist for installing a version of Hadoop and realizing big data processes. This article discusses different alternatives and recommends when to use which one.
Alternatives for Hadoop Platforms
The following picture shows different alternatives for Hadoop platforms. You can either install just the Apache release, choose one of several distributions of different vendors, or you can decide to use a big data suite. It is important to understand that every distribution contains Apache Hadoop, and that almost every big data suite contains or uses a distribution.