Sectors

Putting Spark to Use: Fast In-Memory Computing for Your Big Data Applications

22nd Nov `13, 01:11 PM in Sectors

Our thanks to Databricks, the company behind Apache Spark (incubating), for providing the guest post below. Cloudera and…

BDMS
Guest Contributor
 

Our thanks to Databricks, the company behind Apache Spark (incubating), for providing the guest post below. Cloudera and Databricks recently announced that Cloudera will distribute and support Spark in CDH. Look for more posts describing Spark internals and Spark + CDH use cases in the near future.

Apache Hadoop has revolutionized big data processing, enabling users to store and process huge amounts of data at very low costs. MapReduce has proven to be an ideal platform to implement complex batch applications as diverse as sifting through system logs, running ETL, computing web indexes, and powering personal recommendation systems. However, its reliance on persistent storage to provide fault tolerance and its one-pass computation model make MapReduce a poor fit for low-latency applications and iterative computations, such as machine learning and graph algorithms.

Read More
MORE FROM BIG DATA MADE SIMPLE