Call it the Hadoop Swiss Army knife of cluster computing frameworks. The Apache Software Foundation just rolled out Apache Spark v1.0, which it’s calling a “super-fast, open-source, large-scale Relevant Products/Services data Relevant Products/Services processing and advanced analytics Relevant Products/Services engine.”
That’s a mouthful, indeed, but why has the technology been dubbed a Hadoop Swiss Army knife? Because Spark lets developers write apps in Java, Scala or Python with a built-in set of more than 80 high-level operators. Apache claims Spark makes possible programs that can run up to 100 times faster than Apache Hadoop MapReduce in memory.
“Apache Spark is an important big data technology in delivering a high-performance analytics solution for the IT industry and satisfying the fast-growing customer Relevant Products/Services demand,” said Michael Greene, vice president and general manager of System Technologies and Optimization at Intel.
Who Does this Target?
Apache Spark aims at groups that need to tap into machine learning, interactive queries, and stream processing. Spark is fully compatible with Hadoop’s Distributed File System, HBase, Cassandra, and any Hadoop storage Relevant Products/Services system, so existing data is immediately available in Spark. Spark also promises support for SQL queries, streaming data and complex analytics, including machine learning and graph algorithms, right out of the box.