Hadoop

Hadoop Glossary: 20 most important terms

This is a list of most important Hadoop terms you need to know and understand, before going into the Hadoop eco-system. [To read about top 10 most popular myths about Hadoop, click here.]

Most important Hadoop terms

Apache or Apache Software Foundation (ASF): A non-profit software foundation set up to support open source software projects. Apache projects are protected by an ASF license that provides legal protection to volunteers who work on Apache products.

Apache Hadoop: An open source platform that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. The platform particularly suited to large volumes of unstructured data such as Facebook comments and Twitter tweets, email and instant messages, and security and application logs.

Apache Spark: An open-source data analytics cluster computing framework, originally developed in the AMPLab at UC Berkeley. It is built on top of the Hadoop Distributed File System and has much faster performance compared to MapReduce. It provides high-level APIs in Scala, Python and Java.

Flume: A service for collecting, aggregating, and moving large amounts of log and event data into Hadoop.

Hadoop Common: Usually only referred to by programmers, Hadoop Common is a common utilities library that contains code to support some of the other modules within the Hadoop ecosystem. When Hive and HBase want to access HDFS, for example, they do so using JARs (Java archives), which are libraries of Java code stored in Hadoop Common.

HBase: An open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data.

HDFS: An acronym for “Hadoop Distributed File System”, which breaks large application workloads into smaller data blocks that are replicated and distributed across a cluster of commodity hardware for faster processing.

Hive: A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. It allows you to query data using a SQL-like language called HiveQL (HQL).

HiveQL (HQL): A SQL like query language for Hadoop used to execute MapReduce jobs on HDFS.

HUE: A browser-based desktop interface for interacting with Hadoop.

Impala: An SQL query engine with massive parallel processing (MPP) power, running natively on the Apache Hadoop framework. It shares the same flexible file system (HDFS), metadata, resource management and security frameworks as used by other Hadoop ecosystem components.

JobTracker: the service within Hadoop which distributes MapReduce tasks to specific nodes in the cluster.

MapReduce: A software framework for easily writing applications that process vast amounts of data (multi-terabyte data-sets) in parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner. Hadoop acts as a platform for executing MapReduce.

NameNode: the core of the HDFS file system. The NameNode maintains a record of all files stored on the Hadoop cluster.

Oozie: A workflow engine for Hadoop.

Pig: A high level programming language for creating MapReduce programs used within Hadoop.

Sqoop: A tool designed to transfer data between Hadoop and relational databases.

Whirr: A set of libraries for running cloud services. It’s ideal for running temporary Hadoop clusters to carry out a proof of concept, or to run a few one-time jobs.

YARN: a resource manager for Hadoop 2. YARN is short for “Yet another resource negotiator”.

ZooKeeper: Allows Hadoop administrators to track and coordinate distributed applications.

7 Comments
  1. gamefly free trial 6 months ago
    Reply

    Hi everyone, it’s my first pay a quick visit at this website,
    and paragraph is really fruitful in support of me, keep up
    posting such articles.

  2. gamefly free trial 6 months ago
    Reply

    Hey! I know this is kinda off topic but I was wondering which blog platform are you
    using for this site? I’m getting tired of WordPress because I’ve had issues with
    hackers and I’m looking at options for another
    platform. I would be fantastic if you could point me in the direction of a good platform.

  3. minecraft games 3 months ago
    Reply

    Awesome issues here. I am very glad to see your article.
    Thanks a lot and I’m looking ahead to contact you.

    Will you please drop me a mail?

  4. Hi there, I enjoy reading all of your article post. I wanted to write a
    little comment to support you.

  5. a coconut oil 3 weeks ago
    Reply

    I just like the helpful information you supply to your
    articles. I’ll bookmark your blog and test
    once more right here regularly. I am reasonably certain I’ll learn a lot of new stuff right right here!
    Best of luck for the next!

  6. then coconut oil 2 weeks ago
    Reply

    This is my first time visit at here and i
    am actually happy to read everthing at alone place.

  7. quest bars cheap 7 days ago
    Reply

    When some one searches for his vital thing, thus he/she
    needs to be available that in detail, so that thing is maintained over here.

Leave a Comment

Your email address will not be published.

You may also like

Pin It on Pinterest