Analytics

A poem on Crayon’s Data Factory

04th Aug `16, 01:03 PM in Analytics

For years, big data has been one of the hottest buzzwords across all industries. Big data is the…

Abhishek Singh
Abhishek Singh Contributor
Follow

For years, big data has been one of the hottest buzzwords across all industries.

Big data is the term used to describe the process of analyzing complex data sets to discover information that can help make better decisions or find certain patterns that were previously unknown.

At Crayon, we have a framework that cleans, transforms, and processes big data using technologies like Amazon Server’s,  HDFSHivePig, and Spark. This framework is what we call the Data Factory.

 The Data Factory

Crawling on the web pages

Stripping the data down

Packing them in Avro or Json fabric

Taking them to Hadoop’s town

Diving in Amazon’s server waters

Each has a different name

Boiling them to required temperatures

Setting up the factory to start the game

Now starts the journey of data

To take a shape; to get some life

They first get addresses in the config

Then get eaten by pig

Pig then grunts and says aloud

“Data are now clean; have some proud”

Then data get some new clothes

For the tough journey they strive

They get the home of attractive rows and columns

We call it in general; the hive

Adding to it some more data

From other sources; manually curated

Cleaning data once more and making them shine

Getting data into shape – long awaited!

Then a lot of Queries are asked

We enrich the data; adorn them with N-grams

All our data then get abode in one place

Thus we drive the iterations, creating a new database

Finally, we bid adieu to data, and present them a new gown

We walk with them to the corner of hive’s town

Thus, we welcome, greet and solve data’s each mystery

And they call us with love, “The data factory”

MORE FROM BIG DATA MADE SIMPLE