1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com

HCatalog Providing interoperability across data processing tools such as Pig, MapReduce, and Hive?

Which will be the best Map Reduce processing tool, Map Reduce(java), Pig, Hive, HBase, Storm or Spark? and which is more promising that would one can invest time to be an expert?

4 Answer(s)


hi Edgar,

Pig, Hive, Hbase, Storm and Spark are various tools that are available on top of hadoop for a various purposes

Pig - for handling data flows, mostly for ETL kind of processing
Hive - SQL support but has long-latencies
Hbase - is the distributed database for hadoop
Storm - Real-time distributed computational system
Spark - is a new technology which promises to provide faster distributed processing as compared to MapReduce.

To start with you need to be aware and have working knowledge of Pig, Hive , Hbase and MapReduce.

Hope this helps



Comparing Pig and Hive


a = LOAD 'nyse' USING org.apache.hcatalog.pig.HCatLoader();
b = FILTER a BY stock_symbol =='IBM' ;
c = group b all;
d = foreach c generate AVG(b.stock_volume);
dump d;

In SQL (Hive)

select AVG(stock_volume) from nyse where stock_symbol =="IBM"



Hi Edgar,
It is cool to see you moving ahead of the course content. Way to go :)

Will talk about Hive next in our course on Monday :).

A small correction, it is not SQL (Hive) it is Hive Query Language (HQL) it is like SQL but not exactly SQL.

Also you will notice that The core of Hadoop is the JAVA APIs that we talked about. As hadoop kept growing in popularity, people found needs to get different auidences to use it and build relevant tools that made it easier for a larger audiences to leverage Hadoop.

So you can notice the evolution that I keep talking about in the class

Core Hadoop API in JAVA --- > Pig Scripts ---> Hive
Manual or script based upload of files -> Flume and Sqoop for Batch upload of semi structured and structured data respectively.

Then to work with Large data sets at the scale that is possible with Hadoop you will See Hbase as a nosql solution in Hadoop.

While all these were fine with Hadoop 1. Hadoop 2 is a complete level higher with a major re-architecture that make the hadoop1 of mechanism batch processes and map reduce irrelevant.

With the hadoop2 re-architecture we will see that hadoop hit a new level of possibilities with Real time data processing, Event based systems, etc.



But for us to appreciate all these and differentiate ourselves from a lot of others, it would be great to understand and spend time on the basics and move from there.

In real difficult situations, the person who understands the basics and the core of how things are built and work differentiates himself/herself in the hour of need. So all time you spend on the basics will be time well invested.


Your Answer

Click on this code-snippet-icon icon to add code snippet.

Upload Files (Maximum image file size - 1.5 MB, other file size - 10 MB, total size - not more than 50 MB)