Hadoop Project - Choosing the best SQL-on-Hadoop Engine

Hadoop Project - Choosing the best SQL-on-Hadoop Engine

In this project, we will take a look at three different SQL-on-Hadoop engines - Hive, Phoenix, Impala and Presto.
explanation image

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 102+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

profile image

Mohamed Yusef Ahmed linkedin profile url

Software Developer at Taske

Recently I became interested in Hadoop as I think its a great platform for storing and analyzing large structured and unstructured data sets. The experts did a great job not only explaining the... Read More

profile image

Dhiraj Tandon linkedin profile url

Solution Architect-Cyber Security at ColorTokens

My Interaction was very short but left a positive impression. I enrolled and asked for a refund since I could not find the time. What happened next: They initiated Refund immediately. Their... Read More

What will you learn

Apache Phoenix, how it works and how to install it.
Presto, how it works and how to install it.
Impala and how to use it.
Using SQL-on-Hadoop in data processing framework like Spark.
Compare the performance of these various engines.
Lookup other SQL-on-Hadoop out there.

Project Description

The hype around SQL-on-Hadoop had died down and now people want more from these SQL-on-Hadoop engines. More requirements like real-time queries, support from various file formats, support from user-defined functions and support from various client connectivities.

In this Hackerday, we will take a look at three different SQL-on-Hadoop engines - Hive, Phoenix, Impala, and Presto. While our expectations for hive should be relatively expected, we want to to see what it will take to get to adopt other SQL-on-Hadoop engines in our big data infrastructure.

After this Hackerday session, you should be able to make a choice about these engines, make the choice with a real informed decision and be able to extend these to your data processing infrastructure.

Similar Projects

Bitcoin Mining on AWS - Learn how to use AWS Cloud for building a data pipeline and analysing bitcoin data.

In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

In this hive project, you will work on denormalizing the JSON data and create HIVE scripts with ORC file format.

Curriculum For This Mini Project