Hadoop Project - Choosing the best SQL-on-Hadoop Engine

Hadoop Project - Choosing the best SQL-on-Hadoop Engine

In this project, we will take a look at three different SQL-on-Hadoop engines - Hive, Phoenix, Impala and Presto.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Swati Patra

Systems Advisor , IBM

I have 11 years of experience and work with IBM. My domain is Travel, Hospitality and Banking - both sectors process lots of data. The way the projects were set up and the mentors' explanation was... Read More

Arvind Sodhi

VP - Data Architect, CDO at Deutsche Bank

I have extensive experience in data management and data processing. Over the past few years I saw the data management technology transition into the Big Data ecosystem and I needed to follow suit. I... Read More

What will you learn

Apache Phoenix, how it works and how to install it.
Presto, how it works and how to install it.
Impala and how to use it.
Using SQL-on-Hadoop in data processing framework like Spark.
Compare the performance of these various engines.
Lookup other SQL-on-Hadoop out there.

Project Description

The hype around SQL-on-Hadoop had died down and now people want more from these SQL-on-Hadoop engines. More requirements like real-time queries, support from various file formats, support from user-defined functions and support from various client connectivities.

In this Hackerday, we will take a look at three different SQL-on-Hadoop engines - Hive, Phoenix, Impala, and Presto. While our expectations for hive should be relatively expected, we want to to see what it will take to get to adopt other SQL-on-Hadoop engines in our big data infrastructure.

After this Hackerday session, you should be able to make a choice about these engines, make the choice with a real informed decision and be able to extend these to your data processing infrastructure.

Similar Projects

In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

The goal of this IoT project is to build an argument for generalized streaming architecture for reactive data ingestion based on a microservice architecture. 

Analyze clickstream data of a website using Hadoop Hive to increase sales by optimizing every aspect of the customer experience on the website from the first mouse click to the last.

Curriculum For This Mini Project