Hadoop Project for Beginners-SQL Analytics with Hive

Hadoop Project for Beginners-SQL Analytics with Hive

In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.
explanation image


Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews
profile image

Mohamed Yusef Ahmed linkedin profile url

Software Developer at Taske

Recently I became interested in Hadoop as I think its a great platform for storing and analyzing large structured and unstructured data sets. The experts did a great job not only explaining the... Read More

profile image

Camille St. Omer linkedin profile url

Artificial Intelligence Researcher, Quora 'Most Viewed Writer in 'Data Mining'

I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More

What will you learn

Roadmap of the project
Understanding Serializing and Deserializing and how does it works
Setting up the environment in Cloudera Manager
Downloading and understanding the dataset
Understanding the schema of the dataset
Moving the data from MySQL to HDFS
Data ingestion/transformation using Sqoop, Spark, and Hive
Creating and executing Scoop Job
Using Append to increase the performance and speed of loading the data to HDFS
Creating your Hive table and troubleshooting it
Using Parquet and Xpath to access schema
Writing aggregate and Select queries using UDAFs.
Hive versus MySQL database
Rollup and Cube in context of Grouping Sets Aggregation using windowing functions.
Query optimizations in Hive

Project Description

In this hive project, we want to take a deeper dive into some analytical features in Hive. Using SQL is still very dominant and will remain so for the nearest features. Most big data tools have been adapted to allow users interact with them using the familiar SQL language. This is because of years of knowledge and skill that has gone into training, acceptance, tooling, standards development and re-engineering. So in many cases, using these cool features of SQL to access data solves a lot of analytical questions without ever needing us to resort to machine learning, BI or data mining.

In this big data project, we want to look at these features in Hive that allows us to perform analytical queries over large datasets.

We will be using the adventure works dataset in a MySQL dataset. Therefore, there will be a need to ingest and transform the data before we proceed to analytics.

Similar Projects

The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

In this project, we will take a look at three different SQL-on-Hadoop engines - Hive, Phoenix, Impala and Presto.

In this big data project, we will be performing an OLAP cube design using AdventureWorks database. The deliverable for this session will be to design a cube, build and implement it using Kylin, query the cube and even connect familiar tools (like Excel) with our new cube.

Curriculum For This Mini Project

Cloning the dataset
Understanding the dataset
Load the data
Query the data
Create a Sqoop job
Executing the Sqoop job
Why is append used ?
Build hive tables on top of the data
Troubleshooting hive table
Using Parquet and xpath
Select statement
Use case based aggregations
Q&A - the problem statement
Q&A - Hive versus myql database
Enhancing aggregate functions
Grouping sets
Rollup versus Cube
Windowing analytic functions
Properties of windowing analytic functions
Solving an example - finding %