Hive Project- Denormalize JSON Data and analyse it with HIVE Scripts

Hive Project- Denormalize JSON Data and analyse it with HIVE Scripts

In this hive project, you will work on denormalizing the JSON data and create HIVE scripts with ORC file format.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

Setting up your own Virtual environment on VM Virtual Box ware
Setting up Hadoop distribution using Cloudera
Understanding JSON data and creating your own JSON data
Creating a database Schema on the JSON data
Writing Queries in HIVE editor
Understanding multiple input format in Mapreduce
Create new desired TABLE to copy the data
Creating necessary Java Scripts
What is Denormalization in the context of Big Data and its use
Writing commands in Java for fetching data
Pre-processing the data using Java
Tackling Exceptions and errors in Java
Creating a query to populate and filter the data
Using MongoDB to optimize the schema
Understanding Geographical distribution in the context of Database distribution
Using Grouping for removing duplicates
Analyzing log files in HIVE and saving the final data file

Project Description

We have JSON dump(extract) with us which contains multiple details related to FSM(Field Service Management). The various details include

  • Vehicles Info
  • Crew Info
  • WorkOrders
  • Work Order transactions in a month.

We need to denormalize the JSON data and analyse using HIVE scripts.

Similar Projects

In this project, we will take a look at three different SQL-on-Hadoop engines - Hive, Phoenix, Impala and Presto.

In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

Curriculum For This Mini Project

18-Jun-2016
05h 21m