Hive Project- Denormalize JSON Data and analyse it with HIVE Scripts

Hive Project- Denormalize JSON Data and analyse it with HIVE Scripts

In this hive project, you will work on denormalizing the JSON data and create HIVE scripts with ORC file format.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Camille St. Omer

Artificial Intelligence Researcher, Quora 'Most Viewed Writer in 'Data Mining'

I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More

Dhiraj Tandon

Solution Architect-Cyber Security at ColorTokens

My Interaction was very short but left a positive impression. I enrolled and asked for a refund since I could not find the time. What happened next: They initiated Refund immediately. Their... Read More

What will you learn

Setting up your own Virtual environment on VM Virtual Box ware
Setting up Hadoop distribution using Cloudera
Understanding JSON data and creating your own JSON data
Creating a database Schema on the JSON data
Writing Queries in HIVE editor
Understanding multiple input format in Mapreduce
Create new desired TABLE to copy the data
Creating necessary Java Scripts
What is Denormalization in the context of Big Data and its use
Writing commands in Java for fetching data
Pre-processing the data using Java
Tackling Exceptions and errors in Java
Creating a query to populate and filter the data
Using MongoDB to optimize the schema
Understanding Geographical distribution in the context of Database distribution
Using Grouping for removing duplicates
Analyzing log files in HIVE and saving the final data file

Project Description

We have JSON dump(extract) with us which contains multiple details related to FSM(Field Service Management). The various details include

  • Vehicles Info
  • Crew Info
  • WorkOrders
  • Work Order transactions in a month.

We need to denormalize the JSON data and analyse using HIVE scripts.

Similar Projects

In this big data project, we'll work with Apache Airflow and write scheduled workflow, which will download data from Wikipedia archives, upload to S3, process them in HIVE and finally analyze on Zeppelin Notebooks.

Learn to design Hadoop Architecture and understand how to store data using data acquisition tools in Hadoop.

In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Curriculum For This Mini Project

18-Jun-2016
05h 21m