1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com

Yelp Data Processing using Spark and Hive Part 2

In this spark project, we will continue building the data warehouse from the previous project Yelp Data Processing Using Spark And Hive Part 1 and will do further data processing to develop diverse data products.
What are the prerequisites for this project?

What will you learn

  • Data normalization and denormalization
  • Handling snapshots and incremental data loads
  • Introducting the time dimension
  • Analysis of data products
  • Package our spark application as an executable using sbt and running them

Project Description

In the previous Spark Project on the same subject- Yelp Data Processing Using Spark And Hive Part 1, we began the development of Yelp dataset into domains that can easily be understood and consumed. Amongst other things we did

  • Various ways to ingest data using spark
  • Data transformation using Spark
  • Various ways of integrating spark and hive
  • Denormalize dataset into the hive tables thereby creating multiple datasets
  • Discuss how to handle snapshots and incremental data loads

In this Spark project, we are going to continue building the data warehouse. The purpose in this big data project is to do further data processing to deliver different kinds of the data products.



Big Data & Enterprise Software Engineer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...

What is Hackerday?

Stay updated in technology trends by working on projects

Live online coding sessions led by industry experts

Build 2-4 projects a month each lasting 6 hours designed to teach you advanced concepts

Code in groups and connect with your community