1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com

Yelp Data Processing using Spark and Hive Part 2

In this project, we going to continue building the data warehouse and will do further data processing to deliver different kinds of data products.

Users who bought this project also bought

What will you learn

  • Data normalization and denormalization
  • Handling snapshots and incremental data loads
  • Introducting the time dimension
  • Analysis of data products
  • Package our spark application as an executable using sbt and running them

What will you get

  • Access to recording of the complete project
  • Access to all material related to project like data files, solution files etc.


  • Installation Cloudera quickstart VM

Project Description

In our last hackerday on the same subject, we actually began the development of Yelp dataset into domains that can easily be understood and consumed. Amongst other things we did

  • Various ways to ingest data using spark
  • Data transformation using Spark
  • Various ways of integrating spark and hive
  • Denormalize dataset into the hive tables thereby creating multiple datasets
  • Discuss how to handle snapshots and incremental data loads

In the same vein, we are going to continue building out data warehouse. The purpose this time is to do further data processing to deliver different kinds of the data product.



Big Data & Enterprise Software Engineer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...