1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com

Yelp Data Processing using Spark and Hive Part 2

In this project, we going to continue building the data warehouse and will do further data processing to deliver different kinds of data products.
What are the prerequisites for this project?

What will you learn

  • Data normalization and denormalization
  • Handling snapshots and incremental data loads
  • Introducting the time dimension
  • Analysis of data products
  • Package our spark application as an executable using sbt and running them

Project Description

In our last hackerday on the same subject, we actually began the development of Yelp dataset into domains that can easily be understood and consumed. Amongst other things we did

  • Various ways to ingest data using spark
  • Data transformation using Spark
  • Various ways of integrating spark and hive
  • Denormalize dataset into the hive tables thereby creating multiple datasets
  • Discuss how to handle snapshots and incremental data loads

In the same vein, we are going to continue building out data warehouse. The purpose this time is to do further data processing to deliver different kinds of the data product.



Big Data & Enterprise Software Engineer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...

What is Hackerday?

Stay updated in technology trends by working on projects

Live online coding sessions led by industry experts

Build 2-4 projects a month each lasting 6 hours designed to teach you advanced concepts

Code in groups and connect with your community