Talk to our career counsellor
1-844-696-6465 (US Toll Free)
yelp-data-processing-using-spark-hive-part-two.jpg

Yelp Data Processing using Spark and Hive Part 2

In this project, we going to continue building the data warehouse and will do further data processing to deliver different kinds of data products.
Event Date
20th
May - 2017
07:30pm - 10:00pm PST
21st
May - 2017
07:30pm - 10:00pm PST
What are the prerequisites for this project?
  • It is expected that students have a fair knowledge of Big Data and Hadoop particularly HDFS, Pig, Hive and Impala.
  • Installation Cloudera quickstart VM
  • Since we will be doing the development in the Quickstart VM, it is essential to have the Scala SDK installed there as well. Instruction on how to setup a Scala SDK and runtime can be found at https://www.youtube.com/watch?v=SFJsuo2XISs&t=151s
  • This project assumes that you have a good knowledge of Hadoop. If not - we recommend you to take the Big Data and Hadoop course first.

What will you learn

  • Data normalization and denormalization
  • Handling snapshots and incremental data loads
  • Introducting the time dimension
  • Analysis of data products
  • Package our spark application as an executable using sbt and running them

Project Description

In our last hackerday on the same subject, we actually began the development of Yelp dataset into domains that can easily be understood and consumed. Amongst other things we did

  • Various ways to ingest data using spark
  • Data transformation using Spark
  • Various ways of integrating spark and hive
  • Denormalize dataset into the hive tables thereby creating multiple datasets
  • Discuss how to handle snapshots and incremental data loads

In the same vein, we are going to continue building out data warehouse. The purpose this time is to do further data processing to deliver different kinds of the data product.

Instructors

 
Michael

Senior Developer at Entelect
Cloudera Certified Spark and Hadoop Developer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...

What is Hackerday?

Stay updated in technology trends by working on projects

Live online coding sessions led by industry experts

Build 2-4 projects a month each lasting 6 hours designed to teach you advanced concepts

Code in groups and connect with your community