1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com
yelp-data-processing-using-spark-and-hive.jpg

Yelp Data Processing Using Spark And Hive Part 1

In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.
4.94.9

Users who bought this project also bought

What will you learn

  • Doing data processing using Spark
  • Normalizing and denormalizing dataset into hive tables
  • Various ways of integrating Hive and Spark
  • Various complex data structures in Hive through spark
  • Exporting some of the processed datasets to RDBMS

What will you get

  • Access to recording of the complete project
  • Access to all material related to project like data files, solution files etc.

Prerequisites

  • It is expected that students have a fair knowledge of Big Data and hadoop particularly HDFS, Pig, Hive and Impala.
  • Installation Cloudera quickstart VM.

Project Description

Data engineering is the science of acquiring, aggregating or collection, processing and storage of data either in batch or in real time as well as providing variety of means of serving these data to other users which could include a data scientist. It involves software engineering practises on big data.

In this big data project for beginners, we will continue from a previous hive project on "Data engineering on Yelp Datasets using Hadoop tools" where we applied some data engineering principles to the Yelp Dataset in the areas of processing, storage and retrieval. Like in that session, We will not include data ingestion since we are already downloading the data from the yelp challenge website. But unlike that session, we will focus on doing the entire data processing using spark.

Instructors

 
Michael

Big Data & Enterprise Software Engineer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...

Curriculum For This Mini Project

 
  Introduction to the Yelp dataset
00:02:28
  Objectives of this project
00:03:43
  Introduction to the JSON schema
00:09:39
  Agenda
00:00:16
  Read the data and transform to Hive parquet table
00:06:58
  Ingest Json data using Spark
00:11:35
  Write to HDFS
00:09:39
  Integrate Hive with Spark
00:33:27
  Understanding Normalizing and Denormalizing
00:16:25
  Normalizing and Denormalizing datasets into Hive tables
00:39:38
  Transform the table and write in a single line
00:08:51
  Query to find users with more followers than their friends
00:05:40
  Error troubleshooting
00:01:48
  Initial import of data
00:16:31
  Exploring various data structures
00:19:16
  Exploring arrays
00:17:16
  Designing the analysis
00:17:49