1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com
yelp-data-processing-using-spark-and-hive.jpg

Yelp Data Processing Using Spark And Hive Part 1

In this project, we will continue from a previous hackerday session "Data engineering on Yelp Datasets using Hadoop tools" and will focus on doing the entire data processing using spark.
4.94.9

Users who bought this project also bought

What will you learn

  • Doing data processing using Spark
  • Normalizing and denormalizing dataset into hive tables
  • Various ways of integrating Hive and Spark
  • Various complex data structures in Hive through spark
  • Exporting some of the processed datasets to RDBMS

What will you get

  • Access to recording of the complete project
  • Access to all material related to project like data files, solution files etc.

Prerequisites

  • It is expected that students have a fair knowledge of Big Data and hadoop particularly HDFS, Pig, Hive and Impala.
  • Installation Cloudera quickstart VM.

Project Description

Data engineering is the science of acquiring, aggregating or collection, processing and storage of data either in batch or in real time as well as providing variety of means of serving these data to other users which could include a data scientist. It involves software engineering practises on big data.

In this hackerday project, We will continue from a previous hackerday session on Data "Data engineering on Yelp Datasets using Hadoop tools" where we applied some data engineering principles to the Yelp Dataset in the areas of processing, storage and retrieval. Like in that session, We will not include data ingestion since we are already downloading the data from the yelp challenge website. But unlike that session, we will focus on doing the entire data processing using spark.

Instructors

 
Michael

Big Data & Enterprise Software Engineer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...