Spark Project-Analysis and Visualization on Yelp Dataset

The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.


What will you learn

  • Understanding the roadmap of the project

  • Downloading and Installing the Yelp Datset

  • Understanding Elastic Search, downloading and Installing elastic search for analytics

  • Installing Kibana for Visualization of data using Elastic Search

  • Ingesting data from a relational database using Sqoop

  • Understanding Postman as a complete API for big data

  • Use of Spark and Elastic Search in Stack

  • Ingesting data from the relational database directly into Spark

  • Integrating of JDBC with Spark for connecting and executing the query with database

  • Exploring the dataset using HUE

  • How to load a Parquet file

  • Processing relational data in Spark

  • How to Map data

  • Creating UDFs by using the datasets

  • Understanding different data types supported by Elastic Search and working with them

  • Ingesting processed data into Elasticsearch

  • Visualizing user signup trend by creating histograms in Kibana

  • Loading and Denormalizing business table data

Project Description

Most businesses seek to get reviews on their goods and services one way or another. It is a most basic way for the business to improve their efficiency and subsequently their bottom-line. Get the review is not only the issue, ability to extract and visualize analytics from review data is critical to business success.

In Apache Spark Project, we will use the yelp review dataset to analyze businesses and reviews over a period of time. Perhaps we will spot potential gaps in service delivery or see how business thrive in different scenarios.

Beyond processing this data, we will ingest the final output of our data processing in Elasticsearch and use the visualization tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Curriculum For This Mini Project

  Project Overview
  Download and Install Yelp Dataset
  Visualisation Questions
  What is Elastic Search?
  Download and Install Elastic Search
  Download and Install Kibana for Data Visualisation using Elastic Search
  Query to Load Data
  Overview of Postman
  Purpose of Spark and Elastic Search in the Stack
  Integration of JDBC Source with Spark
  Using Sqoop for Data Migration -Importing Business Table
  Why do we create a Password file?
  When to use Scoop
  When to use Spark to JDBC
  Explore the loaded data using Hue
  Data Analysis for Business Use Cases
  Load a Parquet File
  Create Mapping for Data and Working with Dataframes to Create UDFs
  Recap of the Previous Session
  Working with different Datatypes supported by Elastic Search
  Creating Yelp User Mappings and Schema
  Ingesting processed data into Elasticsearch
  Preview to Kibana
  Create a Histogram of people with different review count (Yelp User Sign Up Trend)
  Load and Denormalize Business Table Data (Data Modelling)
  Explore data and visualize review analytics using Kibana