Spark Project-Analysis and Visualization on Yelp Dataset

Spark Project-Analysis and Visualization on Yelp Dataset

The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

Understanding the roadmap of the project
Downloading and Installing the Yelp Datset
Understanding Elastic Search, downloading and Installing elastic search for analytics
Installing Kibana for Visualization of data using Elastic Search
Ingesting data from a relational database using Sqoop
Understanding Postman as a complete API for big data
Use of Spark and Elastic Search in Stack
Ingesting data from the relational database directly into Spark
Integrating of JDBC with Spark for connecting and executing the query with database
Exploring the dataset using HUE
How to load a Parquet file
Processing relational data in Spark
How to Map data
Creating UDFs by using the datasets
Understanding different data types supported by Elastic Search and working with them
Ingesting processed data into Elasticsearch
Visualizing user signup trend by creating histograms in Kibana
Loading and Denormalizing business table data

Project Description

Most businesses seek to get reviews on their goods and services one way or another. It is a most basic way for the business to improve their efficiency and subsequently their bottom-line. Get the review is not only the issue, ability to extract and visualize analytics from review data is critical to business success.

In Apache Spark Project, we will use the yelp review dataset to analyze businesses and reviews over a period of time. Perhaps we will spot potential gaps in service delivery or see how business thrive in different scenarios.

Beyond processing this data, we will ingest the final output of our data processing in Elasticsearch and use the visualization tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.
 

Similar Projects

In this project, we will use complex scenarios to make Spark developers better to deal with the issues that come in the real world.

In this NoSQL project, we will use two NoSQL databases(HBase and MongoDB) to store Yelp business attributes and learn how to retrieve this data for processing or query.

In this project, we will evaluate and demonstrate how to handle unstructured data using Spark.

Curriculum For This Mini Project

Project Overview
09m
Download and Install Yelp Dataset
16m
Visualisation Questions
05m
What is Elastic Search?
05m
Download and Install Elastic Search
12m
Download and Install Kibana for Data Visualisation using Elastic Search
08m
Query to Load Data
06m
Overview of Postman
01m
Purpose of Spark and Elastic Search in the Stack
07m
Integration of JDBC Source with Spark
08m
Using Sqoop for Data Migration -Importing Business Table
08m
Why do we create a Password file?
05m
When to use Scoop
00m
When to use Spark to JDBC
00m
Explore the loaded data using Hue
02m
Data Analysis for Business Use Cases
02m
Load a Parquet File
08m
Create Mapping for Data and Working with Dataframes to Create UDFs
27m
Recap of the Previous Session
03m
Working with different Datatypes supported by Elastic Search
07m
Creating Yelp User Mappings and Schema
03m
Ingesting processed data into Elasticsearch
08m
Preview to Kibana
04m
Create a Histogram of people with different review count (Yelp User Sign Up Trend)
06m
Load and Denormalize Business Table Data (Data Modelling)
54m
Explore data and visualize review analytics using Kibana
41m