Spark Project-Analysis and Visualization on Yelp Dataset

Spark Project-Analysis and Visualization on Yelp Dataset

The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.
explanation image


Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews
profile image

Swati Patra linkedin profile url

Systems Advisor , IBM

I have 11 years of experience and work with IBM. My domain is Travel, Hospitality and Banking - both sectors process lots of data. The way the projects were set up and the mentors' explanation was... Read More

profile image

Hiren Ahir linkedin profile url

Microsoft Azure SQL Sever Developer, BI Developer

I'm a Graduate student and came into the job market and found a university degree wasn't sufficient to get a good paying job. I aimed at hottest technology in the market Big Data but the word BigData... Read More

What will you learn

Understanding the roadmap of the project
Downloading and Installing the Yelp Datset
Understanding Elastic Search, downloading and Installing elastic search for analytics
Installing Kibana for Visualization of data using Elastic Search
Ingesting data from a relational database using Sqoop
Understanding Postman as a complete API for big data
Use of Spark and Elastic Search in Stack
Ingesting data from the relational database directly into Spark
Integrating of JDBC with Spark for connecting and executing the query with database
Exploring the dataset using HUE
How to load a Parquet file
Processing relational data in Spark
How to Map data
Creating UDFs by using the datasets
Understanding different data types supported by Elastic Search and working with them
Ingesting processed data into Elasticsearch
Visualizing user signup trend by creating histograms in Kibana
Loading and Denormalizing business table data

Project Description

Most businesses seek to get reviews on their goods and services one way or another. It is a most basic way for the business to improve their efficiency and subsequently their bottom-line. Get the review is not only the issue, ability to extract and visualize analytics from review data is critical to business success.

In Apache Spark Project, we will use the yelp review dataset to analyze businesses and reviews over a period of time. Perhaps we will spot potential gaps in service delivery or see how business thrive in different scenarios.

Beyond processing this data, we will ingest the final output of our data processing in Elasticsearch and use the visualization tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Similar Projects

In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

In this spark project, we will measure by how much NFP has triggered moves in past markets.

In this hive project, you will design a data warehouse for e-commerce environments.

Curriculum For This Mini Project

Project Overview
Download and Install Yelp Dataset
Visualisation Questions
What is Elastic Search?
Download and Install Elastic Search
Download and Install Kibana for Data Visualisation using Elastic Search
Query to Load Data
Overview of Postman
Purpose of Spark and Elastic Search in the Stack
Integration of JDBC Source with Spark
Using Sqoop for Data Migration -Importing Business Table
Why do we create a Password file?
When to use Scoop
When to use Spark to JDBC
Explore the loaded data using Hue
Data Analysis for Business Use Cases
Load a Parquet File
Create Mapping for Data and Working with Dataframes to Create UDFs
Recap of the Previous Session
Working with different Datatypes supported by Elastic Search
Creating Yelp User Mappings and Schema
Ingesting processed data into Elasticsearch
Preview to Kibana
Create a Histogram of people with different review count (Yelp User Sign Up Trend)
Load and Denormalize Business Table Data (Data Modelling)
Explore data and visualize review analytics using Kibana