Spark Project-Analysis and Visualization on Yelp Dataset

The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

  • Understanding the roadmap of the project

  • Downloading and Installing the Yelp Datset

  • Understanding Elastic Search, downloading and Installing elastic search for analytics

  • Installing Kibana for Visualization of data using Elastic Search

  • Ingesting data from a relational database using Sqoop

  • Understanding Postman as a complete API for big data

  • Use of Spark and Elastic Search in Stack

  • Ingesting data from the relational database directly into Spark

  • Integrating of JDBC with Spark for connecting and executing the query with database

  • Exploring the dataset using HUE

  • How to load a Parquet file

  • Processing relational data in Spark

  • How to Map data

  • Creating UDFs by using the datasets

  • Understanding different data types supported by Elastic Search and working with them

  • Ingesting processed data into Elasticsearch

  • Visualizing user signup trend by creating histograms in Kibana

  • Loading and Denormalizing business table data

Project Description

Most businesses seek to get reviews on their goods and services one way or another. It is a most basic way for the business to improve their efficiency and subsequently their bottom-line. Get the review is not only the issue, ability to extract and visualize analytics from review data is critical to business success.

In Apache Spark Project, we will use the yelp review dataset to analyze businesses and reviews over a period of time. Perhaps we will spot potential gaps in service delivery or see how business thrive in different scenarios.

Beyond processing this data, we will ingest the final output of our data processing in Elasticsearch and use the visualization tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.
 

Similar Projects

Big Data Project NoSQL Project on Yelp Dataset using HBase and MongoDB
In this NoSQL project, we will use two NoSQL databases(HBase and MongoDB) to store Yelp business attributes and learn how to retrieve this data for processing or query.
Big Data Project Big Data Project on Processing Unstructured Data using Spark
In this project, we will evaluate and demonstrate how to handle unstructured data using Spark.
Big Data Project Spark Project-Measuring US Non-Farm Payroll Forex Impact
In this spark project, we will measure by how much NFP has triggered moves in past markets.
Big Data Project Predicting Flight Delays using Apache Spark and Kylin
In this project, we will be building and querying an OLAP Cube for Flight Delays on the Hadoop platform.

Curriculum For This Mini Project

 
  Project Overview
09m
  Download and Install Yelp Dataset
16m
  Visualisation Questions
05m
  What is Elastic Search?
05m
  Download and Install Elastic Search
12m
  Download and Install Kibana for Data Visualisation using Elastic Search
08m
  Query to Load Data
06m
  Overview of Postman
01m
  Purpose of Spark and Elastic Search in the Stack
07m
  Integration of JDBC Source with Spark
08m
  Using Sqoop for Data Migration -Importing Business Table
08m
  Why do we create a Password file?
05m
  When to use Scoop
00m
  When to use Spark to JDBC
00m
  Explore the loaded data using Hue
02m
  Data Analysis for Business Use Cases
02m
  Load a Parquet File
08m
  Create Mapping for Data and Working with Dataframes to Create UDFs
27m
  Recap of the Previous Session
03m
  Working with different Datatypes supported by Elastic Search
07m
  Creating Yelp User Mappings and Schema
03m
  Ingesting processed data into Elasticsearch
08m
  Preview to Kibana
04m
  Create a Histogram of people with different review count (Yelp User Sign Up Trend)
06m
  Load and Denormalize Business Table Data (Data Modelling)
54m
  Explore data and visualize review analytics using Kibana
41m