Analysing Big Data with Twitter Sentiments using Spark Streaming

Analysing Big Data with Twitter Sentiments using Spark Streaming

In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

Understanding the project and how to use Spark Streaming
Understanding the basics of NLP, sentiment analysis, and its application
Visualizing the complete Architecture of the system
Introduction to Spark MLib
How broker message and Datastore works
Installing Redis and Using it as broker
Creating Virtual environment in Cloudera VM ware
Starting Redis server and establishing a connection with Redis
Why we need datastore and selecting Hbase as Datastore
Spark Streaming
Using Flume as source
Initializing Spark streaming context and setting up the system
Creating Stream for filtering Data
Training a text classification model using Spark MLib
Using Standford NLP library for labeling the data
Classifying real-time twitter streams using Spark MLib
Processing Logic for real-time sentiment analysis
Integration of Spark streaming and MLib
Displaying live stream results on a desktop dashboard

Project Description

In Dezyre's Hadoop hands-on training course, we perform two different projects that require us to stream data from twitter in real time. Most of these hadoop projects are a production scenario which will then involve analyzing the project in a batch mode and representing to end users.
But what if the decision that needs the streamed data is time sensitive? This means that we must stream that data and analyze it in motion. After analysis, the result must be presented as the streaming is taking place.
An example of a use of such system is to analyze public response to any event in real time like a political speech, a sports game, an economic news and much more. People with the access to quality real-time data can then position themselves for profit in such circumstance.

Similar Projects

In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

In this Hackerday, we will go through the basis of statistics and see how Spark enables us to perform statistical operations like descriptive and inferential statistics over the very large dataset.

Curriculum For This Mini Project

What's challenging about this Spark Project?
02m
Agenda for the Project
02m
What is Sentiment Analysis?
08m
Use Cases of Sentiment Analysis
13m
End-to-End System Design
05m
Advantages of using Scala over Python for System Design
03m
Technology used for Broker Message and Datastore
04m
Learn to install Redis
01m
Open Redis through Cloudera Quickstart VM
03m
Go to the location where Redis is installed
02m
Start the Redis Server
00m
Establish connection to Redis
01m
Why choose Redis as a Broker ?
00m
HDFS as the Datastore
02m
Why choose HDFS as the Datastore?
00m
Why do you need a Datastore?
02m
What is Spark Streaming?
03m
Source for Spark Streaming
02m
Where can Flume be used as Source?
03m
Learn to install and setup Scala to Cloudere Quickstart VM
02m
Download the Jars Needed for Twitter Spark Streaming
04m
Learn about Spark Streaming Context
01m
Intialize Spark Streaming Context and Set System Properties
00m
Set System Properties
01m
Create Stream
01m
See Trending Tweets from "CNN"
01m
Create a stream that searches based on "CNN"
01m
Understanding about the Dashboard of the System
02m
Starting Up Hadoop Instances and do Spark Streaming
02m
Start Spark and Run the Code in Command Line
01m
Start the Streaming Context
00m
Understand the Processing Logic from the Source Code
04m
Output Streaming
03m
Classify Tweets
13m
Dashboard Visualization for Sentiment Analysis
01m
Recap of the Spark Streaming System Architecture
05m
Agenda for the Session
01m
Understanding Spark Ecosystem Components
03m
What is Spark Mlib?
03m
Machine Learning Process
04m
Spark Pipeline API Concepts
09m
Discussion on Classifying and Labelling Tweets
01m
Which is better for Sentiment Analysis -Supervised or Unsupervised Learning?
01m
Live Twitter Sentiment Analysis Workflow
04m
Spark Streaming Code Walkthrough
09m
Redis as Message Broker
04m
Using Standford NLP library to label sentiments of a Tweet
08m
Processing Logic for Streaming Sentiments in Real-Time
11m
End-to-End Integration of the Application
08m
Dashboard Visualization of Real-Time Twitter Sentiments
01m
Significance of getting data in real-time for Machine Learning Module
05m
Refining Spark Streaming System Architecture
05m
Importance of a Datastore in the Architecture
01m
Understanding different types of streaming
04m
Stateless and Stateful Streaming
04m
MLib and It's Use Cases
15m