Real-Time Streaming of Twitter Sentiments AWS EC2 NiFi

Real-Time Streaming of Twitter Sentiments AWS EC2 NiFi

Learn to perform 1) Twitter Sentiment Analysis using Spark Streaming, NiFi and Kafka, and 2) Build an Interactive Data Visualization for the analysis using Python Plotly.
explanation image

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews
profile image

Mike Vogt linkedin profile url

Information Architect at Bank of America

I have had a very positive experience. The platform is very rich in resources, and the expert was thoroughly knowledgeable on the subject matter - real world hands-on experience. I wish I had this... Read More

profile image

Prasanna Lakshmi T linkedin profile url

Advisory System Analyst at IBM

Initially, I was unaware of how this would cater to my career needs. But when I stumbled through the reviews given on the website. I went through many of them and found them all positive. I would... Read More

What will you learn

Understanding the project and how to use AWS EC2 Instance
Understanding the basics of Containers, sentiment analysis, and their application
Visualizing the complete Architecture of the system
Introduction to Docker
Usage of docker-composer and starting all tools
Exploring dataset and bucketizing dataset for labelling
Training the model and saving it
Installing NiFi and using it for data ingestion
Installing Kafka and using it for creating topics
Publishing tweets using NiFi
Integration of NiFi and Kafka
Installing Spark and using it for data processing
Integration of Kafka and Spark
Extracting schema from the stream of tweets
Reading data from Kafka
Analyzing sentiments in tweets in Spark
Integration of Spark and MongoDB
Continuously loading data in MongoDB for aggregated results
Integrating MongoDB and Plotly and Dash
Displaying live stream results using Python Plotly and Dash

Project Description

What is Twitter Sentiment?

Twitter sentiment is a term used to define the analysis of sentiments in the tweets generated by users on social media platform like Twitter. Generally, twitter sentiments are analysed in most of the projects using parsing. Analyzing sentiments of users on twitter is fruitful to companies for their product that is mostly focused on social media trends, users sentiments and future view of the online community.

 

Data Pipeline:

It refers to a system for moving data from one system to another. The data may or may not be transformed, and it may be processed in real time (or streaming) instead of batches. Right from extracting or capturing data using various tools, storing raw data, cleaning, validating data, transforming data into query worthy format, visualisation of KPIs including Orchestration of the above process is data pipeline.

 

What is the Agenda of the project?

Agenda of the project involves Real-time streaming of Twitter Sentiments with visualization web app. We first launch an EC2 instance on AWS, and install Docker in it with tools like Apache Spark, Apache NiFi, Apache Kafka, Jupyter Lab, MongoDB, Plotly and Dash. Then, supervised classification model is created using Data exploration, Bucketizing, Stratified sampling, Dataset splitting, Extracting the features using tokenizing, removing stop words, TF-IDF etc., Creating Pipeline, Training the model, Evaluating model with binary classification evaluation and Saving classified model. It is followed by Extraction using Apache NiFi and Apache Kafka, followed by Transformation and Load using MongoDB and finally Visualizing it using python plotly and Dash with the usage of graph and table app call-back.

 

Usage of Dataset:

Here we are going to use Twitter sentiments data in the following ways:

- Extraction: During extraction process, NiFi process and connections are set up followed by creation of twitter app in twitter developer account. The data is streamed from the twitter API using NiFi followed by creation of topics and publishing tweets in NiFi using apache Kafka.

- Transformation and Load: During transformation and load process, schema is extracted from the stream of tweets followed by reading of data form apache Kafka as streaming a dataframe with extraction and cleansing of twitter data and analyzing sentiments in tweets. Then data is written in MongoDB for the visualization in Dash.

 

Data Analysis:

  • From given website, data is downloaded containing text of review, rating of product and summary of review. Data is bucketized to label features followed by partitioning of data to homogenous sample..

  • Dataset is splitted in appropriate ratios following by features extraction using tokenisation, TF-IDF and logistic regression.

  • Data pipeline is created to train the model and evaluate it with binary classification evaluator followed by saving of classified model.

  • The extraction process is done using NiFi and Kafka, by streaming data from twitter API using NiFi and creating topics, publishing tweets using Kafka.

  • In transformation and load process, schema is extracted from twitter streams and data is read from Kafka as streaming dataframe.

  • Twitter data is extracted and cleansed followed by sentiment analysis of tweets.

  • Finally continuous data is loaded into MongoDB and data is visualized using scatter graph and table definitions in python plotly and Dash.

Similar Projects

In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

In this spark streaming project, we are going to build the backend of a IT job ad website by streaming data from twitter for analysis in spark.

In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

Curriculum For This Mini Project

Agenda and Architecture
05m
Environment Setup Part 1
04m
Environment Setup Part 2
11m
Classification model creation
01m
Dataset exploration and Bucketizing
02m
Stratified sampling and Dataset splitting
01m
Feature extraction and Pipeline creation
02m
Model training and Evaluation
02m
Saving model and Evaluation
03m
Defining NiFi and its extraction process
01m
Twitter app creation
01m
Setting up NiFi
02m
Defining Kafka and its extraction process
01m
Topic and publishing messages
02m
Schema extraction in transform and load
01m
Reading data in transform and load
00m
Extraction and Cleansing in transform and load
01m
Sentiment analysis and Writing in transform and load
01m
Introduction to Dash
00m
Code explanation in visualization
02m
Code walkthrough and runung notebooks
10m