Work with Streaming Data using Twitter API to Build a JobPortal

Work with Streaming Data using Twitter API to Build a JobPortal

In this spark streaming project, we are going to build the backend of a IT job ad website by streaming data from twitter for analysis in spark.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Dhiraj Tandon

Solution Architect-Cyber Security at ColorTokens

My Interaction was very short but left a positive impression. I enrolled and asked for a refund since I could not find the time. What happened next: They initiated Refund immediately. Their... Read More

Swati Patra

Systems Advisor , IBM

I have 11 years of experience and work with IBM. My domain is Travel, Hospitality and Banking - both sectors process lots of data. The way the projects were set up and the mentors' explanation was... Read More

What will you learn

Understanding the architecture and understanding the dataflow of the project
Setting up a virtual environment on Cloudera VM ware
Basics of Real-time injection and how to get the output
JAVA XMPP servers
Streaming twitter using flume agent
Integrating flume with Spark-Streaming for processing twitter events
Downloading the necessary file from GitHub
Collecting and Visualizing the data
Performing Basic Data preprocessing using Spark
How to time stamp Real-time data
Integrating Kafka to complex event alert
Writing Queries in HUE-Hive for creating tables
Integrating spark with online databases
How to extract Data schema of an AVRO file
Coordinating the data processing pipeline with Oozie
Loading and storing the final data in Hbase

Project Description

In this spark project, we are going to be building a business. Yes, a business that is similar to a IT job ad site. This Job portal will stream data from twitter to locate recently published IT jobs, process them and make them available via a simple search api. Also, to complete the circle, we will be building notification features to user who subscribe for job ads notification.

On completion of this big data project, we will provide a job portal for every IT job tweeted and provide an apply-early advantage to users.

Similar Projects

In this big data project, we will see how data ingestion and loading is done with Kafka connect APIs while transformation will be done with Kafka Streaming API.

The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.

Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Curriculum For This Mini Project

02h 33m
02h 34m