Work with Streaming Data using Twitter API to Build a JobPortal

Work with Streaming Data using Twitter API to Build a JobPortal

In this spark streaming project, we are going to build the backend of a IT job ad website by streaming data from twitter for analysis in spark.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Camille St. Omer

Artificial Intelligence Researcher, Quora 'Most Viewed Writer in 'Data Mining'

I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More

Nathan Elbert

Senior Data Scientist at Tiger Analytics

This was great. The use of Jupyter was great. Prior to learning Python I was a self taught SQL user with advanced skills. I hold a Bachelors in Finance and have 5 years of business experience.. I... Read More

What will you learn

Understanding the architecture and understanding the dataflow of the project
Setting up a virtual environment on Cloudera VM ware
Basics of Real-time injection and how to get the output
JAVA XMPP servers
Streaming twitter using flume agent
Integrating flume with Spark-Streaming for processing twitter events
Downloading the necessary file from GitHub
Collecting and Visualizing the data
Performing Basic Data preprocessing using Spark
How to time stamp Real-time data
Integrating Kafka to complex event alert
Writing Queries in HUE-Hive for creating tables
Integrating spark with online databases
How to extract Data schema of an AVRO file
Coordinating the data processing pipeline with Oozie
Loading and storing the final data in Hbase

Project Description

In this spark project, we are going to be building a business. Yes, a business that is similar to a IT job ad site. This Job portal will stream data from twitter to locate recently published IT jobs, process them and make them available via a simple search api. Also, to complete the circle, we will be building notification features to user who subscribe for job ads notification.

On completion of this big data project, we will provide a job portal for every IT job tweeted and provide an apply-early advantage to users.

Similar Projects

The goal of this IoT project is to build an argument for generalized streaming architecture for reactive data ingestion based on a microservice architecture. 

In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.

Curriculum For This Mini Project

02h 33m
02h 34m