Work with Streaming Data using Twitter API to Build a JobPortal

Work with Streaming Data using Twitter API to Build a JobPortal

In this spark streaming project, we are going to build the backend of a IT job ad website by streaming data from twitter for analysis in spark.
explanation image

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 102+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

profile image

Shailesh Kurdekar linkedin profile url

Solutions Architect at Capital One

I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More

profile image

Nathan Elbert linkedin profile url

Senior Data Scientist at Tiger Analytics

This was great. The use of Jupyter was great. Prior to learning Python I was a self taught SQL user with advanced skills. I hold a Bachelors in Finance and have 5 years of business experience.. I... Read More

What will you learn

Understanding the architecture and understanding the dataflow of the project
Setting up a virtual environment on Cloudera VM ware
Basics of Real-time injection and how to get the output
JAVA XMPP servers
Streaming twitter using flume agent
Integrating flume with Spark-Streaming for processing twitter events
Downloading the necessary file from GitHub
Collecting and Visualizing the data
Performing Basic Data preprocessing using Spark
How to time stamp Real-time data
Integrating Kafka to complex event alert
Writing Queries in HUE-Hive for creating tables
Integrating spark with online databases
How to extract Data schema of an AVRO file
Coordinating the data processing pipeline with Oozie
Loading and storing the final data in Hbase

Project Description

In this spark project, we are going to be building a business. Yes, a business that is similar to a IT job ad site. This Job portal will stream data from twitter to locate recently published IT jobs, process them and make them available via a simple search api. Also, to complete the circle, we will be building notification features to user who subscribe for job ads notification.

On completion of this big data project, we will provide a job portal for every IT job tweeted and provide an apply-early advantage to users.

Similar Projects

In this big data project, we will see how data ingestion and loading is done with Kafka connect APIs while transformation will be done with Kafka Streaming API.

The goal of this IoT project is to build an argument for generalized streaming architecture for reactive data ingestion based on a microservice architecture. 

This Elasticsearch example deploys the AWS ELK stack to analyse streaming event data. Tools used include Nifi, PySpark, Elasticsearch, Logstash and Kibana for visualisation.

Curriculum For This Mini Project

9-Dec-2016
02h 33m
10-Dec-2016
02h 34m