Work with Streaming Data using Twitter API to Build a JobPortal

Work with Streaming Data using Twitter API to Build a JobPortal

In this spark streaming project, we are going to build the backend of a IT job ad website by streaming data from twitter for analysis in spark.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

Understanding the architecture and understanding the dataflow of the project
Setting up a virtual environment on Cloudera VM ware
Basics of Real-time injection and how to get the output
JAVA XMPP servers
Streaming twitter using flume agent
Integrating flume with Spark-Streaming for processing twitter events
Downloading the necessary file from GitHub
Collecting and Visualizing the data
Performing Basic Data preprocessing using Spark
How to time stamp Real-time data
Integrating Kafka to complex event alert
Writing Queries in HUE-Hive for creating tables
Integrating spark with online databases
How to extract Data schema of an AVRO file
Coordinating the data processing pipeline with Oozie
Loading and storing the final data in Hbase

Project Description

In this spark project, we are going to be building a business. Yes, a business that is similar to a IT job ad site. This Job portal will stream data from twitter to locate recently published IT jobs, process them and make them available via a simple search api. Also, to complete the circle, we will be building notification features to user who subscribe for job ads notification.

On completion of this big data project, we will provide a job portal for every IT job tweeted and provide an apply-early advantage to users.

Similar Projects

In this Spark project, we are going to bring processing to the speed layer of the lambda architecture which opens up capabilities to monitor application real time performance, measure real time comfort with applications and real time alert in case of security

In this big data project, we will embark on real-time data collection and aggregation from a simulated real-time system using Spark Streaming.

The goal of this apache kafka project is to process log entries from applications in real-time using Kafka for the streaming architecture in a microservice sense.

Curriculum For This Mini Project

9-Dec-2016
02h 33m
10-Dec-2016
02h 34m