Making real time decision on incoming data using Flume and Kafka

Making real time decision on incoming data using Flume and Kafka

Hadoop Projects for Beginners -Learn data ingestion from a source using Apache Flume and Kafka to make a real-time decision on incoming data.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

Setting up Virtual Environment
Downloading and Installing Apache Maven
Identifying and downloading compatible Hadoop, Eclipse and Kafka version
Troubleshooting VM ware
Understanding the Data ingestion, data preprocessing and data assimilation
Ingesting data with Flume
How the data flows between exchange house and HDFS
Reducing latency in data workflow using Messaging servers
Understanding Memory Channel, JDC channel, Kafka Channel, etc.
Creating a dummy Java app for visualizing the real-time processing using Kafka Channel
Building a Flume multiplexed channel selector
Setting up a Kafka Channel
Using subcomponent channel interceptor to collect and manipulate incoming data
Using Flume for reading and injecting file into memory channel
Implementing Auro Sink to transfer data to required destination
Writing a Kafka client that persist data to MySQL
Creating Checkpoints in order to prevent data loss for crashing system
Saving the final output in HDFS

Project Description

Data is everywhere and constantly being generated around us. Using Big data tools, it is possible to ingest, process and make decisions based on data at high speed.

This big data project for beginners demonstrates how to use Apache Flume to ingest trading data from a source. While the default data flow is to archive all data to HDFS, Flume is also configured to channel some preconfigured symbols or trading pairs of interest to another processing server using Kafka. All the processed instructions are stored in a relational database (MySQL).

We will use following tools in this flume kafka project:

Similar Projects

In this project, we will show how to build an ETL pipeline on streaming datasets using Kafka.

In this big data project, we will see how data ingestion and loading is done with Kafka connect APIs while transformation will be done with Kafka Streaming API.

Spark Project - Discuss real-time monitoring of taxis in a city. The real-time data streaming will be simulated using Flume. The ingestion will be done using Spark Streaming.

Curriculum For This Mini Project

05h 14m