Making real time decision on incoming data using Flume and Kafka

Making real time decision on incoming data using Flume and Kafka

Hadoop Projects for Beginners -Learn data ingestion from a source using Apache Flume and Kafka to make a real-time decision on incoming data.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Shailesh Kurdekar

Solutions Architect at Capital One

I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More

James Peebles

Data Analytics Leader, IQVIA

This is one of the best of investments you can make with regards to career progression and growth in technological knowledge. I was pointed in this direction by a mentor in the IT world who I highly... Read More

What will you learn

Setting up Virtual Environment
Downloading and Installing Apache Maven
Identifying and downloading compatible Hadoop, Eclipse and Kafka version
Troubleshooting VM ware
Understanding the Data ingestion, data preprocessing and data assimilation
Ingesting data with Flume
How the data flows between exchange house and HDFS
Reducing latency in data workflow using Messaging servers
Understanding Memory Channel, JDC channel, Kafka Channel, etc.
Creating a dummy Java app for visualizing the real-time processing using Kafka Channel
Building a Flume multiplexed channel selector
Setting up a Kafka Channel
Using subcomponent channel interceptor to collect and manipulate incoming data
Using Flume for reading and injecting file into memory channel
Implementing Auro Sink to transfer data to required destination
Writing a Kafka client that persist data to MySQL
Creating Checkpoints in order to prevent data loss for crashing system
Saving the final output in HDFS

Project Description

Data is everywhere and constantly being generated around us. Using Big data tools, it is possible to ingest, process and make decisions based on data at high speed.

This big data project for beginners demonstrates how to use Apache Flume to ingest trading data from a source. While the default data flow is to archive all data to HDFS, Flume is also configured to channel some preconfigured symbols or trading pairs of interest to another processing server using Kafka. All the processed instructions are stored in a relational database (MySQL).

We will use following tools in this flume kafka project:

Similar Projects

In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.

In this spark streaming project, we are going to build the backend of a IT job ad website by streaming data from twitter for analysis in spark.

The goal of this IoT project is to build an argument for generalized streaming architecture for reactive data ingestion based on a microservice architecture. 

Curriculum For This Mini Project

05h 14m