Making real time decision on incoming data using Flume and Kafka

Making real time decision on incoming data using Flume and Kafka

Hadoop Projects for Beginners -Learn data ingestion from a source using Apache Flume and Kafka to make a real-time decision on incoming data.
explanation image


Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews
profile image

Ray Han linkedin profile url

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

profile image

Prasanna Lakshmi T linkedin profile url

Advisory System Analyst at IBM

Initially, I was unaware of how this would cater to my career needs. But when I stumbled through the reviews given on the website. I went through many of them and found them all positive. I would... Read More

What will you learn

Setting up Virtual Environment
Downloading and Installing Apache Maven
Identifying and downloading compatible Hadoop, Eclipse and Kafka version
Troubleshooting VM ware
Understanding the Data ingestion, data preprocessing and data assimilation
Ingesting data with Flume
How the data flows between exchange house and HDFS
Reducing latency in data workflow using Messaging servers
Understanding Memory Channel, JDC channel, Kafka Channel, etc.
Creating a dummy Java app for visualizing the real-time processing using Kafka Channel
Building a Flume multiplexed channel selector
Setting up a Kafka Channel
Using subcomponent channel interceptor to collect and manipulate incoming data
Using Flume for reading and injecting file into memory channel
Implementing Auro Sink to transfer data to required destination
Writing a Kafka client that persist data to MySQL
Creating Checkpoints in order to prevent data loss for crashing system
Saving the final output in HDFS

Project Description

Data is everywhere and constantly being generated around us. Using Big data tools, it is possible to ingest, process and make decisions based on data at high speed.

This big data project for beginners demonstrates how to use Apache Flume to ingest trading data from a source. While the default data flow is to archive all data to HDFS, Flume is also configured to channel some preconfigured symbols or trading pairs of interest to another processing server using Kafka. All the processed instructions are stored in a relational database (MySQL).

We will use following tools in this flume kafka project:

Similar Projects

In this spark streaming project, we are going to build the backend of a IT job ad website by streaming data from twitter for analysis in spark.

In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file problem.

In this big data project, we will see how data ingestion and loading is done with Kafka connect APIs while transformation will be done with Kafka Streaming API.

Curriculum For This Mini Project

05h 14m