Streaming ETL in Kafka with KSQL using NYC TLC Data

Streaming ETL in Kafka with KSQL using NYC TLC Data

In this project, we will show how to build an ETL pipeline on streaming datasets using Kafka.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Shailesh Kurdekar

Solutions Architect at Capital One

I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More

Camille St. Omer

Artificial Intelligence Researcher, Quora 'Most Viewed Writer in 'Data Mining'

I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More

What will you learn

Kafka Connectors (Source and Sink)
Deploy HBase Sink Connector
ETL with Kafka Streaming application
Joining two separate streams of data
Discuss Apache Kafka vs Confluent Kafka
Introduction to KSQL

Project Description

In this Hackerday, we will show by demonstrating how to build an ETL pipeline on streaming datasets using Kafka. We will be using the trips and fares dataset from the New York Taxi and Limousine Commission to demonstrate how to get data in real-time, join it to other streaming datasets, and store the data in a database.

Will be answering questions like reporting the income of drivers every hour, find the drivers around a certain location at any point in time amongst other things.
 

Similar Projects

In this big data project, we'll work through a real-world scenario using the Cortana Intelligence Suite tools, including the Microsoft Azure Portal, PowerShell, and Visual Studio.

Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

In this project, we will evaluate and demonstrate how to handle unstructured data using Spark.

Curriculum For This Mini Project