Analyze a streaming log file by integrating Kafka and Kylin

Analyze a streaming log file by integrating Kafka and Kylin

In this project, we are going to analyze streaming logfile dataset by integrating Kafka and Kylin.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Nathan Elbert

Senior Data Scientist at Tiger Analytics

This was great. The use of Jupyter was great. Prior to learning Python I was a self taught SQL user with advanced skills. I hold a Bachelors in Finance and have 5 years of business experience.. I... Read More

Prasanna Lakshmi T

Advisory System Analyst at IBM

Initially, I was unaware of how this would cater to my career needs. But when I stumbled through the reviews given on the website. I went through many of them and found them all positive. I would... Read More

What will you learn

Introduce how Kylin works over streaming datasets
Integrate Kafka with Kylin
Build Kylin Cube
How to build a Kylin cube using the rest API
Write OLA queries over the streaming dataset

Project Description

In our last Hackerday, we demonstrated how OLAP analysis with real-time queries can be archived using Apache Kylin.
In this Hackerday, we will look at another angle to it - streaming dataset.

First of all, Apache Kylin (kylin.apache.org) is a Distributed Analytics Engine that provides SQL interface and multidimensional analysis (OLAP) on the large dataset using MapReduce or Spark. This means that I can answer classical MDX questions in the Hadoop platform with a decent amount of latency. Apache Kylin has recorded brilliant performance of delivering results in sub-seconds response to analytical or aggregation queries.

This time, we are doing the same over a streaming dataset. Our dataset will be a simulated stream of a log file using Kafka and we intend to build a Kylin cube over the streaming dataset. At the end of the class, we will be able to write analytical queries will Kylin receive newer data from the Kafka topic.
 

Similar Projects

In this big data project, we will see how data ingestion and loading is done with Kafka connect APIs while transformation will be done with Kafka Streaming API.

The goal of this IoT project is to build an argument for generalized streaming architecture for reactive data ingestion based on a microservice architecture. 

In this spark streaming project, we are going to build the backend of a IT job ad website by streaming data from twitter for analysis in spark.

Curriculum For This Mini Project

7-Apr-2018
02h 29m
8-Apr-2018
02h 41m