1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com

Analyze a streaming log file by integrating Kafka and Kylin

In this project, we are going to analyze streaming logfile dataset by integrating Kafka and Kylin.

Users who bought this project also bought

What will you learn

  • Introduce how Kylin works over streaming datasets
  • Integrate Kafka with Kylin
  • Build Kylin Cube
  • How to build a Kylin cube using the rest API
  • Write OLA queries over the streaming dataset

What will you get

  • Access to recording of the complete project
  • Access to all material related to project like data files, solution files etc.


  • A fair knowledge of Hadoop, Hive, HBase, and MapReduce
  • A fair knowledge of data warehousing and OLAP design
  • A working Hadoop distribution sandbox (eg. Cloudera QuickStart VM or Hortonworks HDP sandbox)

Project Description

In our last Hackerday, we demonstrated how OLAP analysis with real-time queries can be archived using Apache Kylin.
In this Hackerday, we will look at another angle to it - streaming dataset.

First of all, Apache Kylin (kylin.apache.org) is a Distributed Analytics Engine that provides SQL interface and multidimensional analysis (OLAP) on the large dataset using MapReduce or Spark. This means that I can answer classical MDX questions in the Hadoop platform with a decent amount of latency. Apache Kylin has recorded brilliant performance of delivering results in sub-seconds response to analytical or aggregation queries.

This time, we are doing the same over a streaming dataset. Our dataset will be a simulated stream of a log file using Kafka and we intend to build a Kylin cube over the streaming dataset. At the end of the class, we will be able to write analytical queries will Kylin receive newer data from the Kafka topic.



Big Data & Enterprise Software Engineer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...

Curriculum For This Mini Project

02h 29m
02h 41m