In our last Hackerday, we demonstrated how OLAP analysis with real-time queries can be archived using Apache Kylin.
In this Hackerday, we will look at another angle to it - streaming dataset.
First of all, Apache Kylin (kylin.apache.org) is a Distributed Analytics Engine that provides SQL interface and multidimensional analysis (OLAP) on the large dataset using MapReduce or Spark. This means that I can answer classical MDX questions in the Hadoop platform with a decent amount of latency. Apache Kylin has recorded brilliant performance of delivering results in sub-seconds response to analytical or aggregation queries.
This time, we are doing the same over a streaming dataset. Our dataset will be a simulated stream of a log file using Kafka and we intend to build a Kylin cube over the streaming dataset. At the end of the class, we will be able to write analytical queries will Kylin receive newer data from the Kafka topic.