Implementing OLAP on Hadoop using Apache Kylin

Implementing OLAP on Hadoop using Apache Kylin

In this big data project, we will be performing an OLAP cube design using AdventureWorks database. The deliverable for this session will be to design a cube, build and implement it using Kylin, query the cube and even connect familiar tools (like Excel) with our new cube.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Nathan Elbert

Senior Data Scientist at Tiger Analytics

This was great. The use of Jupyter was great. Prior to learning Python I was a self taught SQL user with advanced skills. I hold a Bachelors in Finance and have 5 years of business experience.. I... Read More

Hiren Ahir

Microsoft Azure SQL Sever Developer, BI Developer

I'm a Graduate student and came into the job market and found a university degree wasn't sufficient to get a good paying job. I aimed at hottest technology in the market Big Data but the word BigData... Read More

What will you learn

Apache Kylin and how it works?
Installing Apache Kylin in our Quickstart VM
Design star schema on our AdventureWorks database
Implementing our star schema in Kylin
Writing aggregate queries against a Kylin cube
Connecting a visualization tool

Project Description

Apache Kylin : Implementing OLAP on Hadoop platform

Perform OLAP on Hadoop big data platform has been a burden for a while, primarily due to high latency of queries. A different open source project like impala, presto and even apache hawq have tried to fix the problem with an MPP style of query execution architecture, but with an even larger dataset, performing query aggregation which is key to OLAP queries is still far from desirable.

Apache Kylin (kylin.apache.org) is a Distributed Analytics Engine that provides SQL interface and multidimensional analysis (OLAP) on the large dataset using MapReduce or Spark. This means that I can answer classical MDX questions in the Hadoop platform with a decent amount of latency.

In this big data project, we will be performing an OLAP cube design using the AdventureWorks dataset. The deliverable for this hadoop  be to design a cube, build and implement it using Kylin, query the cube and even connect familiar tools (like Excel) with our new cube.

Similar Projects

In this big data project, we will talk about Apache Zeppelin. We will write code, write notes, build charts and share all in one single data analytics environment using Hive, Spark and Pig.

In this project, we will show how to build an ETL pipeline on streaming datasets using Kafka.

In this big data project, we will continue from a previous hive project "Data engineering on Yelp Datasets using Hadoop tools" and do the entire data processing using spark.

Curriculum For This Mini Project

3-Mar-2018
02h 46m
4-Mar-2018
02h 28m