Predicting Flight Delays using Apache Spark and Kylin

Predicting Flight Delays using Apache Spark and Kylin

In this project, we will be building and querying an OLAP Cube for Flight Delays on the Hadoop platform.
explanation image

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews
profile image

Nathan Elbert linkedin profile url

Senior Data Scientist at Tiger Analytics

This was great. The use of Jupyter was great. Prior to learning Python I was a self taught SQL user with advanced skills. I hold a Bachelors in Finance and have 5 years of business experience.. I... Read More

profile image

Mike Vogt linkedin profile url

Information Architect at Bank of America

I have had a very positive experience. The platform is very rich in resources, and the expert was thoroughly knowledgeable on the subject matter - real world hands-on experience. I wish I had this... Read More

What will you learn

Discuss the installation of Apache Kylin in a Hortonworks sandbox.
Design star schema on our flight dataset
Implementing our star schema in Kylin
Building and merging Kyline segments incrementally.
Building Cubes using Kylin Restful API
How to execute Cubes using Spark Engine

Project Description

In previous Hackerday sessions, we have introduced how to bring OLAP to extremely large datasets in Apache Kylin. For those who don't know what Kylin is, Kylin (kylin.apache.org) is a Distributed Analytics Engine that provides SQL interface and multidimensional analysis (OLAP) on the large dataset using MapReduce or Spark. This means that I can answer classical aggregate queries in the Hadoop platform with a low latency over billions of records.

In this Hackerday, we will be performing an OLAP cube design using the flight on-time dataset. Since we have previously introduced Kylin, this Hackerday session will look at more involved features like incremental build, performance tuning or consideration tips, we will discuss the Spark engine as well as how to build different types of model.

Similar Projects

PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

In this NoSQL project, we will use two NoSQL databases(HBase and MongoDB) to store Yelp business attributes and learn how to retrieve this data for processing or query.

Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Curriculum For This Mini Project