1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com
predicting-flight-delays-using-apache-spark-and-kylin.jpg

Predicting Flight Delays using Apache Spark and Kylin

In this project, we will be building and querying an OLAP Cube for Flight Delays on the Hadoop platform.
What are the prerequisites for this project?
  • A fair knowledge of Hadoop, Hive, HBase and MapReduce
  • A fair knowledge of data warehousing and OLAP design
  • A working Hadoop distribution sandbox (eg. Cloudera QuickStart VM or Hortonworks HDP sandbox)

What will you learn

  • Discuss the installation of Apache Kylin in a Hortonworks sandbox.
  • Design star schema on our flight dataset
  • Implementing our star schema in Kylin
  • Building and merging Kyline segments incrementally.
  • Building Cubes using Kylin Restful API
  • How to execute Cubes using Spark Engine

Project Description

In previous Hackerday sessions, we have introduced how to bring OLAP to extremely large datasets in Apache Kylin. For those who don't know what Kylin is, Kylin (kylin.apache.org) is a Distributed Analytics Engine that provides SQL interface and multidimensional analysis (OLAP) on the large dataset using MapReduce or Spark. This means that I can answer classical aggregate queries in the Hadoop platform with a low latency over billions of records.

In this Hackerday, we will be performing an OLAP cube design using the flight on-time dataset. Since we have previously introduced Kylin, this Hackerday session will look at more involved features like incremental build, performance tuning or consideration tips, we will discuss the Spark engine as well as how to build different types of model.

Instructors

 
Michael

Big Data & Enterprise Software Engineer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...

What is Hackerday?

Stay updated in technology trends by working on projects

Live online coding sessions led by industry experts

Build 2-4 projects a month each lasting 6 hours designed to teach you advanced concepts

Code in groups and connect with your community