Spark Project - Airline Dataset Analysis using Spark MLlib

Spark Project - Airline Dataset Analysis using Spark MLlib

In this Hackerday, we will go through the basis of statistics and see how Spark enables us to perform statistical operations like descriptive and inferential statistics over the very large dataset.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

Introduction to Spark MLlib
MLlib Data Structures
Descriptive statistics
Inferential statistics
Data Sampling
Introduction to Machine Learning algorithms with Spark MLlib

Project Description

According to Wikipedia, Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation. It is about building from collected data, a model that can enable humans to describe, analyze and infer event happening around. Statistics is in itself a conduit to the field of Machine Learning and AI.

In this Hackerday, we will go through the basis of statistics and see how Spark enables us to perform statistical operations like descriptive and inferential statistics over the very large dataset.


No knowledge of statistics is assumed in this session. Every concept will be discussed ground up and put to practice on the airline on-time performance dataset. We will conclude the session by introducing a number of machine learning algorithms available in MLlib.
 

Similar Projects

In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

In this project, we will be building and querying an OLAP Cube for Flight Delays on the Hadoop platform.

Curriculum For This Mini Project