Spark Project - Airline Dataset Analysis using Spark MLlib

Spark Project - Airline Dataset Analysis using Spark MLlib

In this Hackerday, we will go through the basis of statistics and see how Spark enables us to perform statistical operations like descriptive and inferential statistics over the very large dataset.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

James Peebles

Data Analytics Leader, IQVIA

This is one of the best of investments you can make with regards to career progression and growth in technological knowledge. I was pointed in this direction by a mentor in the IT world who I highly... Read More

Dhiraj Tandon

Solution Architect-Cyber Security at ColorTokens

My Interaction was very short but left a positive impression. I enrolled and asked for a refund since I could not find the time. What happened next: They initiated Refund immediately. Their... Read More

What will you learn

Introduction to Spark MLlib
MLlib Data Structures
Descriptive statistics
Inferential statistics
Data Sampling
Introduction to Machine Learning algorithms with Spark MLlib

Project Description

According to Wikipedia, Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation. It is about building from collected data, a model that can enable humans to describe, analyze and infer event happening around. Statistics is in itself a conduit to the field of Machine Learning and AI.

In this Hackerday, we will go through the basis of statistics and see how Spark enables us to perform statistical operations like descriptive and inferential statistics over the very large dataset.


No knowledge of statistics is assumed in this session. Every concept will be discussed ground up and put to practice on the airline on-time performance dataset. We will conclude the session by introducing a number of machine learning algorithms available in MLlib.
 

Similar Projects

In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.

In this big data spark project, we will do Twitter sentiment analysis using spark streaming on the incoming streaming data.

In this project, we will use complex scenarios to make Spark developers better to deal with the issues that come in the real world.

Curriculum For This Mini Project