1-844-696-6465 (US)        +91 77600 44484        help@dezyre.com
airline-dataset-analysis-using-spark-mllib.jpg

Spark Project - Airline Dataset Analysis using Spark MLlib

In this Hackerday, we will go through the basis of statistics and see how Spark enables us to perform statistical operations like descriptive and inferential statistics over the very large dataset.
What are the prerequisites for this project?
  • It is expected that students have a fair knowledge of Big Data and Hadoop particularly Hive and Spark.
  • Installation Cloudera Quickstart VM.
  • Since we will be doing the development in the Quickstart VM, it is essential to have the Scala SDK installed there as well.
  • No knowledge of statistics is assumed.

What will you learn

  • Introduction to Spark MLlib
  • MLlib Data Structures
  • Descriptive statistics
  • Inferential statistics
  • Data Sampling
  • Introduction to Machine Learning algorithms with Spark MLlib

Project Description

According to Wikipedia, Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation. It is about building from collected data, a model that can enable humans to describe, analyze and infer event happening around. Statistics is in itself a conduit to the field of Machine Learning and AI.

In this Hackerday, we will go through the basis of statistics and see how Spark enables us to perform statistical operations like descriptive and inferential statistics over the very large dataset.


No knowledge of statistics is assumed in this session. Every concept will be discussed ground up and put to practice on the airline on-time performance dataset. We will conclude the session by introducing a number of machine learning algorithms available in MLlib.
 

Instructors

 
Michael

Big Data & Enterprise Software Engineer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...

What is Hackerday?

Stay updated in technology trends by working on projects

Live online coding sessions led by industry experts

Build 2-4 projects a month each lasting 6 hours designed to teach you advanced concepts

Code in groups and connect with your community