Each project comes with 2-5 hours of micro-videos explaining the solution.
Get access to 50+ solved projects with iPython notebooks and datasets.
Add project experience to your Linkedin/Github profiles.
According to Wikipedia, Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation. It is about building from collected data, a model that can enable humans to describe, analyze and infer event happening around. Statistics is in itself a conduit to the field of Machine Learning and AI.
In this Hackerday, we will go through the basis of statistics and see how Spark enables us to perform statistical operations like descriptive and inferential statistics over the very large dataset.
No knowledge of statistics is assumed in this session. Every concept will be discussed ground up and put to practice on the airline on-time performance dataset. We will conclude the session by introducing a number of machine learning algorithms available in MLlib.
In this PySpark project, you will simulate a complex real-world data pipeline based on messaging. This project is deployed using the following tech stack - NiFi, PySpark, Hive, HDFS, Kafka, Airflow, Tableau and AWS QuickSight.
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.
In this project, we will be building and querying an OLAP Cube for Flight Delays on the Hadoop platform.