Each project comes with 2-5 hours of micro-videos explaining the solution.

Get access to 50+ solved projects with iPython notebooks and datasets.

Add project experience to your Linkedin/Github profiles.

I have worked for more than 15 years in Java and J2EE and have recently developed an interest in Big Data technologies and Machine learning due to a big need at my workspace. I was referred here by a... Read More

This was great. The use of Jupyter was great. Prior to learning Python I was a self taught SQL user with advanced skills. I hold a Bachelors in Finance and have 5 years of business experience.. I... Read More

Overview of the project, its motive and expected output

What is Pyspark

Spark as a Bigdata Cluster Computing framework

Installing Anaconda and Spark

Interaction with Spark Shell using Python API

Understanding Transformation and Actions using Spark

Establishing Spark Environment and creating a handshake function between Python and Spark

What is Resilient Distributed Data-RDD and performing RDD operation

Creating RDD partitions and Instances

Performing Basic Descriptive Statistics using PySpark

Performing Basic Statistical Test in PySpark

Understanding Linear Relation and calculating Correlation

Performing the Chi-Squared test for non-linear relation

Importing necessary library for implementing model on datapoints

Using Map and lambda function to read a dataset

Applying the Logistic Regression model for training and making final predictions

This series of PySpark project will look at installing Apache Spark on the cluster and explore various data analysis tasks using PySpark for various big data and data science applications.

This video PySpark tutorial explains various transformations and actions that can be performed using PySpark with multiple examples.

In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

In this data science project, we will look at few examples where we can apply various time series forecasting techniques.

In this project, we will be building and querying an OLAP Cube for Flight Delays on the Hadoop platform.

Overview of Project

00m

What is PySpark

01m

Install PySpark

05m

Handshake between Python and Spark

12m

RDD - Resilient Distributed Data

03m

RDD operations

07m

Basic Statistics using PySpark

03m

Recap

02m

Basic Statistical Test

06m

Calculate Correlation

02m

Chi Squared Test

03m

Implement Machine Learning

09m

Logistic Regression Model

11m