PySpark Tutorial - Learn to use Apache Spark with Python

PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.
Videos
Each project comes with 2-5 hours of micro-videos explaining the solution.
Code & Dataset
Get access to 50+ solved projects with iPython notebooks and datasets.
Project Experience
Add project experience to your Linkedin/Github profiles.

What will you learn

  • What is PySpark?
  • Installing and Configuring PySpark
  • Basic Interaction with Spark Shell using Python API- PySpark
  • Explaining Transformation and Actions using PySpark
  • Learn about logistic regression machine learning model using PySpark as a tool.

Project Description

This series of PySpark project will look at installing Apache Spark on the cluster and explore various data analysis tasks using PySpark for various big data and data science applications.

This video PySpark tutorial explains various transformations and actions that can be performed using PySpark with multiple examples.

Curriculum For This Mini Project

 
  Overview of Project
00m
  What is PySpark
01m
  Install PySpark
05m
  Handshake between Python and Spark
12m
  RDD - Resilient Distributed Data
03m
  RDD operations
07m
  Basic Statistics using PySpark
03m
  Recap
02m
  Basic Statistical Test
06m
  Calculate Correlation
02m
  Chi Squared Test
03m
  Implement Machine Learning
09m
  Logistic Regression Model
11m