Explain the pivot function and stack function in PySpark in Databricks

This recipe explains what the pivot function and stack function in PySpark in Databricks
Last Updated: 23 Jan 2023

Get access to Big Data projects View all Big Data projects

APACHE SPARK PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - Explain the pivot() function and stack() function in PySpark in Databricks?

In PySpark, the pivot() function is defined as the most important function and used to rotate or transpose the data from one column into the multiple Dataframe columns and back using the unpivot() function. The Pivot() function is an aggregation where one of the grouping columns values is transposed into the individual columns with the distinct data. The PySpark Dataframe is a distributed collection of the data organized into the named columns and is conceptually equivalent to the table in the relational database or the data frame in Python or R language. The Dataframes in PySpark can also be constructed from a wide array of the sources such as the structured data files, the tables in Apache Hive, External databases or the existing Resilient Distributed Datasets. Further, the unpivot is the reverse operation is achieved by rotating column values into the values of the row. The PySpark SQL doesn’t have the unpivot function hence the stack() function is used.

Learn Spark SQL for Relational Big Data Procesing

System Requirements

Python (3.0 version)
Apache Spark (3.1.1 version)

This recipe explains what is Pivot() function, Stack() function and explaining the usage of Pivot() and Stack() in PySpark.

Implementing the Pivot() function and Stack() function in Databricks in PySpark

# Importing packages import pyspark from pyspark.sql import SparkSession, Row from pyspark.sql.types import MapType, StringType from pyspark.sql.functions import col, expr from pyspark.sql.types import StructType,StructField, StringType Databricks-1

The Sparksession, Row, MapType, StringType, col, explode, StructType, StructField, StringType are imported in the environment so as to use pivot() function and stack() function in PySpark .

Explore PySpark Machine Learning Tutorial to take your PySpark skills to the next level!

# Implementing the Pivot() function and Stack() function in Databricks in PySpark spark = SparkSession.builder.appName('Pivot()Stack() PySpark').getOrCreate() sample_data = [("Banana",1000,"USA"), ("Carrots",1500,"USA"), ("Beans",1600,"USA"), \ ("Orange",2000,"USA"),("Orange",2000,"USA"),("Banana",400,"China"), \ ("Carrots",1200,"China"),("Beans",1500,"China"),("Orange",4000,"China"), \ ("Banana",2000,"Canada"),("Carrots",2000,"Canada"),("Beans",2000,"Mexico")] sample_columns= ["Product","Amount","Country"] dataframe = spark.createDataFrame(data = sample_data, schema = sample_columns) dataframe.printSchema() dataframe.show(truncate=False) # Using the pivot() function pivot_DataFrame = dataframe.groupBy("Product").pivot("Country").sum("Amount") pivot_DataFrame.printSchema() pivot_DataFrame.show(truncate=False) pivot_DataFrame2 = dataframe.groupBy("Product","Country") \ .sum("Amount") \ .groupBy("Product") \ .pivot("Country") \ .sum("sum(Amount)") pivot_DataFrame2.printSchema() pivot_DataFrame2.show(truncate=False) # Using stack() function to unpivot unpivot_Expr = "stack(3, 'Canada', Canada, 'China', China, 'Mexico', Mexico) as (Country,Total)" unpivot_DataFrame = pivot_DataFrame2.select("Product", expr(unpivot_Expr)) \ .where("Total is not null") unpivot_DataFrame.show(truncate=False) Databricks-2
Databricks-3
Databricks-4
Databricks-5
Databricks-6

Learn to Transform your data pipeline with Azure Data Factory!

The Spark Session is defined. The "data frame" is defined using the sample_data and the sample_columns. Further, the "pivot_DataFrame" is defined to get the total amount exported to each country of each product and will do group by Product, pivot by Country, and the sum of Amount. Also, this will transpose the countries from the DataFrame rows into the columns and it represents as null by default. Using the stack() function to do unpivot operation, it converts the pivoted column “country” to the rows.

Download Materials

Databricks_1

Databricks_2

Databricks_3

Databricks_4

Databricks_5

Databricks_6

smallzipcode

What Users are saying..

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Python and MongoDB Project for Beginners with Source Code-Part 1

In this Python and MongoDB Project, you learn to do data analysis using PyMongo on MongoDB Atlas Cluster.

View Project Details

Hands-On Real Time PySpark Project for Beginners

In this PySpark project, you will learn about fundamental Spark architectural concepts like Spark Sessions, Transformation, Actions, and Optimization Techniques using PySpark

View Project Details

Build an ETL Pipeline with Talend for Export of Data from Cloud

In this Talend ETL Project, you will build an ETL pipeline using Talend to export employee data from the Snowflake database and investor data from the Azure database, combine them using a Loop-in mechanism, filter the data for each sales representative, and export the result as a CSV file.

View Project Details

Explain the pivot function and stack function in PySpark in Databricks

Recipe Objective - Explain the pivot() function and stack() function in PySpark in Databricks?

System Requirements

Implementing the Pivot() function and Stack() function in Databricks in PySpark

Anand Kumpatla

Relevant Projects

You might also like

Relevant Projects