Explain dense rank and percent rank window functions in pyspark

The tutorial gives an Overview of Dense rank and Percent rank Window functions in PySpark in Databricks also the difference between both functions is explained in this and how to implement these functions in Python is given.
Last Updated: 19 Jan 2023

Get access to Big Data projects View all Big Data projects

APACHE SPARK PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - Explain dense_rank and percent_rank window functions in PySpark in Databricks?

The dense_rank() function and the percent_rank() function in PySpark is popularly used for day-to-day operations and make the difficult task an easy way. The dense_rank() window function in PySpark is defined to be used to get the result with the rank of rows within the window partition without any gaps that is it is similar to the rank() function, just the difference being rank() function leaves gaps in rank when there are ties. The percent_rank() function in PySpark is defined to return the status of rows in the percentage format with windows specified.

Recipe Objective - Explain dense_rank and percent_rank window functions in PySpark in Databricks?
- System Requirements
- Implementing the dense_rank and percent_rank window functions in Databricks in PySpark

System Requirements

Python (3.0 version)
Apache Spark (3.1.1 version)

This recipe explains what dense_rank and percent_rank window function and how to perform them in PySpark.

Implementing the dense_rank and percent_rank window functions in Databricks in PySpark

# Importing packages import pyspark from pyspark.sql import SparkSession from pyspark.sql.window import Window from pyspark.sql.functions import dense_rank from pyspark.sql.functions import percent_rank Databricks-1

The Sparksession, Window, dense_rank and percent_rank packages are imported in the environment to demonstrate dense_rank and percent_rank window functions in PySpark.

# Implementing the dense_rank and percent_rank window functions in Databricks in PySpark spark = SparkSession.builder.appName('Spark rank() row_number()').getOrCreate() Sample_data = [("Ram", "Technology", 4000), ("Shyam", "Technology", 5600), ("Veer", "Technology", 5100), ("Renu", "Accounts", 4000), ("Ram", "Technology", 4000), ("Vijay", "Accounts", 4300), ("Shivani", "Accounts", 4900), ("Amit", "Sales", 4000), ("Anupam", "Sales", 3000), ("Anas", "Technology", 5100) ] Sample_columns= ["employee_name", "department", "salary"] dataframe = spark.createDataFrame(data = Sample_data, schema = Sample_columns) dataframe.printSchema() dataframe.show(truncate=False) # Using dense_rank() function Window_Spec = Window.partitionBy("department").orderBy("salary") dataframe.withColumn("dense_rank",dense_rank().over(Window_Spec)) \ .show() # Defining percent_rank() function dataframe.withColumn("percent_rank",percent_rank().over(Window_Spec)) \ .show() Databricks-2
Databricks-3
Databricks-4
Databricks-5

The "dataframe" value is created in which the Sample_data and Sample_columns are defined. The dense_rank() function returns the result with the rank of rows within a window partition that is the "Window_Spec" without any gaps. The percent_rank() function in PySpark is defined as returning the rank of rows in the percentage format "Window_Spec."

Download Materials

Databricks_1

Databricks_2

Databricks_3

Databricks_4

Databricks_5

What Users are saying..

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

AWS Project for Batch Processing with PySpark on AWS EMR

In this AWS Project, you will learn how to perform batch processing on Wikipedia data with PySpark on AWS EMR.

View Project Details

Orchestrate Redshift ETL using AWS Glue and Step Functions

ETL Orchestration on AWS - Use AWS Glue and Step Functions to fetch source data and glean faster analytical insights on Amazon Redshift Cluster

View Project Details

Python and MongoDB Project for Beginners with Source Code-Part 2

In this Python and MongoDB Project for Beginners, you will learn how to use Apache Sedona and perform advanced analysis on the Transportation dataset.

View Project Details

Databricks Real-Time Streaming with Event Hubs and Snowflake

In this Azure Databricks Project, you will learn to use Azure Databricks, Event Hubs, and Snowflake to process and analyze real-time data, specifically in monitoring IoT devices.

View Project Details

Hadoop Project to Perform Hive Analytics using SQL and Scala

In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

View Project Details

Airline Dataset Analysis using Hadoop, Hive, Pig and Athena

Hadoop Project- Perform basic big data analysis on airline dataset using big data tools -Pig, Hive and Athena.

View Project Details

SQL Project for Data Analysis using Oracle Database-Part 7

In this SQL project, you will learn to perform various data wrangling activities on an ecommerce database.

View Project Details

Build an ETL Pipeline for Financial Data Analytics on GCP-IaC

In this GCP Project, you will learn to build an ETL pipeline on Google Cloud Platform to maximize the efficiency of financial data analytics with GCP-IaC.

View Project Details

Explore features of Spark SQL in practice on Spark 2.0

The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

View Project Details

A Hands-On Approach to Learn Apache Spark using Scala

Get Started with Apache Spark using Scala for Big Data Analysis

View Project Details

Explain dense rank and percent rank window functions in pyspark

Recipe Objective - Explain dense_rank and percent_rank window functions in PySpark in Databricks?

Table of Contents

System Requirements

Implementing the dense_rank and percent_rank window functions in Databricks in PySpark

Anand Kumpatla

Relevant Projects

You might also like

Relevant Projects