Explain the withColumnRenamed function in PySpark in Databricks

This recipe explains what the withColumnRenamed function in PySpark in Databricks
Last Updated: 08 Aug 2022

Get access to Big Data projects View all Big Data projects

PYSPARK PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - Explain the withColumnRenamed() function in PySpark in Databricks?

In PySpark, the withColumnRenamed() function is widely used to rename columns or multiple columns in PySpark Dataframe. As the DataFrame’s are the immutable collection so, it can’t be renamed or updated instead when using the withColumnRenamed() function, it creates the new DataFrame with the updated column names. The Resilient Distributed Datasets or RDDs are defined as the fundamental data structure of Apache PySpark. It was developed by The Apache Software Foundation. It is the immutable distributed collection of objects. In RDD, each dataset is divided into logical partitions which may be computed on different nodes of the cluster. The RDDs concept was launched in the year 2011. The Dataset is defined as a data structure in the SparkSQL that is strongly typed and is a map to the relational schema. It represents the structured queries with encoders and is an extension to dataframe API. Spark Dataset provides both the type safety and object-oriented programming interface. The Datasets concept was launched in the year 2015.

Build a Real-Time Dashboard with Spark, Grafana and Influxdb

Recipe Objective - Explain the withColumnRenamed() function in PySpark in Databricks?
- System Requirements
- Implementing the withColumnRenamed() function in Databricks in PySpark

System Requirements

Python (3.0 version)
Apache Spark (3.1.1 version)

This recipe explains what is withColumnRenamed() function and explains their usage in PySpark.

Implementing the withColumnRenamed() function in Databricks in PySpark

# Importing packages import pyspark from pyspark.sql import SparkSession from pyspark.sql.functions import * from pyspark.sql.types import StructType,StructField, StringType, IntegerType Databricks-1

The Sparksession, StructType, StructField, StringType, IntegerType and all SQL Functions are imported in the environment so as to use withColumnRenamed() function in the PySpark .

# Implementing the withColumnRenamed() function in Databricks in PySpark spark = SparkSession.builder.appName('withColumRenamed() PySpark').getOrCreate() sample_dataDataframe = [(('Ram','','Aggarwal'),'1994-06-02','M',4000), (('Shyam','Gupta',''),'2002-07-21','M',5000), (('Amit','','Jain'),'1988-07-02','M',5000), (('Pooja','Rahul','Kumar'),'1977-09-02','F',5000), (('Sunita','Kumari','Kapoor'),'1990-04-18','F',-2) ] sample_schema = StructType([ StructField('name', StructType([ StructField('firstname', StringType(), True), StructField('middlename', StringType(), True), StructField('lastname', StringType(), True) ])), StructField('dob', StringType(), True), StructField('gender', StringType(), True), StructField('salary', IntegerType(), True) ]) dataframe = spark.createDataFrame(data = sample_dataDataframe, schema = sample_schema) dataframe.printSchema() # Using withColumnRenamed() function dataframe.withColumnRenamed("dob","Date_Of_Birth").printSchema() # Using withColumnRenamed() function on multiple column dataframe2 = dataframe.withColumnRenamed("dob","Date_Of_Birth") \ .withColumnRenamed("salary","salaryAmount") dataframe2.printSchema() Databricks-2
Databricks-3

The Spark Session is defined. The "sample_dataDataframe" and "sample_schema" are defined. The DataFrame "data frame" is defined using the sample_dataDataframe and sample_schema. Using the withColumnRenamed() function returns the new DataFrame and doesn’t modify the current DataFrame. It changes the column “dob” to “DateOfBirth” on the PySpark DataFrame. The DataFrame "data frame" is defined while using withColumnRenamed() function on "dob" and "salary" columns.

Download Materials

Databricks_1

Databricks_2

Databricks_3

What Users are saying..

Abhinav Agarwal

Graduate Student at Northwestern University

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Streaming Data Pipeline using Spark, HBase and Phoenix

Build a Real-Time Streaming Data Pipeline for an application that monitors oil wells using Apache Spark, HBase and Apache Phoenix .

View Project Details

Build a real-time Streaming Data Pipeline using Flink and Kinesis

In this big data project on AWS, you will learn how to run an Apache Flink Python application for a real-time streaming platform using Amazon Kinesis.

View Project Details

Python and MongoDB Project for Beginners with Source Code-Part 1

In this Python and MongoDB Project, you learn to do data analysis using PyMongo on MongoDB Atlas Cluster.

View Project Details

Deploy an Application to Kubernetes in Google Cloud using GKE

In this Kubernetes Big Data Project, you will automate and deploy an application using Docker, Google Kubernetes Engine (GKE), and Google Cloud Functions.

View Project Details

Build a Spark Streaming Pipeline with Synapse and CosmosDB

In this Spark Streaming project, you will learn to build a robust and scalable spark streaming pipeline using Azure Synapse Analytics and Azure Cosmos DB and also gain expertise in window functions, joins, and logic apps for comprehensive real-time data analysis and processing.

View Project Details

Explain the withColumnRenamed function in PySpark in Databricks

Recipe Objective - Explain the withColumnRenamed() function in PySpark in Databricks?

Table of Contents

System Requirements

Implementing the withColumnRenamed() function in Databricks in PySpark

Abhinav Agarwal

Relevant Projects

You might also like

Relevant Projects