Explain conversion of dataframe columns to MapType in PySpark

The recipe gives a detailed overview of how create_map() function in Apache Spark is used for the Conversion of DataFrame Columns into MapType in PySpark in DataBricks, also the implementation of these function is shown with a example in Python.
Last Updated: 23 Dec 2022

Get access to Big Data projects View all Big Data projects

APACHE SPARK PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - Explain the conversion of Dataframe columns to MapType in PySpark in Databricks?

The create_map() function in Apache Spark is popularly used to convert the selected or all the DataFrame columns to the MapType, similar to the Python Dictionary (Dict) object. The create_map(column) function takes input as the list of columns grouped as the key-value pairs (key1, value1, key2, value2, key3, value3…) and which has to be converted using the function. The create_map() function returns the MapType column. The create_map() function is the PySpark SQL function which is imported from the "pyspark.sql.functions".

Access Movie Review Sentiment Analysis Project with Source Code

Recipe Objective - Explain the conversion of Dataframe columns to MapType in PySpark in Databricks?
- System Requirements
- Implementing the conversion of Dataframe columns to MapType in Databricks in PySpark

System Requirements

Python (3.0 version)
Apache Spark (3.1.1 version)

This recipe explains the create_map() function and how to perform them in PySpark.

Implementing the conversion of Dataframe columns to MapType in Databricks in PySpark

# Importing package import pyspark from pyspark.sql import SparkSession from pyspark.sql.types import StructType, StructField, StringType, IntegerType from pyspark.sql.functions import col,lit,create_map Databricks-1

The Sparksession, StructType, StructField, StringType, IntegerType, col, lit, and create_map packages are imported in the environment to perform conversion of Dataframe columns to MapType functions in PySpark.

# Implementing the conversion of Dataframe columns to MapType in Databricks in PySpark spark = SparkSession.builder.appName('PySpark create_map()').getOrCreate() Sample_data = [ ("38874","Technology",5000,"IND"), ("42105","Technology",6000,"BHU"), ("46987","Finance",4900,"IND"), ("35412","Entertainment",3500,"ISR"), ("36987","Finance",5500,"IND") ] Sample_schema = StructType([ StructField('id', StringType(), True), StructField('dept', StringType(), True), StructField('salary', IntegerType(), True), StructField('location', StringType(), True) ]) dataframe = spark.createDataFrame(data = Sample_data, schema = Sample_schema) dataframe.printSchema() dataframe.show(truncate=False) #Convert columns to Map dataframe = dataframe.withColumn("PropertiesOnMap",create_map( lit("salary"),col("salary"), lit("location"),col("location") )).drop("salary","location") dataframe.printSchema() dataframe.show(truncate=False) Databricks-1
Databricks-3
Databricks-4

The "dataframe" value is created in which the Sample_data and Sample_schema are defined. The create_map() PySpark SQL function returns the converted DataFrame columns salary and location to the MapType.

Download Materials

Databricks_1

Databricks_2

Databricks_3

Databricks_4

What Users are saying..

Ed Godalle

Director Data Analytics at EY / EY Tech

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Build an Analytical Platform for eCommerce using AWS Services

In this AWS Big Data Project, you will use an eCommerce dataset to simulate the logs of user purchases, product views, cart history, and the user’s journey to build batch and real-time pipelines.

View Project Details

Databricks Data Lineage and Replication Management

Databricks Project on data lineage and replication management to help you optimize your data management practices | ProjectPro

View Project Details

Movielens Dataset Analysis on Azure

Build a movie recommender system on Azure using Spark SQL to analyse the movielens dataset . Deploy Azure data factory, data pipelines and visualise the analysis.

View Project Details

Build an ETL Pipeline for Financial Data Analytics on GCP-IaC

In this GCP Project, you will learn to build an ETL pipeline on Google Cloud Platform to maximize the efficiency of financial data analytics with GCP-IaC.

View Project Details

Build an Incremental ETL Pipeline with AWS CDK

Learn how to build an Incremental ETL Pipeline with AWS CDK using Cryptocurrency data

View Project Details

Learn to Build Regression Models with PySpark and Spark MLlib

In this PySpark Project, you will learn to implement regression machine learning models in SparkMLlib.

View Project Details

Deploying auto-reply Twitter handle with Kafka, Spark and LSTM

Deploy an Auto-Reply Twitter Handle that replies to query-related tweets with a trackable ticket ID generated based on the query category predicted using LSTM deep learning model.

View Project Details

Learn How to Implement SCD in Talend to Capture Data Changes

In this Talend Project, you will build an ETL pipeline in Talend to capture data changes using SCD techniques.

View Project Details

AWS CDK and IoT Core for Migrating IoT-Based Data to AWS

Learn how to use AWS CDK and various AWS services to replicate an On-Premise Data Center infrastructure by ingesting real-time IoT-based.

View Project Details

Build a Data Pipeline in AWS using NiFi, Spark, and ELK Stack

In this AWS Project, you will learn how to build a data pipeline Apache NiFi, Apache Spark, AWS S3, Amazon EMR cluster, Amazon OpenSearch, Logstash and Kibana.

View Project Details

Explain conversion of dataframe columns to MapType in PySpark

Recipe Objective - Explain the conversion of Dataframe columns to MapType in PySpark in Databricks?

Table of Contents

System Requirements

Implementing the conversion of Dataframe columns to MapType in Databricks in PySpark

Ed Godalle

Relevant Projects

You might also like

Relevant Projects