Explain conversion of dataframe columns to MapType in PySpark

The recipe gives a detailed overview of how create_map() function in Apache Spark is used for the Conversion of DataFrame Columns into MapType in PySpark in DataBricks, also the implementation of these function is shown with a example in Python.
Last Updated: 23 Dec 2022

Get access to Big Data projects View all Big Data projects

APACHE SPARK PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - Explain the conversion of Dataframe columns to MapType in PySpark in Databricks?

The create_map() function in Apache Spark is popularly used to convert the selected or all the DataFrame columns to the MapType, similar to the Python Dictionary (Dict) object. The create_map(column) function takes input as the list of columns grouped as the key-value pairs (key1, value1, key2, value2, key3, value3…) and which has to be converted using the function. The create_map() function returns the MapType column. The create_map() function is the PySpark SQL function which is imported from the "pyspark.sql.functions".

Access Movie Review Sentiment Analysis Project with Source Code

Recipe Objective - Explain the conversion of Dataframe columns to MapType in PySpark in Databricks?
- System Requirements
- Implementing the conversion of Dataframe columns to MapType in Databricks in PySpark

System Requirements

Python (3.0 version)
Apache Spark (3.1.1 version)

This recipe explains the create_map() function and how to perform them in PySpark.

Implementing the conversion of Dataframe columns to MapType in Databricks in PySpark

# Importing package import pyspark from pyspark.sql import SparkSession from pyspark.sql.types import StructType, StructField, StringType, IntegerType from pyspark.sql.functions import col,lit,create_map Databricks-1

The Sparksession, StructType, StructField, StringType, IntegerType, col, lit, and create_map packages are imported in the environment to perform conversion of Dataframe columns to MapType functions in PySpark.

# Implementing the conversion of Dataframe columns to MapType in Databricks in PySpark spark = SparkSession.builder.appName('PySpark create_map()').getOrCreate() Sample_data = [ ("38874","Technology",5000,"IND"), ("42105","Technology",6000,"BHU"), ("46987","Finance",4900,"IND"), ("35412","Entertainment",3500,"ISR"), ("36987","Finance",5500,"IND") ] Sample_schema = StructType([ StructField('id', StringType(), True), StructField('dept', StringType(), True), StructField('salary', IntegerType(), True), StructField('location', StringType(), True) ]) dataframe = spark.createDataFrame(data = Sample_data, schema = Sample_schema) dataframe.printSchema() dataframe.show(truncate=False) #Convert columns to Map dataframe = dataframe.withColumn("PropertiesOnMap",create_map( lit("salary"),col("salary"), lit("location"),col("location") )).drop("salary","location") dataframe.printSchema() dataframe.show(truncate=False) Databricks-1
Databricks-3
Databricks-4

The "dataframe" value is created in which the Sample_data and Sample_schema are defined. The create_map() PySpark SQL function returns the converted DataFrame columns salary and location to the MapType.

Download Materials

Databricks_1

Databricks_2

Databricks_3

Databricks_4

What Users are saying..

Savvy Sahai

Data Science Intern, Capgemini

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Log Analytics Project with Spark Streaming and Kafka

In this spark project, you will use the real-world production logs from NASA Kennedy Space Center WWW server in Florida to perform scalable log analytics with Apache Spark, Python, and Kafka.

View Project Details

Build Streaming Data Pipeline using Azure Stream Analytics

In this Azure Data Engineering Project, you will learn how to build a real-time streaming platform using Azure Stream Analytics, Azure Event Hub, and Azure SQL database.

View Project Details

PySpark ETL Project for Real-Time Data Processing

In this PySpark ETL Project, you will learn to build a data pipeline and perform ETL operations for Real-Time Data Processing

View Project Details

AWS CDK Project for Building Real-Time IoT Infrastructure

AWS CDK Project for Beginners to Build Real-Time IoT Infrastructure and migrate and analyze data to

View Project Details

Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive

The goal of this hadoop project is to apply some data engineering principles to Yelp Dataset in the areas of processing, storage, and retrieval.

View Project Details

Project-Driven Approach to PySpark Partitioning Best Practices

In this Big Data Project, you will learn to implement PySpark Partitioning Best Practices.

View Project Details

Build an Analytical Platform for eCommerce using AWS Services

In this AWS Big Data Project, you will use an eCommerce dataset to simulate the logs of user purchases, product views, cart history, and the user’s journey to build batch and real-time pipelines.

View Project Details

Flask API Big Data Project using Databricks and Unity Catalog

In this Flask Project, you will use Flask APIs, Databricks, and Unity Catalog to build a secure data processing platform focusing on climate data. You will also explore advanced features like Docker containerization, data encryption, and detailed data lineage tracking.

View Project Details

Hadoop Project to Perform Hive Analytics using SQL and Scala

In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

View Project Details

Retail Analytics Project Example using Sqoop, HDFS, and Hive

This Project gives a detailed explanation of How Data Analytics can be used in the Retail Industry, using technologies like Sqoop, HDFS, and Hive.

View Project Details

Explain conversion of dataframe columns to MapType in PySpark

Recipe Objective - Explain the conversion of Dataframe columns to MapType in PySpark in Databricks?

Table of Contents

System Requirements

Implementing the conversion of Dataframe columns to MapType in Databricks in PySpark

Savvy Sahai

Relevant Projects

You might also like

Relevant Projects