Explain mapvalues and mapkeys function in PySpark in Databricks

This recipe explains what mapvalues and mapkeys function in PySpark in Databricks
Last Updated: 23 Aug 2022

Get access to Big Data projects View all Big Data projects

APACHE SPARK PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - Explain map_values() and map_keys() function in PySpark in Databricks?

The PySpark MapType (also called map type) in Apache Spark is popularly known as the data type, used to represent the Python Dictionary (dict) for storing the key-value pair. The MapType object comprises of the three fields which are key type (a DataType), valueType (a DataType) and the valueContainsNull (a BooleanType). The PySpark MapType represents the Map key-value pair similar to the python Dictionary (Dict). It extends the DataType class, which is the superclass of all the types in the PySpark, which takes the two mandatory arguments: key type and value type of type DataType and one optional boolean argument that is valueContainsNull. The map_values() function is used to get all the map values. The map_keys() function is used to get all map keys.

ETL Orchestration on AWS using Glue and Step Functions

System Requirements

Python (3.0 version)
Apache Spark (3.1.1 version)

This recipe explains what are PySpark MapType, map_values() function, map_keys() and how to perform them in PySpark.

Implementing the map_values() and map_keys() functions in Databricks in PySpark

# Importing packages import pyspark from pyspark.sql import SparkSession from pyspark.sql.types import StructField, StructType, StringType, MapType from pyspark.sql.types import StringType, MapType from pyspark.sql.functions import map_values from pyspark.sql.functions import map_keys Databricks-1

The Sparksession, StructField, StructType, StringType, MapType, map_values and map_keys packages are imported in the environment so as to perform map_values() and map_keys() functions in PySpark.

# Implementing the map_values() and map_keys() functions in Databricks in PySpark spark = SparkSession.builder.appName('PySpark map_values() and map_keys()').getOrCreate() Sample_schema = StructType([ StructField('name', StringType(), True), StructField('properties', MapType(StringType(),StringType()),True) ]) Sample_dataDictionary = [ ('Ram',{'hair':'brown','eye':'brown'}), ('Shyam',{'hair':'black','eye':'black'}), ('Raman',{'hair':'orange','eye':'black'}), ('Sonu',{'hair':'red','eye':None}), ('Vinay',{'hair':'black','eye':''}) ] dataframe = spark.createDataFrame(data = Sample_dataDictionary, schema = Sample_schema) dataframe.printSchema() dataframe.show(truncate=False) # Using map_values() function dataframe.select(dataframe.name, map_values(dataframe.properties)).show() # Using map_keys() function dataframe.select(dataframe.name, map_keys(dataframe.properties)).show() Databricks-2
Databricks-3
Databricks-4

The "dataframe" value is created in which the Sample_dataDictionary and Sample_schema are defined. Using the map_values() PySpark function returns the map values of all the dataframe properties present in the dataframe. The map_keys() PySpark function returns the map keys of all the dataframe properties current in the dataframe.

Download Materials

Databricks_1

Databricks_2

Databricks_3

Databricks_4

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Build an ETL Pipeline with DBT, Snowflake and Airflow

Data Engineering Project to Build an ETL pipeline using technologies like dbt, Snowflake, and Airflow, ensuring seamless data extraction, transformation, and loading, with efficient monitoring through Slack and email notifications via SNS

View Project Details

Build an ETL Pipeline with Talend for Export of Data from Cloud

In this Talend ETL Project, you will build an ETL pipeline using Talend to export employee data from the Snowflake database and investor data from the Azure database, combine them using a Loop-in mechanism, filter the data for each sales representative, and export the result as a CSV file.

View Project Details

Explain mapvalues and mapkeys function in PySpark in Databricks

Recipe Objective - Explain map_values() and map_keys() function in PySpark in Databricks?

System Requirements

Implementing the map_values() and map_keys() functions in Databricks in PySpark

Ray han

Relevant Projects

You might also like

Relevant Projects