Explain mapvalues and mapkeys function in PySpark in Databricks

This recipe explains what mapvalues and mapkeys function in PySpark in Databricks
Last Updated: 23 Aug 2022

Get access to Big Data projects View all Big Data projects

APACHE SPARK PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - Explain map_values() and map_keys() function in PySpark in Databricks?

The PySpark MapType (also called map type) in Apache Spark is popularly known as the data type, used to represent the Python Dictionary (dict) for storing the key-value pair. The MapType object comprises of the three fields which are key type (a DataType), valueType (a DataType) and the valueContainsNull (a BooleanType). The PySpark MapType represents the Map key-value pair similar to the python Dictionary (Dict). It extends the DataType class, which is the superclass of all the types in the PySpark, which takes the two mandatory arguments: key type and value type of type DataType and one optional boolean argument that is valueContainsNull. The map_values() function is used to get all the map values. The map_keys() function is used to get all map keys.

ETL Orchestration on AWS using Glue and Step Functions

System Requirements

Python (3.0 version)
Apache Spark (3.1.1 version)

This recipe explains what are PySpark MapType, map_values() function, map_keys() and how to perform them in PySpark.

Implementing the map_values() and map_keys() functions in Databricks in PySpark

# Importing packages import pyspark from pyspark.sql import SparkSession from pyspark.sql.types import StructField, StructType, StringType, MapType from pyspark.sql.types import StringType, MapType from pyspark.sql.functions import map_values from pyspark.sql.functions import map_keys Databricks-1

The Sparksession, StructField, StructType, StringType, MapType, map_values and map_keys packages are imported in the environment so as to perform map_values() and map_keys() functions in PySpark.

# Implementing the map_values() and map_keys() functions in Databricks in PySpark spark = SparkSession.builder.appName('PySpark map_values() and map_keys()').getOrCreate() Sample_schema = StructType([ StructField('name', StringType(), True), StructField('properties', MapType(StringType(),StringType()),True) ]) Sample_dataDictionary = [ ('Ram',{'hair':'brown','eye':'brown'}), ('Shyam',{'hair':'black','eye':'black'}), ('Raman',{'hair':'orange','eye':'black'}), ('Sonu',{'hair':'red','eye':None}), ('Vinay',{'hair':'black','eye':''}) ] dataframe = spark.createDataFrame(data = Sample_dataDictionary, schema = Sample_schema) dataframe.printSchema() dataframe.show(truncate=False) # Using map_values() function dataframe.select(dataframe.name, map_values(dataframe.properties)).show() # Using map_keys() function dataframe.select(dataframe.name, map_keys(dataframe.properties)).show() Databricks-2
Databricks-3
Databricks-4

The "dataframe" value is created in which the Sample_dataDictionary and Sample_schema are defined. Using the map_values() PySpark function returns the map values of all the dataframe properties present in the dataframe. The map_keys() PySpark function returns the map keys of all the dataframe properties current in the dataframe.

Download Materials

Databricks_1

Databricks_2

Databricks_3

Databricks_4

What Users are saying..

Ed Godalle

Director Data Analytics at EY / EY Tech

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Analyse Yelp Dataset with Spark & Parquet Format on Azure Databricks

In this Databricks Azure project, you will use Spark & Parquet file formats to analyse the Yelp reviews dataset. As part of this you will deploy Azure data factory, data pipelines and visualise the analysis.

View Project Details

Big Data Project for Solving Small File Problem in Hadoop Spark

This big data project focuses on solving the small file problem to optimize data processing efficiency by leveraging Apache Hadoop and Spark within AWS EMR by implementing and demonstrating effective techniques for handling large numbers of small files.

View Project Details

Flask API Big Data Project using Databricks and Unity Catalog

In this Flask Project, you will use Flask APIs, Databricks, and Unity Catalog to build a secure data processing platform focusing on climate data. You will also explore advanced features like Docker containerization, data encryption, and detailed data lineage tracking.

View Project Details

SQL Project for Data Analysis using Oracle Database-Part 5

In this SQL Project for Data Analysis, you will learn to analyse data using various SQL functions like ROW_NUMBER, RANK, DENSE_RANK, SUBSTR, INSTR, COALESCE and NVL.

View Project Details

A Hands-On Approach to Learn Apache Spark using Scala

Get Started with Apache Spark using Scala for Big Data Analysis

View Project Details

Getting Started with Pyspark on AWS EMR and Athena

In this AWS Big Data Project, you will learn to perform Spark Transformations using a real-time currency ticker API and load the processed data to Athena using Glue Crawler.

View Project Details

Yelp Data Processing Using Spark And Hive Part 1

In this big data project, you will learn how to process data using Spark and Hive as well as perform queries on Hive tables.

View Project Details

Build an ETL Pipeline with Talend for Export of Data from Cloud

In this Talend ETL Project, you will build an ETL pipeline using Talend to export employee data from the Snowflake database and investor data from the Azure database, combine them using a Loop-in mechanism, filter the data for each sales representative, and export the result as a CSV file.

View Project Details

Movielens Dataset Analysis on Azure

Build a movie recommender system on Azure using Spark SQL to analyse the movielens dataset . Deploy Azure data factory, data pipelines and visualise the analysis.

View Project Details

AWS Project-Website Monitoring using AWS Lambda and Aurora

In this AWS Project, you will learn the best practices for website monitoring using AWS services like Lambda, Aurora MySQL, Amazon Dynamo DB and Kinesis.

View Project Details

Explain mapvalues and mapkeys function in PySpark in Databricks

Recipe Objective - Explain map_values() and map_keys() function in PySpark in Databricks?

System Requirements

Implementing the map_values() and map_keys() functions in Databricks in PySpark

Ed Godalle

Relevant Projects

You might also like

Relevant Projects