Explain the Append SaveMode in Spark and demonstrate it

This recipe helps you to understand the Append SaveMode method in Spark and demonstrates it. Append in spark savemode appends the Dataframe to the already existing data frame.

Recipe Objective - Explain the Append SaveMode method in Spark and demonstrate it with an example.

Apache Spark provides various savemodes to save files in its directory or folders. While saving the Dataframe to the data source, if the dataframe already exists at the given location, Append in spark savemode appends the Dataframe to the already existing dataframe.  Thus, if the data/table already exists, then the contents of the Dataframe are expected to be appended to the existing data. The Append savemode option is used carefully as it may result in the loss of data that cannot be recovered later on. Apache Spark, by default, does not append the output directory on Amazon AWS S3 storage, HDFS, or any other file system. So when tried to write DataFrame contents that are JSON, CSV, Avro, Parquet, ORC, etc., to the existing directory, Spark returns the runtime error.

Access Snowflake Real Time Data Warehousing Project with Source Code

System Requirements

This recipe explains what is Append savemode, Defining its usefulness and demonstrating it using an example.

Implementing Append savemode in Databricks

// Importing Packages
import org.apache.spark.sql.SaveMode

Databricks-1

The Spark SQL functions package is imported into the environment to run Savemode Overwrite function.

// Defining Append Savemode function
dataframe.write.mode(SaveMode.Append).csv("/home/desktop/folder")

Databricks-2

Savemode() function is used while writing the dataframe in Spark. The dataframe is saved using Append savemode, and the path of the folder is specified with the .csv type of file.

Further options can be added while writing the file in Spark partitionBy, format, saveAsTable, etc. These functions add extra features while writing and saving the file.

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Learn to Build Regression Models with PySpark and Spark MLlib
In this PySpark Project, you will learn to implement regression machine learning models in SparkMLlib.

Explore features of Spark SQL in practice on Spark 2.0
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Hadoop Project to Perform Hive Analytics using SQL and Scala
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

AWS Project-Website Monitoring using AWS Lambda and Aurora
In this AWS Project, you will learn the best practices for website monitoring using AWS services like Lambda, Aurora MySQL, Amazon Dynamo DB and Kinesis.

dbt Snowflake Project to Master dbt Fundamentals in Snowflake
DBT Snowflake Project to Master the Fundamentals of DBT and learn how it can be used to build efficient and robust data pipelines with Snowflake.

Retail Analytics Project Example using Sqoop, HDFS, and Hive
This Project gives a detailed explanation of How Data Analytics can be used in the Retail Industry, using technologies like Sqoop, HDFS, and Hive.

Movielens Dataset Analysis on Azure
Build a movie recommender system on Azure using Spark SQL to analyse the movielens dataset . Deploy Azure data factory, data pipelines and visualise the analysis.

Hive Mini Project to Build a Data Warehouse for e-Commerce
In this hive project, you will design a data warehouse for e-commerce application to perform Hive analytics on Sales and Customer Demographics data using big data tools such as Sqoop, Spark, and HDFS.

Build Classification and Clustering Models with PySpark and MLlib
In this PySpark Project, you will learn to implement pyspark classification and clustering model examples using Spark MLlib.

SQL Project for Data Analysis using Oracle Database-Part 1
In this SQL Project for Data Analysis, you will learn to efficiently leverage various analytical features and functions accessible through SQL in Oracle Database