Read and write a Dataframe into a Text file in Apache Spark

This recipe helps you read and write data as a Dataframe into a Text file format in Apache Spark. The Dataframe in Apache Spark is defined as the distributed collection of the data organized into the named columns.Dataframe is equivalent to the table conceptually in the relational database or the data frame in R or Python languages but offers richer optimizations.
Last Updated: 16 Dec 2022

Get access to Big Data projects View all Big Data projects

APACHE HADOOP PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - Read and write data as a Dataframe into a Text file format in Apache Spark?

The Dataframe in Apache Spark is defined as the distributed collection of the data organized into the named columns. Dataframe is equivalent to the table conceptually in the relational database or the data frame in R or Python languages but offers richer optimizations. The DataFrames can be constructed from a wide array of sources: the structured data files, tables in Hive, the external databases, or the existing Resilient distributed datasets. The test file is defined as a kind of computer file structured as the sequence of lines of electronic text. The text file exists stored as data within a computer file system, and also the "Text file" refers to the type of container, whereas plain text refers to the type of content. The Apache Spark provides many ways to read .txt files that is "sparkContext.textFile()" and "sparkContext.wholeTextFiles()" methods to read into the Resilient Distributed Systems(RDD) and "spark.read.text()" & "spark.read.textFile()" methods to read into the DataFrame from local or the HDFS file.

System Requirements

Scala (2.12 version)
Apache Spark (3.1.1 version)

This recipe explains Spark Dataframe and various options available in Spark CSV while reading & writing data as a dataframe into a CSV file.

Implementing Spark CSV in Databricks

nullValues: The nullValues option specifies the string in a JSON format to consider it as null. For example, if a date column is considered with a value "2000-01-01", set null on the DataFrame.

dateFormat: The dateFormat option is used to set the format of input DateType and the TimestampType columns. dateFormat supports all the java.text.SimpleDateFormat formats.

// Importing Packages

import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}

import spark.implicits._

Databricks-1

The spark SQL and implicit package are imported to read and write data as the dataframe into a Text file format.

// Implementing Text File

object TextFile {

def main(args:Array[String]):Unit= {

val spark: SparkSession = SparkSession.builder()

.master("local[1]")

.appName("Spark Text File")

.getOrCreate()

// Reading Text file and returns DataFrame

val dataframe:DataFrame = spark.read.text("/FileStore/tables/textfile.txt")

dataframe.printSchema()

dataframe.show(false)

// Converting to columns by splitting

// Using Map transformation

val dataframe2 = dataframe.map(f=>{

val element = f.getString(0).split(",")

(element(0),element(1))

})

dataframe2.printSchema()

dataframe2.show(false)

// Writing of Text file

dataframe2.write.text("/FileStore/tables/textfile.txt")

}

Databricks-2

Databricks-3

Textfile object is created in which spark session is initiated. The dataframe value is created in which textfile.txt is read using spark.read.text("path") function. The dataframe2 value is created for converting records(i.e., Containing One column named "value") into columns by splitting by using map transformation and split method to transform. Finally, the text file is written using "dataframe.write.text("path)" function.

Download Materials

Databricks_1

Databricks_2

Databricks_3

textfile

What Users are saying..

Abhinav Agarwal

Graduate Student at Northwestern University

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More