Read older versions of the data using time travel in Databricks

This recipe helps you read older versions of the data using time travel in Databricks. The Delta Lake table, defined as the Delta table, is both a batch table and the streaming source and sink.
Last Updated: 20 Dec 2022

Get access to Big Data projects View all Big Data projects

APACHE HADOOP PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - How to read older versions of the data using time travel in Databricks?

The Delta Lake table, defined as the Delta table, is both a batch table and the streaming source and sink. The Streaming data ingest, batch historic backfill, and interactive queries all work out of the box. Delta Lake provides the ability to specify the schema and also enforce it, which further helps ensure that data types are correct and the required columns are present, which also helps in building the delta tables and also preventing the bad data from causing data corruption in both delta lake and delta table. The Delta can write the batch and the streaming data into the same table, allowing a simpler architecture and quicker data ingestion to the query result. Also, the Delta provides the ability to infer the schema for data input which further reduces the effort required in managing the schema changes. The previous snapshots of the Delta table can be queried by using the time travel method that is an older version of the data that can be easily accessed. Time travel takes advantage of the power of the Delta Lake transaction log for accessing data that is no longer in the table.

Recipe Objective - How to read older versions of the data using time travel in Databricks?
- System Requirements
- Implementing reading of older version of data in Delta Table

System Requirements

Scala (2.12 version)
Apache Spark (3.1.1 version)

This recipe explains Delta lake and how to read the older versions of data using the time travel in Spark.

Implementing reading of older version of data in Delta Table

// Importing packages import org.apache.spark.sql.{SaveMode, SparkSession} import io.delta.implicits._

Databricks-1

The spark SQL Savemode & Sparksession package and Delta implicit package, and Delta table package are imported in the environment to read older versions of data using time travel in Databricks.

// Implementing reading of older version of data in Delta Table object DeltaTableOlderVersions extends App { val spark: SparkSession = SparkSession.builder() .master("local[1]") .appName("Spark Read Older Version Delta table") .getOrCreate() spark.sparkContext.setLogLevel("ERROR") // Reading older versions of data of Delta table val read_older = spark.read.format("delta").option("versionAsOf", 0).load("/delta/events") read_older.show() }

Databricks-2

Databricks-3

The DeltaTableOlderVersions is created in which a spark session is initiated. The Delta table from the path "/delta/events" is loaded using the "spark.read.format()" function and is used for reading older versions of data from the Deta table while creating value "read_older" which reads the table using time travel. Further, the first set of data is queried using "versionAsOf" option in "spark.read.format()" function. The version 0 data is displayed. Version 1 can be used that will display the new data.

Download Materials

Databricks_1

Databricks_2

Databricks_3

What Users are saying..

Jingwei Li

Graduate Research assistance at Stony Brook University

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Read older versions of the data using time travel in Databricks

Recipe Objective - How to read older versions of the data using time travel in Databricks?

Table of Contents

System Requirements

Implementing reading of older version of data in Delta Table

Jingwei Li

Relevant Projects

You might also like

Relevant Projects