How to Convert PySpark String to Timestamp Type in Databricks?

This recipe will help you with know-how to convert a PySpark string to Timestamp in Databricks. | ProjectPro

Recipe Objective - How to  Convert String to Timestamp in PySpark?

The to_timestamp() function in Apache PySpark is popularly used to convert String to the Timestamp(i.e., Timestamp Type). The default format of the Timestamp is "MM-dd-yyyy HH:mm: ss.SSS," and if the input is not in the specified form, it returns Null. The "to_timestamp(timestamping: Column, format: String)" is the syntax of the Timestamp() function where the first argument specifies the input of the timestamp string that is the column of the dataframe. The Second argument specifies an additional String argument which further defines the format of the input Timestamp and helps in the casting of the String from any form to the Timestamp type in the PySpark.

System Requirements

  • Python (3.0 version)

  • Apache Spark (3.1.1 version)

Check out this recipe to understand the conversion from string to timestamp in Spark SQL. This recipe provides a step-by-step guide on how to convert a string to a timestamp in PySpark, covering essential concepts such as the to_timestamp() function and casting using the cast() function.

Before diving into the conversion process, ensure you have the necessary environment set up. This includes having Python (version 3.0 or above) and Apache Spark (version 3.1.1 or above) installed. 

Implementing the to_timestamp() functions in Databricks in PySpark

# Importing packages

from pyspark.sql import SparkSession

from pyspark.sql.functions import *

Importing packages for PySpark string to Datetime

The SparkSession and all packages are imported in the environment to perform conversion of PySpark string to timestamp function.

Initialize a Spark session using the SparkSession.builder method. This is the starting point for creating a DataFrame and performing PySpark operations.

# Implementing String to Timestamp in Databricks in PySpark

spark = SparkSession.builder \

          .appName('PySpark String to Timestamp') \

          .getOrCreate()

Create a sample DataFrame with a string column representing timestamps. This will be the basis for the subsequent conversion operations.

dataframe = spark.createDataFrame(

        data = [ ("1","2021-08-26 11:30:21.000")],

        schema=["id","input_timestamp"])

dataframe.printSchema()

Also Check - Explain the creation of Dataframes in PySpark in Databricks

Utilize the to_timestamp() function to convert the string column (input_timestamp) to a timestamp column (timestamp) within the DataFrame.

# Converting String to Timestamp

dataframe.withColumn("timestamp",to_timestamp("input_timestamp")) \

  .show(truncate=False)

In cases where the timestamp format is not standard, employ the cast() function to convert the timestamp column back to a string after using to_timestamp().

# Using Cast to convert String to Timestamp

dataframe.withColumn('timestamp', \

         to_timestamp('input_timestamp').cast('string')) \

  .show(truncate=False)

Implementing Overlay() function in Databricks in PySpark

Output showing overlay_column

The "dataframe" value is created in which the data is defined—using the to_timestamp() function that is converting the String to Timestamp (TimestampType) in the PySpark. Using the cast() function, the conversion of string to timestamp occurs when the timestamp is not in the custom format and is first converted into the appropriate timestamp format.

Practice more Databricks Operations with ProjectPro! 

Converting strings to timestamp types is crucial for effective data processing in Databricks. However, theoretical knowledge alone is insufficient. To truly solidify your skills, it's essential to gain practical experience through real-world projects. ProjectPro offers a diverse repository of solved projects in data science and big data. By immersing yourself in hands-on exercises provided by ProjectPro, you can enhance your proficiency in Databricks operations and ensure a seamless transition from theory to practical application.

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Implementing Slow Changing Dimensions in a Data Warehouse using Hive and Spark
Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Azure Data Factory and Databricks End-to-End Project
Azure Data Factory and Databricks End-to-End Project to implement analytics on trip transaction data using Azure Services such as Data Factory, ADLS Gen2, and Databricks, with a focus on data transformation and pipeline resiliency.

Learn How to Implement SCD in Talend to Capture Data Changes
In this Talend Project, you will build an ETL pipeline in Talend to capture data changes using SCD techniques.

Movielens Dataset Analysis on Azure
Build a movie recommender system on Azure using Spark SQL to analyse the movielens dataset . Deploy Azure data factory, data pipelines and visualise the analysis.

Data Processing and Transformation in Hive using Azure VM
Hive Practice Example - Explore hive usage efficiently for data transformation and processing in this big data project using Azure VM.

Web Server Log Processing using Hadoop in Azure
In this big data project, you will use Hadoop, Flume, Spark and Hive to process the Web Server logs dataset to glean more insights on the log data.

SQL Project for Data Analysis using Oracle Database-Part 6
In this SQL project, you will learn the basics of data wrangling with SQL to perform operations on missing data, unwanted features and duplicated records.

Retail Analytics Project Example using Sqoop, HDFS, and Hive
This Project gives a detailed explanation of How Data Analytics can be used in the Retail Industry, using technologies like Sqoop, HDFS, and Hive.

Learn Real-Time Data Ingestion with Azure Purview
In this Microsoft Azure project, you will learn data ingestion and preparation for Azure Purview.

AWS CDK and IoT Core for Migrating IoT-Based Data to AWS
Learn how to use AWS CDK and various AWS services to replicate an On-Premise Data Center infrastructure by ingesting real-time IoT-based.