How to Convert PySpark String to Timestamp Type in Databricks?

This recipe will help you with know-how to convert a PySpark string to Timestamp in Databricks. | ProjectPro

Recipe Objective - How to  Convert String to Timestamp in PySpark?

The to_timestamp() function in Apache PySpark is popularly used to convert String to the Timestamp(i.e., Timestamp Type). The default format of the Timestamp is "MM-dd-yyyy HH:mm: ss.SSS," and if the input is not in the specified form, it returns Null. The "to_timestamp(timestamping: Column, format: String)" is the syntax of the Timestamp() function where the first argument specifies the input of the timestamp string that is the column of the dataframe. The Second argument specifies an additional String argument which further defines the format of the input Timestamp and helps in the casting of the String from any form to the Timestamp type in the PySpark.

System Requirements

  • Python (3.0 version)

  • Apache Spark (3.1.1 version)

Check out this recipe to understand the conversion from string to timestamp in Spark SQL. This recipe provides a step-by-step guide on how to convert a string to a timestamp in PySpark, covering essential concepts such as the to_timestamp() function and casting using the cast() function.

Before diving into the conversion process, ensure you have the necessary environment set up. This includes having Python (version 3.0 or above) and Apache Spark (version 3.1.1 or above) installed. 

Implementing the to_timestamp() functions in Databricks in PySpark

# Importing packages

from pyspark.sql import SparkSession

from pyspark.sql.functions import *

Importing packages for PySpark string to Datetime

The SparkSession and all packages are imported in the environment to perform conversion of PySpark string to timestamp function.

Initialize a Spark session using the SparkSession.builder method. This is the starting point for creating a DataFrame and performing PySpark operations.

# Implementing String to Timestamp in Databricks in PySpark

spark = SparkSession.builder \

          .appName('PySpark String to Timestamp') \

          .getOrCreate()

Create a sample DataFrame with a string column representing timestamps. This will be the basis for the subsequent conversion operations.

dataframe = spark.createDataFrame(

        data = [ ("1","2021-08-26 11:30:21.000")],

        schema=["id","input_timestamp"])

dataframe.printSchema()

Also Check - Explain the creation of Dataframes in PySpark in Databricks

Utilize the to_timestamp() function to convert the string column (input_timestamp) to a timestamp column (timestamp) within the DataFrame.

# Converting String to Timestamp

dataframe.withColumn("timestamp",to_timestamp("input_timestamp")) \

  .show(truncate=False)

In cases where the timestamp format is not standard, employ the cast() function to convert the timestamp column back to a string after using to_timestamp().

# Using Cast to convert String to Timestamp

dataframe.withColumn('timestamp', \

         to_timestamp('input_timestamp').cast('string')) \

  .show(truncate=False)

Implementing Overlay() function in Databricks in PySpark

Output showing overlay_column

The "dataframe" value is created in which the data is defined—using the to_timestamp() function that is converting the String to Timestamp (TimestampType) in the PySpark. Using the cast() function, the conversion of string to timestamp occurs when the timestamp is not in the custom format and is first converted into the appropriate timestamp format.

Practice more Databricks Operations with ProjectPro! 

Converting strings to timestamp types is crucial for effective data processing in Databricks. However, theoretical knowledge alone is insufficient. To truly solidify your skills, it's essential to gain practical experience through real-world projects. ProjectPro offers a diverse repository of solved projects in data science and big data. By immersing yourself in hands-on exercises provided by ProjectPro, you can enhance your proficiency in Databricks operations and ensure a seamless transition from theory to practical application.

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

PySpark Project to Learn Advanced DataFrame Concepts
In this PySpark Big Data Project, you will gain hands-on experience working with advanced functionalities of PySpark Dataframes and Performance Optimization.

GCP Project to Learn using BigQuery for Exploring Data
Learn using GCP BigQuery for exploring and preparing data for analysis and transformation of your datasets.

Azure Stream Analytics for Real-Time Cab Service Monitoring
Build an end-to-end stream processing pipeline using Azure Stream Analytics for real time cab service monitoring

SQL Project for Data Analysis using Oracle Database-Part 2
In this SQL Project for Data Analysis, you will learn to efficiently analyse data using JOINS and various other operations accessible through SQL in Oracle Database.

Build a big data pipeline with AWS Quicksight, Druid, and Hive
Use the dataset on aviation for analytics to simulate a complex real-world big data pipeline based on messaging with AWS Quicksight, Druid, NiFi, Kafka, and Hive.

Orchestrate Redshift ETL using AWS Glue and Step Functions
ETL Orchestration on AWS - Use AWS Glue and Step Functions to fetch source data and glean faster analytical insights on Amazon Redshift Cluster

AWS Project - Build an ETL Data Pipeline on AWS EMR Cluster
Build a fully working scalable, reliable and secure AWS EMR complex data pipeline from scratch that provides support for all data stages from data collection to data analysis and visualization.

Build an ETL Pipeline with DBT, Snowflake and Airflow
Data Engineering Project to Build an ETL pipeline using technologies like dbt, Snowflake, and Airflow, ensuring seamless data extraction, transformation, and loading, with efficient monitoring through Slack and email notifications via SNS

Build an ETL Pipeline for Financial Data Analytics on GCP-IaC
In this GCP Project, you will learn to build an ETL pipeline on Google Cloud Platform to maximize the efficiency of financial data analytics with GCP-IaC.

How to deal with slowly changing dimensions using snowflake?
Implement Slowly Changing Dimensions using Snowflake Method - Build Type 1 and Type 2 SCD in Snowflake using the Stream and Task Functionalities