HANDS-ON-LAB

Data analysis with Azure Synapse

Problem Statement

The Objective of Creating Data Analysis and Transformation with Azure Synapse Analytics Example

This hands-on process Data Analysis and Transformation with Azure Synapse Analytics Example code aims to create a Lambda function to cleanse YouTube statistics reference data and store it in an S3 bucket in CSV format. Additionally, the cleansed data should be exposed in the Glue catalog. 

The statistics reference data (the JSON files) is placed in the raw S3 bucket:

s3://<raw_bucket_name>/youtube/raw_statistics_reference_data/

 

Tasks

  1. Download the movie lens dataset and upload it to an Azure BLOB storage account within a container and appropriate folders.

  2. Create Synapse external tables for each dataset in the movie lens zip folders, naming the tables appropriately.

  3. Write SQL statements on the external tables to confirm dataset conformity and entity relationships within Synapse.

  4. Create a Synapse pipeline with a Synapse Dataflow, utilizing low code/no code features, to perform joins on the Movies, Rating, Links, and Tags datasets and load the resultant dataset into a Delta Table as an external table in Synapse.

  5. Use a Python Notebook to conduct an analysis on the Delta table, retrieving a list of movies with the highest ratings and the tags "funny" and "beautiful".

  6. Add the saved Python notebook into the previously created pipeline.

  7. Use Azure Data Studio and Azure Databricks to repeat the exercise, building the transformations using Delta Live Tables.

Transform your data analysis capabilities with Azure Synapse Analytics. Start the lab now and discover the power of cloud-based data engineering and analytics.

Learnings

  • Gain practical experience in working with Azure Synapse Analytics for data analysis and transformation tasks.

  • Understand the process of uploading and managing datasets in Azure BLOB storage and creating Synapse external tables.

  • Learn how to verify dataset conformity and establish entity relationships using SQL statements in Synapse.

  • Explore the capabilities of Synapse pipelines and Dataflows for loading and transforming data into Delta Tables.

  • Develop skills in using Python Notebooks for data analysis on Delta tables within Synapse.

  • Discover the use of Delta Live Tables in Azure Data Studio and Azure Databricks for efficient data transformations.

FAQs

Q1. How can I upload and manage datasets in Azure BLOB storage for data analysis?

You can upload and manage datasets in Azure BLOB storage by creating a container and appropriate folders, then using tools like Azure Storage Explorer or Azure Portal to upload and organize the data.

 

Q2. What are the advantages of using Delta Tables in Azure Synapse Analytics?

Delta Tables in Azure Synapse Analytics provide ACID transactions, schema enforcement, time travel, and optimized performance for big data workloads, making them ideal for data transformations and analysis.

 

Q3. How can I leverage Python Notebooks in Azure Synapse Analytics for data analysis?

In Azure Synapse Analytics, you can use Python Notebooks to perform data analysis on Delta tables. You can write and execute Python code to manipulate, transform, visualize, and derive insights from the data stored in Delta Tables.