Explain the features of Amazon Redshift

In this recipe, we will learn about Amazon RedShift. We will also learn about the features of Amazon RedShift.

Recipe Objective - Explain the features of Amazon Redshift?

The Amazon Redshift is widely used and is defined as a data warehouse product that forms part of the larger cloud-computing platform Amazon Web Services, red being an allusion to Oracle, whose corporate colour is red and is informally referred to as "Big Red. Amazon Redshift is built on top of technology from the massive parallel processing (MPP) data warehouse company ParAccel which was later acquired by Actian for handling large scale data sets and database migrations. Amazon Redshift differs from Amazon's other hosted database offering, Amazon RDS, in its ability to handle analytic workloads on big data sets stored by the column-oriented DBMS principle. Further, Amazon Redshift allows up to 16 petabytes of data on a cluster compared to Amazon RDS's maximum database size of 16TB. The Amazon Redshift is based on an older version of PostgreSQL (version 8.0.2), and Redshift has made changes to that version. An initial preview beta was released in November 2012 and further a full release was made available on February 15, 2013. The service is allowed to handle connections from most other applications using ODBC and JDBC connections. Amazon Redshift has the largest Cloud data warehouse deployments, with more than 6,500 deployments as per the Cloud Data Warehouse report published by Forrester in Q4 2018. Amazon Redshift uses parallel processing and compression to decrease command execution time and this allows the Redshift to perform operations on billions of rows at once. This also makes Redshift useful for storing and analyzing large quantities of data from logs or live feeds through a source such as Amazon Kinesis Data Firehose.

Benefits of Amazon Redshift

  • The Amazon Redshift can be used to set up SSL to secure data in transit, and hardware-accelerated AES-256 encryption for data at rest If the user chooses to enable encryption of data at rest, all data are written to disk will be encrypted as well as any backups thus Amazon Redshift takes care of key management by default and provides End-to-end encryption. Amazon Redshift lets you configure firewall rules to control network access to user's data warehouse cluster and users can run Amazon Redshift inside Amazon Virtual Private Cloud (VPC) to isolate the data warehouse cluster in their virtual network and connect it to the existing IT infrastructure using an industry-standard encrypted IPsec VPN and thus provides Network isolation. Amazon Redshift integrates with AWS CloudTrail to enable users to audit all the Redshift API calls and it logs all SQL operations, including connection attempts, queries, and changes to the users' data warehouse. Users can access these logs using SQL queries against system tables, or save the logs to a secure location in Amazon S3. Amazon Redshift is compliant with SOC1, SOC2, SOC3, and PCI DSS Level 1 requirements and thereby providing Audit and compliance.

System Requirements

  • Any Operating System(Mac, Windows, Linux)

This recipe explains Amazon Redshift and the Features of Amazon Redshift.

Features of Amazon Redshift

    • It enables Partner console integration

Amazon Redshift enables acceleration of data onboarding and creating valuable business insights in minutes by integrating with select Partner solutions in the Amazon Redshift console So, with these solutions users can bring data from applications such as Salesforce, Google Analytics, Facebook Ads, Slack, Jira, Splunk, and Marketo into their Redshift data warehouse in an efficient and streamlined way. Amazon Redshift also lets users join these disparate datasets and analyze them together to produce actionable insights

    • It enables Data Sharing

Amazon Redshift data sharing allows users to extend the ease of use, performance, and cost benefits of Amazon Redshift in a single cluster to multi-cluster deployments while being able to share data. Data sharing enables instant, granular, and fast data access across Redshift clusters without the need to copy or move it. Also, data sharing provides live access to data so users always see the most current and consistent information as it’s updated in the data warehouse.

    • It provides Redshift ML

Amazon Redshift ML makes it easy for data analysts, data scientists, BI professionals, and developers to create, train, and deploy Amazon SageMaker models using SQL. Using the Redshift ML, users can use SQL statements to create and train Amazon SageMaker models on the data in Amazon Redshift and then use those models for predictions such as churn detection, financial forecasting, personalization, and risk scoring directly in the queries and reports.

    • It provides efficient storage and high-performance query processing

Amazon Redshift delivers fast query performance on datasets ranging in size from gigabytes to petabytes. Columnar storage, data compression, and zone maps reduce the amount of I/O needed to perform queries along with the industry-standard encodings such as the LZO and Zstandard, Amazon Redshift also offers the purpose-built compression encoding, AZ64, for numeric and date/time types to provide both storage savings and optimized query performance.

    • It provides Limitless concurrency

Amazon Redshift provides consistently fast performance even with thousands of concurrent queries, whether they query data in the user's Redshift data warehouse or directly in the Amazon S3 data lake. Amazon Redshift Concurrency Scaling supports virtually unlimited concurrent users and concurrent queries with consistent service levels by adding the transient capacity in seconds as concurrency increases.

    • It provides Materialized views

Amazon Redshift materialized views allow users to achieve significantly faster query performance for the iterative or predictable analytical workloads such as dashboarding and queries from Business Intelligence (BI) tools, and extract, transform and load (ELT) data processing jobs. Users can further use the materialized views to easily store and manage pre-computed results of a SELECT statement that may reference one or more tables, including external tables. Also, subsequent queries referencing the materialized views can run much faster by reusing the precomputed results. Amazon Redshift can efficiently maintain the materialized views incrementally to continue to provide the low latency performance benefits.

    • It uses Result caching

Amazon Redshift uses result caching to deliver the sub-second response times for repeat queries. Dashboard, visualization, and business intelligence tools that run repeat queries experience a significant performance boost. So, when a query runs, Amazon Redshift searches the cache to see if there is the cached result from a prior run and if a cached result is found, then the data has not changed, the cached result is returned immediately instead of re-running the query.

    • It provides predictable cost, even with the unpredictable workloads

Amazon Redshift allows users to scale with minimal cost impact, as each cluster earns up to one hour of free Concurrency Scaling credits per day. So, these free credits are sufficient for the concurrency needs of 97% of customers. This provides users with predictability in the month-to-month cost, even during periods of fluctuating analytical demand.

Download Materials

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Explore features of Spark SQL in practice on Spark 2.0
The goal of this spark project for students is to explore the features of Spark SQL in practice on the latest version of Spark i.e. Spark 2.0.

Azure Data Factory and Databricks End-to-End Project
Azure Data Factory and Databricks End-to-End Project to implement analytics on trip transaction data using Azure Services such as Data Factory, ADLS Gen2, and Databricks, with a focus on data transformation and pipeline resiliency.

Build an Analytical Platform for eCommerce using AWS Services
In this AWS Big Data Project, you will use an eCommerce dataset to simulate the logs of user purchases, product views, cart history, and the user’s journey to build batch and real-time pipelines.

SQL Project for Data Analysis using Oracle Database-Part 1
In this SQL Project for Data Analysis, you will learn to efficiently leverage various analytical features and functions accessible through SQL in Oracle Database

SQL Project for Data Analysis using Oracle Database-Part 2
In this SQL Project for Data Analysis, you will learn to efficiently analyse data using JOINS and various other operations accessible through SQL in Oracle Database.

Learn Data Processing with Spark SQL using Scala on AWS
In this AWS Spark SQL project, you will analyze the Movies and Ratings Dataset using RDD and Spark SQL to get hands-on experience on the fundamentals of Scala programming language.

Build an Incremental ETL Pipeline with AWS CDK
Learn how to build an Incremental ETL Pipeline with AWS CDK using Cryptocurrency data

Build a real-time Streaming Data Pipeline using Flink and Kinesis
In this big data project on AWS, you will learn how to run an Apache Flink Python application for a real-time streaming platform using Amazon Kinesis.

AWS Project - Build an ETL Data Pipeline on AWS EMR Cluster
Build a fully working scalable, reliable and secure AWS EMR complex data pipeline from scratch that provides support for all data stages from data collection to data analysis and visualization.

Web Server Log Processing using Hadoop in Azure
In this big data project, you will use Hadoop, Flume, Spark and Hive to process the Web Server logs dataset to glean more insights on the log data.