What are consistency model for modern DBs offered by AWS

This recipe explains consistency model for modern DBs offered by AWS

What are consistency model for modern DBs offered by AWS?

Database consistency is defined by a set of values that all data points within the database system must align to in order to be read and accepted properly. If any data enters the database that does not match the preconditioned values, the dataset will experience consistency errors. By establishing rules, database consistency can be achieved. Any data transaction written to the database must only change affected data as defined by the specific constraints, triggers, variables, cascades, and so on established by the database's developer's rules.

Assume you work for the National Transportation Safety Institute (NTSI). You've been assigned the task of compiling a database of new California driver's licenses. California's population has exploded in the last ten years, necessitating a new alphabet and numerical format for all first-time driver's license holders. Your team has determined that the new set value in your database for a California driver's license is as follows: 1 Alphabetic + 7 Numeric This rule is now mandatory for all entries. An entry with the string "C08846024" would result in an error. Why? Because the entered value was 1 Alpha + 8 Numeric, which is essentially inconsistent data.

Learn to Transform your data pipeline with Azure Data Factory!

Consistency also implies that any data changes to a single object in one table must be reflected in all other tables where that object appears. Continuing with the driver's license example, if the new driver's home address changes, that change must be reflected in all tables where that prior address existed. If one table has the old address and the others have the new address, this is an example of data inconsistency.

Data-Centric Consistency Models

Tanenbaum and Maarten Van Steen, two experts in this field, define the consistency model as a contract between software (processes) and memory implementation (data store). This model ensures that if the software follows certain rules, the memory will function properly. Because defining the last operation writes in a system without a global clock is difficult, some constraints should be placed on the values that can be returned by a read operation.

Client-Centric Consistency Models

The emphasis in a client-centric consistency model is on how data is perceived by clients. If data replication is not complete, data may differ from client to client. Because faster data access is the primary goal, we may choose a less-strict consistency model, such as eventual consistency.

Eventual Consistency

In this approach, the system ensures that if no new updates are made to a specific piece of data, all reads to that item will eventually return the most recently updated value. The update messages are sent to all other replicas by the updated replicas. In these states, different replicas may return different values when queried, but all replicas will eventually receive the update and be consistent. This model is appropriate for applications with hundreds of thousands of concurrent reads and writes per second, such as Twitter updates, Instagram photo uploads, Facebook status pages, messaging systems, and so on, where data integrity is not a primary concern.

Read-Your-Write Consistency

RYW (Read-Your-Writes) consistency is achieved when the system guarantees that any attempt to read a record after it has been updated will return the updated value. RDBMS typically provides read-write consistency.

Read-after-Write Consistency

RAW consistency is more stringent than eventual consistency. All clients will see a newly inserted data item or record right away. Please keep in mind that it only applies to new data. This model does not take into account updates or deletions.

Amazon S3 Consistency Models

In all regions, Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket, as well as eventual consistency for overwrite PUTS and DELETES. As a result, if you add a new object to your bucket, both you and your clients will notice it. However, if you overwrite an object, it may take some time to update its replicas, which is why the eventual consistency model is used. Amazon S3 ensures high availability by replicating data across multiple servers and availability zones. When a new record is added, or a record/data is updated and deleted, it is obvious that data integrity must be maintained. The following are the scenarios for the aforementioned cases:

• A new PUT request is submitted. If the object is queried immediately, it may not appear in the list until the changes are propagated to all servers and AZs. The read-after-write consistency model is used in this case.

• An UPDATE request is submitted. Because the eventual consistency model is used for UPDATEs, a query to list the object may return an outdated value.

• A DELETE request is issued. Due to the use of the eventual consistency model for DELETES, a query to list or read the object may return the deleted object.

Amazon DynamoDB Consistency Models

Amazon DynamoDB is a popular NoSQL service provided by AWS. NoSQL storage is designed to be distributed. Amazon DynamoDB stores three geographically distributed replicas of each table to ensure high availability and data durability. In DynamoDB, a write operation follows eventual consistency. A DyanamoDB table read operation (GetItem, BatchGetItem, Query, or Scan operation) is an eventual consistent read by default. However, for the most recent data, you can configure a strong consistent read request. It is worth noting that a strong consistent read operation consumes twice as many read units as a subsequent consistent read request. In general, it is recommended to use eventual consistent read because DynamoDB's change propagation is very fast (DynamoDB uses SSDs for low-latency) and you will get the same result for half the price of a strong read consistent request.

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

GCP Data Ingestion with SQL using Google Cloud Dataflow
In this GCP Project, you will learn to build a data processing pipeline With Apache Beam, Dataflow & BigQuery on GCP using Yelp Dataset.

SQL Project for Data Analysis using Oracle Database-Part 1
In this SQL Project for Data Analysis, you will learn to efficiently leverage various analytical features and functions accessible through SQL in Oracle Database

Big Data Project for Solving Small File Problem in Hadoop Spark
This big data project focuses on solving the small file problem to optimize data processing efficiency by leveraging Apache Hadoop and Spark within AWS EMR by implementing and demonstrating effective techniques for handling large numbers of small files.

COVID-19 Data Analysis Project using Python and AWS Stack
COVID-19 Data Analysis Project using Python and AWS to build an automated data pipeline that processes COVID-19 data from Johns Hopkins University and generates interactive dashboards to provide insights into the pandemic for public health officials, researchers, and the general public.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

GCP Project to Explore Cloud Functions using Python Part 1
In this project we will explore the Cloud Services of GCP such as Cloud Storage, Cloud Engine and PubSub

Project-Driven Approach to PySpark Partitioning Best Practices
In this Big Data Project, you will learn to implement PySpark Partitioning Best Practices.

Hadoop Project to Perform Hive Analytics using SQL and Scala
In this hadoop project, learn about the features in Hive that allow us to perform analytical queries over large datasets.

Learn Data Processing with Spark SQL using Scala on AWS
In this AWS Spark SQL project, you will analyze the Movies and Ratings Dataset using RDD and Spark SQL to get hands-on experience on the fundamentals of Scala programming language.

EMR Serverless Example to Build a Search Engine for COVID19
In this AWS Project, create a search engine using the BM25 TF-IDF Algorithm that uses EMR Serverless for ad-hoc processing of a large amount of unstructured textual data.