Data Consistency

2023-05-23 6 min read

Eventual and strong consistency are a set of design strategies used to describe how data is persisted. They represent 2 sides of the spectrum, and they provide a unique set of benefits and drawbacks from one another.

Table of Contents

What is a consistency model?

Consistency models are used to describe how our data is being persisted and also replicated. You might be wondering, well if I send data to my database so that it can be persisted and queried later, what’s to talk about?

A lot of this boils down to the type of the system we are trying to build. And this is dependent on the constraints that our system needs to work within.

Consistency models provide a common language to frame the different solutions which can be used to satisfy these common constraints.

Strong consistency

Let’s imagine we are building a system in which we really care about consistency.

Strong consistency provides a guarantee that any read request sent to the database should return data associated with the latest write.

In other words. If we put data into our database. When we then query the database, we should that new data.

Seems logical enough right? Naturally, we’d expect this by default from our data storage solutions.

But like anything in life, there’s a trade-off to be made.

To be sure that our new data is available we have to wait a given period of time for our data storage to ‘catch up’. During this period of time, the data is locked to reads.

This ‘catch up’ time increases overall latency because of the time we spent being blocked/having to wait until our consistency model makes that data available to us.

strong_consistency_increased_latency

In this scenario, our read request would be blocked until the new data had propagated through. But we trade this increase in wait time for being certain that the data we get back is fresh.

Eventual consistency

At the other end of the spectrum there is the eventual consistency model.

Eventual consistency provides the guarantee that updates made will be applied. Eventually. This means that we are free to send our read requests whenever we want.

But this paradigm provides no guarantees that the data we receive is up-to-date.

eventual_consistency_stale_data

Think of this as the inverse of the strong consistency model.

In this case, our read request would be free to hit our database without restrictions. But we trade this reduced wait time for having to accept stale reads.

We can query for our data whenever we like. It will always be there for us. You could say that with eventual consistency, our data is highly available. And we can do this without having to pay the locking penalty that we would otherwise get with strong consistency.

Strong eventual consistency

Strong eventual consistency can only really be used in conflict-free replicated data types.

This means that the order of writes is not important. So long as all the writes are applied, then we can be certain that our reads will be consistent.

A typical example of this would be an integer counter based data type. If we had an update to increment that integer and another to decrement that same number, then the order in which they are applied is insignificant.

Consistency in distributed systems

In distributed systems we often provision databases across multiple replicas for scalability and availability. Primarily, due to requirements of having to handle larger amounts of traffic and the need to ensure that data can be retrieved quickly and at will.

This brings about the added problems of how we handle writes and updates being applied to our various database nodes.

When we have multiple nodes, we need to replicate our data across those nodes. Note that there are a whole world of problems associated with write-conflict resolution. And the intention of this post is not to address them!

But when we need to maintain our data across multiple nodes, we have decisions to make regarding how we want our data to be applied and queried.

This will often be driven by how our software is being used and what our users expect from it.

CAP Theorem

The CAP theorem is a paradigm which states that in distributed systems, one can only achieve 2 of the 3 characteristics:

Consistency - All nodes respond to clients with the same data, regardless of which node handles the request.
Availability - The system is able to respond to requests at any given time, regardless of whether multiple nodes are down.
Partition tolerance - The system can still respond successfully to requests, regardless of messages between nodes fail.

In distributed systems, we have to expect nodes to fail. So it stands to reason that partition tolerance becomes a necessity.

So we are often left with the choice between consistency and availability.

Strong consistency in distributed systems

If we were to apply a strong consistency model, we would require all nodes to agree on a piece of data before that data can be returned.

Which can introduce significant latency to our system. But this would mean every node in our system would have the same view of the data.

In CAP theorem terms, we take consistency and partition tolerance, whilst sacrificing availability.

Eventual consistency in distributed systems

If we were to apply an eventual consistency model, we would require all nodes to converge on a given piece of data.

So our nodes can have different views of the data. In turn, we would not require that to happen before that data can be returned to clients.

This can reduce latency in our system when compared to the strong consistency data model. But this would mean there will be periods of time in which data across our nodes can be varying states.

Once again in CAP theorem terms, we take availability and partition tolerance, whilst sacrificing consistency.

Application development

For most of us, when we develop at the application level, it is more natural and convenient to assume strong consistency.

In other words, if we write or update a record in our database, then any subsequent read for that record will return that new data. This is generally how we write logic against our databases.

Using an eventual consistency model, would force us to reframe how we think of and interact with our data storage.

This of course, imposes additional cognitive load on the application developer. And will most likely increases the risk of bugs.

Ideally, we would like to be in a position where this is abstracted away from the application and towards the data storage solution instead.

So it can be said that anything other than strong consistency will bring with it the sort of baggage that application developers may not be too thankful for!

Summary

Eventual and strong consistency offer a distinct set of characteristics which may be useful to you depending on the context if the system you are building.

In general terms, eventual consistency is useful for scenarios in which having the latest data is not critical. Strong consistency is easier to reason about and can lead to less brittle software when compared to eventual consistency.