Most Systems Get Consistency Wrong

DynamoDB Shows How to Do It at Scale

May 31, 2025

In the wild, most systems get consistency wrong. They build a backend that works great until two components try to write to the same record at the same time.

Then everything breaks.

You’ve seen it:

Double charges in payment systems.
Lost edits in collaborative tools.
Inventory systems that sell stock that’s no longer there.

These aren’t edge cases. They’re the reality of modern software and will break your business if you ignore them.

Want to build real-world voice automation with serious engineering behind it?
Telnyx Voice AI gives you full-stack tools to launch voice assistants that feel fast, human, and production-ready.
Build AI-powered call flows with real-time transcription, webhook logic, and natural language understanding.
Bring your own model or use native tools — no vendor lock-in, full control.
Scale instantly with built-in global telephony, low-latency infrastructure, and seamless fallback to human agents.
Thank you to Telnyx for sponsoring this newsletter and helping us keep it free!

Start Building Today!

What’s at Stake?

If you don't handle consistency properly:

You lose data,
You break trust,
And you create a system that’s impossible to debug.

But consistency is hard, especially at scale. You're working with multiple nodes and services, often across regions. Distributed writes mean several systems may try to update the same data at the same time. Throw in unreliable networks, retries, and partial failures, and suddenly, maintaining a single source of truth becomes a serious challenge.

Despite the complexity, there are three ways to get consistency right:

1. Write-Ahead Logging (WAL)

Used by: PostgreSQL, MySQL, etc.

How it works:

If the system crashes in the middle, you replay the log. That gets you back to a consistent state.

✅ Durable and safe.
❌ Doesn’t handle concurrent write conflicts.

2. Locking (Pessimistic Concurrency)

Used by: Traditional RDBMS

How it works:

✅ Prevents conflicting updates.

❌ Doesn’t scale. Blocks fast paths. Poor fit for distributed systems.

3. Data Versioning (Optimistic Concurrency)

Used by: DynamoDB, Cassandra, Event Sourcing

How it works:

No locks. Just version checks.

✅ Scales well. High throughput.
❌ You must handle retries and merge conflicts.

Case Study: How DynamoDB Gets Consistency Right

Let’s say you're building a diagram collaborative platform where teams co-edit architecture diagrams in real time.

Two users, A and B, open the same diagram. User A moves a component and writes it to the database (version 1). At the same time, User B renames that component and also tries to write (still thinking it's version 1).

Without coordination, whichever writer hits the database last will silently overwrite the other, resulting in either the position change or the rename being lost.

DynamoDB avoids this with Conditional Writes:

**ConditionExpression** — a clause in DynamoDB that ensures an update only occurs if a specified condition (like a matching version) is true: "version = :expectedVersion"

Key DynamoDB Features Used

ConditionExpression
Enforces that writes only succeed if the item’s current version matches the client’s expected version.
Atomic Counters
The version attribute acts as an atomic counter, ensuring sequential updates.
Optimistic Locking
Clients assume no conflicts by default but handle them gracefully when they occur.

Why This Works

Prevents Silent Overwrites: Only one writer succeeds per version, eliminating "last write wins" chaos.
Merge Flexibility: Clients can implement domain-specific merge logic (e.g., Operational Transform for diagrams).
Scalability: No centralized locking, making it suitable for globally distributed teams.

Trade-offs in Practice

Tools & Frameworks

Write-Ahead Logging: Native in PostgreSQL, MySQL, and SQLite
Locking: Handled via transaction isolation levels (e.g., SERIALIZABLE) in most RDBMS
Versioning:
- AWS SDKs for DynamoDB support Conditional Writes out of the box
- Event Sourcing frameworks like Axon (Java), Eventuate (Java), or Marten (C#) enable version-based control

Wrapping up

There are only 3 ways to ensure consistency: WAL, Locking, and Versioning.
DynamoDB scales by choosing versioning and pushing retry logic to clients.
Versioning is ideal for distributed, high-throughput systems, but comes with trade-offs.
If you're building a system that involves concurrent writes, don't rely on duct tape and blind hope. Rely on versions.
Consistency isn’t magic; it’s engineering.

Most teams overengineer consistency or overlook it until it breaks. DynamoDB shows that with the right model, you can keep both scale and safety.

Your Move: How does your system manage consistency today? Locking? Logging? Or versioning?

Shoot me a reply or share your approach; I’m curious.

System Design Classroom is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Articles I enjoyed this week

Thank you for reading System Design Classroom. If you like this post, share it with your friends!

Petar Ivanov

Amazing read, Raul! Love the mentioned trade-offs. It's important to know the pros and cons of each approach and choose the one that best suits your current needs.

And thank you for mentioning one of my latest articles!

Expand full comment

Saurabh Dashora

Solid post brother.

Consistency is often overlooked unless problems arise in the most critical times. Great tips!

Also, thanks for the mention, Raul!

System Design Classroom

Most Systems Get Consistency Wrong

DynamoDB Shows How to Do It at Scale

What’s at Stake?

1. Write-Ahead Logging (WAL)

2. Locking (Pessimistic Concurrency)

3. Data Versioning (Optimistic Concurrency)

Case Study: How DynamoDB Gets Consistency Right

Key DynamoDB Features Used

Why This Works

Trade-offs in Practice

Tools & Frameworks

Wrapping up

Articles I enjoyed this week

Discussion about this post