Table Of Contents

Back to Labs Content

  • System Design
  • CAP Theorem
  • Distributed Systems

System Design Simplified: The Trade-Off Triangle You Must Master

Tuesday, May 13, 2025 at 9:58:48 AM GMT+8


When I first started learning system design, it felt like solving a puzzle with no clear answer. Every architecture decision seemed to open new problems. Should I use this database? Should I prioritize performance or reliability?

What finally helped me break through the noise was understanding trade-offs — especially through something called the CAP Theorem.

Once I got this concept, system design stopped feeling like guesswork and started making sense.

What Is the CAP Theorem?

CAP stands for:

- Consistency: Every read reflects the most recent write.

- Availability: Every request gets a response — it may not be the latest data, but it won’t fail.

- Partition Tolerance: The system continues to operate despite network failures or delays between nodes.

The theorem, proposed by Eric Brewer, states that in the presence of a network partition, a distributed system must choose between consistency or availability — you can’t guarantee all three.

The Core Trade-Off: You Can Only Pick Two (When Things Go Wrong)

Let’s imagine a network partition occurs — a common scenario in distributed systems (think of a temporary server outage or a flaky connection between data centers). In that moment, your system faces a decision:

1. Do you ensure all users see accurate, up-to-date data? (Consistency)

2. Or do you keep serving responses, even if they might be a bit stale? (Availability)

You can’t have both if you want to survive a partition. So you have to pick your trade-off based on what matters more in your use case.

Deep Dive into Trade-Offs with Real Examples

Let’s explore the trade-offs with practical applications so they stick.

1. CP: Consistency + Partition Tolerance (but lose Availability)

You choose accuracy over uptime. If the system detects a partition, some parts may refuse to respond until they can ensure up-to-date data.

Good for: Systems where correctness matters more than speed.

Examples:

- Banking Systems (e.g., traditional SQL databases, HBase): You can’t risk showing a balance that’s wrong. If there's a partition, it's better to delay the transaction than give someone incorrect account info.

- Reservation Systems (e.g., booking platforms): You don’t want to sell the same hotel room to two people.

Trade-off in action: The system might block some reads/writes until it can confirm data integrity.

2. AP: Availability + Partition Tolerance (but lose Consistency)

You prioritize keeping the system responsive — even if that means some data may be out of sync temporarily.

Good for: Systems where eventual consistency is acceptable.

Examples:

- Amazon DynamoDB: Built to ensure your shopping cart always works, even during high traffic or minor outages. If an item is added on one node and not yet replicated, it still responds and later syncs.

- DNS Systems: When a server is unreachable, others take over, possibly returning slightly outdated records.

- Social Media Feeds: You don’t need to see the absolute latest tweet or like in real time. You just want the feed to load quickly.

Trade-off in action: You might see an old version of data for a few seconds or minutes, but you’ll always get a response.

3. CA: Consistency + Availability (but lose Partition Tolerance)

Technically, this isn’t feasible in distributed systems at scale. The only way to guarantee both consistency and availability is to assume the network never fails — which is unrealistic.

Good for: Single-node systems or tightly coupled components within the same data center.

Examples:

- Relational Databases like PostgreSQL or MySQL (in standalone mode): Perfectly consistent and available as long as there's no network partition — but once you go distributed, this trade-off falls apart.

Designing Systems with CAP in Mind

Here’s where it gets interesting: CAP is not about picking one model and sticking with it forever.

Modern architectures often blend approaches or allow for configurable trade-offs.

Examples:

- MongoDB allows you to choose between strong and eventual consistency, depending on your write concern and read preference settings.

- Cassandra (AP) sacrifices consistency by default but can be tuned for “quorum” reads and writes to lean toward CP if needed.

- Google Spanner (CP) makes use of atomic clocks to appear consistent and highly available, but this is an edge case made possible by unique hardware infrastructure.

How to Decide What to Prioritize?

Ask yourself:

1. Is it worse for the user to see slightly outdated data or to see an error?

2. Does your app need real-time accuracy or can it catch up later?

3. What kind of failure is more acceptable — a slow service or an inconsistent one?

Summary: CAP Is About Making Informed Trade-Offs

System design stopped being “hard” when I stopped chasing perfection and started asking better questions:

What matters more — correctness, uptime, or fault tolerance?

CAP Theorem doesn’t give you a rulebook, but it gives you a mental model. It reminds you Distributed systems are built on choices. And when you understand those choices, you're no longer guessing — you're designing.


Another Recommended Labs Content

System DesignDistributed SystemsMicroservicesBulkhead

How to Stop Microservices Failures from Spreading with the Bulkhead Pattern

Microservices are awesome for building apps that scale and evolve quickly. But as your system grows, a small problem in one service can snowball into a disaster, taking down your entire application. This is called a cascading failure, and it’s a big challenge in microservices. The Bulkhead Pattern is a smart way to prevent this by isolating parts of your system so one failure doesn’t sink everything else.

Kubernetes, a popular tool, helps manage these pieces by running them in containers and ensuring they’re available and scalable. But as your application grows, Kubernetes alone can’t handle everything. Let’s break this down and see how the sidecar pattern in a service mesh comes to the rescue, making your system easier to monitor, secure, and manage.

MicroservicesDistributed SystemsSoftware Architecture

Saga Pattern in Microservices Architecture

In modern software architecture, microservices have become the go-to approach for building scalable, maintainable, and independently deployable applications. However, with great modularity comes great complexity—especially when it comes to managing data consistency across services.