Core Algorithms

Quorum-Based Reads and Writes: The N, R, W Model for Distributed Consistency

Master the quorum formula R + W greater than N for strong consistency. Understand Dynamo and Cassandra tuning, tradeoffs between latency and consistency, and sloppy quorum.

quorumconsistencyN-R-WDynamoCassandrasloppy-quorumhinted-handoff

Quorum-Based Reads and Writes

During a Black Friday incident, my team's read-after-write consistency guarantee collapsed. A customer would add an item to their cart, see it in the response, then refresh the page and find the cart empty. The root cause was subtle: we had N=3 replicas, W=1 for fast writes, and R=1 for fast reads. The write went to one replica, but the read hit a different replica that hadn't received the replication yet. The customer saw a phantom empty cart.

This is the fundamental tradeoff in distributed storage: you can't have fast writes, fast reads, and strong consistency all at once. The quorum model gives you a dial to control this tradeoff.

The Fundamental Tradeoff

In a distributed system with N replicas of each piece of data, you face a choice on every read and every write:

StrategyWrite LatencyRead LatencyConsistency Risk
Write to all, read from anySlow (must reach N)FastCan read stale data if write not yet replicated
Write to any, read from allFastSlow (must reach N)Read always gets latest from whoever wrote it
Write to some, read from someBalancedBalancedDepends on the overlap

The quorum model turns this tradeoff into a formula: R + W greater than N. If the sum of the read replica count (R) and write replica count (W) exceeds the total replica count (N), then every read will include at least one node that has the latest write.


The Quorum Model

Defining N, R, and W

ParameterMeaningTypical ValuesEffect
NTotal number of replicas that store the data3 (most common), 5 (higher durability)Higher N = more durability, higher storage cost, more write amplification
WNumber of replicas that must acknowledge a writeN/2 + 1 (quorum), N (all), 1 (any)Higher W = stronger write consistency, slower writes
RNumber of replicas that must respond to a readN/2 + 1 (quorum), N (all), 1 (any)Higher R = stronger read consistency, slower reads

The Core Formula: R + W greater than N

When R + W is greater than N, the read set and write set are guaranteed to intersect on at least one node. This means:

text
Given:
- N replicas
- W replicas acknowledged the latest write
- R replicas are consulted on the read

If R + W > N:
  The intersection of the R read replicas and the W write replicas
  contains at least one node with the latest data.
  => Strong consistency (no stale reads).

Let's verify this with examples:

Walk-Through: N=3, W=2, R=2 (Standard Quorum)

StepActionNode ANode BNode C
1Write "X=1"X=1 (ack)X=1 (ack)X=1 (no ack needed)
2Read "X"X=1X=1X=0 (stale)
3Read returnsBoth A and B have latest: X=1

The read consulted A and B, both of which have the latest value. Even though C is stale, it doesn't matter because the read quorum (R=2) didn't need C.

Walk-Through: N=3, W=1, R=1 (No Guarantee)

StepActionNode ANode BNode C
1Write "X=1"X=1 (ack)X=0X=0
2Read "X" from random nodeNode B returns X=0 (stale!)

The read happened to hit node B, which never received the write. Without quorum intersection, stale reads are possible.

Quorum Read Flow Diagram


Consistency Levels in Practice (Cassandra)

Apache Cassandra implements the N, R, W model with named consistency levels. This allows each operation to choose its own tradeoff.

Cassandra Consistency Levels for N=3

LevelR or WBehaviorLatencyConsistency
ONE1Fastest, ack from any replica~5msEventually consistent (stale possible)
TWO2Wait for 2 of 3 replicas~10msQuorum intersection with W=2
QUORUMceil(N/2)+1 = 2Wait for majority~10msStrong with W=QUORUM
THREE3Wait for all replicas~15msStrongest, lowest availability
ALL3Same as THREE~15msStrongest, lowest availability
LOCAL_QUORUM2 (same DC)Quorum in local datacenter~5msStrong within DC
EACH_QUORUM2 per DCQuorum in each datacenter~20msStrong across DCs
SERIAL3Uses Paxos for linearizability~25msLinearizable
LOCAL_SERIAL2 (same DC)Linearizable within local DC~15msLinearizable within DC

Consistency Combinations for N=3

Write LevelRead LevelR + WGuaranteeUse Case
ONEONE1 + 1 = 2 (less than 3)EventualLogging, analytics
ONEALL1 + 3 = 4 (greater than 3)StrongFast writes, high read cost
ALLONE3 + 1 = 4 (greater than 3)StrongFast reads, high write cost
QUORUMQUORUM2 + 2 = 4 (greater than 3)StrongBalanced: most common
ALLALL3 + 3 = 6 (greater than 3)StrongAuditor, needs latest

The most common production configuration for Cassandra is N=3 with QUORUM reads and QUORUM writes. This provides strong consistency with reasonable latency. Use ONE only for non-critical data where a small window of inconsistency is acceptable, such as analytics or logging.


Dynamo's Approach

Amazon's Dynamo paper (2007) defined the modern quorum-based approach to distributed storage. While Cassandra is often called "Dynamo-inspired," there are important differences in how they handle quorum.

Dynamo's Typical Configuration

ParameterDynamo DefaultPurpose
N3Three replicas per key
R2Read from two replicas
W2Write to two replicas
Hinted handoffEnabledHandle temporary node failures
Sloppy quorumEnabledAccept writes on any N nodes during partition

How Dynamo Processes a Write

Read-Time Reconciliation

Dynamo's read path is unusual: instead of returning the latest value from a single node, it returns ALL versions of a key that are not dominated by any other version (siblings):

text
Client reads key K with R=2:
  
  Node A returns: K=42, clock=[A=3, B=2, C=1]
  Node B returns: K=99, clock=[A=2, B=3, C=1]
  
  Coordinator compares clocks:
    [A=3, B=2, C=1] vs [A=2, B=3, C=1]
    Neither dominates the other (A: 3 > 2, B: 2 < 3)
    => These updates are CONCURRENT
    => Return BOTH versions to client
  
  Client must reconcile: e.g., merge values or pick one
  Client writes merged result with merged clock

This read-time reconciliation is what makes Dynamo's quorum model "eventually consistent with no data loss." Even if concurrent writes produce conflicting values, the system surfaces both to the client (or application logic) rather than silently dropping one.


Tradeoff Analysis

R=1, W=N: Read-Optimized

PropertyValue
Read latencyFastest (single replica)
Write latencySlowest (must reach all replicas)
Read consistencyGuaranteed latest (write is on all nodes)
Write availabilityLow (all replicas must be up)
Best forRead-heavy workloads, configuration data

R=N, W=1: Write-Optimized

PropertyValue
Write latencyFastest (single replica)
Read latencySlowest (must reach all replicas)
Read consistencyGuaranteed latest (reads from all, finds the writer)
Write availabilityHigh (any single replica can accept)
Best forWrite-heavy workloads, append-only logs

R=W=N/2+1: Balanced (Standard Quorum)

PropertyValue
Read latencyModerate (majority)
Write latencyModerate (majority)
Read consistencyGuaranteed latest (quorum intersection)
AvailabilityGood (tolerates N/2 - 1 failures)
Best forGeneral purpose, most applications

The Availability Tradeoff

ConfigurationTolerates Down NodesLatency
N=3, W=3, R=10 nodes downReads fast, writes slow
N=3, W=2, R=21 node downBalanced
N=3, W=1, R=32 nodes downWrites fast, reads slow
N=5, W=3, R=32 nodes downBalanced, higher durability
N=5, W=5, R=10 nodes downReads fast, writes slow
N=5, W=1, R=54 nodes downWrites fast, reads slow

Sloppy Quorum

Standard quorum requires writes and reads to reach specific replicas (the "preference list"). During a network partition, some of those replicas may be unreachable. Sloppy quorum relaxes this: the coordinator accepts writes from the first N healthy nodes in the preference list.

How Sloppy Quorum Works

AspectStrict QuorumSloppy Quorum
Write targetsFirst N replicas in preference listFirst N reachable replicas
Availability during partitionMay reject writesAccepts writes on healthy nodes
ConsistencyStrong (if R+W > N)Weaker (data on unexpected nodes)
RestorationAutomatic when nodes recoverRequires hinted handoff

Hinted Handoff

Hinted handoff is the mechanism that makes sloppy quorum eventually consistent:

StepAction
1Write to target replica C fails (node down)
2Coordinator writes to next healthy node D
3Coordinator stores a "hint" on D: this data belongs to C
4When C comes back online, D delivers the hinted write
5D removes the hint
⚠️

Sloppy quorum means your data may not be on the expected nodes during a partition. If you need to guarantee read-your-writes consistency across partitions, do NOT use sloppy quorum. Use strict quorum with R + W > N. The tradeoff is lower availability during partitions.


When Quorum Breaks: Network Partitions

Quorum works in theory, but network partitions create edge cases:

The Write-Read Gap Scenario

text
Time 0: Client writes X=5 with W=2 (acknowledged by A and B)
Time 1: Network partition separates {A, B} from {C}
Time 2: Client writes X=10 with W=2 (only {B, C} reachable)
         B has X=5 and X=10
         C only has X=10
         A only has X=5
Time 3: Client reads X with R=2
         If read hits {A, C}: A has X=5, C has X=10
         Neither dominates! => CONFLICT
ScenarioR + W > N?Result
Write to A,B; Read from A,BYesConsistent (both see X=10)
Write to A,B; Read from A,CYesNeed vector clocks to detect conflict
Write to B,C; Read from A,CYesNeed vector clocks to detect conflict
Write to A,B; Read from C,D (wrong partition)NoStale read possible

This is where vector clocks (from the previous tutorial) become essential. Quorum guarantees that R and W sets intersect, but intersection doesn't guarantee the latest value on the intersected node—an old version might coexist with a new one on the same node. Vector clocks resolve this by tracking causal history.

The CAP Connection

SystemNRWSloppy?CAP Profile
Dynamo (default)322YesAP (availability on partition)
Cassandra (QUORUM)322OptionalConfigurable
Cassandra (ALL)333NoCP (consistency on partition)
Spanner3+3+3+NoCP (external consistency)

Real-World Tuning Guidance

Selecting N

NUse CaseToleratesStorage Cost
1Non-critical, no durability needed0 failures1x
3Most applications1 failure (with quorum 2+2)3x
5Financial, critical data2 failures (with quorum 3+3)5x
7Extreme durability3 failures7x

Selecting R and W

Your PriorityRWNTradeoff
Fast reads1N3Slow writes, all replicas must be up
Fast writesN13Slow reads
BalancedN/2+1N/2+13Good all-around
Highest read consistencyNN/2+13Read always latest
Highest write consistencyN/2+1N3Write always authoritative

Production Patterns

PatternConfigurationWhy
User-facing readsR=QUORUM, W=QUORUMUsers notice stale data
Write-heavy ingestionR=ONE, W=LOCAL_QUORUMMinimize write latency
Analytics reportingR=ONE, W=ONEStale data is acceptable
Cross-datacenterR=LOCAL_QUORUM, W=EACH_QUORUMFast local reads, safe writes
Financial transactionsR=ALL, W=ALLNo compromise on consistency

Start with N=3, R=QUORUM, W=QUORUM. This is the most well-understood configuration and the hardest to get wrong. Only deviate after measuring your actual consistency and latency requirements. The vast majority of applications don't need to optimize beyond this configuration.


Key Takeaways

  • The quorum formula R + W > N guarantees read-write intersection for strong consistency
  • N defines durability (more replicas = higher fault tolerance but more write amplification)
  • R defines read consistency: R=1 is fastest, R=N is slowest but most consistent
  • W defines write consistency: W=1 is fastest, W=N is slowest but ensures all replicas are updated
  • The balanced configuration (N=3, R=W=2) is the most common production choice
  • Sloppy quorum trades strict consistency for availability during network partitions by writing to any N healthy nodes
  • Hinted handoff restores consistency after sloppy quorum by delivering writes to their intended replicas when they recover
  • R=1, W=N gives fast reads with guaranteed consistency (all replicas have the latest write)
  • R=N, W=1 gives fast writes but slow reads and weaker consistency guarantees
  • Vector clocks are essential for resolving conflicts when quorum reads discover concurrent updates across replicas
  • In production, choose your N, R, W based on your specific latency, consistency, and availability requirements—there is no one-size-fits-all configuration
  • The CAP theorem means you must choose: strict quorum (CP) or sloppy quorum (AP) during network partitions

This concludes the Core Algorithms phase. You now understand the foundational algorithms that power distributed systems: consistent hashing, vector clocks, cache eviction, distributed ID generation, and quorum-based reads and writes. These algorithms appear in every major distributed data store, caching system, and ID generation service you'll encounter.