Distributed Caching & CDNs

Design multi-layer cache hierarchies from browser to database. Learn CDN strategies, cache coherence, and geo-distributed caching patterns used by the world's largest systems.

5 min read

Distributed Caching & CDNs

Every high-traffic system runs on caches. Without caching, even the most powerful database would crumble under the weight of a million users hitting it directly. As a staff engineer who has built systems serving billions of requests, I can tell you that caching is not an optimization—it's a fundamental architectural decision you make on day one.

Let me teach you how to think about caching systematically, from the browser to the database.

The Cache Hierarchy

Real systems don't have one cache—they have layers. Each layer serves a different purpose and has different characteristics.

Each layer trades off:

Proximity to user (lower latency, higher hit rate)
Storage capacity (more proximity = less capacity)
Control granularity (who manages invalidation)

Layer 1: Browser Caching

The browser is your first line of defense. When a user revisits your site, zero network requests should be needed for static assets.

Cache-Control Headers

The HTTP caching model is controlled by the Cache-Control header:

Directive	Meaning
`public`	Can be cached by any cache (CDN, proxy, browser)
`private`	Only browser can cache (contains user-specific data)
`no-cache`	Must revalidate with server before using cached copy
`no-store`	Never cache (sensitive data)
`max-age=N`	Cache for N seconds
`s-maxage=N`	CDN-specific max-age (browser ignores)
`immutable`	Content won't change—never revalidate

ETag and Last-Modified

For dynamic content, use conditional requests. The server includes an ETag or Last-Modified timestamp in the response. On subsequent requests, the browser sends If-None-Match or If-Modified-Since. If the resource hasn't changed, the server responds with 304 Not Modified, saving bandwidth.

Approach	When to Use
ETag	Strong validator, changes when content changes
Last-Modified	Coarser, based on modification time
304 Not Modified	Server confirms resource unchanged

Service Workers

For offline-first applications and advanced caching strategies, service workers act as a programmable proxy between the browser and the network.

Pattern	Description
Cache First	Check cache before network
Network First	Try network, fall back to cache
Stale While Revalidate	Serve stale cache, update in background
Cache Only	Only serve from cache

Layer 2: CDN (Content Delivery Network)

CDNs are geographically distributed servers that cache content close to users. When a user in Tokyo requests a file, they get it from a Tokyo PoP (Point of Presence), not your Oregon data center.

Push vs Pull CDN

Strategy	How it works	Best for
Pull CDN (most common)	CDN fetches from origin on first request (lazy)	Dynamic/static content, easy setup
Push CDN	You upload content to CDN proactively	Live streaming, predictable traffic, exact control

CDN Cache Invalidation

The hardest part of CDNs: how do you invalidate stale content?

Three invalidation strategies:

URL-based purge: Purge specific URLs
Tag-based purge (Surrogate Keys): Tag content and purge by tag
TTL-based: Set appropriate TTLs and wait (simplest, slowest)

Cache Warming

For high-traffic events, don't let cache misses hit your origin. Warm caches proactively by fetching critical paths before users request them.

Layer 3: Application Cache (Redis/Memcached)

This is where you have the most control. Application caches store computed results, session data, and frequently-accessed database records.

Cache-Aside Pattern (Lazy Loading)

The most common pattern. Check cache first, fall back to database:

Step	Action
1	Check cache for key
2	If hit, return cached value
3	If miss, query database
4	Store result in cache
5	Return result

⚠️

Cache-aside vs Read-through: Cache-aside is explicit (your code manages caching). Read-through hides caching behind the cache layer (cache fetches from DB on miss). Cache-aside gives you more control; read-through is simpler but less flexible.

Write Patterns

Pattern	Description	When to Use
Write-Through	Update cache and database atomically	Data changes frequently but read often
Write-Behind	Write to cache, acknowledge immediately, persist async	Data loss acceptable (logs, analytics)
Cache-Aside	Update DB, invalidate cache	General purpose

🚨

Be careful with write-behind: If the cache dies before the DB write, you lose data. Use this only when data loss is acceptable.

Cache Stampede (Thundering Herd)

When a popular cache entry expires, thousands of requests hit the database simultaneously.

Prevention strategies:

Strategy	How It Works
Probabilistic early expiration	Refresh before TTL expires based on probability
Distributed lock	Only one request refreshes, others wait
Stale-while-revalidate	Serve stale data while refreshing in background

Distributed Cache Architectures

As traffic grows, a single cache instance won't cut it. You need distributed caching.

Redis Cluster

Redis Cluster shards data automatically across nodes:

Key features:

Automatic sharding (16384 hash slots)
Replication for high availability
Read scaling with replicas
Automatic failover

Consistent Hashing for Cache Nodes

When adding/removing cache nodes, consistent hashing minimizes key remapping:

Virtual nodes ensure even load distribution when nodes have different capacities or when nodes join/leave. With 150 virtual nodes per physical node, adding a new node only remaps ~1/4 of keys.

Cache Coherence

In distributed caches, keeping multiple replicas consistent is hard. You have several strategies:

Write Strategies

Strategy	How It Works	Trade-off
Write-Through All	Write to all replicas simultaneously	Strong consistency, high latency
Write-Invalidate	Write to primary, invalidate replicas	Lower write latency, replicas repopulate on read
Eventual Consistency	Accept temporary inconsistency	Highest performance, may serve stale data

Read Strategies

Strategy	How It Works	Trade-off
Read from Primary	Always read from primary	Consistent, higher latency
Read from Replica	Always read from nearest replica	Lower latency, may return stale data
Read-Repair	Read from replica, repair if stale	Background consistency

Geo-Distributed Caching

Global systems need caches in multiple regions. The challenge: keeping them coherent while minimizing latency.

Active-Active Caching

Read from local cache, write to nearest region, replicate asynchronously to other regions.

Region	Read Latency	Write Strategy
US East	~5ms	Write local + async to others
EU West	~80ms	Write local + async to others
APAC	~150ms	Write local + async to others

Edge-Side Caching with Vary

Use Vary headers to cache different versions of the same URL:

Header	Creates Cache Per
`Accept-Language`	Language version
`Accept-Encoding`	Compression type
`Cookie`	User-specific (use carefully!)

Cache Monitoring & Metrics

You can't optimize what you don't measure.

Hit Rate Targets

Cache Layer	Target Hit Rate	Typical Size
Browser	30-50% of requests	User's device
CDN	90%+ for static	TB per PoP
Application	80-95% for dynamic	10-100 GB
Database	90%+ (buffer pool)	GB (memory)

Key Metrics to Track

Metric	What It Tells You
Hit rate	Cache effectiveness
Miss rate	How often you hit origin
Latency	Cache response time
Eviction rate	Memory pressure
TTL distribution	How fresh is your data

✅

If your cache hit rate drops below 80%, you either have a cache sizing problem (not enough memory) or a cache key design problem (keys too granular or too coarse). Profile your access patterns.

Summary

Caching is not a single decision—it's a layered architecture:

The patterns you choose—cache-aside vs write-through, single-node vs cluster, eventual vs strong consistency—depend on your consistency requirements and latency constraints. Start simple, add complexity only when you have measurements showing you need it.

In the next tutorial, we'll explore Distributed Data Stores—the systems that store your data at massive scale while maintaining consistency guarantees.

Service Discovery & Configuration Management in Distributed Systems

Distributed Data Stores