Distributed Caching & CDNs
Design multi-layer cache hierarchies from browser to database. Learn CDN strategies, cache coherence, and geo-distributed caching patterns used by the world's largest systems.
Distributed Caching & CDNs
Every high-traffic system runs on caches. Without caching, even the most powerful database would crumble under the weight of a million users hitting it directly. As a staff engineer who has built systems serving billions of requests, I can tell you that caching is not an optimization—it's a fundamental architectural decision you make on day one.
Let me teach you how to think about caching systematically, from the browser to the database.
The Cache Hierarchy
Real systems don't have one cache—they have layers. Each layer serves a different purpose and has different characteristics.
Each layer trades off:
- Proximity to user (lower latency, higher hit rate)
- Storage capacity (more proximity = less capacity)
- Control granularity (who manages invalidation)
Layer 1: Browser Caching
The browser is your first line of defense. When a user revisits your site, zero network requests should be needed for static assets.
Cache-Control Headers
The HTTP caching model is controlled by the Cache-Control header:
| Directive | Meaning |
|---|---|
public | Can be cached by any cache (CDN, proxy, browser) |
private | Only browser can cache (contains user-specific data) |
no-cache | Must revalidate with server before using cached copy |
no-store | Never cache (sensitive data) |
max-age=N | Cache for N seconds |
s-maxage=N | CDN-specific max-age (browser ignores) |
immutable | Content won't change—never revalidate |
ETag and Last-Modified
For dynamic content, use conditional requests. The server includes an ETag or Last-Modified timestamp in the response. On subsequent requests, the browser sends If-None-Match or If-Modified-Since. If the resource hasn't changed, the server responds with 304 Not Modified, saving bandwidth.
| Approach | When to Use |
|---|---|
| ETag | Strong validator, changes when content changes |
| Last-Modified | Coarser, based on modification time |
| 304 Not Modified | Server confirms resource unchanged |
Service Workers
For offline-first applications and advanced caching strategies, service workers act as a programmable proxy between the browser and the network.
| Pattern | Description |
|---|---|
| Cache First | Check cache before network |
| Network First | Try network, fall back to cache |
| Stale While Revalidate | Serve stale cache, update in background |
| Cache Only | Only serve from cache |
Layer 2: CDN (Content Delivery Network)
CDNs are geographically distributed servers that cache content close to users. When a user in Tokyo requests a file, they get it from a Tokyo PoP (Point of Presence), not your Oregon data center.
Push vs Pull CDN
| Strategy | How it works | Best for |
|---|---|---|
| Pull CDN (most common) | CDN fetches from origin on first request (lazy) | Dynamic/static content, easy setup |
| Push CDN | You upload content to CDN proactively | Live streaming, predictable traffic, exact control |
CDN Cache Invalidation
The hardest part of CDNs: how do you invalidate stale content?
Three invalidation strategies:
- URL-based purge: Purge specific URLs
- Tag-based purge (Surrogate Keys): Tag content and purge by tag
- TTL-based: Set appropriate TTLs and wait (simplest, slowest)
Cache Warming
For high-traffic events, don't let cache misses hit your origin. Warm caches proactively by fetching critical paths before users request them.
Layer 3: Application Cache (Redis/Memcached)
This is where you have the most control. Application caches store computed results, session data, and frequently-accessed database records.
Cache-Aside Pattern (Lazy Loading)
The most common pattern. Check cache first, fall back to database:
| Step | Action |
|---|---|
| 1 | Check cache for key |
| 2 | If hit, return cached value |
| 3 | If miss, query database |
| 4 | Store result in cache |
| 5 | Return result |
Cache-aside vs Read-through: Cache-aside is explicit (your code manages caching). Read-through hides caching behind the cache layer (cache fetches from DB on miss). Cache-aside gives you more control; read-through is simpler but less flexible.
Write Patterns
| Pattern | Description | When to Use |
|---|---|---|
| Write-Through | Update cache and database atomically | Data changes frequently but read often |
| Write-Behind | Write to cache, acknowledge immediately, persist async | Data loss acceptable (logs, analytics) |
| Cache-Aside | Update DB, invalidate cache | General purpose |
Be careful with write-behind: If the cache dies before the DB write, you lose data. Use this only when data loss is acceptable.
Cache Stampede (Thundering Herd)
When a popular cache entry expires, thousands of requests hit the database simultaneously.
Prevention strategies:
| Strategy | How It Works |
|---|---|
| Probabilistic early expiration | Refresh before TTL expires based on probability |
| Distributed lock | Only one request refreshes, others wait |
| Stale-while-revalidate | Serve stale data while refreshing in background |
Distributed Cache Architectures
As traffic grows, a single cache instance won't cut it. You need distributed caching.
Redis Cluster
Redis Cluster shards data automatically across nodes:
Key features:
- Automatic sharding (16384 hash slots)
- Replication for high availability
- Read scaling with replicas
- Automatic failover
Consistent Hashing for Cache Nodes
When adding/removing cache nodes, consistent hashing minimizes key remapping:
Virtual nodes ensure even load distribution when nodes have different capacities or when nodes join/leave. With 150 virtual nodes per physical node, adding a new node only remaps ~1/4 of keys.
Cache Coherence
In distributed caches, keeping multiple replicas consistent is hard. You have several strategies:
Write Strategies
| Strategy | How It Works | Trade-off |
|---|---|---|
| Write-Through All | Write to all replicas simultaneously | Strong consistency, high latency |
| Write-Invalidate | Write to primary, invalidate replicas | Lower write latency, replicas repopulate on read |
| Eventual Consistency | Accept temporary inconsistency | Highest performance, may serve stale data |
Read Strategies
| Strategy | How It Works | Trade-off |
|---|---|---|
| Read from Primary | Always read from primary | Consistent, higher latency |
| Read from Replica | Always read from nearest replica | Lower latency, may return stale data |
| Read-Repair | Read from replica, repair if stale | Background consistency |
Geo-Distributed Caching
Global systems need caches in multiple regions. The challenge: keeping them coherent while minimizing latency.
Active-Active Caching
Read from local cache, write to nearest region, replicate asynchronously to other regions.
| Region | Read Latency | Write Strategy |
|---|---|---|
| US East | ~5ms | Write local + async to others |
| EU West | ~80ms | Write local + async to others |
| APAC | ~150ms | Write local + async to others |
Edge-Side Caching with Vary
Use Vary headers to cache different versions of the same URL:
| Header | Creates Cache Per |
|---|---|
Accept-Language | Language version |
Accept-Encoding | Compression type |
Cookie | User-specific (use carefully!) |
Cache Monitoring & Metrics
You can't optimize what you don't measure.
Hit Rate Targets
| Cache Layer | Target Hit Rate | Typical Size |
|---|---|---|
| Browser | 30-50% of requests | User's device |
| CDN | 90%+ for static | TB per PoP |
| Application | 80-95% for dynamic | 10-100 GB |
| Database | 90%+ (buffer pool) | GB (memory) |
Key Metrics to Track
| Metric | What It Tells You |
|---|---|
| Hit rate | Cache effectiveness |
| Miss rate | How often you hit origin |
| Latency | Cache response time |
| Eviction rate | Memory pressure |
| TTL distribution | How fresh is your data |
If your cache hit rate drops below 80%, you either have a cache sizing problem (not enough memory) or a cache key design problem (keys too granular or too coarse). Profile your access patterns.
Summary
Caching is not a single decision—it's a layered architecture:
The patterns you choose—cache-aside vs write-through, single-node vs cluster, eventual vs strong consistency—depend on your consistency requirements and latency constraints. Start simple, add complexity only when you have measurements showing you need it.
In the next tutorial, we'll explore Distributed Data Stores—the systems that store your data at massive scale while maintaining consistency guarantees.