Distributed Systems

Distributed Caching & CDNs

Design multi-layer cache hierarchies from browser to database. Learn CDN strategies, cache coherence, and geo-distributed caching patterns used by the world's largest systems.

5 min read

Distributed Caching & CDNs

Every high-traffic system runs on caches. Without caching, even the most powerful database would crumble under the weight of a million users hitting it directly. As a staff engineer who has built systems serving billions of requests, I can tell you that caching is not an optimization—it's a fundamental architectural decision you make on day one.

Let me teach you how to think about caching systematically, from the browser to the database.

The Cache Hierarchy

Real systems don't have one cache—they have layers. Each layer serves a different purpose and has different characteristics.

Each layer trades off:

  • Proximity to user (lower latency, higher hit rate)
  • Storage capacity (more proximity = less capacity)
  • Control granularity (who manages invalidation)

Layer 1: Browser Caching

The browser is your first line of defense. When a user revisits your site, zero network requests should be needed for static assets.

Cache-Control Headers

The HTTP caching model is controlled by the Cache-Control header:

DirectiveMeaning
publicCan be cached by any cache (CDN, proxy, browser)
privateOnly browser can cache (contains user-specific data)
no-cacheMust revalidate with server before using cached copy
no-storeNever cache (sensitive data)
max-age=NCache for N seconds
s-maxage=NCDN-specific max-age (browser ignores)
immutableContent won't change—never revalidate

ETag and Last-Modified

For dynamic content, use conditional requests. The server includes an ETag or Last-Modified timestamp in the response. On subsequent requests, the browser sends If-None-Match or If-Modified-Since. If the resource hasn't changed, the server responds with 304 Not Modified, saving bandwidth.

ApproachWhen to Use
ETagStrong validator, changes when content changes
Last-ModifiedCoarser, based on modification time
304 Not ModifiedServer confirms resource unchanged

Service Workers

For offline-first applications and advanced caching strategies, service workers act as a programmable proxy between the browser and the network.

PatternDescription
Cache FirstCheck cache before network
Network FirstTry network, fall back to cache
Stale While RevalidateServe stale cache, update in background
Cache OnlyOnly serve from cache

Layer 2: CDN (Content Delivery Network)

CDNs are geographically distributed servers that cache content close to users. When a user in Tokyo requests a file, they get it from a Tokyo PoP (Point of Presence), not your Oregon data center.

Push vs Pull CDN

StrategyHow it worksBest for
Pull CDN (most common)CDN fetches from origin on first request (lazy)Dynamic/static content, easy setup
Push CDNYou upload content to CDN proactivelyLive streaming, predictable traffic, exact control

CDN Cache Invalidation

The hardest part of CDNs: how do you invalidate stale content?

Three invalidation strategies:

  1. URL-based purge: Purge specific URLs
  2. Tag-based purge (Surrogate Keys): Tag content and purge by tag
  3. TTL-based: Set appropriate TTLs and wait (simplest, slowest)

Cache Warming

For high-traffic events, don't let cache misses hit your origin. Warm caches proactively by fetching critical paths before users request them.


Layer 3: Application Cache (Redis/Memcached)

This is where you have the most control. Application caches store computed results, session data, and frequently-accessed database records.

Cache-Aside Pattern (Lazy Loading)

The most common pattern. Check cache first, fall back to database:

StepAction
1Check cache for key
2If hit, return cached value
3If miss, query database
4Store result in cache
5Return result
⚠️

Cache-aside vs Read-through: Cache-aside is explicit (your code manages caching). Read-through hides caching behind the cache layer (cache fetches from DB on miss). Cache-aside gives you more control; read-through is simpler but less flexible.

Write Patterns

PatternDescriptionWhen to Use
Write-ThroughUpdate cache and database atomicallyData changes frequently but read often
Write-BehindWrite to cache, acknowledge immediately, persist asyncData loss acceptable (logs, analytics)
Cache-AsideUpdate DB, invalidate cacheGeneral purpose
🚨

Be careful with write-behind: If the cache dies before the DB write, you lose data. Use this only when data loss is acceptable.

Cache Stampede (Thundering Herd)

When a popular cache entry expires, thousands of requests hit the database simultaneously.

Prevention strategies:

StrategyHow It Works
Probabilistic early expirationRefresh before TTL expires based on probability
Distributed lockOnly one request refreshes, others wait
Stale-while-revalidateServe stale data while refreshing in background

Distributed Cache Architectures

As traffic grows, a single cache instance won't cut it. You need distributed caching.

Redis Cluster

Redis Cluster shards data automatically across nodes:

Key features:

  • Automatic sharding (16384 hash slots)
  • Replication for high availability
  • Read scaling with replicas
  • Automatic failover

Consistent Hashing for Cache Nodes

When adding/removing cache nodes, consistent hashing minimizes key remapping:

Virtual nodes ensure even load distribution when nodes have different capacities or when nodes join/leave. With 150 virtual nodes per physical node, adding a new node only remaps ~1/4 of keys.


Cache Coherence

In distributed caches, keeping multiple replicas consistent is hard. You have several strategies:

Write Strategies

StrategyHow It WorksTrade-off
Write-Through AllWrite to all replicas simultaneouslyStrong consistency, high latency
Write-InvalidateWrite to primary, invalidate replicasLower write latency, replicas repopulate on read
Eventual ConsistencyAccept temporary inconsistencyHighest performance, may serve stale data

Read Strategies

StrategyHow It WorksTrade-off
Read from PrimaryAlways read from primaryConsistent, higher latency
Read from ReplicaAlways read from nearest replicaLower latency, may return stale data
Read-RepairRead from replica, repair if staleBackground consistency

Geo-Distributed Caching

Global systems need caches in multiple regions. The challenge: keeping them coherent while minimizing latency.

Active-Active Caching

Read from local cache, write to nearest region, replicate asynchronously to other regions.

RegionRead LatencyWrite Strategy
US East~5msWrite local + async to others
EU West~80msWrite local + async to others
APAC~150msWrite local + async to others

Edge-Side Caching with Vary

Use Vary headers to cache different versions of the same URL:

HeaderCreates Cache Per
Accept-LanguageLanguage version
Accept-EncodingCompression type
CookieUser-specific (use carefully!)

Cache Monitoring & Metrics

You can't optimize what you don't measure.

Hit Rate Targets

Cache LayerTarget Hit RateTypical Size
Browser30-50% of requestsUser's device
CDN90%+ for staticTB per PoP
Application80-95% for dynamic10-100 GB
Database90%+ (buffer pool)GB (memory)

Key Metrics to Track

MetricWhat It Tells You
Hit rateCache effectiveness
Miss rateHow often you hit origin
LatencyCache response time
Eviction rateMemory pressure
TTL distributionHow fresh is your data

If your cache hit rate drops below 80%, you either have a cache sizing problem (not enough memory) or a cache key design problem (keys too granular or too coarse). Profile your access patterns.


Summary

Caching is not a single decision—it's a layered architecture:

The patterns you choose—cache-aside vs write-through, single-node vs cluster, eventual vs strong consistency—depend on your consistency requirements and latency constraints. Start simple, add complexity only when you have measurements showing you need it.

In the next tutorial, we'll explore Distributed Data Stores—the systems that store your data at massive scale while maintaining consistency guarantees.