Core Algorithms

LRU, LFU, and TinyLFU: Cache Eviction Algorithms Explained Deeply

Master cache eviction: LRU, LFU, LRU-K, and modern TinyLFU used in Caffeine cache. Understand admission policies, window/protected segments, and near-optimal hit rates.

LRULFUTinyLFUCaffeineevictioncacheadmission-policy

LRU, LFU, and TinyLFU

The worst production debugging session I ever had started with a single alert: "Cache hit rate dropped from 95% to 40%." Our Redis cluster was thrashing—evicting keys as fast as we could insert them. The root cause? A misconfigured analytics job was scanning millions of keys that would never be accessed again. LRU eviction, in its infinite wisdom, decided that since those scanned keys were "recently used," they were worth keeping. Our hot data got evicted.

Cache eviction seems simple—when the cache is full, remove something. But the choice of what to remove has a massive impact on hit rate, latency, and infrastructure cost. This tutorial explores the evolution of eviction algorithms from simple LRU to the sophisticated TinyLFU used in Caffeine, the most performant Java cache today.

The Caching Problem

A cache is a fast, finite storage layer that sits in front of a slow, infinite storage layer. When the cache is full and a new item arrives, something must be evicted. The goal is to evict the item least likely to be accessed in the future.

This is inherently a prediction problem. We don't know the future, so we approximate it using past access patterns:

StrategyAssumptionWorks Well ForFails For
LRURecently accessed items will be accessed againRecency-biased workloadsScanning / one-time access patterns
LFUFrequently accessed items will be accessed againStable popularityBursty / trending content
FIFOOld items won't be needed againSequential access patternsMost real-world workloads
RandomNo patternNothingEverything (but at least it's fair)

The Optimal Algorithm: Belady's Algorithm

In 1966, Laszlo Belady proved that the optimal eviction strategy is to evict the item that will be used farthest in the future. This is called Belady's MIN algorithm. It's impossible to implement in practice because it requires knowledge of future accesses, but it serves as a theoretical upper bound for hit rate.

Every practical eviction algorithm is an approximation of Belady's algorithm, using past behavior as a proxy for future behavior.


LRU: Least Recently Used

LRU is the most widely used eviction algorithm. It's simple, performs well for many workloads, and has an efficient O(1) implementation.

How LRU Works

The classic implementation uses a doubly-linked list with a hash map:

Operations:

OperationStepsComplexity
Get(key)Look up in hash map, move node to headO(1)
Put(key, value)If exists: move to head, update value. If new: insert at head, if full: evict tailO(1)
Evict()Remove node at tail, delete from hash mapO(1)

The hash map provides O(1) lookup by key. The linked list maintains access order: most recently accessed at the head, least recently accessed at the tail.

Why LRU Works (and When It Doesn't)

LRU works well when access patterns have temporal locality—items accessed recently are likely to be accessed again soon. This is true for most real-world workloads:

  • User sessions: a user who just logged in will make more requests
  • API responses: a recently fetched resource is likely to be fetched again
  • Database query results: recent queries are often repeated

The scanning problem:

When a workload sequentially accesses many unique keys (a "scan"), LRU performs terribly. Each new key becomes the "most recently used," pushing out truly hot data. After the scan completes, the cache is filled with cold data.

WorkloadCache SizeAccess PatternLRU Hit Rate
Web metadata1000 entries80% of requests hit 200 popular items98%
Analytics scan1000 entriesSequential scan of 10000 unique keys5% (after scan)
Mixed1000 entriesPopular items + occasional scans60-80%

LRU-K: Resisting Scans

LRU-K improves on LRU by tracking the K most recent access times for each item, not just the single most recent. An item is only considered "recently used" after being accessed K times.

VariantStrategyScan ResistanceComplexity
LRU-1 (standard)Single access timestampNoneO(1)
LRU-2Track two most recent accessesGoodO(log N)
LRU-KTrack K most recent accessesExcellent for large KO(log N) for K greater than 2

The problem with LRU-K: it requires maintaining a priority queue, and tracking historical access times adds memory overhead per entry. In practice, LRU-2 is often "good enough" for most scanning-resistant workloads.

LRU-2 insight: Items accessed once (scans) stay in the first queue and get evicted quickly. Items accessed twice (potentially popular) move to the LRU queue where they compete fairly.


LFU: Least Frequently Used

LFU evicts the item with the lowest access frequency. The intuition: if you access an item once and never again, its frequency stays at 1 and it gets evicted first when the cache fills.

How LFU Works

Each cache entry maintains a frequency counter. When the cache is full, the entry with the lowest counter is evicted. If multiple entries have the same frequency, LRU among them (or FIFO).

LFU Weaknesses

Temporal burst unfriendliness:

Consider a news article that goes viral. In the first minute, it gets 10,000 accesses. But the topic is so niche that it never gets accessed again. Meanwhile, a database index that's accessed once every 5 minutes has accumulated a low but steady count over hours. LFU keeps the database index (higher total count) and evicts the breaking news (lower total count but high recent burst).

ItemAccess PatternTotal CountRecent BurstLFU Decision
Breaking news10000 accesses in 1 minute, then 010000HighEvicted! (vs old items with 20000 total)
Database index1 access every 5 minutes for 24 hours288LowKept
User session50 accesses over 2 hours50MediumKept if more than cutoff

Frequency inertia (the "old hot" problem):

An item that was extremely popular yesterday (millions of accesses) has such a high frequency count that it takes hours or days for new popular items to overtake it. The cache is slow to adapt to workload changes.

Imagine a Black Friday sale page that had 1M hits on Friday. By Saturday, users are looking at regular product pages. The sale page's frequency counter is so high that it stays in cache for days, evicting Saturday's actually-hot items.

Memory Overhead for Counters

Cache SizeBits per CounterTotal Memory
1 million entries32 bits (int)4 MB
1 million entries16 bits (short)2 MB
1 million entries8 bits (byte)1 MB
100 million entries32 bits400 MB

Using smaller counters limits the maximum frequency you can track, but also wastes space on items with low counts (most of them). The Pareto principle applies: 80% of items have very low frequencies.


Window-TinyLFU (Caffeine)

Caffeine is a high-performance Java caching library that uses the Window-TinyLFU eviction policy. It's the default cache implementation in Spring Boot and achieves hit rates within 1-2% of the theoretical optimal (Belady's MIN) for most workloads.

Caffeine's real innovation is not the eviction algorithm alone—it's how it combines three distinct mechanisms:

  1. Admission policy: Should this new item be admitted to the cache at all?
  2. Eviction policy: Which item should be removed when the cache is full?
  3. Frequency estimation: How do we track access frequency with minimal memory?

The Three-Segment Architecture

The three segments work together:

SegmentAlgorithmSizePurpose
WindowLRU1% of total cacheAbsorb one-time bursts and scans
ProbationLRU~79% of totalItems that survived admission once
ProtectedLRU~20% of totalItems accessed more than once

How Admission Works

When a new item arrives and the cache is full, Caffeine doesn't automatically admit it. Instead, it asks TinyLFU:

  1. TinyLFU estimates the frequency of the new item
  2. TinyLFU estimates the frequency of the victim (the item about to be evicted)
  3. If the new item's frequency is higher, admit it (evict the victim)
  4. If not, reject the new item (keep the cache unchanged)

This admission policy is the key insight. Most caches blindly accept new items and evict old ones. Caffeine rejects new items that are unlikely to be accessed again, preventing cache pollution.

TinyLFU Frequency Estimation

TinyLFU uses a Count-Min Sketch, a probabilistic data structure that estimates event frequencies with bounded error using minimal memory.

text
Count-Min Sketch
================
Structure: d rows of w counters each
Hash functions: h1, h2, ..., hd (one per row)

Increment(key):
  for each row i:
    counter[i][hi(key)] += 1

Estimate(key):
  return min over all rows of counter[i][hi(key)]
FeatureExact CountingCount-Min Sketch
Memory per 1M items4 MB (32-bit counters)~64 KB (4 rows x 4096 cols)
Accuracy100%99%+ with proper config
Concurrency-safe updatesLock neededLock-free (per-row CAS)
Space for 100M items400 MB~6 MB

The sketch has 4 rows of 4096 4-bit counters. Each counter saturates at 15 (4 bits), which limits the influence of ancient hot items. Four hash functions map each key to one counter per row, and the estimate is the minimum across all rows (this minimizes overcounts from hash collisions).

The Reset Mechanism

TinyLFU doesn't let counters grow forever. It uses an adaptive reset:

  1. After N accesses (the "window"), the sketch is halved: all counters are divided by 2
  2. This exponentially decays the influence of past access patterns
  3. The window size adapts based on workload: shorter windows for bursty workloads, longer for stable ones

This solves LFU's "frequency inertia" problem. Old hot items have their frequency halved every window, allowing the cache to adapt to changes in popularity.

The Victim Duel

When probation is full and a new item arrives from the window, Caffeine stages a "duel" between the new item and the probation victim:

The duel ensures that every entry in the cache has "earned" its place by having a frequency at least as high as the item it displaced.


Comparison: LRU vs LFU vs ARC vs TinyLFU

PropertyLRULFUARCTinyLFU (Caffeine)
Hit rate vs BeladyModerate (~80%)Good (~85%)Very good (~90%)Excellent (~95%+)
Scan resistanceNoneGoodExcellentExcellent
Burst adaptationExcellentPoorGoodExcellent
Memory overheadVery low (2 ptrs/entry)High (counter/entry)ModerateLow (sketch shared)
Concurrent performanceGood (lock on list)Poor (global freq update)ModerateExcellent (lock-free sketch)
Implementation complexityTrivialSimpleModerateComplex
Workload adaptabilityLowLowHighHigh

ARC: Adaptive Replacement Cache

ARC deserves mention as a precursor to TinyLFU. It maintains two lists: recent items (recency) and frequent items (frequency). It dynamically adjusts the balance between the two based on access patterns using a "ghost" entry feedback mechanism.

ARC ListTracksPurpose
T1Recent items (accessed once recently)Capture recency
B1Evicted from T1 (ghosts)Feedback: if B1 hit, increase T1 space
T2Frequent items (accessed twice recently)Capture frequency
B2Evicted from T2 (ghosts)Feedback: if B2 hit, increase T2 space

Ghost entries don't store values—just keys. A hit on a ghost entry signals that the adaptive balance is wrong. ARC adjusts its target sizes based on these signals.

ARC achieves excellent hit rates but at the cost of ghost entry memory and implementation complexity.


Real-World Implementations

SystemAlgorithmNotes
RedisApproximate LRU (eviction pool)Samples N keys, evicts oldest. Configurable sample size. Also supports LFU since 4.0.
MemcachedLRU (with slab allocation)Per-slab LRU. Has faced issues with scanning pollution. LRU crawler for rebalancing.
CaffeineWindow-TinyLFUJava. Default in Spring Boot. Near-optimal hit rates.
Guava CacheLRU + SegmentedPredecessor to Caffeine. Simpler, lower hit rate.
Linux page cacheLRU with two lists (active/inactive)Two-handed clock algorithm. Active list for hot pages, inactive for cold.
Varnish (HTTP cache)LRU with object size awarenessEviction weighted by size: evict many small or one large.
NginxLRU (for proxy cache)Also supports "inactive" timeout eviction.

Redis LFU Mode

Redis 4.0 introduced LFU as an optional eviction policy. It uses a Morris counter (logarithmic counter) instead of a linear one, which uses only a few bits to track frequencies spanning orders of magnitude.

text
Redis LFU Counter:
- 16 bits total (in the existing 24-bit LRU field)
- 8-bit logarithmic counter (max frequency ~200)
- 8-bit log decrement (aging factor / 2)

The counter increases logarithmically:
  p = 1 / (counter * logarithm_factor + 1)
  if rand(0,1) < p: counter += 1

At high counters, increment probability decreases exponentially.

This means a frequently accessed item's counter grows quickly at first, then slows. Old items decay based on the log decrement field, enabling workload adaptation.

S3-FIFO: Simple Yet Effective

In 2023, researchers from Carnegie Mellon and Google published S3-FIFO (Scan-Resistant Segmented FIFO), a remarkably simple eviction algorithm that achieves hit rates comparable to TinyLFU with far less complexity.

S3-FIFO uses three FIFO queues instead of LRU lists:

QueueSizePolicyBehavior
Small10% of cacheFIFONew items enter here. Accessed once? Stay. Evicted? Go to Main
Main90% of cacheFIFOItems accessed in small queue or re-accessed in main queue get a second chance
GhostSize of small queue (metadata only)FIFOTracks recently evicted items from small queue for admission decisions

The key insight: S3-FIFO achieves scan resistance by separating new, unproven items into a small FIFO queue. Items that survive this queue (by being re-accessed) are promoted. Items that never get re-accessed are quickly evicted.

text
S3-FIFO admission flow:

1. New item K inserted into Small FIFO queue
2. If K is accessed while in Small queue:
   - Move K to Main FIFO queue
   - Mark K as "accessed" (one bit)
3. If K is evicted from Small queue without re-access:
   - Insert K's key into Ghost queue
   - Ghost tracks keys of items evicted from Small
4. When cache is full and new item arrives:
   - If item is found in Ghost: admit it (was popular)
   - If not in Ghost: reject it (likely a scan)
5. Items in Main queue get a second chance on first eviction attempt
   (re-insert at head of Main queue if accessed bit is set)

Compared to Window-TinyLFU, S3-FIFO uses no frequency sketch, no probabilistic counters, and no hashing. Just three FIFO queues and one "accessed" bit per entry. Despite this simplicity, it achieves hit rates within 1-3% of Caffeine on standard benchmarks.

AlgorithmHit Rate on Wiki workloadMemory per entryLines of code
LRU52%2 pointers + 1 bit~50
LFU63%1 counter + heap entry~100
ARC72%2 pointers + ghost metadata~300
TinyLFU76%2 pointers + shared sketch~500
S3-FIFO75%2 pointers + 1 access bit~100

For most web application caches, LRU is "good enough." Upgrade to Window-TinyLFU (Caffeine) only if you measure a meaningful hit rate improvement. The complexity of tuning LFU parameters often outweighs benefits for moderate cache sizes (less than 10GB). For large caches (100GB+) with scanning workloads, the investment pays off.


Choosing the Right Eviction Policy

Workload ProfileRecommended PolicyWhy
User session dataLRUStrong recency bias, no scanning
Database query cacheLRU or TinyLFURecency-dominant, occasional scans
CDN / HTTP cacheLRU with size awarenessVariable object sizes
Analytics cacheLFU or TinyLFUStable item popularity
Mixed workloadTinyLFU (Caffeine)Adapts to both recency and frequency
Write-heavy cacheLRU with write cost awarenessLRU is simplest; writes don't affect hit rate
Read-heavy with scansTinyLFUExcellent scan resistance

Practical Tuning Guidelines

ParameterRecommendationEffect
Cache sizeAs large as your working setMost important factor
Window size (TinyLFU)Default (1%)Increase for bursty workloads
Protected ratio20%Increase for stable popularity
Sample count (Redis LRU)5-10Higher = more accurate, more CPU
Reset period (TinyLFU)Adaptive (default)Don't override unless benchmarked
⚠️

One of the most common caching mistakes is setting the cache too small for the working set. If your cache hit rate is below 70%, you may need more memory. No eviction algorithm can save a cache that's 10x too small for the workload. Measure your working set size first, then choose the eviction policy.


Key Takeaways

  • LRU is simple and effective for recency-biased workloads but suffers from scanning pollution where one-time large scans evict hot data
  • LRU-2 and LRU-K improve scan resistance by requiring multiple accesses before considering an item "recent"
  • LFU tracks access frequency but suffers from frequency inertia—old hot items resist eviction long after they've cooled
  • Cumulative frequency accumulates linearly with cache size, making exact LFU expensive for large caches
  • TinyLFU uses a Count-Min Sketch for approximate frequency estimation with minimal memory (99%+ accuracy at ~64KB for millions of items)
  • Window-TinyLFU's admission policy (rejecting low-frequency new items) is the key innovation—it prevents cache pollution at the entry point
  • Caffeine's three-segment architecture (Window, Probation, Protected) handles both bursts and stable popularity
  • The adaptive reset mechanism halves all counters periodically, solving LFU's frequency inertia problem
  • For most workloads, TinyLFU achieves hit rates within 1-2% of theoretical optimal (Belady's algorithm)
  • Measure your working set size before tuning eviction—the right policy can't fix a cache that's too small

In the next tutorial, we will explore distributed ID generation with Twitter's Snowflake algorithm and understand how time-ordered unique IDs are generated across thousands of nodes without coordination.