LRU, LFU, and TinyLFU: Cache Eviction Algorithms Explained Deeply

Master cache eviction: LRU, LFU, LRU-K, and modern TinyLFU used in Caffeine cache. Understand admission policies, window/protected segments, and near-optimal hit rates.

LRULFUTinyLFUCaffeineevictioncacheadmission-policy

LRU, LFU, and TinyLFU

The worst production debugging session I ever had started with a single alert: "Cache hit rate dropped from 95% to 40%." Our Redis cluster was thrashing—evicting keys as fast as we could insert them. The root cause? A misconfigured analytics job was scanning millions of keys that would never be accessed again. LRU eviction, in its infinite wisdom, decided that since those scanned keys were "recently used," they were worth keeping. Our hot data got evicted.

Cache eviction seems simple—when the cache is full, remove something. But the choice of what to remove has a massive impact on hit rate, latency, and infrastructure cost. This tutorial explores the evolution of eviction algorithms from simple LRU to the sophisticated TinyLFU used in Caffeine, the most performant Java cache today.

The Caching Problem

A cache is a fast, finite storage layer that sits in front of a slow, infinite storage layer. When the cache is full and a new item arrives, something must be evicted. The goal is to evict the item least likely to be accessed in the future.

This is inherently a prediction problem. We don't know the future, so we approximate it using past access patterns:

Strategy	Assumption	Works Well For	Fails For
LRU	Recently accessed items will be accessed again	Recency-biased workloads	Scanning / one-time access patterns
LFU	Frequently accessed items will be accessed again	Stable popularity	Bursty / trending content
FIFO	Old items won't be needed again	Sequential access patterns	Most real-world workloads
Random	No pattern	Nothing	Everything (but at least it's fair)

The Optimal Algorithm: Belady's Algorithm

In 1966, Laszlo Belady proved that the optimal eviction strategy is to evict the item that will be used farthest in the future. This is called Belady's MIN algorithm. It's impossible to implement in practice because it requires knowledge of future accesses, but it serves as a theoretical upper bound for hit rate.

Every practical eviction algorithm is an approximation of Belady's algorithm, using past behavior as a proxy for future behavior.

LRU: Least Recently Used

LRU is the most widely used eviction algorithm. It's simple, performs well for many workloads, and has an efficient O(1) implementation.

How LRU Works

The classic implementation uses a doubly-linked list with a hash map:

Operations:

Operation	Steps	Complexity
Get(key)	Look up in hash map, move node to head	O(1)
Put(key, value)	If exists: move to head, update value. If new: insert at head, if full: evict tail	O(1)
Evict()	Remove node at tail, delete from hash map	O(1)

The hash map provides O(1) lookup by key. The linked list maintains access order: most recently accessed at the head, least recently accessed at the tail.

Why LRU Works (and When It Doesn't)

LRU works well when access patterns have temporal locality—items accessed recently are likely to be accessed again soon. This is true for most real-world workloads:

User sessions: a user who just logged in will make more requests
API responses: a recently fetched resource is likely to be fetched again
Database query results: recent queries are often repeated

The scanning problem:

When a workload sequentially accesses many unique keys (a "scan"), LRU performs terribly. Each new key becomes the "most recently used," pushing out truly hot data. After the scan completes, the cache is filled with cold data.

Workload	Cache Size	Access Pattern	LRU Hit Rate
Web metadata	1000 entries	80% of requests hit 200 popular items	98%
Analytics scan	1000 entries	Sequential scan of 10000 unique keys	5% (after scan)
Mixed	1000 entries	Popular items + occasional scans	60-80%

LRU-K: Resisting Scans

LRU-K improves on LRU by tracking the K most recent access times for each item, not just the single most recent. An item is only considered "recently used" after being accessed K times.

Variant	Strategy	Scan Resistance	Complexity
LRU-1 (standard)	Single access timestamp	None	O(1)
LRU-2	Track two most recent accesses	Good	O(log N)
LRU-K	Track K most recent accesses	Excellent for large K	O(log N) for K greater than 2

The problem with LRU-K: it requires maintaining a priority queue, and tracking historical access times adds memory overhead per entry. In practice, LRU-2 is often "good enough" for most scanning-resistant workloads.

LRU-2 insight: Items accessed once (scans) stay in the first queue and get evicted quickly. Items accessed twice (potentially popular) move to the LRU queue where they compete fairly.

LFU: Least Frequently Used

LFU evicts the item with the lowest access frequency. The intuition: if you access an item once and never again, its frequency stays at 1 and it gets evicted first when the cache fills.

How LFU Works

Each cache entry maintains a frequency counter. When the cache is full, the entry with the lowest counter is evicted. If multiple entries have the same frequency, LRU among them (or FIFO).

LFU Weaknesses

Temporal burst unfriendliness:

Consider a news article that goes viral. In the first minute, it gets 10,000 accesses. But the topic is so niche that it never gets accessed again. Meanwhile, a database index that's accessed once every 5 minutes has accumulated a low but steady count over hours. LFU keeps the database index (higher total count) and evicts the breaking news (lower total count but high recent burst).

Item	Access Pattern	Total Count	Recent Burst	LFU Decision
Breaking news	10000 accesses in 1 minute, then 0	10000	High	Evicted! (vs old items with 20000 total)
Database index	1 access every 5 minutes for 24 hours	288	Low	Kept
User session	50 accesses over 2 hours	50	Medium	Kept if more than cutoff

Frequency inertia (the "old hot" problem):

An item that was extremely popular yesterday (millions of accesses) has such a high frequency count that it takes hours or days for new popular items to overtake it. The cache is slow to adapt to workload changes.

Imagine a Black Friday sale page that had 1M hits on Friday. By Saturday, users are looking at regular product pages. The sale page's frequency counter is so high that it stays in cache for days, evicting Saturday's actually-hot items.

Memory Overhead for Counters

Cache Size	Bits per Counter	Total Memory
1 million entries	32 bits (int)	4 MB
1 million entries	16 bits (short)	2 MB
1 million entries	8 bits (byte)	1 MB
100 million entries	32 bits	400 MB

Using smaller counters limits the maximum frequency you can track, but also wastes space on items with low counts (most of them). The Pareto principle applies: 80% of items have very low frequencies.

Window-TinyLFU (Caffeine)

Caffeine is a high-performance Java caching library that uses the Window-TinyLFU eviction policy. It's the default cache implementation in Spring Boot and achieves hit rates within 1-2% of the theoretical optimal (Belady's MIN) for most workloads.

Caffeine's real innovation is not the eviction algorithm alone—it's how it combines three distinct mechanisms:

Admission policy: Should this new item be admitted to the cache at all?
Eviction policy: Which item should be removed when the cache is full?
Frequency estimation: How do we track access frequency with minimal memory?

The Three-Segment Architecture

The three segments work together:

Segment	Algorithm	Size	Purpose
Window	LRU	1% of total cache	Absorb one-time bursts and scans
Probation	LRU	~79% of total	Items that survived admission once
Protected	LRU	~20% of total	Items accessed more than once

How Admission Works

When a new item arrives and the cache is full, Caffeine doesn't automatically admit it. Instead, it asks TinyLFU:

TinyLFU estimates the frequency of the new item
TinyLFU estimates the frequency of the victim (the item about to be evicted)
If the new item's frequency is higher, admit it (evict the victim)
If not, reject the new item (keep the cache unchanged)

This admission policy is the key insight. Most caches blindly accept new items and evict old ones. Caffeine rejects new items that are unlikely to be accessed again, preventing cache pollution.

TinyLFU Frequency Estimation

TinyLFU uses a Count-Min Sketch, a probabilistic data structure that estimates event frequencies with bounded error using minimal memory.

text

Count-Min Sketch
================
Structure: d rows of w counters each
Hash functions: h1, h2, ..., hd (one per row)

Increment(key):
  for each row i:
    counter[i][hi(key)] += 1

Estimate(key):
  return min over all rows of counter[i][hi(key)]

Feature	Exact Counting	Count-Min Sketch
Memory per 1M items	4 MB (32-bit counters)	~64 KB (4 rows x 4096 cols)
Accuracy	100%	99%+ with proper config
Concurrency-safe updates	Lock needed	Lock-free (per-row CAS)
Space for 100M items	400 MB	~6 MB

The sketch has 4 rows of 4096 4-bit counters. Each counter saturates at 15 (4 bits), which limits the influence of ancient hot items. Four hash functions map each key to one counter per row, and the estimate is the minimum across all rows (this minimizes overcounts from hash collisions).

The Reset Mechanism

TinyLFU doesn't let counters grow forever. It uses an adaptive reset:

After N accesses (the "window"), the sketch is halved: all counters are divided by 2
This exponentially decays the influence of past access patterns
The window size adapts based on workload: shorter windows for bursty workloads, longer for stable ones

This solves LFU's "frequency inertia" problem. Old hot items have their frequency halved every window, allowing the cache to adapt to changes in popularity.

The Victim Duel

When probation is full and a new item arrives from the window, Caffeine stages a "duel" between the new item and the probation victim:

The duel ensures that every entry in the cache has "earned" its place by having a frequency at least as high as the item it displaced.

Comparison: LRU vs LFU vs ARC vs TinyLFU

Property	LRU	LFU	ARC	TinyLFU (Caffeine)
Hit rate vs Belady	Moderate (~80%)	Good (~85%)	Very good (~90%)	Excellent (~95%+)
Scan resistance	None	Good	Excellent	Excellent
Burst adaptation	Excellent	Poor	Good	Excellent
Memory overhead	Very low (2 ptrs/entry)	High (counter/entry)	Moderate	Low (sketch shared)
Concurrent performance	Good (lock on list)	Poor (global freq update)	Moderate	Excellent (lock-free sketch)
Implementation complexity	Trivial	Simple	Moderate	Complex
Workload adaptability	Low	Low	High	High

ARC: Adaptive Replacement Cache

ARC deserves mention as a precursor to TinyLFU. It maintains two lists: recent items (recency) and frequent items (frequency). It dynamically adjusts the balance between the two based on access patterns using a "ghost" entry feedback mechanism.

ARC List	Tracks	Purpose
T1	Recent items (accessed once recently)	Capture recency
B1	Evicted from T1 (ghosts)	Feedback: if B1 hit, increase T1 space
T2	Frequent items (accessed twice recently)	Capture frequency
B2	Evicted from T2 (ghosts)	Feedback: if B2 hit, increase T2 space

Ghost entries don't store values—just keys. A hit on a ghost entry signals that the adaptive balance is wrong. ARC adjusts its target sizes based on these signals.

ARC achieves excellent hit rates but at the cost of ghost entry memory and implementation complexity.

Real-World Implementations

System	Algorithm	Notes
Redis	Approximate LRU (eviction pool)	Samples N keys, evicts oldest. Configurable sample size. Also supports LFU since 4.0.
Memcached	LRU (with slab allocation)	Per-slab LRU. Has faced issues with scanning pollution. LRU crawler for rebalancing.
Caffeine	Window-TinyLFU	Java. Default in Spring Boot. Near-optimal hit rates.
Guava Cache	LRU + Segmented	Predecessor to Caffeine. Simpler, lower hit rate.
Linux page cache	LRU with two lists (active/inactive)	Two-handed clock algorithm. Active list for hot pages, inactive for cold.
Varnish (HTTP cache)	LRU with object size awareness	Eviction weighted by size: evict many small or one large.
Nginx	LRU (for proxy cache)	Also supports "inactive" timeout eviction.

Redis LFU Mode

Redis 4.0 introduced LFU as an optional eviction policy. It uses a Morris counter (logarithmic counter) instead of a linear one, which uses only a few bits to track frequencies spanning orders of magnitude.

text

Redis LFU Counter:
- 16 bits total (in the existing 24-bit LRU field)
- 8-bit logarithmic counter (max frequency ~200)
- 8-bit log decrement (aging factor / 2)

The counter increases logarithmically:
  p = 1 / (counter * logarithm_factor + 1)
  if rand(0,1) < p: counter += 1

At high counters, increment probability decreases exponentially.

This means a frequently accessed item's counter grows quickly at first, then slows. Old items decay based on the log decrement field, enabling workload adaptation.

S3-FIFO: Simple Yet Effective

In 2023, researchers from Carnegie Mellon and Google published S3-FIFO (Scan-Resistant Segmented FIFO), a remarkably simple eviction algorithm that achieves hit rates comparable to TinyLFU with far less complexity.

S3-FIFO uses three FIFO queues instead of LRU lists:

Queue	Size	Policy	Behavior
Small	10% of cache	FIFO	New items enter here. Accessed once? Stay. Evicted? Go to Main
Main	90% of cache	FIFO	Items accessed in small queue or re-accessed in main queue get a second chance
Ghost	Size of small queue (metadata only)	FIFO	Tracks recently evicted items from small queue for admission decisions

The key insight: S3-FIFO achieves scan resistance by separating new, unproven items into a small FIFO queue. Items that survive this queue (by being re-accessed) are promoted. Items that never get re-accessed are quickly evicted.

text

S3-FIFO admission flow:

1. New item K inserted into Small FIFO queue
2. If K is accessed while in Small queue:
   - Move K to Main FIFO queue
   - Mark K as "accessed" (one bit)
3. If K is evicted from Small queue without re-access:
   - Insert K's key into Ghost queue
   - Ghost tracks keys of items evicted from Small
4. When cache is full and new item arrives:
   - If item is found in Ghost: admit it (was popular)
   - If not in Ghost: reject it (likely a scan)
5. Items in Main queue get a second chance on first eviction attempt
   (re-insert at head of Main queue if accessed bit is set)

Compared to Window-TinyLFU, S3-FIFO uses no frequency sketch, no probabilistic counters, and no hashing. Just three FIFO queues and one "accessed" bit per entry. Despite this simplicity, it achieves hit rates within 1-3% of Caffeine on standard benchmarks.

Algorithm	Hit Rate on Wiki workload	Memory per entry	Lines of code
LRU	52%	2 pointers + 1 bit	~50
LFU	63%	1 counter + heap entry	~100
ARC	72%	2 pointers + ghost metadata	~300
TinyLFU	76%	2 pointers + shared sketch	~500
S3-FIFO	75%	2 pointers + 1 access bit	~100

✅

For most web application caches, LRU is "good enough." Upgrade to Window-TinyLFU (Caffeine) only if you measure a meaningful hit rate improvement. The complexity of tuning LFU parameters often outweighs benefits for moderate cache sizes (less than 10GB). For large caches (100GB+) with scanning workloads, the investment pays off.

Choosing the Right Eviction Policy

Workload Profile	Recommended Policy	Why
User session data	LRU	Strong recency bias, no scanning
Database query cache	LRU or TinyLFU	Recency-dominant, occasional scans
CDN / HTTP cache	LRU with size awareness	Variable object sizes
Analytics cache	LFU or TinyLFU	Stable item popularity
Mixed workload	TinyLFU (Caffeine)	Adapts to both recency and frequency
Write-heavy cache	LRU with write cost awareness	LRU is simplest; writes don't affect hit rate
Read-heavy with scans	TinyLFU	Excellent scan resistance

Practical Tuning Guidelines

Parameter	Recommendation	Effect
Cache size	As large as your working set	Most important factor
Window size (TinyLFU)	Default (1%)	Increase for bursty workloads
Protected ratio	20%	Increase for stable popularity
Sample count (Redis LRU)	5-10	Higher = more accurate, more CPU
Reset period (TinyLFU)	Adaptive (default)	Don't override unless benchmarked

⚠️

One of the most common caching mistakes is setting the cache too small for the working set. If your cache hit rate is below 70%, you may need more memory. No eviction algorithm can save a cache that's 10x too small for the workload. Measure your working set size first, then choose the eviction policy.

Key Takeaways

LRU is simple and effective for recency-biased workloads but suffers from scanning pollution where one-time large scans evict hot data
LRU-2 and LRU-K improve scan resistance by requiring multiple accesses before considering an item "recent"
LFU tracks access frequency but suffers from frequency inertia—old hot items resist eviction long after they've cooled
Cumulative frequency accumulates linearly with cache size, making exact LFU expensive for large caches
TinyLFU uses a Count-Min Sketch for approximate frequency estimation with minimal memory (99%+ accuracy at ~64KB for millions of items)
Window-TinyLFU's admission policy (rejecting low-frequency new items) is the key innovation—it prevents cache pollution at the entry point
Caffeine's three-segment architecture (Window, Probation, Protected) handles both bursts and stable popularity
The adaptive reset mechanism halves all counters periodically, solving LFU's frequency inertia problem
For most workloads, TinyLFU achieves hit rates within 1-2% of theoretical optimal (Belady's algorithm)
Measure your working set size before tuning eviction—the right policy can't fix a cache that's too small

In the next tutorial, we will explore distributed ID generation with Twitter's Snowflake algorithm and understand how time-ordered unique IDs are generated across thousands of nodes without coordination.

Vector Clocks and Versioning: Causal Ordering in Distributed Systems

Snowflake ID and Distributed ID Generation: Time-Ordered Unique IDs at Scale