LRU, LFU, and TinyLFU: Cache Eviction Algorithms Explained Deeply
Master cache eviction: LRU, LFU, LRU-K, and modern TinyLFU used in Caffeine cache. Understand admission policies, window/protected segments, and near-optimal hit rates.
LRU, LFU, and TinyLFU
The worst production debugging session I ever had started with a single alert: "Cache hit rate dropped from 95% to 40%." Our Redis cluster was thrashing—evicting keys as fast as we could insert them. The root cause? A misconfigured analytics job was scanning millions of keys that would never be accessed again. LRU eviction, in its infinite wisdom, decided that since those scanned keys were "recently used," they were worth keeping. Our hot data got evicted.
Cache eviction seems simple—when the cache is full, remove something. But the choice of what to remove has a massive impact on hit rate, latency, and infrastructure cost. This tutorial explores the evolution of eviction algorithms from simple LRU to the sophisticated TinyLFU used in Caffeine, the most performant Java cache today.
The Caching Problem
A cache is a fast, finite storage layer that sits in front of a slow, infinite storage layer. When the cache is full and a new item arrives, something must be evicted. The goal is to evict the item least likely to be accessed in the future.
This is inherently a prediction problem. We don't know the future, so we approximate it using past access patterns:
| Strategy | Assumption | Works Well For | Fails For |
|---|---|---|---|
| LRU | Recently accessed items will be accessed again | Recency-biased workloads | Scanning / one-time access patterns |
| LFU | Frequently accessed items will be accessed again | Stable popularity | Bursty / trending content |
| FIFO | Old items won't be needed again | Sequential access patterns | Most real-world workloads |
| Random | No pattern | Nothing | Everything (but at least it's fair) |
The Optimal Algorithm: Belady's Algorithm
In 1966, Laszlo Belady proved that the optimal eviction strategy is to evict the item that will be used farthest in the future. This is called Belady's MIN algorithm. It's impossible to implement in practice because it requires knowledge of future accesses, but it serves as a theoretical upper bound for hit rate.
Every practical eviction algorithm is an approximation of Belady's algorithm, using past behavior as a proxy for future behavior.
LRU: Least Recently Used
LRU is the most widely used eviction algorithm. It's simple, performs well for many workloads, and has an efficient O(1) implementation.
How LRU Works
The classic implementation uses a doubly-linked list with a hash map:
Operations:
| Operation | Steps | Complexity |
|---|---|---|
| Get(key) | Look up in hash map, move node to head | O(1) |
| Put(key, value) | If exists: move to head, update value. If new: insert at head, if full: evict tail | O(1) |
| Evict() | Remove node at tail, delete from hash map | O(1) |
The hash map provides O(1) lookup by key. The linked list maintains access order: most recently accessed at the head, least recently accessed at the tail.
Why LRU Works (and When It Doesn't)
LRU works well when access patterns have temporal locality—items accessed recently are likely to be accessed again soon. This is true for most real-world workloads:
- User sessions: a user who just logged in will make more requests
- API responses: a recently fetched resource is likely to be fetched again
- Database query results: recent queries are often repeated
The scanning problem:
When a workload sequentially accesses many unique keys (a "scan"), LRU performs terribly. Each new key becomes the "most recently used," pushing out truly hot data. After the scan completes, the cache is filled with cold data.
| Workload | Cache Size | Access Pattern | LRU Hit Rate |
|---|---|---|---|
| Web metadata | 1000 entries | 80% of requests hit 200 popular items | 98% |
| Analytics scan | 1000 entries | Sequential scan of 10000 unique keys | 5% (after scan) |
| Mixed | 1000 entries | Popular items + occasional scans | 60-80% |
LRU-K: Resisting Scans
LRU-K improves on LRU by tracking the K most recent access times for each item, not just the single most recent. An item is only considered "recently used" after being accessed K times.
| Variant | Strategy | Scan Resistance | Complexity |
|---|---|---|---|
| LRU-1 (standard) | Single access timestamp | None | O(1) |
| LRU-2 | Track two most recent accesses | Good | O(log N) |
| LRU-K | Track K most recent accesses | Excellent for large K | O(log N) for K greater than 2 |
The problem with LRU-K: it requires maintaining a priority queue, and tracking historical access times adds memory overhead per entry. In practice, LRU-2 is often "good enough" for most scanning-resistant workloads.
LRU-2 insight: Items accessed once (scans) stay in the first queue and get evicted quickly. Items accessed twice (potentially popular) move to the LRU queue where they compete fairly.
LFU: Least Frequently Used
LFU evicts the item with the lowest access frequency. The intuition: if you access an item once and never again, its frequency stays at 1 and it gets evicted first when the cache fills.
How LFU Works
Each cache entry maintains a frequency counter. When the cache is full, the entry with the lowest counter is evicted. If multiple entries have the same frequency, LRU among them (or FIFO).
LFU Weaknesses
Temporal burst unfriendliness:
Consider a news article that goes viral. In the first minute, it gets 10,000 accesses. But the topic is so niche that it never gets accessed again. Meanwhile, a database index that's accessed once every 5 minutes has accumulated a low but steady count over hours. LFU keeps the database index (higher total count) and evicts the breaking news (lower total count but high recent burst).
| Item | Access Pattern | Total Count | Recent Burst | LFU Decision |
|---|---|---|---|---|
| Breaking news | 10000 accesses in 1 minute, then 0 | 10000 | High | Evicted! (vs old items with 20000 total) |
| Database index | 1 access every 5 minutes for 24 hours | 288 | Low | Kept |
| User session | 50 accesses over 2 hours | 50 | Medium | Kept if more than cutoff |
Frequency inertia (the "old hot" problem):
An item that was extremely popular yesterday (millions of accesses) has such a high frequency count that it takes hours or days for new popular items to overtake it. The cache is slow to adapt to workload changes.
Imagine a Black Friday sale page that had 1M hits on Friday. By Saturday, users are looking at regular product pages. The sale page's frequency counter is so high that it stays in cache for days, evicting Saturday's actually-hot items.
Memory Overhead for Counters
| Cache Size | Bits per Counter | Total Memory |
|---|---|---|
| 1 million entries | 32 bits (int) | 4 MB |
| 1 million entries | 16 bits (short) | 2 MB |
| 1 million entries | 8 bits (byte) | 1 MB |
| 100 million entries | 32 bits | 400 MB |
Using smaller counters limits the maximum frequency you can track, but also wastes space on items with low counts (most of them). The Pareto principle applies: 80% of items have very low frequencies.
Window-TinyLFU (Caffeine)
Caffeine is a high-performance Java caching library that uses the Window-TinyLFU eviction policy. It's the default cache implementation in Spring Boot and achieves hit rates within 1-2% of the theoretical optimal (Belady's MIN) for most workloads.
Caffeine's real innovation is not the eviction algorithm alone—it's how it combines three distinct mechanisms:
- Admission policy: Should this new item be admitted to the cache at all?
- Eviction policy: Which item should be removed when the cache is full?
- Frequency estimation: How do we track access frequency with minimal memory?
The Three-Segment Architecture
The three segments work together:
| Segment | Algorithm | Size | Purpose |
|---|---|---|---|
| Window | LRU | 1% of total cache | Absorb one-time bursts and scans |
| Probation | LRU | ~79% of total | Items that survived admission once |
| Protected | LRU | ~20% of total | Items accessed more than once |
How Admission Works
When a new item arrives and the cache is full, Caffeine doesn't automatically admit it. Instead, it asks TinyLFU:
- TinyLFU estimates the frequency of the new item
- TinyLFU estimates the frequency of the victim (the item about to be evicted)
- If the new item's frequency is higher, admit it (evict the victim)
- If not, reject the new item (keep the cache unchanged)
This admission policy is the key insight. Most caches blindly accept new items and evict old ones. Caffeine rejects new items that are unlikely to be accessed again, preventing cache pollution.
TinyLFU Frequency Estimation
TinyLFU uses a Count-Min Sketch, a probabilistic data structure that estimates event frequencies with bounded error using minimal memory.
Count-Min Sketch
================
Structure: d rows of w counters each
Hash functions: h1, h2, ..., hd (one per row)
Increment(key):
for each row i:
counter[i][hi(key)] += 1
Estimate(key):
return min over all rows of counter[i][hi(key)]
| Feature | Exact Counting | Count-Min Sketch |
|---|---|---|
| Memory per 1M items | 4 MB (32-bit counters) | ~64 KB (4 rows x 4096 cols) |
| Accuracy | 100% | 99%+ with proper config |
| Concurrency-safe updates | Lock needed | Lock-free (per-row CAS) |
| Space for 100M items | 400 MB | ~6 MB |
The sketch has 4 rows of 4096 4-bit counters. Each counter saturates at 15 (4 bits), which limits the influence of ancient hot items. Four hash functions map each key to one counter per row, and the estimate is the minimum across all rows (this minimizes overcounts from hash collisions).
The Reset Mechanism
TinyLFU doesn't let counters grow forever. It uses an adaptive reset:
- After N accesses (the "window"), the sketch is halved: all counters are divided by 2
- This exponentially decays the influence of past access patterns
- The window size adapts based on workload: shorter windows for bursty workloads, longer for stable ones
This solves LFU's "frequency inertia" problem. Old hot items have their frequency halved every window, allowing the cache to adapt to changes in popularity.
The Victim Duel
When probation is full and a new item arrives from the window, Caffeine stages a "duel" between the new item and the probation victim:
The duel ensures that every entry in the cache has "earned" its place by having a frequency at least as high as the item it displaced.
Comparison: LRU vs LFU vs ARC vs TinyLFU
| Property | LRU | LFU | ARC | TinyLFU (Caffeine) |
|---|---|---|---|---|
| Hit rate vs Belady | Moderate (~80%) | Good (~85%) | Very good (~90%) | Excellent (~95%+) |
| Scan resistance | None | Good | Excellent | Excellent |
| Burst adaptation | Excellent | Poor | Good | Excellent |
| Memory overhead | Very low (2 ptrs/entry) | High (counter/entry) | Moderate | Low (sketch shared) |
| Concurrent performance | Good (lock on list) | Poor (global freq update) | Moderate | Excellent (lock-free sketch) |
| Implementation complexity | Trivial | Simple | Moderate | Complex |
| Workload adaptability | Low | Low | High | High |
ARC: Adaptive Replacement Cache
ARC deserves mention as a precursor to TinyLFU. It maintains two lists: recent items (recency) and frequent items (frequency). It dynamically adjusts the balance between the two based on access patterns using a "ghost" entry feedback mechanism.
| ARC List | Tracks | Purpose |
|---|---|---|
| T1 | Recent items (accessed once recently) | Capture recency |
| B1 | Evicted from T1 (ghosts) | Feedback: if B1 hit, increase T1 space |
| T2 | Frequent items (accessed twice recently) | Capture frequency |
| B2 | Evicted from T2 (ghosts) | Feedback: if B2 hit, increase T2 space |
Ghost entries don't store values—just keys. A hit on a ghost entry signals that the adaptive balance is wrong. ARC adjusts its target sizes based on these signals.
ARC achieves excellent hit rates but at the cost of ghost entry memory and implementation complexity.
Real-World Implementations
| System | Algorithm | Notes |
|---|---|---|
| Redis | Approximate LRU (eviction pool) | Samples N keys, evicts oldest. Configurable sample size. Also supports LFU since 4.0. |
| Memcached | LRU (with slab allocation) | Per-slab LRU. Has faced issues with scanning pollution. LRU crawler for rebalancing. |
| Caffeine | Window-TinyLFU | Java. Default in Spring Boot. Near-optimal hit rates. |
| Guava Cache | LRU + Segmented | Predecessor to Caffeine. Simpler, lower hit rate. |
| Linux page cache | LRU with two lists (active/inactive) | Two-handed clock algorithm. Active list for hot pages, inactive for cold. |
| Varnish (HTTP cache) | LRU with object size awareness | Eviction weighted by size: evict many small or one large. |
| Nginx | LRU (for proxy cache) | Also supports "inactive" timeout eviction. |
Redis LFU Mode
Redis 4.0 introduced LFU as an optional eviction policy. It uses a Morris counter (logarithmic counter) instead of a linear one, which uses only a few bits to track frequencies spanning orders of magnitude.
Redis LFU Counter:
- 16 bits total (in the existing 24-bit LRU field)
- 8-bit logarithmic counter (max frequency ~200)
- 8-bit log decrement (aging factor / 2)
The counter increases logarithmically:
p = 1 / (counter * logarithm_factor + 1)
if rand(0,1) < p: counter += 1
At high counters, increment probability decreases exponentially.
This means a frequently accessed item's counter grows quickly at first, then slows. Old items decay based on the log decrement field, enabling workload adaptation.
S3-FIFO: Simple Yet Effective
In 2023, researchers from Carnegie Mellon and Google published S3-FIFO (Scan-Resistant Segmented FIFO), a remarkably simple eviction algorithm that achieves hit rates comparable to TinyLFU with far less complexity.
S3-FIFO uses three FIFO queues instead of LRU lists:
| Queue | Size | Policy | Behavior |
|---|---|---|---|
| Small | 10% of cache | FIFO | New items enter here. Accessed once? Stay. Evicted? Go to Main |
| Main | 90% of cache | FIFO | Items accessed in small queue or re-accessed in main queue get a second chance |
| Ghost | Size of small queue (metadata only) | FIFO | Tracks recently evicted items from small queue for admission decisions |
The key insight: S3-FIFO achieves scan resistance by separating new, unproven items into a small FIFO queue. Items that survive this queue (by being re-accessed) are promoted. Items that never get re-accessed are quickly evicted.
S3-FIFO admission flow:
1. New item K inserted into Small FIFO queue
2. If K is accessed while in Small queue:
- Move K to Main FIFO queue
- Mark K as "accessed" (one bit)
3. If K is evicted from Small queue without re-access:
- Insert K's key into Ghost queue
- Ghost tracks keys of items evicted from Small
4. When cache is full and new item arrives:
- If item is found in Ghost: admit it (was popular)
- If not in Ghost: reject it (likely a scan)
5. Items in Main queue get a second chance on first eviction attempt
(re-insert at head of Main queue if accessed bit is set)
Compared to Window-TinyLFU, S3-FIFO uses no frequency sketch, no probabilistic counters, and no hashing. Just three FIFO queues and one "accessed" bit per entry. Despite this simplicity, it achieves hit rates within 1-3% of Caffeine on standard benchmarks.
| Algorithm | Hit Rate on Wiki workload | Memory per entry | Lines of code |
|---|---|---|---|
| LRU | 52% | 2 pointers + 1 bit | ~50 |
| LFU | 63% | 1 counter + heap entry | ~100 |
| ARC | 72% | 2 pointers + ghost metadata | ~300 |
| TinyLFU | 76% | 2 pointers + shared sketch | ~500 |
| S3-FIFO | 75% | 2 pointers + 1 access bit | ~100 |
For most web application caches, LRU is "good enough." Upgrade to Window-TinyLFU (Caffeine) only if you measure a meaningful hit rate improvement. The complexity of tuning LFU parameters often outweighs benefits for moderate cache sizes (less than 10GB). For large caches (100GB+) with scanning workloads, the investment pays off.
Choosing the Right Eviction Policy
| Workload Profile | Recommended Policy | Why |
|---|---|---|
| User session data | LRU | Strong recency bias, no scanning |
| Database query cache | LRU or TinyLFU | Recency-dominant, occasional scans |
| CDN / HTTP cache | LRU with size awareness | Variable object sizes |
| Analytics cache | LFU or TinyLFU | Stable item popularity |
| Mixed workload | TinyLFU (Caffeine) | Adapts to both recency and frequency |
| Write-heavy cache | LRU with write cost awareness | LRU is simplest; writes don't affect hit rate |
| Read-heavy with scans | TinyLFU | Excellent scan resistance |
Practical Tuning Guidelines
| Parameter | Recommendation | Effect |
|---|---|---|
| Cache size | As large as your working set | Most important factor |
| Window size (TinyLFU) | Default (1%) | Increase for bursty workloads |
| Protected ratio | 20% | Increase for stable popularity |
| Sample count (Redis LRU) | 5-10 | Higher = more accurate, more CPU |
| Reset period (TinyLFU) | Adaptive (default) | Don't override unless benchmarked |
One of the most common caching mistakes is setting the cache too small for the working set. If your cache hit rate is below 70%, you may need more memory. No eviction algorithm can save a cache that's 10x too small for the workload. Measure your working set size first, then choose the eviction policy.
Key Takeaways
- LRU is simple and effective for recency-biased workloads but suffers from scanning pollution where one-time large scans evict hot data
- LRU-2 and LRU-K improve scan resistance by requiring multiple accesses before considering an item "recent"
- LFU tracks access frequency but suffers from frequency inertia—old hot items resist eviction long after they've cooled
- Cumulative frequency accumulates linearly with cache size, making exact LFU expensive for large caches
- TinyLFU uses a Count-Min Sketch for approximate frequency estimation with minimal memory (99%+ accuracy at ~64KB for millions of items)
- Window-TinyLFU's admission policy (rejecting low-frequency new items) is the key innovation—it prevents cache pollution at the entry point
- Caffeine's three-segment architecture (Window, Probation, Protected) handles both bursts and stable popularity
- The adaptive reset mechanism halves all counters periodically, solving LFU's frequency inertia problem
- For most workloads, TinyLFU achieves hit rates within 1-2% of theoretical optimal (Belady's algorithm)
- Measure your working set size before tuning eviction—the right policy can't fix a cache that's too small
In the next tutorial, we will explore distributed ID generation with Twitter's Snowflake algorithm and understand how time-ordered unique IDs are generated across thousands of nodes without coordination.