Snowflake ID and Distributed ID Generation: Time-Ordered Unique IDs at Scale

Understand Twitter's Snowflake algorithm for time-ordered unique IDs across nodes. Compare UUID v4, UUID v7, and FLAKE. Learn about clock skew, worker ID assignment, and sequence handling.

Snowflakedistributed-IDUUIDtime-orderedclock-skewworker-ID

Snowflake ID and Distributed ID Generation

Two weeks before launch, our database started throwing primary key constraint violations. The root cause was elegant in its stupidity: we used auto-increment IDs, and our write path had scaled to two application servers. Both servers grabbed the same "next ID" from the database sequence at nearly the same time. The fix involved switching to a central ID server, which promptly became the bottleneck during peak load.

Distributed ID generation looks trivial until you actually need it at scale. Every node must generate unique IDs without talking to any other node. IDs should ideally be time-ordered for efficient B-tree index insertion. And the whole system must work at millions of IDs per second with no single point of failure.

Why Distributed ID Generation Is Hard

The requirements seem contradictory:

Requirement	Why It's Hard
Globally unique	No two nodes, even under network partition, should generate the same ID
Time-ordered	IDs should sort by creation time for efficient database indexing
High throughput	Must handle millions of IDs per second across all nodes
No coordination	Nodes should generate IDs without talking to each other (no single point of failure)
Compact	IDs should fit in 64 bits for efficient storage and indexing
Human-readable	Ideally, IDs give some information about when they were created

Auto-increment fails requirement 3 (centralized bottleneck). Random IDs fail requirement 2 (no ordering). UUIDs fail requirement 5 (128 bits causes index bloat). This is where Snowflake comes in.

Twitter Snowflake

In 2010, Twitter released Snowflake—a system for generating unique, time-ordered, 64-bit IDs at scale. It's arguably the most influential distributed ID algorithm, inspiring dozens of variants across the industry.

The 64-Bit Layout

Field	Size	Description	Maximum Value
Sign	1 bit	Always 0 (positive integer)	N/A
Timestamp	41 bits	Milliseconds since custom epoch	2^41 - 1 = ~69 years
Worker ID	10 bits	Datacenter (5) + Machine (5)	1024 combinations
Sequence	12 bits	Auto-increment per ms per worker	4096 per ms

Timestamp: The Custom Epoch

Snowflake uses a custom epoch (not Unix epoch) to maximize the available timestamp range:

Epoch	Start Date	Max Date (69 years from start)
Unix epoch	1970-01-01	2039-?? (already too close)
Snowflake epoch	2010-11-04 (Twitter chose)	~2079
Discord epoch	2015-01-01	~2084
Instagram epoch	2011-12-01	~2080

By choosing a custom epoch closer to the service launch date, you extend the usable lifetime of the ID space. This is why every Snowflake variant tends to choose its own epoch.

Worker ID: 1024 Unique Identifiers

The 10-bit worker ID is typically split:

Bits	Field	Purpose	Values
5 bits	Datacenter ID	Identifies the physical data center	0-31
5 bits	Machine ID	Identifies the machine within the datacenter	0-31

Total: 32 datacenters x 32 machines = 1024 unique workers.

If your system has fewer machines, you can use the unused worker ID space for other purposes. Instagram, for example, repurposed bits for shard ID + logical timestamp.

Sequence: Handling Concurrent Requests Within a Millisecond

The 12-bit sequence counter resets to 0 at each new millisecond and increments for each ID generated within the same millisecond:

Maximum throughput per worker: 4096 IDs/ms = ~4 million IDs/second per worker.

Sequence overflow: If the sequence reaches 4096 within a single millisecond (extremely unlikely for a single worker), the generator waits for the next millisecond.

Clock Skew Handling

Clock skew is the most dangerous operational issue with Snowflake. If a machine's clock jumps backward (due to NTP adjustment or manual intervention), the ID generator might produce non-monotonic IDs or duplicate IDs.

Strategies for Dealing with Clock Skew

Strategy	Description	Pros	Cons
Fail fast	Throw exception if clock goes backward	Simple, safe	Service disruption
Wait	Sleep until clock catches up	No disruption for small skew	Impractical for large skew
Use previous timestamp	If clock goes backward, reuse last known timestamp	No disruption	Reduces sequence space
Push to reserved bits	Set a flag bit indicating clock was adjusted	Transparency	Uses ID space
NTP hardening	Use PTP or GPS clocks, disable clock jumps	Eliminates the problem	Infrastructure complexity

Twitter's Approach

Twitter's original implementation used the "wait" strategy:

text

generate():
  last_timestamp = read last generated timestamp
  current_timestamp = current time in milliseconds
  
  if current_timestamp < last_timestamp:
    wait(last_timestamp - current_timestamp)
  
  if current_timestamp == last_timestamp:
    sequence = (sequence + 1) % 4096
    if sequence == 0:
      wait until next millisecond
  else:
    sequence = 0
    
  last_timestamp = current_timestamp
  return (current_timestamp << 22) | (worker_id << 12) | sequence

Production Recommendations

Scenario	Recommendation
Small clock drift (less than 1 second)	Wait and log a warning
Large clock drift (more than 1 second)	Fail fast, alert operator, reject requests
NTP configured with `-g` flag	NEVER use this flag on ID-generating nodes (it allows large jumps)
Cloud environments	Use instance metadata service or NTP with slewing (gradual correction)

⚠️

Never use NTP's "step" mode (ntpdate or ntpd -g) on machines running Snowflake ID generators. A clock jump of even a few seconds can cause duplicate IDs that persist indefinitely. Always use "slew" mode, which adjusts the clock gradually over time.

UUID Comparison

UUIDs (Universally Unique Identifiers) are the most common alternative to Snowflake-style IDs. The choice between them depends on your ordering, storage, and indexing requirements.

UUID v4: Random

UUID v4 generates 122 random bits (6 bits are fixed for version/variant):

Property	UUID v4	Snowflake
Size	128 bits (16 bytes)	64 bits (8 bytes)
Ordering	Random (no temporal order)	Time-ordered
B-tree index fragmentation	Severe (random inserts)	Minimal (sequential inserts)
Collision probability	2^122 (astronomically low)	2^64 (space divided per worker)
Clock dependency	None	Requires synchronized clocks

The index fragmentation problem with UUID v4:

When you insert random UUIDs into a B-tree index, each insert goes to a random page, causing:

Excessive page splits (up to 10x more than sequential inserts)
Index bloat (B-tree fill factor drops from ~67% to ~50%)
Cache miss rate increases (the working set of index pages grows)

For a table with 100 million rows indexed by UUID v4, you might see database sizes 2-3x larger than with a sequential ID, with insert throughput reduced by 50% or more.

UUID v7: Time-Ordered

UUID v7 (RFC 9562, published in 2024) addresses the ordering problem by embedding a Unix timestamp in the first 48 bits:

Property	UUID v7	Snowflake
Size	128 bits	64 bits
Timestamp	Unix epoch (ms), 48 bits	Custom epoch (ms), 41 bits
Ordering	Monotonic per millisecond	Monotonic per millisecond
Uniqueness	Random remainder (74 bits)	Worker ID + sequence (22 bits)
No coordination needed	Yes (random is truly random)	Yes (worker ID must be assigned)

MySQL (8.0+) and PostgreSQL (with uuid-ossp extension) now support UUID v7 generation. It's becoming the default choice for new applications that want time-ordered IDs without the operational complexity of Snowflake worker ID assignment.

Other Approaches

FLAKE / Flake IDs

FLAKE is a family of Snowflake-like algorithms that use slightly different bit allocations:

Algorithm	Timestamp	Worker	Sequence	Total Bits	IDs/sec per worker
Snowflake	41 ms	10	12	64	4,096,000
Sonyflake	39 (10ms)	16	8	64	256,000
Instagram Flake	41 ms	13 (shard)	10	64	1,024,000
Discord Snowflake	42 ms	10	12	64	4,096,000
Boundary Flake	64 ms	48	16	128	65,536,000

Sonyflake

Sonyflake (by Sony) uses a 10-millisecond timestamp resolution instead of 1 millisecond. This reduces clock resolution requirements but also reduces throughput:

text

Sonyflake 64-bit layout:
- 1 bit: unused
- 39 bits: timestamp (10ms units, ~174 years from custom epoch)
- 8 bits: sequence number (256 per 10ms per worker)
- 16 bits: machine ID (65536 workers)

Max throughput: 256 IDs per 10ms = 25,600 IDs/second per worker

The 16-bit machine ID means Sonyflake supports up to 65,536 workers without coordination, at the cost of lower per-worker throughput. Good for systems with many small nodes.

KSUID

KSUID (K-Sortable Unique IDentifier) takes a different approach: 27 bytes total, with 32 bits of seconds + 128 bits of random payload:

text

KSUID layout (27 bytes, 216 bits):
- 4 bytes: timestamp (seconds since epoch)
- 2 bytes: padding/version
- 20 bytes: random payload (160 bits)

KSUIDs are designed to be sortable by creation time (second-level granularity) while having a massive random space. They're larger than Snowflake IDs but require zero coordination between nodes. No worker ID assignment needed.

ULID

ULID (Universally Unique Lexicographically Sortable Identifier) is another 128-bit format:

text

ULID layout (26 characters in Crockford Base32):
- 10 characters: timestamp (milliseconds, 48 bits)
- 16 characters: random (80 bits)

Total: 128 bits, ~1.21e24 unique IDs per millisecond

ULIDs are encoded as 26-character strings (lowercase, no special chars) and sort correctly as strings. They're a drop-in replacement for UUIDs with ordering guarantees.

Comparison: Snowflake vs UUID v4 vs UUID v7 vs ULID

Property	Snowflake	UUID v4	UUID v7	ULID
Size	64 bits (8 bytes)	128 bits (16 bytes)	128 bits (16 bytes)	128 bits (16 bytes)
Ordering	Millisecond-order	Random	Millisecond-order	Millisecond-order
Coordinated?	Worker ID needed	No	No	No
Clock dependent?	Yes (must be monotonic)	No	Yes (tolerant of skew)	Yes
ID generation speed	Very fast (~50ns)	Fast (~100ns)	Fast (~100ns)	Fast (~150ns)
B-tree insert cost	Low (sequential)	High (random)	Low (sequential)	Low (sequential)
Collision risk	Zero within worker	2^122 (negligible)	2^74 (extremely low)	2^80 (extremely low)
Human-readable	Decimal (19 digits)	Hex (32 chars)	Hex (32 chars)	Base32 (26 chars)
Standard	De facto (Twitter)	RFC 4122	RFC 9562	Draft (no RFC)

Operational Concerns

NTP Dependency

Every Snowflake variant depends on accurate system clocks. Here's what can go wrong:

Scenario	Effect on IDs	Detection
Clock jumps forward	Gap in timestamps, IDs are monotonic but have gaps	Track timestamp delta vs wall clock
Clock jumps backward	Potential duplicate IDs if timestamp overlaps	Track last_timestamp, compare
Clock freezes (stalled)	All IDs get same timestamp, sequence space may overflow	Watchdog timer for clock progress
Clock is set far in the future	Exhausts ID space early	Validate against NTP time on startup

Worker ID Assignment

For Snowflake-style IDs, each worker needs a unique ID. Common approaches:

Approach	Mechanism	Complexity	Reliability
Static configuration	File-based config per machine	Very low	High (manual)
ZooKeeper / etcd	Sequential ephemeral nodes	Medium	High
Database sequence	SELECT nextval from worker_id_seq	Low	Medium (DB dependency)
Hostname hash	Hash of hostname	Very low	Low (collision risk)
Cloud instance metadata	AWS instance ID, GCP instance name	Low	Medium (cloud-specific)

Recommendation: For most deployments, use static configuration with a configuration management system (Ansible, Chef, Kubernetes ConfigMap). ZooKeeper-based assignment is overkill unless you frequently spin up and tear down nodes.

Sequence Overflow Handling

When the sequence counter overflows within a single millisecond:

In practice, reaching 4096 IDs per worker per millisecond requires an average of 4 million requests per second on a single machine. Most systems will never hit this limit. If you do, consider:

Reducing timestamp granularity (use 10ms ticks like Sonyflake, gives more IDs per tick)
Adding more worker ID bits to the sequence space
Batching ID generation requests

Key Takeaways

Snowflake's 64-bit ID layout (1 sign + 41 timestamp + 10 worker + 12 sequence) provides compact, time-ordered, unique IDs with no coordination between workers
The custom epoch extends usable ID space by ~69 years from the chosen start date
Clock skew is the #1 operational risk—never allow NTP to jump clocks backward on ID-generating nodes; use slew mode only
Sequence overflow handling (wait for next millisecond) is robust for all but the most extreme throughput scenarios
Worker ID assignment is the main operational burden; static configuration via config management is sufficient for most deployments
UUID v4 causes severe B-tree index fragmentation (up to 3x storage overhead, 50% reduced insert throughput) compared to time-ordered IDs
UUID v7 (RFC 9562) solves the ordering problem with a 48-bit Unix timestamp prefix, making it a strong alternative without worker ID management
ULID and KSUID offer zero-coordination time-ordered IDs at the cost of larger size (128+ bits)
Choose UUID v7 for simplicity (no worker ID, no clock monotonicity enforcement) and Snowflake for compactness (64-bit, better index efficiency)
NTP hardening with slew mode is mandatory for any production Snowflake deployment

In the next tutorial, we will explore quorum-based reads and writes and the N, R, W model for distributed consistency in systems like Dynamo and Cassandra.

LRU, LFU, and TinyLFU: Cache Eviction Algorithms Explained Deeply

Quorum-Based Reads and Writes: The N, R, W Model for Distributed Consistency