Snowflake ID and Distributed ID Generation: Time-Ordered Unique IDs at Scale
Understand Twitter's Snowflake algorithm for time-ordered unique IDs across nodes. Compare UUID v4, UUID v7, and FLAKE. Learn about clock skew, worker ID assignment, and sequence handling.
Snowflake ID and Distributed ID Generation
Two weeks before launch, our database started throwing primary key constraint violations. The root cause was elegant in its stupidity: we used auto-increment IDs, and our write path had scaled to two application servers. Both servers grabbed the same "next ID" from the database sequence at nearly the same time. The fix involved switching to a central ID server, which promptly became the bottleneck during peak load.
Distributed ID generation looks trivial until you actually need it at scale. Every node must generate unique IDs without talking to any other node. IDs should ideally be time-ordered for efficient B-tree index insertion. And the whole system must work at millions of IDs per second with no single point of failure.
Why Distributed ID Generation Is Hard
The requirements seem contradictory:
| Requirement | Why It's Hard |
|---|---|
| Globally unique | No two nodes, even under network partition, should generate the same ID |
| Time-ordered | IDs should sort by creation time for efficient database indexing |
| High throughput | Must handle millions of IDs per second across all nodes |
| No coordination | Nodes should generate IDs without talking to each other (no single point of failure) |
| Compact | IDs should fit in 64 bits for efficient storage and indexing |
| Human-readable | Ideally, IDs give some information about when they were created |
Auto-increment fails requirement 3 (centralized bottleneck). Random IDs fail requirement 2 (no ordering). UUIDs fail requirement 5 (128 bits causes index bloat). This is where Snowflake comes in.
Twitter Snowflake
In 2010, Twitter released Snowflake—a system for generating unique, time-ordered, 64-bit IDs at scale. It's arguably the most influential distributed ID algorithm, inspiring dozens of variants across the industry.
The 64-Bit Layout
| Field | Size | Description | Maximum Value |
|---|---|---|---|
| Sign | 1 bit | Always 0 (positive integer) | N/A |
| Timestamp | 41 bits | Milliseconds since custom epoch | 2^41 - 1 = ~69 years |
| Worker ID | 10 bits | Datacenter (5) + Machine (5) | 1024 combinations |
| Sequence | 12 bits | Auto-increment per ms per worker | 4096 per ms |
Timestamp: The Custom Epoch
Snowflake uses a custom epoch (not Unix epoch) to maximize the available timestamp range:
| Epoch | Start Date | Max Date (69 years from start) |
|---|---|---|
| Unix epoch | 1970-01-01 | 2039-?? (already too close) |
| Snowflake epoch | 2010-11-04 (Twitter chose) | ~2079 |
| Discord epoch | 2015-01-01 | ~2084 |
| Instagram epoch | 2011-12-01 | ~2080 |
By choosing a custom epoch closer to the service launch date, you extend the usable lifetime of the ID space. This is why every Snowflake variant tends to choose its own epoch.
Worker ID: 1024 Unique Identifiers
The 10-bit worker ID is typically split:
| Bits | Field | Purpose | Values |
|---|---|---|---|
| 5 bits | Datacenter ID | Identifies the physical data center | 0-31 |
| 5 bits | Machine ID | Identifies the machine within the datacenter | 0-31 |
Total: 32 datacenters x 32 machines = 1024 unique workers.
If your system has fewer machines, you can use the unused worker ID space for other purposes. Instagram, for example, repurposed bits for shard ID + logical timestamp.
Sequence: Handling Concurrent Requests Within a Millisecond
The 12-bit sequence counter resets to 0 at each new millisecond and increments for each ID generated within the same millisecond:
Maximum throughput per worker: 4096 IDs/ms = ~4 million IDs/second per worker.
Sequence overflow: If the sequence reaches 4096 within a single millisecond (extremely unlikely for a single worker), the generator waits for the next millisecond.
Clock Skew Handling
Clock skew is the most dangerous operational issue with Snowflake. If a machine's clock jumps backward (due to NTP adjustment or manual intervention), the ID generator might produce non-monotonic IDs or duplicate IDs.
Strategies for Dealing with Clock Skew
| Strategy | Description | Pros | Cons |
|---|---|---|---|
| Fail fast | Throw exception if clock goes backward | Simple, safe | Service disruption |
| Wait | Sleep until clock catches up | No disruption for small skew | Impractical for large skew |
| Use previous timestamp | If clock goes backward, reuse last known timestamp | No disruption | Reduces sequence space |
| Push to reserved bits | Set a flag bit indicating clock was adjusted | Transparency | Uses ID space |
| NTP hardening | Use PTP or GPS clocks, disable clock jumps | Eliminates the problem | Infrastructure complexity |
Twitter's Approach
Twitter's original implementation used the "wait" strategy:
generate():
last_timestamp = read last generated timestamp
current_timestamp = current time in milliseconds
if current_timestamp < last_timestamp:
wait(last_timestamp - current_timestamp)
if current_timestamp == last_timestamp:
sequence = (sequence + 1) % 4096
if sequence == 0:
wait until next millisecond
else:
sequence = 0
last_timestamp = current_timestamp
return (current_timestamp << 22) | (worker_id << 12) | sequence
Production Recommendations
| Scenario | Recommendation |
|---|---|
| Small clock drift (less than 1 second) | Wait and log a warning |
| Large clock drift (more than 1 second) | Fail fast, alert operator, reject requests |
NTP configured with -g flag | NEVER use this flag on ID-generating nodes (it allows large jumps) |
| Cloud environments | Use instance metadata service or NTP with slewing (gradual correction) |
Never use NTP's "step" mode (ntpdate or ntpd -g) on machines running Snowflake ID generators. A clock jump of even a few seconds can cause duplicate IDs that persist indefinitely. Always use "slew" mode, which adjusts the clock gradually over time.
UUID Comparison
UUIDs (Universally Unique Identifiers) are the most common alternative to Snowflake-style IDs. The choice between them depends on your ordering, storage, and indexing requirements.
UUID v4: Random
UUID v4 generates 122 random bits (6 bits are fixed for version/variant):
| Property | UUID v4 | Snowflake |
|---|---|---|
| Size | 128 bits (16 bytes) | 64 bits (8 bytes) |
| Ordering | Random (no temporal order) | Time-ordered |
| B-tree index fragmentation | Severe (random inserts) | Minimal (sequential inserts) |
| Collision probability | 2^122 (astronomically low) | 2^64 (space divided per worker) |
| Clock dependency | None | Requires synchronized clocks |
The index fragmentation problem with UUID v4:
When you insert random UUIDs into a B-tree index, each insert goes to a random page, causing:
- Excessive page splits (up to 10x more than sequential inserts)
- Index bloat (B-tree fill factor drops from ~67% to ~50%)
- Cache miss rate increases (the working set of index pages grows)
For a table with 100 million rows indexed by UUID v4, you might see database sizes 2-3x larger than with a sequential ID, with insert throughput reduced by 50% or more.
UUID v7: Time-Ordered
UUID v7 (RFC 9562, published in 2024) addresses the ordering problem by embedding a Unix timestamp in the first 48 bits:
| Property | UUID v7 | Snowflake |
|---|---|---|
| Size | 128 bits | 64 bits |
| Timestamp | Unix epoch (ms), 48 bits | Custom epoch (ms), 41 bits |
| Ordering | Monotonic per millisecond | Monotonic per millisecond |
| Uniqueness | Random remainder (74 bits) | Worker ID + sequence (22 bits) |
| No coordination needed | Yes (random is truly random) | Yes (worker ID must be assigned) |
MySQL (8.0+) and PostgreSQL (with uuid-ossp extension) now support UUID v7 generation. It's becoming the default choice for new applications that want time-ordered IDs without the operational complexity of Snowflake worker ID assignment.
Other Approaches
FLAKE / Flake IDs
FLAKE is a family of Snowflake-like algorithms that use slightly different bit allocations:
| Algorithm | Timestamp | Worker | Sequence | Total Bits | IDs/sec per worker |
|---|---|---|---|---|---|
| Snowflake | 41 ms | 10 | 12 | 64 | 4,096,000 |
| Sonyflake | 39 (10ms) | 16 | 8 | 64 | 256,000 |
| Instagram Flake | 41 ms | 13 (shard) | 10 | 64 | 1,024,000 |
| Discord Snowflake | 42 ms | 10 | 12 | 64 | 4,096,000 |
| Boundary Flake | 64 ms | 48 | 16 | 128 | 65,536,000 |
Sonyflake
Sonyflake (by Sony) uses a 10-millisecond timestamp resolution instead of 1 millisecond. This reduces clock resolution requirements but also reduces throughput:
Sonyflake 64-bit layout:
- 1 bit: unused
- 39 bits: timestamp (10ms units, ~174 years from custom epoch)
- 8 bits: sequence number (256 per 10ms per worker)
- 16 bits: machine ID (65536 workers)
Max throughput: 256 IDs per 10ms = 25,600 IDs/second per worker
The 16-bit machine ID means Sonyflake supports up to 65,536 workers without coordination, at the cost of lower per-worker throughput. Good for systems with many small nodes.
KSUID
KSUID (K-Sortable Unique IDentifier) takes a different approach: 27 bytes total, with 32 bits of seconds + 128 bits of random payload:
KSUID layout (27 bytes, 216 bits):
- 4 bytes: timestamp (seconds since epoch)
- 2 bytes: padding/version
- 20 bytes: random payload (160 bits)
KSUIDs are designed to be sortable by creation time (second-level granularity) while having a massive random space. They're larger than Snowflake IDs but require zero coordination between nodes. No worker ID assignment needed.
ULID
ULID (Universally Unique Lexicographically Sortable Identifier) is another 128-bit format:
ULID layout (26 characters in Crockford Base32):
- 10 characters: timestamp (milliseconds, 48 bits)
- 16 characters: random (80 bits)
Total: 128 bits, ~1.21e24 unique IDs per millisecond
ULIDs are encoded as 26-character strings (lowercase, no special chars) and sort correctly as strings. They're a drop-in replacement for UUIDs with ordering guarantees.
Comparison: Snowflake vs UUID v4 vs UUID v7 vs ULID
| Property | Snowflake | UUID v4 | UUID v7 | ULID |
|---|---|---|---|---|
| Size | 64 bits (8 bytes) | 128 bits (16 bytes) | 128 bits (16 bytes) | 128 bits (16 bytes) |
| Ordering | Millisecond-order | Random | Millisecond-order | Millisecond-order |
| Coordinated? | Worker ID needed | No | No | No |
| Clock dependent? | Yes (must be monotonic) | No | Yes (tolerant of skew) | Yes |
| ID generation speed | Very fast (~50ns) | Fast (~100ns) | Fast (~100ns) | Fast (~150ns) |
| B-tree insert cost | Low (sequential) | High (random) | Low (sequential) | Low (sequential) |
| Collision risk | Zero within worker | 2^122 (negligible) | 2^74 (extremely low) | 2^80 (extremely low) |
| Human-readable | Decimal (19 digits) | Hex (32 chars) | Hex (32 chars) | Base32 (26 chars) |
| Standard | De facto (Twitter) | RFC 4122 | RFC 9562 | Draft (no RFC) |
Operational Concerns
NTP Dependency
Every Snowflake variant depends on accurate system clocks. Here's what can go wrong:
| Scenario | Effect on IDs | Detection |
|---|---|---|
| Clock jumps forward | Gap in timestamps, IDs are monotonic but have gaps | Track timestamp delta vs wall clock |
| Clock jumps backward | Potential duplicate IDs if timestamp overlaps | Track last_timestamp, compare |
| Clock freezes (stalled) | All IDs get same timestamp, sequence space may overflow | Watchdog timer for clock progress |
| Clock is set far in the future | Exhausts ID space early | Validate against NTP time on startup |
Worker ID Assignment
For Snowflake-style IDs, each worker needs a unique ID. Common approaches:
| Approach | Mechanism | Complexity | Reliability |
|---|---|---|---|
| Static configuration | File-based config per machine | Very low | High (manual) |
| ZooKeeper / etcd | Sequential ephemeral nodes | Medium | High |
| Database sequence | SELECT nextval from worker_id_seq | Low | Medium (DB dependency) |
| Hostname hash | Hash of hostname | Very low | Low (collision risk) |
| Cloud instance metadata | AWS instance ID, GCP instance name | Low | Medium (cloud-specific) |
Recommendation: For most deployments, use static configuration with a configuration management system (Ansible, Chef, Kubernetes ConfigMap). ZooKeeper-based assignment is overkill unless you frequently spin up and tear down nodes.
Sequence Overflow Handling
When the sequence counter overflows within a single millisecond:
In practice, reaching 4096 IDs per worker per millisecond requires an average of 4 million requests per second on a single machine. Most systems will never hit this limit. If you do, consider:
- Reducing timestamp granularity (use 10ms ticks like Sonyflake, gives more IDs per tick)
- Adding more worker ID bits to the sequence space
- Batching ID generation requests
Key Takeaways
- Snowflake's 64-bit ID layout (1 sign + 41 timestamp + 10 worker + 12 sequence) provides compact, time-ordered, unique IDs with no coordination between workers
- The custom epoch extends usable ID space by ~69 years from the chosen start date
- Clock skew is the #1 operational risk—never allow NTP to jump clocks backward on ID-generating nodes; use slew mode only
- Sequence overflow handling (wait for next millisecond) is robust for all but the most extreme throughput scenarios
- Worker ID assignment is the main operational burden; static configuration via config management is sufficient for most deployments
- UUID v4 causes severe B-tree index fragmentation (up to 3x storage overhead, 50% reduced insert throughput) compared to time-ordered IDs
- UUID v7 (RFC 9562) solves the ordering problem with a 48-bit Unix timestamp prefix, making it a strong alternative without worker ID management
- ULID and KSUID offer zero-coordination time-ordered IDs at the cost of larger size (128+ bits)
- Choose UUID v7 for simplicity (no worker ID, no clock monotonicity enforcement) and Snowflake for compactness (64-bit, better index efficiency)
- NTP hardening with slew mode is mandatory for any production Snowflake deployment
In the next tutorial, we will explore quorum-based reads and writes and the N, R, W model for distributed consistency in systems like Dynamo and Cassandra.