Core Algorithms

Snowflake ID and Distributed ID Generation: Time-Ordered Unique IDs at Scale

Understand Twitter's Snowflake algorithm for time-ordered unique IDs across nodes. Compare UUID v4, UUID v7, and FLAKE. Learn about clock skew, worker ID assignment, and sequence handling.

Snowflakedistributed-IDUUIDtime-orderedclock-skewworker-ID

Snowflake ID and Distributed ID Generation

Two weeks before launch, our database started throwing primary key constraint violations. The root cause was elegant in its stupidity: we used auto-increment IDs, and our write path had scaled to two application servers. Both servers grabbed the same "next ID" from the database sequence at nearly the same time. The fix involved switching to a central ID server, which promptly became the bottleneck during peak load.

Distributed ID generation looks trivial until you actually need it at scale. Every node must generate unique IDs without talking to any other node. IDs should ideally be time-ordered for efficient B-tree index insertion. And the whole system must work at millions of IDs per second with no single point of failure.

Why Distributed ID Generation Is Hard

The requirements seem contradictory:

RequirementWhy It's Hard
Globally uniqueNo two nodes, even under network partition, should generate the same ID
Time-orderedIDs should sort by creation time for efficient database indexing
High throughputMust handle millions of IDs per second across all nodes
No coordinationNodes should generate IDs without talking to each other (no single point of failure)
CompactIDs should fit in 64 bits for efficient storage and indexing
Human-readableIdeally, IDs give some information about when they were created

Auto-increment fails requirement 3 (centralized bottleneck). Random IDs fail requirement 2 (no ordering). UUIDs fail requirement 5 (128 bits causes index bloat). This is where Snowflake comes in.


Twitter Snowflake

In 2010, Twitter released Snowflake—a system for generating unique, time-ordered, 64-bit IDs at scale. It's arguably the most influential distributed ID algorithm, inspiring dozens of variants across the industry.

The 64-Bit Layout

FieldSizeDescriptionMaximum Value
Sign1 bitAlways 0 (positive integer)N/A
Timestamp41 bitsMilliseconds since custom epoch2^41 - 1 = ~69 years
Worker ID10 bitsDatacenter (5) + Machine (5)1024 combinations
Sequence12 bitsAuto-increment per ms per worker4096 per ms

Timestamp: The Custom Epoch

Snowflake uses a custom epoch (not Unix epoch) to maximize the available timestamp range:

EpochStart DateMax Date (69 years from start)
Unix epoch1970-01-012039-?? (already too close)
Snowflake epoch2010-11-04 (Twitter chose)~2079
Discord epoch2015-01-01~2084
Instagram epoch2011-12-01~2080

By choosing a custom epoch closer to the service launch date, you extend the usable lifetime of the ID space. This is why every Snowflake variant tends to choose its own epoch.

Worker ID: 1024 Unique Identifiers

The 10-bit worker ID is typically split:

BitsFieldPurposeValues
5 bitsDatacenter IDIdentifies the physical data center0-31
5 bitsMachine IDIdentifies the machine within the datacenter0-31

Total: 32 datacenters x 32 machines = 1024 unique workers.

If your system has fewer machines, you can use the unused worker ID space for other purposes. Instagram, for example, repurposed bits for shard ID + logical timestamp.

Sequence: Handling Concurrent Requests Within a Millisecond

The 12-bit sequence counter resets to 0 at each new millisecond and increments for each ID generated within the same millisecond:

Maximum throughput per worker: 4096 IDs/ms = ~4 million IDs/second per worker.

Sequence overflow: If the sequence reaches 4096 within a single millisecond (extremely unlikely for a single worker), the generator waits for the next millisecond.


Clock Skew Handling

Clock skew is the most dangerous operational issue with Snowflake. If a machine's clock jumps backward (due to NTP adjustment or manual intervention), the ID generator might produce non-monotonic IDs or duplicate IDs.

Strategies for Dealing with Clock Skew

StrategyDescriptionProsCons
Fail fastThrow exception if clock goes backwardSimple, safeService disruption
WaitSleep until clock catches upNo disruption for small skewImpractical for large skew
Use previous timestampIf clock goes backward, reuse last known timestampNo disruptionReduces sequence space
Push to reserved bitsSet a flag bit indicating clock was adjustedTransparencyUses ID space
NTP hardeningUse PTP or GPS clocks, disable clock jumpsEliminates the problemInfrastructure complexity

Twitter's Approach

Twitter's original implementation used the "wait" strategy:

text
generate():
  last_timestamp = read last generated timestamp
  current_timestamp = current time in milliseconds
  
  if current_timestamp < last_timestamp:
    wait(last_timestamp - current_timestamp)
  
  if current_timestamp == last_timestamp:
    sequence = (sequence + 1) % 4096
    if sequence == 0:
      wait until next millisecond
  else:
    sequence = 0
    
  last_timestamp = current_timestamp
  return (current_timestamp << 22) | (worker_id << 12) | sequence

Production Recommendations

ScenarioRecommendation
Small clock drift (less than 1 second)Wait and log a warning
Large clock drift (more than 1 second)Fail fast, alert operator, reject requests
NTP configured with -g flagNEVER use this flag on ID-generating nodes (it allows large jumps)
Cloud environmentsUse instance metadata service or NTP with slewing (gradual correction)
⚠️

Never use NTP's "step" mode (ntpdate or ntpd -g) on machines running Snowflake ID generators. A clock jump of even a few seconds can cause duplicate IDs that persist indefinitely. Always use "slew" mode, which adjusts the clock gradually over time.


UUID Comparison

UUIDs (Universally Unique Identifiers) are the most common alternative to Snowflake-style IDs. The choice between them depends on your ordering, storage, and indexing requirements.

UUID v4: Random

UUID v4 generates 122 random bits (6 bits are fixed for version/variant):

PropertyUUID v4Snowflake
Size128 bits (16 bytes)64 bits (8 bytes)
OrderingRandom (no temporal order)Time-ordered
B-tree index fragmentationSevere (random inserts)Minimal (sequential inserts)
Collision probability2^122 (astronomically low)2^64 (space divided per worker)
Clock dependencyNoneRequires synchronized clocks

The index fragmentation problem with UUID v4:

When you insert random UUIDs into a B-tree index, each insert goes to a random page, causing:

  • Excessive page splits (up to 10x more than sequential inserts)
  • Index bloat (B-tree fill factor drops from ~67% to ~50%)
  • Cache miss rate increases (the working set of index pages grows)

For a table with 100 million rows indexed by UUID v4, you might see database sizes 2-3x larger than with a sequential ID, with insert throughput reduced by 50% or more.

UUID v7: Time-Ordered

UUID v7 (RFC 9562, published in 2024) addresses the ordering problem by embedding a Unix timestamp in the first 48 bits:

PropertyUUID v7Snowflake
Size128 bits64 bits
TimestampUnix epoch (ms), 48 bitsCustom epoch (ms), 41 bits
OrderingMonotonic per millisecondMonotonic per millisecond
UniquenessRandom remainder (74 bits)Worker ID + sequence (22 bits)
No coordination neededYes (random is truly random)Yes (worker ID must be assigned)

MySQL (8.0+) and PostgreSQL (with uuid-ossp extension) now support UUID v7 generation. It's becoming the default choice for new applications that want time-ordered IDs without the operational complexity of Snowflake worker ID assignment.


Other Approaches

FLAKE / Flake IDs

FLAKE is a family of Snowflake-like algorithms that use slightly different bit allocations:

AlgorithmTimestampWorkerSequenceTotal BitsIDs/sec per worker
Snowflake41 ms1012644,096,000
Sonyflake39 (10ms)16864256,000
Instagram Flake41 ms13 (shard)10641,024,000
Discord Snowflake42 ms1012644,096,000
Boundary Flake64 ms481612865,536,000

Sonyflake

Sonyflake (by Sony) uses a 10-millisecond timestamp resolution instead of 1 millisecond. This reduces clock resolution requirements but also reduces throughput:

text
Sonyflake 64-bit layout:
- 1 bit: unused
- 39 bits: timestamp (10ms units, ~174 years from custom epoch)
- 8 bits: sequence number (256 per 10ms per worker)
- 16 bits: machine ID (65536 workers)

Max throughput: 256 IDs per 10ms = 25,600 IDs/second per worker

The 16-bit machine ID means Sonyflake supports up to 65,536 workers without coordination, at the cost of lower per-worker throughput. Good for systems with many small nodes.

KSUID

KSUID (K-Sortable Unique IDentifier) takes a different approach: 27 bytes total, with 32 bits of seconds + 128 bits of random payload:

text
KSUID layout (27 bytes, 216 bits):
- 4 bytes: timestamp (seconds since epoch)
- 2 bytes: padding/version
- 20 bytes: random payload (160 bits)

KSUIDs are designed to be sortable by creation time (second-level granularity) while having a massive random space. They're larger than Snowflake IDs but require zero coordination between nodes. No worker ID assignment needed.

ULID

ULID (Universally Unique Lexicographically Sortable Identifier) is another 128-bit format:

text
ULID layout (26 characters in Crockford Base32):
- 10 characters: timestamp (milliseconds, 48 bits)
- 16 characters: random (80 bits)

Total: 128 bits, ~1.21e24 unique IDs per millisecond

ULIDs are encoded as 26-character strings (lowercase, no special chars) and sort correctly as strings. They're a drop-in replacement for UUIDs with ordering guarantees.


Comparison: Snowflake vs UUID v4 vs UUID v7 vs ULID

PropertySnowflakeUUID v4UUID v7ULID
Size64 bits (8 bytes)128 bits (16 bytes)128 bits (16 bytes)128 bits (16 bytes)
OrderingMillisecond-orderRandomMillisecond-orderMillisecond-order
Coordinated?Worker ID neededNoNoNo
Clock dependent?Yes (must be monotonic)NoYes (tolerant of skew)Yes
ID generation speedVery fast (~50ns)Fast (~100ns)Fast (~100ns)Fast (~150ns)
B-tree insert costLow (sequential)High (random)Low (sequential)Low (sequential)
Collision riskZero within worker2^122 (negligible)2^74 (extremely low)2^80 (extremely low)
Human-readableDecimal (19 digits)Hex (32 chars)Hex (32 chars)Base32 (26 chars)
StandardDe facto (Twitter)RFC 4122RFC 9562Draft (no RFC)

Operational Concerns

NTP Dependency

Every Snowflake variant depends on accurate system clocks. Here's what can go wrong:

ScenarioEffect on IDsDetection
Clock jumps forwardGap in timestamps, IDs are monotonic but have gapsTrack timestamp delta vs wall clock
Clock jumps backwardPotential duplicate IDs if timestamp overlapsTrack last_timestamp, compare
Clock freezes (stalled)All IDs get same timestamp, sequence space may overflowWatchdog timer for clock progress
Clock is set far in the futureExhausts ID space earlyValidate against NTP time on startup

Worker ID Assignment

For Snowflake-style IDs, each worker needs a unique ID. Common approaches:

ApproachMechanismComplexityReliability
Static configurationFile-based config per machineVery lowHigh (manual)
ZooKeeper / etcdSequential ephemeral nodesMediumHigh
Database sequenceSELECT nextval from worker_id_seqLowMedium (DB dependency)
Hostname hashHash of hostnameVery lowLow (collision risk)
Cloud instance metadataAWS instance ID, GCP instance nameLowMedium (cloud-specific)

Recommendation: For most deployments, use static configuration with a configuration management system (Ansible, Chef, Kubernetes ConfigMap). ZooKeeper-based assignment is overkill unless you frequently spin up and tear down nodes.

Sequence Overflow Handling

When the sequence counter overflows within a single millisecond:

In practice, reaching 4096 IDs per worker per millisecond requires an average of 4 million requests per second on a single machine. Most systems will never hit this limit. If you do, consider:

  1. Reducing timestamp granularity (use 10ms ticks like Sonyflake, gives more IDs per tick)
  2. Adding more worker ID bits to the sequence space
  3. Batching ID generation requests

Key Takeaways

  • Snowflake's 64-bit ID layout (1 sign + 41 timestamp + 10 worker + 12 sequence) provides compact, time-ordered, unique IDs with no coordination between workers
  • The custom epoch extends usable ID space by ~69 years from the chosen start date
  • Clock skew is the #1 operational risk—never allow NTP to jump clocks backward on ID-generating nodes; use slew mode only
  • Sequence overflow handling (wait for next millisecond) is robust for all but the most extreme throughput scenarios
  • Worker ID assignment is the main operational burden; static configuration via config management is sufficient for most deployments
  • UUID v4 causes severe B-tree index fragmentation (up to 3x storage overhead, 50% reduced insert throughput) compared to time-ordered IDs
  • UUID v7 (RFC 9562) solves the ordering problem with a 48-bit Unix timestamp prefix, making it a strong alternative without worker ID management
  • ULID and KSUID offer zero-coordination time-ordered IDs at the cost of larger size (128+ bits)
  • Choose UUID v7 for simplicity (no worker ID, no clock monotonicity enforcement) and Snowflake for compactness (64-bit, better index efficiency)
  • NTP hardening with slew mode is mandatory for any production Snowflake deployment

In the next tutorial, we will explore quorum-based reads and writes and the N, R, W model for distributed consistency in systems like Dynamo and Cassandra.