Latency Simulator

Interactive p99, p95, p50, and average latency calculator. Generate realistic latency distributions, visualize percentiles, and understand the critical difference between average and tail latency in distributed systems.

Latency Distribution Generator
Configure realistic latency parameters and generate random samples. Adjust the sliders to model different system behaviors — from predictable APIs to highly-variable services.

Distribution Model

Default for most real-world latencies — right-skewed, non-negative, multiplicative variance.

Quick Presets

Sets parameters to match real-world system behaviors. Click Generate to create a new random sample.

The typical response time of a normal (non-tail) request. Start with 100 (typical API). Try 30 for cache hit, 500 for slow DB.

Spread of the distribution — higher = more variance in response times. 0.3 = very consistent (CDN). 0.8 = typical API. 1.5 = highly variable (GC-heavy, external calls).

Percentage of requests entering the high-latency tail due to retries, cache misses, GC pauses, cold starts, or downstream slowness. 0% = no tail. 2% = typical. 5%+ = problematic.

Multiplier applied to tail-latency requests relative to normal requests. 2-3× = mild slowdown (e.g. cache miss). = typical slow path. 10×+ = timeout-retry scenario.

Generate a sample to see latency metrics.

Understanding Latency Percentiles

What are latency percentiles? Percentiles divide your latency data into 100 equal groups. p50 (median) is the middle value — half of requests are faster, half are slower. p95 means 95% of requests are at or below this value. p99 means 99% of requests are at or below this value. The higher the percentile, the more it reflects the experience of your slowest users.

Why is average misleading? Latency distributions are almost always right-skewed (log-normal). Most requests complete quickly, but a small number take much longer due to GC pauses, network retries, cache misses, or slow dependencies. The average is pulled to the right by these outliers, making it appear that all users have worse performance than they actually do. A classic example: if 99 requests take 100ms and 1 takes 10s, the average is ~199ms — but 99% of users experienced 100ms.

The tail at scale. In distributed systems, the probability of hitting a slow path multiplies across service calls. If a request touches 100 microservices each with p99=50ms, the end-to-end p99 is not 50ms — it can be much higher due to the compounding effect of tail latencies. Google's seminal paper “The Tail at Scale” (Dean & Barroso, 2013) demonstrated that even rare latency spikes in individual components become near-certainty at scale.

Which metric should you use? For user-facing SLOs, use p99. For internal monitoring and dashboards, use p95 (less noisy). For baseline tracking, use p50. Never rely on average alone. Tools like Datadog, Grafana, and AWS CloudWatch all default to percentile-based metrics for latency monitoring. Core Web Vitals (Google) uses p75 for real-user monitoring — a deliberate choice balancing outlier resistance and tail awareness.

p50 (Median)

Best for: baseline tracking, developer dashboards, regression detection.

p95

Best for: monitoring, alerting, capacity planning, trend analysis.

p99

Best for: user-facing SLOs, SLA compliance, tail latency optimization.