Scalability & Performance

Scaling Strategies: Horizontal vs Vertical, Sharding, and Auto-Scaling

Learn how to scale systems to handle millions of users. Cover vertical and horizontal scaling, database sharding, caching strategies, and auto-scaling patterns.

22 min readscalinghorizontal scalingvertical scalingshardingauto-scalingperformance

Why Scaling Matters

A system that works for 100 users might fail for 100,000. Understanding scaling strategies ensures your system can grow with your user base.

The scalability golden rule: Scale out (horizontal) before scaling up (vertical). Horizontal scaling provides better fault tolerance and cost efficiency at scale.


Vertical vs Horizontal Scaling

Vertical Scaling (Scale Up)

Add more resources to a single machine.

AspectDescription
ProsSimple, no code changes needed
ConsHardware limits, single point of failure
CostExpensive at high end (big machines)

Horizontal Scaling (Scale Out)

Add more machines to the pool.

AspectDescription
ProsUnlimited scale, fault tolerance
ConsComplexity (statelessness, data distribution)
CostLinear cost with users
⚠️

Key requirement for horizontal scaling: Services must be stateless. Any state (sessions, cache) must be stored externally (Redis, database).


The Scale Cube

Three Axes of Scaling

AxisStrategyExample
XClone/replicateRun multiple identical app instances
YSplit by dataShard users across databases
ZSplit by functionSeparate services for users, orders, payments

Database Scaling

Read Replicas

When to use: When reads >> writes, and eventual consistency is acceptable.

Database Sharding

Sharding Strategies

StrategyShard KeyUse Case
Range-basedUser ID ranges (0-1M, 1M-2M)Sequential access
Hash-basedhash(user_id) % num_shardsEven distribution
Directory-basedLookup serviceFlexible routing
Geo-basedRegion/ datacenterLow latency for local users

Challenges with Sharding

  1. Cross-shard queries: Queries spanning multiple shards are expensive
  2. Rebalancing: Adding/removing shards requires data migration
  3. Joins across shards: Denormalize or accept application-level joins

Start with read replicas, not sharding. Most applications can scale to millions of users with read replicas and caching. Only shard when you've exhausted other options.


Caching for Scale

Cache Hierarchy

Cache Patterns at Scale

PatternDescriptionUse Case
Cache-AsideApp manages cacheGeneral purpose
Read-ThroughCache fetches on missSimplified app code
Write-ThroughWrite to cache + DBConsistency priority
Write-BehindWrite to cache, async to DBWrite-heavy workloads

Auto-Scaling

Automatically adjust capacity based on demand.

Scaling Metrics

MetricGood ThresholdAction
CPU> 70% sustainedScale up
Memory> 80%Scale up
Request countPredictable patternScheduled scaling
LatencyP99 > thresholdScale up
Queue depthGrowingScale up

Scale-Up vs Scale-Out

AspectScale Up (Vertical)Scale Out (Horizontal)
SpeedMinutesSeconds
MaximumLimited by hardwareVirtually unlimited
CostNon-linear (expensive at top)Linear
ComplexityLowHigher
RiskSingle point of failureBetter fault tolerance

CDN and Edge Computing

CDN Caching Strategy

Content TypeTTLStrategy
Static assetsDaysCache long
API responsesMinutesShort TTL
PersonalizedNoneDon't cache
User-generatedConfigurableBalance freshness

Real-World Scaling Example: Twitter/X

💡

Timeline serving: Twitter pre-computes and caches timelines in Redis. When you open the app, your feed is served from cache, not computed on-demand.


What to Remember for Interviews

  1. Stateless design: Enable horizontal scaling by storing state externally
  2. Caching first: Before scaling infrastructure, optimize with caching
  3. Read replicas: Simple way to scale reads
  4. Sharding: When you need to scale writes, shard by a good key
  5. Auto-scaling: Respond to demand automatically with metrics-based policies

Practice: Design the scaling strategy for an e-commerce site expecting 10x traffic during Black Friday. What components need scaling? What can stay static? How would you test it?