Scaling Strategies: Horizontal vs Vertical, Sharding, and Auto-Scaling
Learn how to scale systems to handle millions of users. Cover vertical and horizontal scaling, database sharding, caching strategies, and auto-scaling patterns.
Why Scaling Matters
A system that works for 100 users might fail for 100,000. Understanding scaling strategies ensures your system can grow with your user base.
The scalability golden rule: Scale out (horizontal) before scaling up (vertical). Horizontal scaling provides better fault tolerance and cost efficiency at scale.
Vertical vs Horizontal Scaling
Vertical Scaling (Scale Up)
Add more resources to a single machine.
| Aspect | Description |
|---|---|
| Pros | Simple, no code changes needed |
| Cons | Hardware limits, single point of failure |
| Cost | Expensive at high end (big machines) |
Horizontal Scaling (Scale Out)
Add more machines to the pool.
| Aspect | Description |
|---|---|
| Pros | Unlimited scale, fault tolerance |
| Cons | Complexity (statelessness, data distribution) |
| Cost | Linear cost with users |
Key requirement for horizontal scaling: Services must be stateless. Any state (sessions, cache) must be stored externally (Redis, database).
The Scale Cube
Three Axes of Scaling
| Axis | Strategy | Example |
|---|---|---|
| X | Clone/replicate | Run multiple identical app instances |
| Y | Split by data | Shard users across databases |
| Z | Split by function | Separate services for users, orders, payments |
Database Scaling
Read Replicas
When to use: When reads >> writes, and eventual consistency is acceptable.
Database Sharding
Sharding Strategies
| Strategy | Shard Key | Use Case |
|---|---|---|
| Range-based | User ID ranges (0-1M, 1M-2M) | Sequential access |
| Hash-based | hash(user_id) % num_shards | Even distribution |
| Directory-based | Lookup service | Flexible routing |
| Geo-based | Region/ datacenter | Low latency for local users |
Challenges with Sharding
- Cross-shard queries: Queries spanning multiple shards are expensive
- Rebalancing: Adding/removing shards requires data migration
- Joins across shards: Denormalize or accept application-level joins
Start with read replicas, not sharding. Most applications can scale to millions of users with read replicas and caching. Only shard when you've exhausted other options.
Caching for Scale
Cache Hierarchy
Cache Patterns at Scale
| Pattern | Description | Use Case |
|---|---|---|
| Cache-Aside | App manages cache | General purpose |
| Read-Through | Cache fetches on miss | Simplified app code |
| Write-Through | Write to cache + DB | Consistency priority |
| Write-Behind | Write to cache, async to DB | Write-heavy workloads |
Auto-Scaling
Automatically adjust capacity based on demand.
Scaling Metrics
| Metric | Good Threshold | Action |
|---|---|---|
| CPU | > 70% sustained | Scale up |
| Memory | > 80% | Scale up |
| Request count | Predictable pattern | Scheduled scaling |
| Latency | P99 > threshold | Scale up |
| Queue depth | Growing | Scale up |
Scale-Up vs Scale-Out
| Aspect | Scale Up (Vertical) | Scale Out (Horizontal) |
|---|---|---|
| Speed | Minutes | Seconds |
| Maximum | Limited by hardware | Virtually unlimited |
| Cost | Non-linear (expensive at top) | Linear |
| Complexity | Low | Higher |
| Risk | Single point of failure | Better fault tolerance |
CDN and Edge Computing
CDN Caching Strategy
| Content Type | TTL | Strategy |
|---|---|---|
| Static assets | Days | Cache long |
| API responses | Minutes | Short TTL |
| Personalized | None | Don't cache |
| User-generated | Configurable | Balance freshness |
Real-World Scaling Example: Twitter/X
Timeline serving: Twitter pre-computes and caches timelines in Redis. When you open the app, your feed is served from cache, not computed on-demand.
What to Remember for Interviews
- Stateless design: Enable horizontal scaling by storing state externally
- Caching first: Before scaling infrastructure, optimize with caching
- Read replicas: Simple way to scale reads
- Sharding: When you need to scale writes, shard by a good key
- Auto-scaling: Respond to demand automatically with metrics-based policies
Practice: Design the scaling strategy for an e-commerce site expecting 10x traffic during Black Friday. What components need scaling? What can stay static? How would you test it?