Load Balancing Strategies: L4 vs L7, Algorithms, and Health Checks
Learn how load balancers distribute traffic, the differences between Layer 4 and Layer 7, common algorithms (round robin, least connections, IP hash), and health check mechanisms.
Why Load Balancing Matters
In any distributed system, a single server cannot handle all the traffic. Load balancers are the traffic cops that distribute incoming requests across multiple servers to ensure no single server becomes a bottleneck.
Interview insight: When designing any scalable system, you'll almost always need a load balancer. Be prepared to explain why you chose a particular type and algorithm.
Layer 4 (Transport Layer) vs Layer 7 (Application Layer) Load Balancers
Load balancers operate at different layers of the OSI model, each with distinct capabilities.
Layer 4 Load Balancers
- Operate at the transport layer (TCP/UDP)
- Make routing decisions based on IP addresses and port numbers
- Cannot see the content of the requests (e.g., HTTP headers, URLs, cookies)
- Faster and less resource-intensive because they don't inspect application data
- Examples: AWS Network Load Balancer, HAProxy in TCP mode
Layer 7 Load Balancers
- Operate at the application layer (HTTP)
- Can inspect and make decisions based on HTTP headers, URLs, cookies, and even the request body
- Enable advanced features like SSL termination, content-based routing, and rate limiting
- More flexible but slightly slower due to deep packet inspection
- Examples: AWS Application Load Balancer, NGINX, Envoy
Key trade-off: L4 is faster and simpler; L7 is more intelligent and flexible. Choose L4 for raw TCP traffic (like databases) and L7 for HTTP/WebSocket traffic where you need routing based on content.
Common Load Balancing Algorithms
The algorithm determines how the load balancer selects a backend server for each request.
Round Robin
- Distributes requests sequentially to each server in the pool
- Simple and works well when all servers have similar capacity
- Does not account for server load or response time
Weighted Round Robin
- Assigns weights to servers based on their capacity
- More powerful servers get more requests
- Example: Server A (weight 3) gets 3 requests for every 1 request to Server B (weight 1)
Least Connections
- Sends requests to the server with the fewest active connections
- Better for long-lived connections (like WebSockets or database connections)
- Accounts for varying request processing times
Least Response Time
- Sends requests to the server with the fastest average response time
- Requires monitoring response times, adding overhead
- Good for minimizing latency
IP Hash
- Uses the client's IP address to determine which server to send the request to
- Ensures the same client always goes to the same server (session persistence)
- Useful when you can't use sticky cookies (e.g., non-HTTP traffic)
Random with Two Choices
- Picks two servers at random and selects the one with the lower load
- Provides good load distribution with less overhead than checking all servers
- Used in systems like Redis Cluster
Sticky sessions: Algorithms like IP hash enable sticky sessions, but they can lead to uneven load distribution if certain clients generate more traffic. Consider using cookie-based stickiness (at L7) for better balance.
Health Checks: Ensuring Traffic Goes Only to Healthy Servers
Load balancers must avoid sending traffic to servers that are down or unhealthy. Health checks are periodic probes that determine a server's fitness.
Types of Health Checks
- TCP check: Can you establish a TCP connection to the server's port? (L4)
- HTTP check: Send an HTTP GET request and expect a 2xx or 3xx response. (L7)
- HTTPS check: Same as HTTP but over TLS.
- Custom check: Run a script or endpoint that returns 200 if the server is healthy (e.g., checks disk space, queue depth, etc.)
Health Check Configuration
- Interval: How often to perform the check (e.g., every 5 seconds)
- Timeout: How long to wait for a response before marking the check as failed
- Unhealthy threshold: Number of consecutive failed checks before marking the server as unhealthy
- Healthy threshold: Number of consecutive successful checks required to mark an unhealthy server as healthy again (prevents flapping)
Best practice: Use an endpoint like /health that checks critical dependencies (database, cache, etc.) and returns 200 only if all are healthy. Avoid overly complex health checks that might fail transiently.
Putting It All Together: Choosing a Load Balancer
When designing a system, consider:
- Traffic type: Is it HTTP/WebSocket (L7) or raw TCP (L4)?
- Features needed: Do you need SSL termination, URL-based routing, or WAF capabilities? (L7)
- Performance requirements: Do you need the lowest possible latency? (L4 might be better)
- Algorithm suitability: Match the algorithm to your workload (e.g., least connections for long-lived requests, round robin for short-lived HTTP requests).
- Observability: Choose a load balancer that provides good metrics and logging.
Common mistake: Overlooking health checks. A misconfigured health check can lead to all servers being marked unhealthy (too strict) or unhealthy servers receiving traffic (too lax).
What to Remember for Interviews
- L4 vs L7: Know the differences in what they can inspect and their typical use cases.
- Algorithms: Be able to explain round robin, least connections, and IP hash, and when to use each.
- Health checks: Understand why they're critical and how to configure them properly.
- Sticky sessions: Know the trade-offs between IP hash and cookie-based stickiness.
- Cloud vs self-managed: Be familiar with offerings like AWS ALB/NLB, GCP Cloud Load Balancing, and open-source options like HAProxy and NGINX.
Practice: Draw a diagram of a typical web architecture with clients, L7 load balancer, web servers, and a database. Explain how you'd choose the load balancer type, algorithm, and health check for each layer.