Core Building Blocks

Load Balancing Strategies: L4 vs L7, Algorithms, and Health Checks

Learn how load balancers distribute traffic, the differences between Layer 4 and Layer 7, common algorithms (round robin, least connections, IP hash), and health check mechanisms.

18 min readload balancingL4L7algorithmshealth checkssystem design

Why Load Balancing Matters

In any distributed system, a single server cannot handle all the traffic. Load balancers are the traffic cops that distribute incoming requests across multiple servers to ensure no single server becomes a bottleneck.

Interview insight: When designing any scalable system, you'll almost always need a load balancer. Be prepared to explain why you chose a particular type and algorithm.


Layer 4 (Transport Layer) vs Layer 7 (Application Layer) Load Balancers

Load balancers operate at different layers of the OSI model, each with distinct capabilities.

Layer 4 Load Balancers

  • Operate at the transport layer (TCP/UDP)
  • Make routing decisions based on IP addresses and port numbers
  • Cannot see the content of the requests (e.g., HTTP headers, URLs, cookies)
  • Faster and less resource-intensive because they don't inspect application data
  • Examples: AWS Network Load Balancer, HAProxy in TCP mode

Layer 7 Load Balancers

  • Operate at the application layer (HTTP)
  • Can inspect and make decisions based on HTTP headers, URLs, cookies, and even the request body
  • Enable advanced features like SSL termination, content-based routing, and rate limiting
  • More flexible but slightly slower due to deep packet inspection
  • Examples: AWS Application Load Balancer, NGINX, Envoy
💡

Key trade-off: L4 is faster and simpler; L7 is more intelligent and flexible. Choose L4 for raw TCP traffic (like databases) and L7 for HTTP/WebSocket traffic where you need routing based on content.


Common Load Balancing Algorithms

The algorithm determines how the load balancer selects a backend server for each request.

Round Robin

  • Distributes requests sequentially to each server in the pool
  • Simple and works well when all servers have similar capacity
  • Does not account for server load or response time

Weighted Round Robin

  • Assigns weights to servers based on their capacity
  • More powerful servers get more requests
  • Example: Server A (weight 3) gets 3 requests for every 1 request to Server B (weight 1)

Least Connections

  • Sends requests to the server with the fewest active connections
  • Better for long-lived connections (like WebSockets or database connections)
  • Accounts for varying request processing times

Least Response Time

  • Sends requests to the server with the fastest average response time
  • Requires monitoring response times, adding overhead
  • Good for minimizing latency

IP Hash

  • Uses the client's IP address to determine which server to send the request to
  • Ensures the same client always goes to the same server (session persistence)
  • Useful when you can't use sticky cookies (e.g., non-HTTP traffic)

Random with Two Choices

  • Picks two servers at random and selects the one with the lower load
  • Provides good load distribution with less overhead than checking all servers
  • Used in systems like Redis Cluster
⚠️

Sticky sessions: Algorithms like IP hash enable sticky sessions, but they can lead to uneven load distribution if certain clients generate more traffic. Consider using cookie-based stickiness (at L7) for better balance.


Health Checks: Ensuring Traffic Goes Only to Healthy Servers

Load balancers must avoid sending traffic to servers that are down or unhealthy. Health checks are periodic probes that determine a server's fitness.

Types of Health Checks

  • TCP check: Can you establish a TCP connection to the server's port? (L4)
  • HTTP check: Send an HTTP GET request and expect a 2xx or 3xx response. (L7)
  • HTTPS check: Same as HTTP but over TLS.
  • Custom check: Run a script or endpoint that returns 200 if the server is healthy (e.g., checks disk space, queue depth, etc.)

Health Check Configuration

  • Interval: How often to perform the check (e.g., every 5 seconds)
  • Timeout: How long to wait for a response before marking the check as failed
  • Unhealthy threshold: Number of consecutive failed checks before marking the server as unhealthy
  • Healthy threshold: Number of consecutive successful checks required to mark an unhealthy server as healthy again (prevents flapping)

Best practice: Use an endpoint like /health that checks critical dependencies (database, cache, etc.) and returns 200 only if all are healthy. Avoid overly complex health checks that might fail transiently.


Putting It All Together: Choosing a Load Balancer

When designing a system, consider:

  1. Traffic type: Is it HTTP/WebSocket (L7) or raw TCP (L4)?
  2. Features needed: Do you need SSL termination, URL-based routing, or WAF capabilities? (L7)
  3. Performance requirements: Do you need the lowest possible latency? (L4 might be better)
  4. Algorithm suitability: Match the algorithm to your workload (e.g., least connections for long-lived requests, round robin for short-lived HTTP requests).
  5. Observability: Choose a load balancer that provides good metrics and logging.
🚨

Common mistake: Overlooking health checks. A misconfigured health check can lead to all servers being marked unhealthy (too strict) or unhealthy servers receiving traffic (too lax).


What to Remember for Interviews

  1. L4 vs L7: Know the differences in what they can inspect and their typical use cases.
  2. Algorithms: Be able to explain round robin, least connections, and IP hash, and when to use each.
  3. Health checks: Understand why they're critical and how to configure them properly.
  4. Sticky sessions: Know the trade-offs between IP hash and cookie-based stickiness.
  5. Cloud vs self-managed: Be familiar with offerings like AWS ALB/NLB, GCP Cloud Load Balancing, and open-source options like HAProxy and NGINX.

Practice: Draw a diagram of a typical web architecture with clients, L7 load balancer, web servers, and a database. Explain how you'd choose the load balancer type, algorithm, and health check for each layer.