Distributed Systems

Service Discovery & Configuration Management in Distributed Systems

Learn how services find each other in dynamic environments, implement client-side vs server-side discovery, and manage configuration across microservices with practical patterns and tools.

26 min readservice discoveryconsuleurekaetcdconfiguration managementmicroserviceskubernetes

Introduction: The Problem of Finding Services

Picture this: it's 2015, and you're running a microservices architecture. Your Order Service needs to call the Payment Service. Simple, right? You know the Payment Service's IP address... wait, do you?

In a modern distributed system:

  • Services scale up and down based on demand
  • Services get deployed to new machines when old ones fail
  • Services move between data centers
  • Services get new IP addresses on every restart
  • Different environments (dev, staging, prod) have different addresses

So how does Order Service know where Payment Service is? Service discovery.

Why this matters: I once spent three days debugging an outage that turned out to be a service whose IP address had changed after a restart, but nobody had updated the configuration file. Service discovery exists to prevent exactly this kind of silent failure.


What is Service Discovery?

Service discovery is the process by which services find each other in a distributed system. It answers the question: "Where is service X running right now?"

The Core Components

The Registration Patterns

There are two main approaches to service registration:

  1. Self-Registration (The "Hey, I'm Here!" Pattern): Services register themselves with the service registry.
  2. Third-Party Registration (The "Someone Else Tells Everyone" Pattern): An external system registers services on their behalf.

Why Self-Registration is Usually Better

In my experience, self-registration is more common because:

AspectSelf-RegistrationThird-Party Registration
CouplingService knows about registryService doesn't know about registry
ComplexitySimple (just add code to service)Complex (need external agent)
Failure isolationService failure affects registryAgent failure doesn't affect service
Startup latencyService can fail registrationDepends on agent

Client-Side vs Server-Side Discovery

There are two fundamental approaches to service discovery, each with trade-offs.

Client-Side Discovery

In client-side discovery, the client (service making the call) is responsible for finding available service instances.

How it works:

  1. Service needs to call another service
  2. Service queries the service registry
  3. Service chooses an instance (simple round-robin, random, or more sophisticated)
  4. Service makes the request

Pros:

  • No additional network hop (compared to server-side)
  • Client can use sophisticated load balancing
  • Registry doesn't become a bottleneck

Cons:

  • Client must know about service registry
  • Client library needed in every service
  • More coupling between services and discovery mechanism

Server-Side Discovery

In server-side discovery, the client makes a request to a router/gateway, which queries the service registry and forwards the request.

How it works:

  1. Client makes request to a known endpoint (API gateway)
  2. Gateway queries the service registry
  3. Gateway chooses an instance and forwards the request

Pros:

  • Services don't need discovery client library
  • Easier to manage cross-cutting concerns (auth, logging)
  • Single entry point for all clients

Cons:

  • Additional network hop
  • Gateway can become a bottleneck
  • Gateway must be highly available

When to Use Which

FactorClient-SideServer-Side
Number of servicesMany, with complex routingFewer, simpler routing
Language diversityDifferent languages need different clientsSingle gateway handles all
Routing needsSophisticated per-service routingStandard routing + common policies
Operational complexityLower (services manage themselves)Higher (need to manage gateway)

My experience: For most microservices architectures, I recommend server-side discovery with Kubernetes or an API gateway. It's simpler to manage, provides a natural place for cross-cutting concerns, and decouples services from the discovery mechanism.


Service Registries: The Heart of Service Discovery

What Makes a Good Service Registry?

A service registry needs to handle several challenges:

  1. Service registration: How do services register?
  2. Health checking: How do we know if a service instance is healthy?
  3. Failure detection: How do we detect and remove unhealthy instances?
  4. Distributed operation: How does the registry itself stay available?
RegistryDeveloped ByBest ForKey Features
ConsulHashiCorpGeneral microservicesDNS interface, health checks, KV store
EurekaNetflixAWS microservicesBuilt for cloud, peer-to-peer replication
etcdCoreOSKubernetes ecosystemDistributed key-value store, used by K8s
ZooKeeperApacheLegacy systemsMature, proven, but complex

Consul Architecture

Consul is my go-to for most service discovery use cases. Here's why:

Key features:

  • DNS interface: Services can be discovered via payment.service.consul
  • HTTP API: For programmatic discovery
  • Health checks: HTTP, TCP, script, or TTL-based
  • Key-Value store: For configuration data
  • Multi-datacenter: Native support for federated clusters

Service Registration Flow

Discovery methods:

  1. DNS queries: dig payment-service.service.consul
  2. HTTP API: GET /v1/catalog/service/payment-service
  3. Blocking queries: Wait for changes without polling

Health Checking: The Unsung Hero

Service discovery only works if the registry knows which services are healthy. Health checking is how services prove they're alive.

Types of Health Checks

1. Active Health Checks (Registry Checks Service)

Types of active checks:

  • HTTP: Send HTTP GET to health endpoint
  • TCP: Open TCP connection to port
  • Script: Run a script to check health

2. Passive Health Checks (Service Reports to Registry)

3. Cassandra-Style (Gossip Protocol)

Services tell each other, and health info spreads via gossip:

What Makes a Good Health Check?

A good health check should:

  1. Check what matters: Not just "is the port open" but "can this service do its job"
  2. Be fast: Don't make clients wait for slow checks
  3. Be deterministic: Same state should always pass or fail
  4. Be lightweight: Health checks shouldn't stress the service

Health Check Levels

LevelChecksUse When
L1: LivenessProcess is runningBasic availability
L2: ReadinessCan handle requestsDependencies healthy (DB, cache)
L3: DeepBusiness operations workCritical services

Handling Unhealthy Instances

Key concepts:

  • Grace period: How long to wait before marking unhealthy
  • Recovery period: How many successful checks before marking healthy again
  • Deregistration delay: Time to wait before removing (prevents flapping)

Configuration Management: The Second Half

Service discovery finds services. Configuration management distributes configuration to them. They're often handled by the same tools.

The Configuration Problem

In a microservices architecture, you might have:

  • 50 microservices
  • 3 environments (dev, staging, prod)
  • Multiple teams changing configuration
  • Secrets that shouldn't be in code
  • Configuration that changes at runtime

How do you manage this?

Configuration Patterns

1. Environment Variables (The Simplest Approach)

The simplest approach is environment variables injected at runtime:

ApproachProsCons
Env varsSimple, universalHard to manage many variables
Config filesStructured, supports hierarchiesPer-service, no centralized updates
Configuration serverCentral control, dynamic updates, versioningAnother service to manage

What to store in environment variables:

  • Database connection strings
  • API endpoints
  • Feature flags
  • Log levels

2. Centralized Configuration Server

Pros: Central control, dynamic updates, versioning
Cons: Another service to manage, network dependency

Configuration Refresh

Feature Flags: Beyond Configuration

Feature flags take configuration a step further—they control behavior, not just values.

Flag TypePurpose
Release flagsEnable/disable features
Experiment flagsA/B testing
Ops flagsKill switches, rate limits
Permission flagsUser-specific features

Secrets Management

Never put secrets in configuration files or environment variables directly!

The principle: Applications should fetch secrets at startup or runtime, never hardcode them.


Kubernetes Service Discovery: A Case Study

If you're running Kubernetes, service discovery is built in and beautifully simple.

How Kubernetes DNS Works

Service names:

  • payment-service (short name)
  • payment-service.default (namespace)
  • payment-service.default.svc.cluster.local (fully qualified)

Kubernetes Service Types

TypeUse CaseHow It Works
ClusterIPInternal onlyStable IP within cluster
NodePortSimple external accessExposes on each node's IP
LoadBalancerCloud provider LBExternal LB routes to service
ExternalNameCNAME to external serviceMaps to external DNS

DNS Resolution Flow

Endpoints and EndpointSlices

Behind every Service is an Endpoints object that lists the actual pod IPs:

ComponentPurpose
ServiceStable name + ClusterIP
EndpointsList of pod IPs + ports
EndpointSlicesScaled version for large clusters

Putting It All Together: A Practical Architecture

Here's how I'd set up service discovery and configuration for a typical microservices architecture:

Implementation Checklist

ComponentToolKey Config
Service RegistryConsulHealth checks, DNS
Service MeshIstio / LinkerdmTLS, traffic management
ConfigurationApollo / Spring Cloud ConfigGit-backed, versioned
SecretsHashiCorp VaultPKI, dynamic secrets
Service Discovery (K8s)CoreDNSBuilt-in

What to Remember for Interviews

  1. Client vs server-side discovery: Know the trade-offs and when to use each.
  2. Health checks: Understand active vs passive and what makes a good health check.
  3. Consul, Eureka, etcd: Be familiar with at least one service registry.
  4. Configuration management: Know patterns for managing config across environments.
  5. Kubernetes DNS: Understand how service discovery works in K8s.
  6. Secrets management: Know that secrets should never be in config files.

Interview tip: When designing any microservices system, always address service discovery. Say "we'll use Consul for service discovery with health checks" or "we'll use Kubernetes DNS for service-to-service communication." This shows operational awareness.


Further Reading

  1. "Service Discovery Patterns" — NGINX
    Excellent overview of discovery patterns.

  2. "Consul Documentation" — HashiCorp
    The definitive guide to Consul.

  3. "Kubernetes DNS" — Kubernetes
    How service discovery works in Kubernetes.

  4. "Managing Microservices with Consul" — HashiCorp Learn
    Hands-on tutorial for Consul.

💡

Final thought: After years of managing service discovery, I've learned that simplicity wins. Kubernetes DNS might not have all the features of Consul, but for most teams, it's more than enough. Start simple, add complexity only when you need it.