Service Discovery
Updated June 8, 2026In a monolith, your code calls a function. Simple. But in a microservices system, your Order Service needs to call your Inventory Service — and that means one service has to know where the other one is running.
That's the service discovery problem, and it's more subtle than it first appears.
Why You Can't Just Hardcode Addresses
The naive approach: hardcode the IP address of each service in a config file. Order Service is at 10.0.1.42:8080. Done.
This breaks immediately in any dynamic environment:
- Containers restart and get new IP addresses
- Auto-scaling adds and removes instances constantly
- Rolling deploys cycle out old instances and bring in new ones
- Failures take instances down and replacements come up on different hosts
In a Kubernetes cluster, a pod's IP address is essentially ephemeral. In an auto-scaling group on AWS, instances can come and go every few minutes. You need a system that tracks where services are running right now — not where they were running when you wrote the config.
That system is called a service registry, and the mechanisms for using it are called service discovery.
Inventory instances register with the Service Registry on startup. The Order Service queries the registry for healthy instances, then calls one directly.
Client-Side Discovery
In client-side discovery, the calling service is responsible for finding the address of the service it wants to call:
- The Order Service queries the service registry: "Where are the Inventory Service instances?"
- The registry returns a list of healthy instances with their addresses
- The Order Service picks one using a load balancing algorithm: round robin, least connections, and so on.
- The Order Service sends the request directly
Pros:
- The client controls load balancing logic — it can make smart decisions (prefer same-availability-zone instances, avoid slow instances)
- One fewer network hop — no proxy in the middle
- Registry is only queried on lookup, not on every request
Cons:
- Every client has to implement the discovery and load balancing logic
- Each client needs an SDK or library for this — and you have to maintain that library in every language your services use
Netflix's Eureka with the Ribbon client-side load balancer is the classic example of client-side discovery.
Server-Side Discovery
In server-side discovery, the client just sends a request to a well-known address (a load balancer or proxy). The load balancer handles the registry lookup and routing:
- The Order Service sends a request to
http://inventory-service/ - The load balancer (e.g., an AWS ALB or an Nginx instance) queries the service registry
- The load balancer picks a healthy Inventory Service instance and forwards the request
Pros:
- Clients don't need any discovery logic — they just call a stable address
- Works for clients in any language with zero code changes
- The discovery/routing logic is centralized and maintained in one place
Cons:
- The load balancer is a potential bottleneck and single point of failure (mitigated by running multiple load balancers)
- One extra network hop on every request
- Less fine-grained control — the client can't make intelligent routing decisions
AWS Elastic Load Balancing, Kubernetes Services, and service mesh proxies (like Envoy in Istio) all implement server-side discovery.
The Service Registry
At the heart of both approaches is the service registry — a distributed key-value store or database that tracks which instances of each service are healthy and running. When an instance starts up, it registers itself. When it shuts down (or crashes), it deregisters.
Consul (HashiCorp)
The most popular standalone service registry. Consul supports service registration, health checking, and key-value storage. It uses the Raft consensus algorithm to keep the registry consistent across a cluster of Consul agents. It also supports DNS-based service discovery out of the box — services can be discovered at inventory-service.service.consul.
etcd (CoreOS / CNCF)
A distributed key-value store using Raft consensus. Originally built for Kubernetes configuration storage, but widely used as a service registry. Kubernetes itself uses etcd to store all cluster state, including service endpoints.
Apache ZooKeeper
The original distributed coordination service, used heavily in the Hadoop ecosystem. Kafka uses ZooKeeper (historically) for broker coordination. More complex to operate than Consul or etcd, but very battle-tested.
Netflix Eureka
Built by Netflix specifically for AWS deployments. Designed for resilience over consistency — Eureka prefers to show stale registry data rather than go down. Each client caches the registry locally, so it continues working even if the Eureka server is temporarily unavailable.
Health Checking
A registry that just stores addresses without checking health is dangerous. You might get routed to an instance that's running but stuck in an infinite loop, out of memory, or failing to connect to its database.
Service registries implement health checks in two ways:
- Active health checks: The registry pings each service instance on a regular interval (
GET /health). If the instance fails to respond within a timeout, it gets removed from the registry. - Heartbeat / TTL: Each instance periodically tells the registry "I'm still alive." If the registry doesn't receive a heartbeat within a time window (TTL), it deregisters the instance.
Both approaches have failure modes. Active checks add load. Heartbeat-based checks can leave stale entries if an instance dies between heartbeats. Production systems typically use both.
DNS-Based Service Discovery
The simplest form of service discovery: just use DNS. Each service name resolves to a list of IP addresses. Rotate healthy instances into the DNS record; remove unhealthy ones.
This works and requires no client library. The downsides: DNS TTLs mean stale addresses can persist for minutes after an instance dies (aggressive TTL caching is the main culprit), and DNS doesn't carry health information beyond "the record exists."
In practice, cloud providers make DNS-based discovery very usable: AWS Route 53 supports health-checked records and private hosted zones. AWS Cloud Map lets you register services and query them via DNS or API. It's often the right starting point before reaching for Consul or etcd.
Kubernetes Service Discovery
If you're running on Kubernetes, you get service discovery for free. When you create a Kubernetes Service object, Kubernetes:
- Assigns a stable virtual IP (ClusterIP) to the service
- Creates a DNS entry:
inventory-service.default.svc.cluster.local - Configures
kube-proxy(or eBPF rules) on every node to forward traffic to healthy pods
Your Order Service just calls http://inventory-service/ and Kubernetes handles finding pods, health checking, and load balancing. It's server-side discovery implemented by the infrastructure itself.
For more advanced use cases (circuit breaking, retries, mutual TLS between services), service meshes like Istio or Linkerd sit on top of Kubernetes and add those features via sidecar proxies, without any changes to your application code.
Summary
Service discovery solves the problem of services finding each other in a dynamic environment where IP addresses are ephemeral and instance counts change constantly. Client-side discovery puts routing logic in the client. Server-side discovery puts it in a proxy or load balancer. The service registry is the source of truth for what's healthy and where it's running. Consul, etcd, ZooKeeper, and Eureka are the main options. DNS-based discovery is the simplest starting point. On Kubernetes, you get solid service discovery out of the box, with service meshes available when you need more advanced traffic management.
How helpful was this content?
Comments
Sign in to join the discussion
Saved on this device only
Sign in to sync progress across devices