What are Load Balancers?
Updated June 3, 2026Imagine you open a coffee shop and it becomes wildly popular. One barista can't handle the queue, so you hire five. But if every customer instinctively walks up to barista #1, you've solved nothing. You need someone at the door routing each customer to the shortest queue. That's a load balancer.
In software terms, a load balancer sits between your clients and your backend servers, distributing incoming requests so no single server becomes a bottleneck.
The Problem Load Balancers Solve
Single server bottleneck — all traffic converging on one machine
Without a load balancer, you have two bad options. A single server eventually hits its CPU, memory, or connection limit and either slows to a crawl or falls over entirely. Multiple servers with clients picking one arbitrarily gives you uneven utilization, hot spots, and no way to gracefully remove a failing server.
Load balancers solve both: they distribute traffic evenly and detect when a server is unhealthy, stopping traffic to it automatically. They're also the natural place to terminate SSL, apply rate limits, and add observability — more on that in the API Gateways chapter.
What does a load balancer do automatically when a backend server fails health checks?
Layer 4 vs Layer 7 Load Balancers
This is where most engineers get confused, and it's worth spending a moment here.
The "layer" refers to the OSI model — think of it as the level of the request the load balancer actually understands.
Layer 4 (Transport Layer)
L4 load balancer routing by IP and port — no HTTP parsing
A Layer 4 load balancer works at the TCP/UDP level. It sees IP addresses and ports and nothing more. It has no idea whether the bytes flowing through are HTTP, gRPC, or a database protocol.
What it excels at: extremely fast operation with minimal processing overhead, full protocol agnosticism, and raw throughput for database connections, game servers, and video streaming. What it cannot do: make routing decisions based on URL paths, headers, or cookies, or terminate SSL on behalf of your application (though it can pass TLS through).
AWS Network Load Balancer (NLB) is the canonical example. It operates at L4, handles millions of requests per second, and is designed for ultra-low latency.
A Layer 4 load balancer can make routing decisions based on HTTP URL paths and headers.
Layer 7 (Application Layer)
L7 load balancer routing by HTTP path — /api, /static, /admin
A Layer 7 load balancer understands the actual content of requests: HTTP headers, URLs, cookies, request bodies. This unlocks a lot of power.
You can route /api/* requests to one server fleet and /static/* to another, implement sticky sessions based on a cookie, run A/B tests by sending 5% of traffic to the new version, and terminate SSL so your backend talks plain HTTP internally. The tradeoff is more CPU overhead (it has to parse HTTP) and slightly higher latency than L4.
AWS Application Load Balancer (ALB) is the go-to for HTTP workloads. Nginx and HAProxy configured in HTTP mode are also L7.
| Feature | Layer 4 | Layer 7 |
|---|---|---|
| Understands URLs/headers | ||
| SSL termination | Passthrough only | |
| Protocol-agnostic | HTTP/gRPC only | |
| Latency | Lower | Slightly higher |
| Content-based routing |
Which of the following is a capability that a Layer 7 load balancer offers but a Layer 4 load balancer does not?
Hardware vs Software Load Balancers
Hardware load balancers are dedicated physical appliances — think F5 BIG-IP. They were the standard in enterprise data centers for decades: fast and reliable, but expensive ($50k–$200k+), hard to scale horizontally, and configured through a proprietary UI. In 2026, you'd only encounter these in legacy enterprise environments or highly regulated industries where the vendor provides compliance certifications.
Software load balancers run on commodity hardware or in the cloud and are configured via code. They're what most teams use today.
Nginx was originally a web server, but it's exceptional at L7 load balancing. Its event-driven, non-blocking architecture handles tens of thousands of concurrent connections with low memory usage. It's battle-tested at enormous scale — used by Cloudflare, Netflix, and countless others.
HAProxy is arguably the most reliable open-source load balancer ever built. It's been the backbone of high-availability setups at GitHub, Reddit, and Stack Overflow, and is particularly respected for its detailed statistics dashboard and its ability to handle extremely high connection counts.
AWS ALB/NLB are fully managed load balancers requiring zero infrastructure management. ALB handles HTTP(S)/gRPC at L7; NLB handles TCP/UDP at L4. For most teams on AWS, these are the default choice.
Envoy is a newer proxy used heavily in service mesh architectures (Istio runs on it under the hood). It's designed for microservices with first-class support for gRPC, distributed tracing, and dynamic configuration via an API.
Which software load balancer is commonly used inside service mesh architectures like Istio?
Health Checks: How Load Balancers Know Who's Alive
A load balancer constantly pings each backend to determine if it's healthy. If a server fails checks, it gets removed from rotation automatically with no human intervention required.
Passive checks watch real traffic. If a backend returns a stream of 5xx errors or repeatedly times out, the LB marks it unhealthy. Simple to set up, but you only detect failure after real users are already being affected.
Active checks have the LB periodically send a synthetic request (such as GET /health) to each backend and verify the response. If the server doesn't respond with a 200 within the timeout, it's pulled from rotation before any user traffic hits it.
Passive health checks detect server failures before any real user traffic is affected.
Most production setups combine both. The active health endpoint should verify that the application is actually functional — not just that the process is alive. A good /health endpoint checks the database connection, cache connectivity, and any other critical dependencies.
Pro tip: Keep health check endpoints lightweight and skip authentication middleware. The LB will be hitting them hundreds of times per minute.
What should a well-designed /health endpoint check to be useful for active health monitoring?
The key parameters to configure: how often to run the check (interval), how long to wait for a response (timeout), how many consecutive successes to mark as healthy (healthy threshold), and how many consecutive failures before pulling a server (unhealthy threshold). Typical values are a 10s interval with a 5s timeout, requiring 2 successes and 3 failures respectively.
Real-World Examples
AWS ALB routes HTTP/HTTPS traffic based on path and host rules. A single ALB can forward api.yourapp.com to your API fleet and app.yourapp.com to your frontend, with SSL termination at the LB layer. It integrates natively with AWS Certificate Manager, Auto Scaling groups, and ECS.
AWS NLB is what you'd use in front of a service that needs to maintain long-lived TCP connections: a database proxy, a WebSocket server, or a gaming backend.
Nginx is commonly used as the entry point in Docker Compose stacks and Kubernetes ingress controllers. It's the workhorse behind many self-hosted setups.
HAProxy powers the TCP load balancing layer at GitHub. Its configuration is more verbose than Nginx's, but its operational characteristics are extremely well understood and documented.
Summary
Load balancers are the traffic cops of distributed systems. They solve the single-server scaling problem by distributing requests across a fleet and automatically removing unhealthy servers from rotation.
L4 load balancers (AWS NLB) work at the TCP level: fast, protocol-agnostic, and simple. L7 load balancers (AWS ALB, Nginx, HAProxy) understand HTTP and offer powerful routing, SSL termination, and sticky sessions. Hardware load balancers are legacy; software load balancers are the modern standard. Health checks keep traffic away from degraded servers without any manual intervention.
Every system that runs more than one backend server needs a load balancer. Everything else in this course — caching, databases, microservices — assumes one is already in place.
How helpful was this content?
Comments
Sign in to join the discussion
Saved on this device only
Sign in to sync progress across devices