What are Load Balancers?

Updated June 3, 2026
M
Magic Magnets Team
8 min read

Imagine you open a coffee shop and it becomes wildly popular. One barista can't handle the queue, so you hire five. But if every customer instinctively walks up to barista #1, you've solved nothing. You need someone at the door routing each customer to the shortest queue. That's a load balancer.

In software terms, a load balancer sits between your clients and your backend servers, distributing incoming requests so no single server becomes a bottleneck.

The Problem Load Balancers Solve

algobase.dev
Without a load balancer — all traffic converges on a single server. When it hits its CPU or connection limit, every user experiences slowness or errors simultaneously. If the server crashes, the entire service goes down. Adding more hardware to the same machine (vertical scaling) only delays the problem.
1 / 1

Single server bottleneck — all traffic converging on one machine

Without a load balancer, you have two bad options. A single server eventually hits its CPU, memory, or connection limit and either slows to a crawl or falls over entirely. Multiple servers with clients picking one arbitrarily gives you uneven utilization, hot spots, and no way to gracefully remove a failing server.

Load balancers solve both: they distribute traffic evenly and detect when a server is unhealthy, stopping traffic to it automatically. They're also the natural place to terminate SSL, apply rate limits, and add observability — more on that in the API Gateways chapter.

Quiz Time

What does a load balancer do automatically when a backend server fails health checks?

Layer 4 vs Layer 7 Load Balancers

This is where most engineers get confused, and it's worth spending a moment here.

The "layer" refers to the OSI model — think of it as the level of the request the load balancer actually understands.

Layer 4 (Transport Layer)

algobase.dev
Layer 4 load balancer — routes at the TCP/IP level. It sees the destination IP and port but doesn't read the HTTP payload. This makes it extremely fast with minimal overhead. AWS Network Load Balancer (NLB) operates here — it's used for game servers, database proxies, and any workload where raw throughput or protocol-agnostic balancing matters more than HTTP-level routing.
1 / 1

L4 load balancer routing by IP and port — no HTTP parsing

A Layer 4 load balancer works at the TCP/UDP level. It sees IP addresses and ports and nothing more. It has no idea whether the bytes flowing through are HTTP, gRPC, or a database protocol.

What it excels at: extremely fast operation with minimal processing overhead, full protocol agnosticism, and raw throughput for database connections, game servers, and video streaming. What it cannot do: make routing decisions based on URL paths, headers, or cookies, or terminate SSL on behalf of your application (though it can pass TLS through).

AWS Network Load Balancer (NLB) is the canonical example. It operates at L4, handles millions of requests per second, and is designed for ultra-low latency.

Quiz Time

A Layer 4 load balancer can make routing decisions based on HTTP URL paths and headers.

Layer 7 (Application Layer)

algobase.dev
Layer 7 load balancer — reads the full HTTP request before routing. It can route /api/* to one fleet, serve /static/* from a CDN-backed cache, and enforce authentication on /admin/* routes — all at the LB layer. AWS ALB, Nginx, and HAProxy in HTTP mode all operate here. The trade-off: it must parse every HTTP request, adding ~1ms overhead compared to L4.
1 / 1

L7 load balancer routing by HTTP path — /api, /static, /admin

A Layer 7 load balancer understands the actual content of requests: HTTP headers, URLs, cookies, request bodies. This unlocks a lot of power.

You can route /api/* requests to one server fleet and /static/* to another, implement sticky sessions based on a cookie, run A/B tests by sending 5% of traffic to the new version, and terminate SSL so your backend talks plain HTTP internally. The tradeoff is more CPU overhead (it has to parse HTTP) and slightly higher latency than L4.

AWS Application Load Balancer (ALB) is the go-to for HTTP workloads. Nginx and HAProxy configured in HTTP mode are also L7.

FeatureLayer 4Layer 7
Understands URLs/headers
SSL terminationPassthrough only
Protocol-agnosticHTTP/gRPC only
LatencyLowerSlightly higher
Content-based routing
Quiz Time

Which of the following is a capability that a Layer 7 load balancer offers but a Layer 4 load balancer does not?

Hardware vs Software Load Balancers

Hardware load balancers are dedicated physical appliances — think F5 BIG-IP. They were the standard in enterprise data centers for decades: fast and reliable, but expensive ($50k–$200k+), hard to scale horizontally, and configured through a proprietary UI. In 2026, you'd only encounter these in legacy enterprise environments or highly regulated industries where the vendor provides compliance certifications.

Software load balancers run on commodity hardware or in the cloud and are configured via code. They're what most teams use today.

Nginx was originally a web server, but it's exceptional at L7 load balancing. Its event-driven, non-blocking architecture handles tens of thousands of concurrent connections with low memory usage. It's battle-tested at enormous scale — used by Cloudflare, Netflix, and countless others.

HAProxy is arguably the most reliable open-source load balancer ever built. It's been the backbone of high-availability setups at GitHub, Reddit, and Stack Overflow, and is particularly respected for its detailed statistics dashboard and its ability to handle extremely high connection counts.

AWS ALB/NLB are fully managed load balancers requiring zero infrastructure management. ALB handles HTTP(S)/gRPC at L7; NLB handles TCP/UDP at L4. For most teams on AWS, these are the default choice.

Envoy is a newer proxy used heavily in service mesh architectures (Istio runs on it under the hood). It's designed for microservices with first-class support for gRPC, distributed tracing, and dynamic configuration via an API.

Quiz Time

Which software load balancer is commonly used inside service mesh architectures like Istio?

Health Checks: How Load Balancers Know Who's Alive

A load balancer constantly pings each backend to determine if it's healthy. If a server fails checks, it gets removed from rotation automatically with no human intervention required.

Passive checks watch real traffic. If a backend returns a stream of 5xx errors or repeatedly times out, the LB marks it unhealthy. Simple to set up, but you only detect failure after real users are already being affected.

Active checks have the LB periodically send a synthetic request (such as GET /health) to each backend and verify the response. If the server doesn't respond with a 200 within the timeout, it's pulled from rotation before any user traffic hits it.

Quiz Time

Passive health checks detect server failures before any real user traffic is affected.

Most production setups combine both. The active health endpoint should verify that the application is actually functional — not just that the process is alive. A good /health endpoint checks the database connection, cache connectivity, and any other critical dependencies.

Pro tip: Keep health check endpoints lightweight and skip authentication middleware. The LB will be hitting them hundreds of times per minute.

Quiz Time

What should a well-designed /health endpoint check to be useful for active health monitoring?

The key parameters to configure: how often to run the check (interval), how long to wait for a response (timeout), how many consecutive successes to mark as healthy (healthy threshold), and how many consecutive failures before pulling a server (unhealthy threshold). Typical values are a 10s interval with a 5s timeout, requiring 2 successes and 3 failures respectively.

Real-World Examples

AWS ALB routes HTTP/HTTPS traffic based on path and host rules. A single ALB can forward api.yourapp.com to your API fleet and app.yourapp.com to your frontend, with SSL termination at the LB layer. It integrates natively with AWS Certificate Manager, Auto Scaling groups, and ECS.

AWS NLB is what you'd use in front of a service that needs to maintain long-lived TCP connections: a database proxy, a WebSocket server, or a gaming backend.

Nginx is commonly used as the entry point in Docker Compose stacks and Kubernetes ingress controllers. It's the workhorse behind many self-hosted setups.

HAProxy powers the TCP load balancing layer at GitHub. Its configuration is more verbose than Nginx's, but its operational characteristics are extremely well understood and documented.

Summary

Load balancers are the traffic cops of distributed systems. They solve the single-server scaling problem by distributing requests across a fleet and automatically removing unhealthy servers from rotation.

L4 load balancers (AWS NLB) work at the TCP level: fast, protocol-agnostic, and simple. L7 load balancers (AWS ALB, Nginx, HAProxy) understand HTTP and offer powerful routing, SSL termination, and sticky sessions. Hardware load balancers are legacy; software load balancers are the modern standard. Health checks keep traffic away from degraded servers without any manual intervention.

Every system that runs more than one backend server needs a load balancer. Everything else in this course — caching, databases, microservices — assumes one is already in place.

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices