Load Balancing Algorithms

Updated June 3, 2026
M
Magic Magnets Team
9 min read

Knowing that a load balancer distributes traffic is one thing. Knowing how it decides where to send each request is what separates a good system design answer from a great one. The algorithm matters — choose the wrong one and you can create hot spots even with a perfectly configured LB.

Let's go through each algorithm, what it optimizes for, and when you'd actually use it.

Round Robin

algobase.dev
Round Robin — the simplest algorithm. Each request goes to the next server in sequence, cycling through the list. Works well when all servers have identical specs and all requests cost roughly the same. Nginx defaults to round robin. The failure mode: if one server gets hit with ten slow requests while others get ten fast ones, round robin keeps sending to the slow server at the same rate — it has no visibility into actual server load.
1 / 1

Round Robin — each request cycles to the next server in order

The simplest algorithm: requests are distributed to each server in order, cycling through the list.

Server 1 → Server 2 → Server 3 → Server 1 → Server 2 → ...

When it works well: When all your servers are identical in hardware specs and all your requests are roughly equal in cost. If every request does about the same amount of work, round robin distributes load evenly by default.

Where it breaks: When requests have wildly varying costs. Imagine a server that got unlucky and received ten slow database-heavy requests in a row while the others handled ten fast in-memory lookups. Round robin doesn't know — it keeps routing to the overloaded server at the same rate.

Most nginx configs default to round robin. It's the right choice more often than people give it credit for, as long as your requests and servers are homogeneous.

Quiz Time

Which load balancing algorithm is most appropriate for a mixed-capacity fleet where some servers have 4 CPU cores and others have 16 CPU cores?

Weighted Round Robin

algobase.dev
Weighted Round Robin — each server gets a weight proportional to its capacity. A 16-core server gets 3x more traffic than a 4-core legacy server. Weights are configured statically and do not adapt to real-time load. The primary use case is fleet migrations: as you replace old servers with new ones, you gradually increase the new servers' weights and decrease the old ones' — shifting traffic without any downtime or full cutover.
1 / 1

Weighted Round Robin — new 16-core server gets 3x more traffic than legacy 4-core server

Same idea as round robin, but each server gets assigned a weight that reflects its capacity.

Server A — weight 3 (16 CPU cores) Server B — weight 1 (4 CPU cores)

The LB sends 3 requests to Server A for every 1 request to Server B. The distribution matches the relative compute capacity.

When to use it: During a rolling upgrade or a mixed fleet migration. Say you're replacing old 4-core servers with new 16-core ones gradually — weighted round robin lets you shift traffic proportionally while the upgrade is in progress without having to maintain identical server specs.

The downside: Weights are static. You set them manually and they don't adapt to real-time load.

Quiz Time

Round Robin is a poor choice for WebSocket servers where connections stay open for hours.

Least Connections

algobase.dev
Least Connections — the load balancer tracks active connection counts in real time and routes each new request to the server with the fewest open connections. Critical for long-lived connections: a WebSocket server might have connections open for hours. Without least-connections awareness, round robin would keep piling new connections onto an already-saturated server. HAProxy calls this mode "leastconn" — it is the default choice for WebSocket, file upload, and video streaming backends.
1 / 1

Least Connections — new request routed to server with fewest active connections

Instead of cycling in order, the LB tracks how many active connections each server currently has and routes the next request to whichever server has the fewest.

When it works well: Long-lived connections where request duration varies significantly — WebSocket connections, file uploads, video streaming. These can hold a connection open for minutes. Least connections ensures you're not piling new connections onto a server that's already saturated, even if it's "due" in a round-robin cycle.

Where it breaks: For very short-lived requests (sub-millisecond API calls), connection counts change so fast that the overhead of tracking them can outweigh the benefit. Round robin performs just as well and is cheaper.

HAProxy calls this Least Connections and it's one of its most popular modes for HTTP services with variable request duration.

Least Response Time

An extension of least connections: the LB sends traffic to the server with both the fewest active connections and the lowest average response time. This is the most "intelligent" of the simple algorithms — it reacts to actual server performance, not just connection count.

Nginx Plus (the commercial version) supports this as least_time. It's excellent for API fleets where some servers might be slower due to noisy neighbors in a cloud environment.

Quiz Time

Least Response Time is an extension of Least Connections that also factors in average response time.

IP Hash (Sticky Sessions)

The LB hashes the client's IP address and routes them consistently to the same server. Same IP means same server, every time (as long as the server pool doesn't change).

The use case: Session affinity. If your application stores session state in memory (old-school PHP apps, some game servers), a user's requests must go to the same server — otherwise they'd be logged out on every request.

The big problem: IP Hash is fragile. If you remove a server, all the hashes change and users get redistributed, effectively logging everyone out. Corporate users often share a single outbound IP through NAT, sending a disproportionate flood of traffic to one server.

The modern alternative: Don't store session state on servers. Put it in Redis or a distributed cache, and then you don't need sticky sessions at all. IP Hash becomes mostly a legacy concern.

Quiz Time

What is the primary weakness of IP Hash for session affinity?

Consistent Hashing

A more robust version of IP Hash. Instead of hashing to a fixed array of servers, consistent hashing arranges servers on a virtual ring. Each request key (could be IP, user ID, or any other attribute) is hashed to a point on the ring and routed to the next server clockwise.

The key property: when a server is added or removed, only a fraction of keys are remapped — not the entire keyspace. This is critical for distributed caches (Memcached, Redis Cluster) and is covered in depth in the Caching chapter.

Random

Exactly what it sounds like: pick a server at random from the available pool.

This sounds naive, but at large scale it's statistically equivalent to round robin — random selection approaches uniform distribution as request count grows. The advantage is that it's trivially simple and requires zero shared state between LB instances.

Netflix has published research showing that "Power of Two Choices" — pick two servers at random and route to whichever has fewer active connections — consistently outperforms pure round robin and approaches optimal distribution in practice. It's used in their production routing layer.

Quiz Time

Netflix's "Power of Two Choices" algorithm works by picking which two servers to consider before routing a request?

Resource Based (Adaptive)

The most sophisticated option. The LB actively queries each backend server (or reads from a sidecar agent) for real-time CPU, memory, or queue depth metrics, then routes traffic to the least-loaded server by actual resource utilization.

When it shines: Heterogeneous workloads where you can't predict request cost in advance. A machine learning inference endpoint that gets cheap text requests and expensive image requests — resource-based routing adapts in real time.

The tradeoff: It adds operational complexity. You need a way to expose and collect resource metrics from every backend, and the LB itself needs to process this data. This is generally only worth it for large fleets where the efficiency gains are meaningful.

AWS ALB doesn't support this natively. You'd implement it with a custom load balancing layer or a service mesh like Envoy with custom load balancing plugins.

Quiz Time

Resource-Based (Adaptive) load balancing is the right default choice for most HTTP API deployments.

Choosing the Right Algorithm

ScenarioRecommended Algorithm
Uniform servers, uniform requestsRound Robin
Mixed-capacity server fleetWeighted Round Robin
Long-lived connections (WebSockets, uploads)Least Connections
Performance-sensitive API fleetLeast Response Time
Must maintain session affinityIP Hash (or better: externalize session state)
Distributed caching layerConsistent Hashing
Very large, diverse workloadsResource Based / Power of Two Choices

Summary

Most production systems run on Round Robin or Least Connections — they're simple, well-understood, and work well for the majority of HTTP API workloads. Weighted Round Robin is your friend during migrations. Consistent Hashing is essential for distributed caches. And if you're at Netflix-scale, the Power of Two Choices is worth the read.

The algorithm you pick should match your specific request pattern. A mismatch — say, using Round Robin for a WebSocket chat server with connections that stay open for hours — can create severe imbalance even with a correctly configured load balancer.

DNS Load Balancing

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices