Latency vs Throughput vs Bandwidth

Updated June 3, 2026

Magic Magnets Team

9 min read

Three terms come up constantly in system design conversations: latency, throughput, and bandwidth. They're related but distinct, and confusing them leads to the wrong solutions. Let's make each one crystal clear.

The Highway Analogy

Think of a highway:

Bandwidth is the number of lanes, representing the maximum capacity of the road
Throughput is the actual number of cars passing through per hour, representing the actual traffic being handled
Latency is how long it takes a single car to travel from point A to point B

A 10-lane highway has high bandwidth. At 2am with no traffic, throughput is low but latency is also low, meaning your car gets there fast. At rush hour, throughput approaches the bandwidth limit, and latency goes up because everyone is stuck in traffic.

This analogy holds surprisingly well for computer systems.

Quiz Time

A highway has 10 lanes but only a few cars are on it at 3am. Which statement best describes this situation?

Bandwidth

Bandwidth is the maximum data transfer rate of a network link, typically measured in Mbps (megabits per second) or Gbps. It's a physical constraint of the medium.

A fiber connection might give you 10 Gbps of bandwidth. That doesn't mean you're always pushing 10 Gbps; instead, it means you can't push more than 10 Gbps no matter what you do.

In system design, bandwidth shows up as a constraint when you're:

Designing replication between database nodes (will the replica keep up with the primary?)
Estimating CDN costs (how much data will we serve per month?)
Planning for video streaming (what bitrate can we sustain per user?)

Throughput

Throughput is the actual amount of work done per unit of time, such as requests per second, transactions per second, or bytes per second. Unlike bandwidth (which is the ceiling), throughput is what you actually observe.

Throughput is limited by the weakest link in your system. You can have 10 Gbps of network bandwidth, but if your database can only handle 1,000 queries per second, that's your throughput ceiling for read-heavy workloads.

Improving throughput usually means:

Horizontal scaling (more workers handling requests in parallel)
Caching (serve more requests without hitting the bottleneck)
Query optimization (each unit of work takes less time)
Batching (amortize fixed overhead across multiple operations)

Quiz Time

Your system has 10 Gbps of network bandwidth but your database handles only 500 queries per second. What is the effective throughput ceiling for read-heavy workloads?

algobase.dev

Horizontal scaling triples throughput — three servers handle requests in parallel. Per-request latency stays the same; we're doing more work per second. The database is now the bottleneck. Adding more app servers beyond this point won't help until DB capacity increases.

1 / 1

Load Balanced Throughput: parallel server instances tripling request throughput under load

Latency

Latency is the time it takes to complete a single operation, from request to response. It's measured in milliseconds (ms) or microseconds (µs).

Latency has a hard floor: the speed of light. A round trip between New York and London is about 70ms at best; that's just physics. You can't engineer your way past it. Everything else (processing time, queueing delays, serialization) adds on top.

Sources of latency:

Network latency: physical distance + routing hops
Processing latency: CPU time to actually do the work
Queueing latency: time spent waiting when the system is busy
Disk I/O latency: reading from SSDs (µs) vs HDDs (ms) vs network storage (ms-s)

Quiz Time

The speed of light sets a hard minimum floor on network latency that cannot be reduced through software optimization.

algobase.dev

Single server — each request travels the full path sequentially. P50 (median) latency looks fine at ~50ms. But P99 is 800ms: 1 in 100 requests gets hit by a GC pause, lock wait, or cold cache miss. At 1,000 req/s, that's 10 slow requests every second.

1 / 1

Single Server Latency: request and query round trips showing typical P50 vs outlier P99 latencies

The Latency-Throughput Trade-off

Here's where it gets interesting: latency and throughput often trade off against each other.

The classic example is batching. If you send database writes one at a time, each write has low latency (it completes quickly). But your throughput is limited since you can only do one write at a time, waiting for each to finish.

If instead you batch 100 writes together and send them in one transaction, your throughput skyrockets because you've amortized the overhead. But the first write in the batch now has to wait for 99 others before being committed. Higher throughput, higher latency.

This trade-off shows up everywhere:

Kafka batches messages for high throughput at the cost of some latency
TCP Nagle's algorithm buffers small packets to reduce overhead, which is great for throughput but bad for interactive applications
Database connection pooling increases throughput but adds queuing latency under load

Quiz Time

Batching 100 writes into a single transaction always reduces overall system latency.

algobase.dev

Batching: the throughput-latency trade-off in action. Clients get an instant ack (<1ms) — great for perceived latency. The batch writer flushes 100 writes every 100ms, amortizing disk/network overhead. Write throughput is 20× higher, but the first write in a batch waits up to 100ms before it's durable. This is exactly the trade-off Kafka and database connection pools make.

1 / 1

Batching Trade-off: Write Queue and Batch Writer boosting throughput at the cost of waiting latency

P50, P95, P99: Why Averages Lie

Here's a trap almost everyone falls into: using average latency as your performance metric.

Imagine a service where 99% of requests complete in 10ms and 1% take 10 seconds. The average might look like 110ms, which seems "acceptable." But 1% of your users are experiencing catastrophic slowness. At 1,000 requests per second, that's 10 users per second having a terrible time.

This is why the industry uses percentile latencies:

P50 (median): Half of requests are faster than this, half are slower, representing the "typical" experience.
P95: 95% of requests complete faster than this, while 5% are slower.
P99: 99% of requests complete faster than this. Only 1% are slower, but at scale, 1% is a lot of people.
P99.9: The "tail" latency or extreme outliers, often caused by garbage collection pauses, lock contention, or cold cache misses.

Quiz Time

A service processes 1,000 requests per second. P50 latency is 10ms and P99 latency is 8 seconds. Approximately how many users per second experience the slow response?

Rule of thumb: optimize for P99, not P50. Your average user experience doesn't determine your reputation; instead, your worst 1% does.

Why P99 Matters More Than You Think

At scale, percentiles compound. Amazon's research showed that a page with 100 service calls, where each has a 99.9% success rate, has only a 90% chance of all calls succeeding. Similarly, if each call has a 1% chance of being slow, a page with 100 calls has nearly a 100% chance of at least one slow call.

This is the long-tail problem of distributed systems. High-percentile latency (P99, P99.9) affects nearly every user in complex systems even if each individual service looks healthy.

Quiz Time

Why does P99 latency matter more than P50 latency in complex distributed systems?

Real Examples

Database reads: A PostgreSQL query might show P50 = 2ms, P99 = 150ms. The difference is usually index usage, as some queries hit edge cases that require full scans or lock waits.

API responses: A typical web API might show P50 = 50ms, P99 = 800ms. The tail is often caused by garbage collection pauses in the JVM/Node.js, cold connections to downstream services, or request queueing during traffic spikes.

Network hops: A CDN edge node 20ms from a user beats an origin server 200ms away for every percentile. Geography matters.

Putting It Together

Metric	Question it answers	Improved by
Bandwidth	What's the maximum data rate possible?	Better hardware, more network links
Throughput	How much work is the system actually doing?	Parallelism, caching, batching
Latency (P50)	What's the typical user experience?	Faster algorithms, caching, proximity
Latency (P99)	What's the worst common experience?	Eliminating outliers: GC tuning, timeouts, retry budgets

Summary

Bandwidth is the capacity ceiling, throughput is what you actually achieve, and latency is how fast individual operations complete. The highway analogy holds: bandwidth is lanes, throughput is cars passing through, latency is travel time. Latency and throughput often trade off; batching increases throughput but raises latency. Most importantly: measure latency with percentiles, not averages. P99 latency is what your worst 1% of users experience, and at scale, 1% is a huge number of people. Optimize for the tail.

Consistent Hashing

How helpful was this content?

Comments

0/2000

Saved on this device only