Correlation IDs

Updated June 8, 2026

Magic Magnets Team

8 min read

A correlation ID is a unique identifier attached to a request at the point it enters your system. Every service that touches that request includes the ID in its logs. When a failure occurs, you search your log aggregator by that ID and get a chronological trace of every event across every service involved.

Without correlation IDs, you know from metrics that errors are spiking. You know from logs that the Payment Service threw a timeout at 10:02 AM. But the Payment Service handles thousands of transactions per minute. You have no way to identify which timeout belongs to which user's failed checkout, or what happened in the other services before the failure.

How it Works

algobase.dev

API Gateway generates a correlation ID and passes it in headers to every downstream service. Each service logs the ID, enabling a single search query to reconstruct the full request history.

1 / 1

Correlation ID generated at the gateway and propagated through all downstream services

Generation: the ID is created once, at the system's entry point. An API Gateway or edge router generates a UUID when a request arrives. Common header names: X-Correlation-ID, X-Request-ID, or the W3C standard traceparent.

Propagation: every service that calls a downstream service includes the correlation ID in the outgoing request headers. If the Auth Service calls the User Profile Service, it forwards the ID. If the Order Service publishes to a message queue, it includes the ID in the message envelope.

Logging: every service logs the correlation ID alongside every event it records for that request. Because the ID appears in every log line across every service, a single search query reconstructs the full history.

correlation_id: "req-1234-abcd"

[10:02:01] [API Gateway]        Request received. Path: /checkout
[10:02:01] [Auth Service]       Token validated for user u_99
[10:02:02] [Order Service]      Order ord_55 created in pending state
[10:02:02] [Inventory Service]  Item prod_8 reserved
[10:02:05] [Payment Service]    ERROR: Stripe API timeout after 5000ms
[10:02:05] [Order Service]      Rolling back ord_55

The engineer searching this output spent one second issuing a query. Without the ID, the same investigation could take hours of manual log correlation.

Return the ID to the Client

When a 500 error is returned to the client, include the correlation ID in the response body:

{
  "error": "checkout_failed",
  "message": "An error occurred processing your payment.",
  "correlationId": "req-1234-abcd"
}

When a user reports a failure and can copy the correlation ID from the error screen or browser console, your support team can find the exact incident in seconds.

Context Propagation

The main operational challenge is ensuring the ID is forwarded everywhere. If the Order Service forgets to include the header when it calls the Payment Service, the chain breaks. The Payment Service generates a new ID, and the two halves of the trace are disconnected.

Most application frameworks provide built-in support for this. Java's MDC (Mapped Diagnostic Context) and Node.js's AsyncLocalStorage both allow you to store the correlation ID in thread-local or async-local storage for the duration of a request, so it's automatically included in every log without passing it explicitly to every function. OpenTelemetry's context propagation does this automatically when you use it for distributed tracing, and the correlation ID becomes the trace ID.

Relationship to Distributed Tracing

Correlation IDs and distributed traces solve related problems. A correlation ID links all log lines for a request. A distributed trace links all timing spans for a request and reconstructs the call tree.

In practice, if you adopt OpenTelemetry for distributed tracing, the trace ID is your correlation ID. Use it in your structured logs and you get both: log correlation across services and the visual span timeline in your tracing backend.

For teams not yet running a full distributed tracing stack, correlation IDs alone are a low-cost, high-value starting point. They require no tracing backend, no instrumentation library, and no sampling configuration. Just generate a UUID at the edge, propagate it through headers, and include it in every log line.

Summary

A correlation ID is a UUID generated at the edge and carried through every service call and log line for the duration of a request. It turns isolated log entries across dozens of services into a single searchable timeline for any given request. Include it in error responses to the client so users can hand you the exact identifier when they report a problem. Use AsyncLocalStorage, MDC, or OpenTelemetry context propagation to avoid manually threading it through every function. This is one of the lowest-cost, highest-impact observability changes you can make to a distributed system.

Metrics & Instrumentation

How helpful was this content?

Comments

0/2000

Saved on this device only