SAGA Pattern
Updated June 3, 2026The Problem with Distributed Transactions
In a monolith, if a user books a flight and a hotel, both actions happen in a single SQL database. If the hotel booking fails, the database simply rolls back the flight booking automatically.
But in a modern microservices architecture, the Flight Service and the Hotel Service have their own separate databases. You can't use a single SQL transaction. While older systems tried using the Two-Phase Commit (2PC) protocol to lock both databases, 2PC is slow, blocks resources, and scales terribly.
So how do companies like Uber and Amazon handle multi-step transactions across dozens of microservices? They use the Saga Pattern.
Why can't microservices use a single SQL transaction to span multiple services?
The Core Concept
A Saga is a sequence of local, independent transactions.
Instead of trying to lock every database at once, the Saga pattern embraces the reality of distributed systems: Let each service do its job and commit its data immediately.
If a later step in the sequence fails, you don't "roll back" the database (because the data was already committed). Instead, you run Compensating Transactions (new transactions that reverse the work of the previous ones).
Analogy: You are booking a vacation.
- You book the flight (Success! Credit card charged).
- You try to book the hotel (Fail! No rooms left).
- You can't just pretend the flight booking never happened. You have to actively call the airline, cancel the flight, and get a refund. That cancellation is a compensating transaction.
What is a compensating transaction?
Two Ways to Implement Sagas
There are two primary architectures for coordinating a Saga: Choreography and Orchestration.
1. Choreography (Event-Driven)
In choreography, there is no central brain. Microservices just publish and listen to events on a message broker (like Kafka or RabbitMQ).
- The Order Service creates an order and emits an
OrderCreatedevent. - The Inventory Service hears that event, reserves the item, and emits an
InventoryReservedevent. - The Payment Service hears that event, tries to charge the card, but it declines. It emits a
PaymentFailedevent. - The Inventory and Order services hear the failure event and run their compensating transactions (restocking the item and marking the order as "Cancelled").
Pros: Highly decoupled, no single point of failure. Cons: Can become a tangled "spaghetti" of events that is very hard to debug and monitor as the system grows.
In Choreography-based Sagas, services communicate by calling each other's APIs directly.
2. Orchestration (Command-Driven)
In orchestration, there is a central "Manager" (the Orchestrator) that tells the services what to do.
- The Orchestrator tells the Order Service to create the order.
- Then it tells the Inventory Service to reserve the item.
- Then it tells the Payment Service to charge the card.
- When the Payment Service returns an error, the Orchestrator explicitly sends a "Cancel Order" command to the Order Service and a "Restock Item" command to the Inventory Service.
Pros: Very clear workflow. Easy to see the current state of a transaction. Cons: The Orchestrator becomes a single point of failure and tightly couples the business logic.
Which Saga implementation makes it easiest to see the current state of a multi-step transaction?
The Catch: Isolation
Because Sagas commit data immediately, they lack "Isolation" (the 'I' in ACID). If a user looks at their account halfway through a Saga, they might see a flight booked, even if a few seconds later it gets cancelled by a compensating transaction.
Developers have to handle this UI/UX challenge, often by using statuses like "Pending" or "Processing" until the entire Saga completes.
The Saga pattern provides full ACID isolation between microservices.
What is the primary downside of the Choreography approach as a system grows larger?
Summary
- The Saga Pattern manages distributed transactions across microservices without using heavy database locks.
- It relies on sequential local transactions.
- If a step fails, the system executes Compensating Transactions to undo the previous steps.
- It can be implemented via Choreography (decentralized events) or Orchestration (a central manager).
- It trades strong consistency and isolation for high availability and scalability.
How helpful was this content?
Comments
Sign in to join the discussion
Saved on this device only
Sign in to sync progress across devices