Cloud Design Patterns - Saga distributed transactions pattern
The Saga design pattern is a way to manage data consistency across microservices in distributed transaction scenarios.
A saga is a sequence of transactions that updates each service and publishes a message or event to trigger the next transaction step.
If a step fails, the saga executes compensating transactions that counteract the preceding transactions.
Reading material
- https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/saga/saga
- https://microservices.io/patterns/data/saga.html
Problem
How to implement transactions that span services?
Solution
The Saga pattern provides transaction management using a sequence of local transactions. A local transaction is the atomic work effort performed by a saga participant. Each local transaction updates the database and publishes a message or event to trigger the next local transaction in the saga. If a local transaction fails, the saga executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.
There are two common saga implementation approaches, choreography and orchestration. Each approach has its own set of challenges and technologies to coordinate the workflow.
Choreography
https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/saga/saga#choreography
Choreography is a way to coordinate sagas where participants exchange events without a centralized point of control. With choreography, each local transaction publishes domain events that trigger local transactions in other services.
Orchestration
https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/saga/saga#orchestration
Orchestration is a way to coordinate sagas where a centralized controller tells the saga participants what local transactions to execute. The saga orchestrator handles all the transactions and tells the participants which operation to perform based on events. The orchestrator executes saga requests, stores and interprets the states of each task, and handles failure recovery with compensating transactions.
Choreography vs Orchestrator
Saga orchestration uses a central orchestrator to manage a long-running business transactions, dictating which services to call and in what order. Choreography is a decentralized approach where services communicate directly by publishing and listening for events, without a central coordinator.
The main trade-offs are: orchestration offers better visibility and control over the workflow, while choreography provides greater flexibility, scalability, and autonomy for individual services.
Saga orchestration
How it works:
- A central service, the orchestrator, handles the entire workflow. It issues commands to other services to start a step and receives responses.
Pros:
- Centralized control: Easier to see, manage, and debug the entire process.
- Simple workflows: Clearly defines the steps and their order.
- Visibility: The orchestrator can track the state of the entire transaction.
Cons:
- Single point of failure: The orchestrator can become a bottleneck or a single point of failure if not designed carefully.
- Increased coupling: Services become coupled to the orchestrator, and the orchestrator is coupled to all the services.
- Complexity: The orchestrator itself can become complex to manage.
Saga choreography
How it works:
- Each service performs its task and then publishes an event. Other services listen for these events and react accordingly, without a central controller dictating the next step.
Pros:
- Decentralized: No single point of failure for the workflow logic.
- Loose coupling: Services are only coupled to events, not to each other directly.
- Flexibility: Adding or changing services is easier as long as the event contract is maintained.
Cons:
- Complex interactions: The overall workflow is distributed, making it harder to understand and debug.
- Integration testing: Requires all services to be running to test the full flow, making testing more difficult.
- Less visibility: The overall status of the transaction is not held in one central place.