Cloud Design Patterns - Saga distributed transactions pattern

The Saga design pattern is a way to manage data consistency across microservices in distributed transaction scenarios.

A saga is a sequence of transactions that updates each service and publishes a message or event to trigger the next transaction step.

If a step fails, the saga executes compensating transactions that counteract the preceding transactions.

Reading material

  1. https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/saga/saga
  2. https://microservices.io/patterns/data/saga.html

Problem

How to implement transactions that span services?

Solution

The Saga pattern provides transaction management using a sequence of local transactions. A local transaction is the atomic work effort performed by a saga participant. Each local transaction updates the database and publishes a message or event to trigger the next local transaction in the saga. If a local transaction fails, the saga executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.

There are two common saga implementation approaches, choreography and orchestration. Each approach has its own set of challenges and technologies to coordinate the workflow.

Choreography

https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/saga/saga#choreography

Choreography is a way to coordinate sagas where participants exchange events without a centralized point of control. With choreography, each local transaction publishes domain events that trigger local transactions in other services.

Orchestration

https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/saga/saga#orchestration

Orchestration is a way to coordinate sagas where a centralized controller tells the saga participants what local transactions to execute. The saga orchestrator handles all the transactions and tells the participants which operation to perform based on events. The orchestrator executes saga requests, stores and interprets the states of each task, and handles failure recovery with compensating transactions.


Links to this note