Distributed Transactions & the SAGA Pattern — System Design Masterclass

In a monolith, transactions are easy. You wrap your database operations in a transaction, and either everything commits or everything rolls back. ACID guarantees handle the rest. Then you split into microservices. The order service has its own database. The payment service has its own database. The inventory service has its own database. You need to charge the customer, reserve inventory, and create the order — and if any step fails, you need to undo the others. There’s no single database transaction that spans all three. Welcome to the hardest problem in distributed systems.

SAGA pattern with orchestration -- coordinator managing steps

The Problem: No Distributed ACID

In a monolith, placing an order looks like this:

BEGIN TRANSACTION;
    INSERT INTO orders (id, user_id, total) VALUES ('ord_789', 'usr_123', 99.99);
    UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 'prod_456';
    INSERT INTO payments (order_id, amount, status) VALUES ('ord_789', 99.99, 'charged');
COMMIT;
-- If anything fails, the entire transaction rolls back. Simple.

In microservices, those three operations live in three different databases on three different machines. You can’t wrap them in a single BEGIN/COMMIT. The fundamental challenge: how do you maintain consistency across multiple services when any service can fail at any point?

Four things can go wrong:

Partial failure. Payment succeeds, but inventory update fails. Customer is charged for an out-of-stock item.
Network partition. The order service can’t reach the payment service. Is the payment in progress? Did it succeed? Did it fail?
Service crash. The inventory service crashes after reserving stock but before confirming. The reservation is stuck.
Timeout ambiguity. You sent a payment request and got a timeout. Did the payment go through or not? You can’t safely retry (double charge) or ignore (lost revenue).

There are two established solutions: two-phase commit (2PC) and the SAGA pattern.

Two-Phase Commit (2PC)

2PC is the textbook answer to distributed transactions. A coordinator orchestrates the transaction in two phases.

Phase 1: Prepare. The coordinator asks every participant: “Can you commit this transaction?” Each participant executes the transaction locally (but doesn’t commit), acquires locks, and votes YES or NO.

Phase 2: Commit/Abort. If all participants voted YES, the coordinator sends COMMIT. If any voted NO, the coordinator sends ABORT. Every participant follows the coordinator’s decision.

class TwoPhaseCommitCoordinator:
    def __init__(self, participants):
        self.participants = participants  # [PaymentService, InventoryService, OrderService]
        self.transaction_log = TransactionLog()

    def execute(self, transaction):
        tx_id = generate_tx_id()
        self.transaction_log.write("STARTED", tx_id)

        # Phase 1: Prepare
        votes = {}
        for participant in self.participants:
            try:
                vote = participant.prepare(tx_id, transaction)
                votes[participant.name] = vote
            except Exception:
                votes[participant.name] = "NO"

        # Phase 2: Commit or Abort
        if all(v == "YES" for v in votes.values()):
            self.transaction_log.write("COMMITTING", tx_id)
            for participant in self.participants:
                participant.commit(tx_id)
            self.transaction_log.write("COMMITTED", tx_id)
        else:
            self.transaction_log.write("ABORTING", tx_id)
            for participant in self.participants:
                if votes.get(participant.name) == "YES":
                    participant.abort(tx_id)
            self.transaction_log.write("ABORTED", tx_id)


class PaymentService:
    def prepare(self, tx_id, transaction):
        """Reserve the payment but don't charge yet."""
        auth = self.payment_gateway.authorize(
            amount=transaction.total,
            customer_id=transaction.user_id
        )
        self.pending_transactions[tx_id] = auth
        return "YES"

    def commit(self, tx_id):
        """Capture the authorized payment."""
        auth = self.pending_transactions.pop(tx_id)
        self.payment_gateway.capture(auth.authorization_id)

    def abort(self, tx_id):
        """Void the authorization."""
        auth = self.pending_transactions.pop(tx_id)
        self.payment_gateway.void(auth.authorization_id)

Why 2PC Falls Short

2PC works, but it has serious problems in practice:

Blocking. During Phase 1, every participant holds locks on the data. If the coordinator crashes between Phase 1 and Phase 2, all participants are stuck holding locks indefinitely. No one knows whether to commit or abort. The data is locked and unavailable.

Latency. 2PC requires multiple round trips between the coordinator and every participant. In a system with 5 participants, that’s 10 network round trips minimum. At 50ms per hop, you’re looking at 500ms for the happy path.

Single point of failure. The coordinator is the bottleneck. If it crashes at the wrong moment, the entire transaction is in limbo. Recovery requires reading the transaction log and re-contacting all participants.

Not partition-tolerant. If a network partition separates the coordinator from a participant, the protocol stalls. This violates the CAP theorem’s availability guarantee.

Timeline of a 2PC failure:

t=0    Coordinator sends PREPARE to all 3 participants
t=50   All 3 vote YES (locks acquired on all 3 databases)
t=51   Coordinator writes COMMITTING to its log
t=52   Coordinator CRASHES before sending COMMIT to anyone
t=52+  All 3 participants are holding locks, waiting for COMMIT
       ... minutes pass ...
       No one can read or write the locked rows
       Manual intervention required to recover

For this reason, 2PC is rarely used across microservices. It’s common within a single database (MySQL and PostgreSQL use 2PC internally for distributed queries) but not across independent services. The SAGA pattern is the modern alternative.

The SAGA Pattern

A saga is a sequence of local transactions. Each step has a corresponding compensating action. If step N fails, you execute the compensating actions for steps N-1, N-2, … down to step 1.

Forward flow (happy path):
  Step 1: Create order (status: pending)
  Step 2: Reserve inventory
  Step 3: Charge payment
  Step 4: Confirm order (status: confirmed)

Compensation flow (Step 3 fails):
  Compensate Step 2: Release inventory reservation
  Compensate Step 1: Cancel order (status: cancelled)

The critical insight: each step is a local ACID transaction in its own database. There’s no distributed lock. Each service maintains its own consistency. The saga coordinates the overall flow.

class OrderSaga:
    """Saga definition: steps and their compensations."""

    steps = [
        {
            "name": "create_order",
            "action": "order_service.create_order",
            "compensation": "order_service.cancel_order",
        },
        {
            "name": "reserve_inventory",
            "action": "inventory_service.reserve",
            "compensation": "inventory_service.release_reservation",
        },
        {
            "name": "charge_payment",
            "action": "payment_service.charge",
            "compensation": "payment_service.refund",
        },
        {
            "name": "confirm_order",
            "action": "order_service.confirm_order",
            "compensation": None,  # Last step: no compensation needed
        },
    ]

There are two ways to coordinate a saga: orchestration and choreography.

Orchestration: The Coordinator

An orchestrator (also called a saga coordinator or saga execution controller) manages the entire flow. It tells each service what to do and handles failures by calling compensations.

class SagaOrchestrator:
    def __init__(self, saga_definition, services):
        self.saga = saga_definition
        self.services = services
        self.state_store = SagaStateStore()  # Persistent state

    async def execute(self, saga_id, context):
        """Execute saga steps sequentially. Compensate on failure."""
        self.state_store.create(saga_id, status="STARTED", context=context)
        completed_steps = []

        for step in self.saga.steps:
            try:
                self.state_store.update(saga_id, current_step=step["name"],
                                        status="EXECUTING")

                service = self.services[step["name"]]
                result = await service.execute(step["action"], context)
                context.update(result)  # Pass results to next step
                completed_steps.append(step)

                self.state_store.update(saga_id, current_step=step["name"],
                                        status="STEP_COMPLETED")

            except Exception as e:
                self.state_store.update(saga_id, status="COMPENSATING",
                                        failed_step=step["name"], error=str(e))
                await self._compensate(saga_id, completed_steps, context)
                return {"status": "FAILED", "failed_step": step["name"],
                        "error": str(e)}

        self.state_store.update(saga_id, status="COMPLETED")
        return {"status": "COMPLETED", "context": context}

    async def _compensate(self, saga_id, completed_steps, context):
        """Execute compensating actions in reverse order."""
        for step in reversed(completed_steps):
            if step["compensation"] is None:
                continue
            try:
                service = self.services[step["name"]]
                await service.execute(step["compensation"], context)
                self.state_store.update(
                    saga_id, compensated_step=step["name"]
                )
            except Exception as e:
                # Compensation failed -- this needs manual intervention
                self.state_store.update(
                    saga_id, status="COMPENSATION_FAILED",
                    stuck_step=step["name"], error=str(e)
                )
                alert_operations_team(saga_id, step["name"], e)
                # Do NOT re-raise. Continue compensating other steps.


# Usage
orchestrator = SagaOrchestrator(
    saga_definition=OrderSaga,
    services={
        "create_order": OrderServiceClient(),
        "reserve_inventory": InventoryServiceClient(),
        "charge_payment": PaymentServiceClient(),
        "confirm_order": OrderServiceClient(),
    }
)

result = await orchestrator.execute(
    saga_id="saga_001",
    context={"user_id": "usr_123", "product_id": "prod_456", "quantity": 1,
             "amount": 99.99}
)

Orchestrator State Machine

The orchestrator’s state machine is critical for recovery. If the orchestrator crashes mid-saga, it reads its state from the persistent store and resumes from where it left off.

STARTED --> EXECUTING(step_1) --> STEP_COMPLETED(step_1)
        --> EXECUTING(step_2) --> STEP_COMPLETED(step_2)
        --> EXECUTING(step_3) --> FAILED(step_3)
        --> COMPENSATING --> COMPENSATED(step_2)
                         --> COMPENSATED(step_1)
                         --> SAGA_FAILED

On restart: read state, resume from current position
  - If EXECUTING: retry the step (requires idempotent actions)
  - If COMPENSATING: continue compensating remaining steps
  - If COMPENSATION_FAILED: alert for manual intervention

Store the saga state in a durable datastore (PostgreSQL, DynamoDB) — not in memory, not in Redis. This is your recovery mechanism.

Choreography: Event-Driven Sagas

SAGA pattern with choreography -- event-driven steps

In choreography, there’s no central coordinator. Each service listens for events and reacts. Service A publishes an event, Service B picks it up and does its work, publishes another event, and so on.

# Order Service
class OrderService:
    async def create_order(self, order_data):
        order = self.db.insert_order(order_data, status="PENDING")
        await self.event_bus.publish("order.created", {
            "order_id": order.id,
            "user_id": order_data["user_id"],
            "product_id": order_data["product_id"],
            "quantity": order_data["quantity"],
            "amount": order_data["amount"],
        })
        return order

    async def on_payment_charged(self, event):
        self.db.update_order(event["order_id"], status="CONFIRMED")
        await self.event_bus.publish("order.confirmed", event)

    async def on_inventory_reservation_failed(self, event):
        self.db.update_order(event["order_id"], status="CANCELLED")
        await self.event_bus.publish("order.cancelled", event)


# Inventory Service
class InventoryService:
    async def on_order_created(self, event):
        try:
            self.db.reserve_inventory(event["product_id"], event["quantity"])
            await self.event_bus.publish("inventory.reserved", {
                "order_id": event["order_id"],
                "product_id": event["product_id"],
                "quantity": event["quantity"],
                "amount": event["amount"],
                "user_id": event["user_id"],
            })
        except InsufficientStockError:
            await self.event_bus.publish("inventory.reservation_failed", {
                "order_id": event["order_id"],
                "reason": "out_of_stock",
            })

    async def on_payment_failed(self, event):
        """Compensate: release the reservation."""
        self.db.release_reservation(event["product_id"], event["quantity"])
        await self.event_bus.publish("inventory.released", event)


# Payment Service
class PaymentService:
    async def on_inventory_reserved(self, event):
        try:
            charge = self.payment_gateway.charge(
                user_id=event["user_id"],
                amount=event["amount"],
                idempotency_key=f"order_{event['order_id']}"
            )
            await self.event_bus.publish("payment.charged", {
                "order_id": event["order_id"],
                "charge_id": charge.id,
                **event,
            })
        except PaymentError as e:
            await self.event_bus.publish("payment.failed", {
                "order_id": event["order_id"],
                "reason": str(e),
                "product_id": event["product_id"],
                "quantity": event["quantity"],
            })

Event flow (happy path):
  order.created --> inventory.reserved --> payment.charged --> order.confirmed

Event flow (payment fails):
  order.created --> inventory.reserved --> payment.failed --> inventory.released
                                                          --> order.cancelled

Orchestration vs Choreography

Dimension	Orchestration	Choreography
Coordination	Central coordinator	Distributed, event-driven
Visibility	Easy to trace (single state machine)	Hard to trace (events scattered)
Coupling	Services coupled to orchestrator	Services coupled to events
Complexity	Orchestrator is complex	Each service is simple, overall flow is complex
Adding steps	Change the orchestrator	Add a new subscriber (no existing code changes)
Debugging	Check orchestrator state	Correlate events across services
Circular deps	Impossible (linear flow)	Possible (service A triggers B triggers A)
Best for	Complex sagas (5+ steps), strict ordering	Simple sagas (2-3 steps), loose coupling

In practice: start with orchestration. It’s easier to reason about, easier to debug, and easier to add monitoring. Switch to choreography only when you have a compelling reason (e.g., services owned by different teams that can’t coordinate releases).

Compensation Logic

Compensation is the most underestimated part of the SAGA pattern. It’s not just “undo the action.” Compensation must handle partial states, timing windows, and external side effects.

class CompensationExamples:
    """Real-world compensation is messy."""

    def compensate_payment(self, order_id, charge_id):
        """Refund a payment. But what if the charge is still processing?"""
        charge = self.payment_gateway.get_charge(charge_id)

        if charge.status == "succeeded":
            self.payment_gateway.refund(charge_id)
        elif charge.status == "pending":
            # Can't refund a pending charge. Wait and retry.
            self.retry_queue.publish("compensate_payment", {
                "order_id": order_id,
                "charge_id": charge_id,
                "retry_count": 1,
            }, delay_seconds=30)
        elif charge.status == "failed":
            # Nothing to compensate. Log and move on.
            pass

    def compensate_inventory(self, product_id, quantity, reservation_id):
        """Release inventory reservation. Idempotent."""
        reservation = self.db.get_reservation(reservation_id)
        if reservation and reservation.status == "active":
            self.db.release_reservation(reservation_id)
            self.db.increment_available(product_id, quantity)
        # If reservation doesn't exist or already released, no-op (idempotent)

    def compensate_shipping(self, shipment_id):
        """Cancel shipping. What if the package already left the warehouse?"""
        shipment = self.shipping_service.get_shipment(shipment_id)

        if shipment.status == "label_created":
            self.shipping_service.cancel(shipment_id)
        elif shipment.status == "in_transit":
            # Too late to cancel. Create a return shipment instead.
            self.shipping_service.create_return(shipment_id)
            # This is a semantic compensation, not a technical undo
        elif shipment.status == "delivered":
            # Way too late. Initiate return process with customer.
            self.customer_service.initiate_return(shipment_id)

Key rules for compensation:

Compensations must be idempotent. The compensation itself might need to be retried.
Semantic, not technical. “Undo” often means “do the business-appropriate reversal,” not “literally reverse the operation.”
Not all actions are compensatable. Sending an email can’t be undone. Sending a push notification can’t be unsent. Design for this — maybe you delay the notification until the saga completes.
Compensation can fail. When it does, you need alerting and manual intervention tooling. This is not an edge case — it will happen in production.

The Outbox Pattern

There’s a subtle but devastating bug in saga implementations. Consider this code:

async def on_order_created(self, event):
    # Step 1: Write to local database
    self.db.reserve_inventory(event["product_id"], event["quantity"])

    # Step 2: Publish event
    await self.event_bus.publish("inventory.reserved", event)

What if the service crashes between Step 1 and Step 2? The inventory is reserved in the database, but the event was never published. The saga is stuck. The order service never hears that inventory was reserved.

The outbox pattern fixes this by writing the event to the same database as the business data, in the same local transaction:

async def on_order_created(self, event):
    # Single local transaction: business data + event in the same commit
    with self.db.transaction():
        self.db.reserve_inventory(event["product_id"], event["quantity"])
        self.db.insert_outbox_event(
            aggregate_id=event["order_id"],
            event_type="inventory.reserved",
            payload=json.dumps({
                "order_id": event["order_id"],
                "product_id": event["product_id"],
                "quantity": event["quantity"],
            })
        )
    # If the transaction commits, both the reservation AND the event are saved.
    # If it rolls back, neither is saved. Atomic.


# Separate process: poll the outbox and publish events
class OutboxPublisher:
    """Runs as a background worker or cron job."""

    async def publish_pending_events(self):
        events = self.db.query(
            "SELECT * FROM outbox WHERE published = FALSE "
            "ORDER BY created_at LIMIT 100"
        )
        for event in events:
            try:
                await self.event_bus.publish(event.event_type,
                                             json.loads(event.payload))
                self.db.execute(
                    "UPDATE outbox SET published = TRUE WHERE id = %s",
                    (event.id,)
                )
            except PublishError:
                # Will retry on next poll
                break

An alternative to polling is change data capture (CDC). Tools like Debezium watch the database’s write-ahead log and publish outbox events automatically. This eliminates the polling delay and is more efficient at scale.

Outbox table schema:

CREATE TABLE outbox (
    id            BIGSERIAL PRIMARY KEY,
    aggregate_id  VARCHAR(255) NOT NULL,
    event_type    VARCHAR(255) NOT NULL,
    payload       JSONB NOT NULL,
    published     BOOLEAN DEFAULT FALSE,
    created_at    TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_outbox_unpublished ON outbox (created_at)
    WHERE published = FALSE;

Idempotency in Sagas

Every step and every compensation in a saga must be idempotent. Messages will be delivered more than once. Services will be called more than once during retries.

class IdempotentInventoryService:
    async def reserve(self, event):
        """Idempotent reservation using the order_id as dedup key."""
        existing = self.db.query_one(
            "SELECT * FROM reservations WHERE order_id = %s",
            (event["order_id"],)
        )
        if existing:
            # Already reserved for this order. Return success.
            return {"reservation_id": existing.id}

        # No existing reservation -- create one
        with self.db.transaction():
            reservation = self.db.execute(
                "INSERT INTO reservations (order_id, product_id, quantity, status) "
                "VALUES (%s, %s, %s, 'active') "
                "ON CONFLICT (order_id) DO NOTHING "
                "RETURNING id",
                (event["order_id"], event["product_id"], event["quantity"])
            )
            if reservation:
                self.db.execute(
                    "UPDATE products SET available = available - %s "
                    "WHERE id = %s AND available >= %s",
                    (event["quantity"], event["product_id"], event["quantity"])
                )

        return {"reservation_id": reservation.id if reservation else existing.id}

The pattern: use a natural business key (order_id, transaction_id) as a deduplication key. Check before writing. Use ON CONFLICT DO NOTHING or equivalent for database-level safety. This ensures that retries and duplicate events produce the same result.

Real-World Example: E-Commerce Order

Putting it all together. Here’s a complete orchestrated saga for an e-commerce order flow:

class ECommerceOrderSaga:
    steps = [
        SagaStep(
            name="create_order",
            action=lambda ctx: order_service.create(
                user_id=ctx["user_id"],
                items=ctx["items"],
                total=ctx["total"]
            ),
            compensation=lambda ctx: order_service.cancel(ctx["order_id"]),
        ),
        SagaStep(
            name="verify_customer",
            action=lambda ctx: customer_service.verify_credit(
                user_id=ctx["user_id"],
                amount=ctx["total"]
            ),
            compensation=None,  # Verification has no side effect
        ),
        SagaStep(
            name="reserve_inventory",
            action=lambda ctx: inventory_service.reserve(
                items=ctx["items"],
                order_id=ctx["order_id"]
            ),
            compensation=lambda ctx: inventory_service.release(
                reservation_id=ctx["reservation_id"]
            ),
        ),
        SagaStep(
            name="process_payment",
            action=lambda ctx: payment_service.charge(
                user_id=ctx["user_id"],
                amount=ctx["total"],
                idempotency_key=f"order_{ctx['order_id']}"
            ),
            compensation=lambda ctx: payment_service.refund(
                charge_id=ctx["charge_id"]
            ),
        ),
        SagaStep(
            name="schedule_shipping",
            action=lambda ctx: shipping_service.create_shipment(
                order_id=ctx["order_id"],
                items=ctx["items"],
                address=ctx["shipping_address"]
            ),
            compensation=lambda ctx: shipping_service.cancel_shipment(
                shipment_id=ctx["shipment_id"]
            ),
        ),
        SagaStep(
            name="confirm_order",
            action=lambda ctx: order_service.confirm(ctx["order_id"]),
            compensation=None,  # Final step
        ),
    ]

# Execution trace (payment fails):
# 1. create_order       --> OK   (order_id: ord_789, status: pending)
# 2. verify_customer    --> OK   (credit: approved)
# 3. reserve_inventory  --> OK   (reservation_id: res_456)
# 4. process_payment    --> FAIL (insufficient funds)
# --- COMPENSATING ---
# 3. release_inventory  --> OK   (reservation res_456 released)
# 1. cancel_order       --> OK   (order ord_789 cancelled)
# (Step 2 has no compensation -- nothing to undo)

Eventual Consistency

Sagas provide eventual consistency, not strong consistency. Between the time a payment is charged and the order is confirmed, the system is in an inconsistent state. This is a fundamental tradeoff.

Strategies for handling the inconsistency window:

# Strategy 1: Optimistic UI
# Show the user "Order placed" immediately.
# Process the saga in the background.
# If it fails, notify the user (email, push notification).

# Strategy 2: Pending states
# Every entity has explicit intermediate states.
# The UI reflects these: "Payment processing...", "Reserving inventory..."
class OrderStatus:
    PENDING = "pending"              # Order created, saga in progress
    PAYMENT_PROCESSING = "payment_processing"
    INVENTORY_RESERVED = "inventory_reserved"
    CONFIRMED = "confirmed"          # Saga completed successfully
    CANCELLED = "cancelled"          # Saga failed, compensated
    COMPENSATION_FAILED = "needs_attention"  # Manual intervention needed

# Strategy 3: Read-your-writes
# After writing, route the user's subsequent reads to the
# write-side database (not the eventually-consistent read replica).
# This ensures the user sees their own changes immediately.

The pragmatic approach: use pending states, show progress to the user, and have clear error handling for failures. Most users understand “processing your order” — they don’t expect instant confirmation for complex operations.

Key Takeaways

Distributed transactions are unavoidable in microservices. Any operation that spans multiple services needs a coordination strategy.
Two-phase commit (2PC) provides strong consistency but blocks resources during the protocol, has a single point of failure (the coordinator), and doesn’t tolerate network partitions. Use it within a single database, not across services.
The SAGA pattern replaces one distributed transaction with a sequence of local transactions. Each step has a compensating action for rollback. No distributed locks, no blocking.
Orchestration uses a central coordinator that manages the saga flow. Easier to reason about, easier to debug, easier to monitor. Start here.
Choreography uses events — each service reacts to events and publishes new ones. Simpler per-service but harder to trace the overall flow. Use for simple sagas or when services are owned by different teams.
Compensation is not “undo.” It’s the business-appropriate reversal for each step. Some actions can’t be undone (emails, notifications). Design your saga step ordering with this in mind — put irreversible actions last.
The outbox pattern prevents the dual-write problem. Write the business data and the event to the same database in one local transaction. A separate process publishes events from the outbox table.
Every saga step must be idempotent. Use natural business keys as deduplication keys. Use ON CONFLICT DO NOTHING at the database level. Assume every message will be delivered at least twice.
Sagas provide eventual consistency. Between steps, the system is in an inconsistent state. Use pending states, optimistic UI, and explicit error handling to manage the inconsistency window.
When compensation fails, you need humans. Build alerting and tooling for stuck sagas. A dashboard showing sagas in COMPENSATION_FAILED state is not optional — it’s a production requirement.