What is the best way to decompose a monolith into microservices?

Start with Domain-Driven Design to identify bounded contexts. Use the Strangler Fig pattern to incrementally migrate functionality — route traffic through an API gateway and replace monolith endpoints one by one. Avoid a big-bang rewrite. Prioritize domains with the most independent business logic and fewest cross-cutting dependencies.

When should I use synchronous vs asynchronous communication between microservices?

Use synchronous communication (REST or gRPC) when the caller needs an immediate response and the latency is acceptable. Use asynchronous messaging (Kafka, RabbitMQ, NATS) when you need temporal decoupling, higher throughput, or the operation can tolerate eventual consistency. Prefer async for cross-domain events and sync for queries within the same bounded context.

What is the Saga pattern and when should I use it?

The Saga pattern manages distributed transactions across multiple microservices by breaking them into a sequence of local transactions, each with a compensating action for rollback. Use orchestration when you need centralized coordination and visibility, and choreography when services should remain loosely coupled. Sagas replace traditional two-phase commits that do not scale in distributed systems.

What is the difference between CQRS and event sourcing?

CQRS (Command Query Responsibility Segregation) separates read and write models, allowing each to be optimized independently. Event sourcing stores every state change as an immutable event rather than overwriting current state. They are complementary but independent — you can use CQRS without event sourcing, and vice versa. Combined, they provide a powerful audit trail and enable temporal queries.

How does the circuit breaker pattern prevent cascading failures?

The circuit breaker monitors calls to a downstream service. When failures exceed a threshold, it trips to an open state and immediately returns errors without making the call, preventing resource exhaustion. After a timeout it enters half-open state, allowing a few test requests through. If those succeed, it closes; otherwise it reopens. Libraries like Resilience4j, Polly, and Hystrix implement this pattern.

What is the bulkhead pattern in microservices?

The bulkhead pattern isolates resources (thread pools, connection pools, memory) so that a failure in one component does not exhaust resources needed by others. Named after ship bulkheads that prevent flooding from spreading, it limits the blast radius of failures. Implementations include separate thread pools per downstream dependency, container resource limits, and rate limiting per tenant.

How does distributed tracing work with OpenTelemetry?

OpenTelemetry propagates a trace context (trace ID, span ID, flags) through HTTP headers or message metadata across service boundaries. Each service creates spans representing units of work, which are exported to backends like Jaeger, Zipkin, or Grafana Tempo. This provides end-to-end visibility into request flows, latency breakdowns, and error localization across the entire microservices topology.

Should each microservice have its own database?

Yes, the database-per-service pattern is strongly recommended. Each service owns its data schema and storage engine, preventing tight coupling at the database layer. Sharing a database across services creates implicit contracts, makes independent deployment impossible, and leads to schema change conflicts. Use events or APIs for cross-service data access, and accept eventual consistency as a trade-off for autonomy.

Microservices Patterns Guide: Service Design, Communication, Saga, CQRS, Event Sourcing & Observability

Master microservices architecture patterns — service decomposition (DDD bounded contexts, strangler fig), inter-service communication (REST, gRPC, async messaging), API Gateway, saga orchestration vs choreography, CQRS, event sourcing, circuit breaker, bulkhead, distributed tracing (OpenTelemetry), service mesh, and data management strategies with production-ready code examples.

TL;DR — Microservices Patterns in 60 Seconds

Decompose via DDD bounded contexts; migrate incrementally with Strangler Fig
Sync (REST/gRPC) for queries, async messaging for event-driven decoupling
API Gateway as single entry point: routing, auth, rate limiting, protocol translation
Saga for distributed transactions — orchestration (centralized) or choreography (event-driven)
CQRS separates read/write models; event sourcing preserves full state change history
Circuit breaker + bulkhead + exponential backoff retry = resilience trifecta
OpenTelemetry tracing + service mesh (Istio/Linkerd) for end-to-end observability
Database per service — avoid the shared database anti-pattern

Key Takeaways

Microservices are not a silver bullet — adopt only when team size and business complexity demand it
Service boundaries should align with business capabilities, not technical layers
Embrace eventual consistency — abandon the illusion of distributed strong consistency
Observability (logs, metrics, traces) is a lifeline for microservices, not optional
Migrate incrementally with Strangler Fig first — avoid big-bang rewrites
Every pattern has a cost — evaluate whether the benefit exceeds the introduced complexity

1. Service Decomposition Strategies

DDD Bounded Contexts

DDD bounded contexts are the most reliable method for defining microservice boundaries. Each bounded context encapsulates a complete business domain with its own domain model, ubiquitous language, and data store.

// E-commerce bounded contexts example
// Each context becomes a candidate microservice

// Order Context — owns order lifecycle
interface Order {
  orderId: string;
  customerId: string;
  items: OrderItem[];
  status: "pending" | "confirmed" | "shipped" | "delivered";
  totalAmount: number;
}

// Inventory Context — owns stock management
interface InventoryItem {
  sku: string;
  warehouseId: string;
  quantityAvailable: number;
  reservedQuantity: number;
}

// Payment Context — owns payment processing
interface Payment {
  paymentId: string;
  orderId: string;  // reference, not a foreign key
  amount: number;
  method: "credit_card" | "paypal" | "bank_transfer";
  status: "pending" | "authorized" | "captured" | "refunded";
}

// Shipping Context — owns delivery logistics
interface Shipment {
  shipmentId: string;
  orderId: string;
  carrier: string;
  trackingNumber: string;
  estimatedDelivery: Date;
}

Strangler Fig Pattern

The Strangler Fig pattern enables incremental migration from a monolith to microservices. An API gateway or reverse proxy gradually routes traffic from the monolith to new microservices until the monolith is fully replaced.

# Nginx configuration for Strangler Fig migration
# Route new endpoints to microservices, legacy to monolith

upstream monolith {
    server monolith-app:8080;
}

upstream order-service {
    server order-service:3001;
}

upstream inventory-service {
    server inventory-service:3002;
}

server {
    listen 80;

    # Migrated: orders now served by microservice
    location /api/v2/orders {
        proxy_pass http://order-service;
    }

    # Migrated: inventory now served by microservice
    location /api/v2/inventory {
        proxy_pass http://inventory-service;
    }

    # Everything else still goes to the monolith
    location / {
        proxy_pass http://monolith;
    }
}

2. Inter-Service Communication

Synchronous: REST vs gRPC

REST uses HTTP/JSON for simple request-response communication, suitable for external APIs and simple queries. gRPC uses Protocol Buffers and HTTP/2, offering higher performance, type safety, and bidirectional streaming, ideal for internal service-to-service communication.

Feature	REST	gRPC
Protocol	HTTP/1.1 or HTTP/2	HTTP/2
Serialization	JSON (text)	Protobuf (binary)
Type Safety	None (needs OpenAPI codegen)	Built-in (proto codegen)
Streaming	Limited (SSE/WebSocket)	Native bidirectional streaming
Browser Support	Native	Requires grpc-web proxy
Best For	Public APIs, CRUD operations	Internal services, high performance

// gRPC service definition (order.proto)
syntax = "proto3";

package order;

service OrderService {
  rpc CreateOrder (CreateOrderRequest) returns (OrderResponse);
  rpc GetOrder (GetOrderRequest) returns (OrderResponse);
  rpc StreamOrderUpdates (GetOrderRequest) returns (stream OrderEvent);
}

message CreateOrderRequest {
  string customer_id = 1;
  repeated OrderItem items = 2;
}

message OrderItem {
  string sku = 1;
  int32 quantity = 2;
  double unit_price = 3;
}

message OrderResponse {
  string order_id = 1;
  string status = 2;
  double total_amount = 3;
}

Asynchronous Messaging

Asynchronous messaging achieves temporal decoupling through message brokers (Kafka, RabbitMQ, NATS). The sender does not wait for the receiver to finish processing, making it ideal for event-driven architectures and eventual consistency scenarios.

// Publishing domain events with Kafka (Node.js + kafkajs)
import { Kafka, Partitioners } from "kafkajs";

const kafka = new Kafka({
  clientId: "order-service",
  brokers: ["kafka-1:9092", "kafka-2:9092"],
});

const producer = kafka.producer({
  createPartitioner: Partitioners.DefaultPartitioner,
});

interface OrderCreatedEvent {
  eventType: "OrderCreated";
  orderId: string;
  customerId: string;
  totalAmount: number;
  timestamp: string;
}

async function publishOrderCreated(order: OrderCreatedEvent) {
  await producer.connect();
  await producer.send({
    topic: "order-events",
    messages: [
      {
        key: order.orderId,
        value: JSON.stringify(order),
        headers: {
          "event-type": "OrderCreated",
          "correlation-id": crypto.randomUUID(),
        },
      },
    ],
  });
}

// Consuming events in inventory-service
const consumer = kafka.consumer({ groupId: "inventory-group" });

async function startConsumer() {
  await consumer.connect();
  await consumer.subscribe({ topic: "order-events" });
  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      const event = JSON.parse(message.value!.toString());
      if (event.eventType === "OrderCreated") {
        await reserveInventory(event.orderId, event.items);
      }
    },
  });
}

3. API Gateway Pattern

The API Gateway is the single entry point for all client requests, handling routing, authentication, rate limiting, protocol translation, and request aggregation. It hides the internal microservices topology and simplifies client interaction.

// Express.js API Gateway with routing + auth + rate limiting
import express from "express";
import { createProxyMiddleware } from "http-proxy-middleware";
import rateLimit from "express-rate-limit";

const app = express();

// Global rate limiting
app.use(rateLimit({
  windowMs: 60 * 1000,    // 1 minute
  max: 100,               // 100 requests per window
  standardHeaders: true,
}));

// JWT authentication middleware
function authenticate(req, res, next) {
  const token = req.headers.authorization?.split(" ")[1];
  if (!token) return res.status(401).json({ error: "Unauthorized" });
  try {
    req.user = jwt.verify(token, process.env.JWT_SECRET);
    next();
  } catch {
    res.status(403).json({ error: "Invalid token" });
  }
}

// Route to microservices
app.use("/api/orders", authenticate, createProxyMiddleware({
  target: "http://order-service:3001",
  changeOrigin: true,
  pathRewrite: { "^/api/orders": "/orders" },
}));

app.use("/api/inventory", authenticate, createProxyMiddleware({
  target: "http://inventory-service:3002",
  changeOrigin: true,
  pathRewrite: { "^/api/inventory": "/inventory" },
}));

app.use("/api/payments", authenticate, createProxyMiddleware({
  target: "http://payment-service:3003",
  changeOrigin: true,
}));

app.listen(8080, () => console.log("API Gateway on :8080"));

Service Discovery

Service discovery enables microservices to dynamically locate other services at runtime without hardcoded addresses. Client-side discovery (the service queries the registry itself) and server-side discovery (the load balancer queries the registry) are the two primary patterns.

# docker-compose.yml with Consul for service discovery
version: "3.8"

services:
  consul:
    image: hashicorp/consul:1.17
    ports:
      - "8500:8500"
    command: agent -server -bootstrap-expect=1 -ui -client=0.0.0.0

  order-service:
    build: ./order-service
    environment:
      CONSUL_HOST: consul
      SERVICE_NAME: order-service
      SERVICE_PORT: 3001
    depends_on:
      - consul

  inventory-service:
    build: ./inventory-service
    environment:
      CONSUL_HOST: consul
      SERVICE_NAME: inventory-service
      SERVICE_PORT: 3002
    depends_on:
      - consul

4. Saga Pattern: Distributed Transaction Management

The Saga pattern decomposes a distributed transaction into a sequence of local transactions, each with a corresponding compensating action. When a step fails, completed steps are compensated in reverse order, achieving eventual consistency.

Orchestration Saga (Central Coordinator)

The orchestration saga uses a central coordinator (Saga Orchestrator) to manage the entire transaction flow. The coordinator knows the execution order of all steps and triggers compensating actions on failure.

// Saga Orchestrator for Order Processing
interface SagaStep {
  name: string;
  execute: (context: SagaContext) => Promise<void>;
  compensate: (context: SagaContext) => Promise<void>;
}

interface SagaContext {
  orderId: string;
  customerId: string;
  items: Array<{ sku: string; qty: number }>;
  paymentId?: string;
  shipmentId?: string;
}

class SagaOrchestrator {
  private steps: SagaStep[] = [];
  private completedSteps: SagaStep[] = [];

  addStep(step: SagaStep) {
    this.steps.push(step);
    return this;
  }

  async execute(context: SagaContext): Promise<void> {
    for (const step of this.steps) {
      try {
        console.log("Executing: " + step.name);
        await step.execute(context);
        this.completedSteps.push(step);
      } catch (error) {
        console.error("Failed at: " + step.name + ", compensating...");
        await this.compensate(context);
        throw error;
      }
    }
  }

  private async compensate(context: SagaContext): Promise<void> {
    for (const step of this.completedSteps.reverse()) {
      try {
        console.log("Compensating: " + step.name);
        await step.compensate(context);
      } catch (err) {
        console.error("Compensation failed for: " + step.name);
        // Log for manual intervention
      }
    }
  }
}

// Usage
const orderSaga = new SagaOrchestrator()
  .addStep({
    name: "ReserveInventory",
    execute: async (ctx) => { /* call inventory-service */ },
    compensate: async (ctx) => { /* release reserved stock */ },
  })
  .addStep({
    name: "ProcessPayment",
    execute: async (ctx) => { /* call payment-service */ },
    compensate: async (ctx) => { /* refund payment */ },
  })
  .addStep({
    name: "CreateShipment",
    execute: async (ctx) => { /* call shipping-service */ },
    compensate: async (ctx) => { /* cancel shipment */ },
  });

Choreography Saga (Event-Driven)

The choreography saga has no central coordinator. Each service listens for events and publishes new ones, forming an event chain. Services are fully decoupled, but the transaction flow is harder to trace and debug.

Orchestration vs Choreography Decision Guide: Use choreography when fewer than 4 steps and services are highly independent; use orchestration when the flow is complex, needs global visibility, or involves conditional branching.

5. CQRS — Command Query Responsibility Segregation

CQRS separates the read (Query) and write (Command) models into different data models or even different databases. The write model is optimized for consistency and business rules; the read model is optimized for query performance.

// CQRS implementation with separate read/write models

// ── Command Side (Write Model) ──
interface CreateOrderCommand {
  type: "CreateOrder";
  customerId: string;
  items: Array<{ sku: string; quantity: number; price: number }>;
}

class OrderCommandHandler {
  constructor(
    private writeRepo: OrderWriteRepository,
    private eventBus: EventBus
  ) {}

  async handle(cmd: CreateOrderCommand): Promise<string> {
    // Business validation
    if (cmd.items.length === 0) throw new Error("Order must have items");

    const order = {
      id: crypto.randomUUID(),
      customerId: cmd.customerId,
      items: cmd.items,
      status: "pending" as const,
      total: cmd.items.reduce((s, i) => s + i.price * i.quantity, 0),
      createdAt: new Date(),
    };

    await this.writeRepo.save(order);

    // Publish event to sync read model
    await this.eventBus.publish({
      type: "OrderCreated",
      payload: order,
      timestamp: new Date().toISOString(),
    });

    return order.id;
  }
}

// ── Query Side (Read Model) ──
interface OrderReadModel {
  orderId: string;
  customerName: string;      // denormalized
  itemCount: number;         // precomputed
  totalFormatted: string;    // preformatted
  status: string;
  createdAt: string;
}

// Read model projection — listens to events and updates view
class OrderProjection {
  constructor(private readRepo: OrderReadRepository) {}

  async onOrderCreated(event: OrderCreatedEvent) {
    const customer = await lookupCustomer(event.payload.customerId);
    await this.readRepo.upsert({
      orderId: event.payload.id,
      customerName: customer.name,
      itemCount: event.payload.items.length,
      totalFormatted: "$" + event.payload.total.toFixed(2),
      status: event.payload.status,
      createdAt: event.timestamp,
    });
  }
}

6. Event Sourcing

Event sourcing stores every state change as an immutable event instead of persisting only the current state. Current state is reconstructed by replaying the event sequence. This provides a complete audit trail, enables temporal queries, and pairs naturally with CQRS.

// Event Sourcing for an Order aggregate
type OrderEvent =
  | { type: "OrderCreated"; data: { id: string; customerId: string; items: any[] } }
  | { type: "OrderConfirmed"; data: { confirmedAt: string } }
  | { type: "ItemAdded"; data: { sku: string; quantity: number } }
  | { type: "OrderShipped"; data: { trackingNumber: string } }
  | { type: "OrderCancelled"; data: { reason: string } };

interface EventStore {
  append(streamId: string, events: OrderEvent[]): Promise<void>;
  readStream(streamId: string): Promise<OrderEvent[]>;
}

class OrderAggregate {
  private id = "";
  private status = "draft";
  private items: any[] = [];
  private changes: OrderEvent[] = [];

  // Rebuild state from event history
  static fromHistory(events: OrderEvent[]): OrderAggregate {
    const order = new OrderAggregate();
    for (const event of events) {
      order.apply(event);
    }
    return order;
  }

  private apply(event: OrderEvent) {
    switch (event.type) {
      case "OrderCreated":
        this.id = event.data.id;
        this.items = event.data.items;
        this.status = "pending";
        break;
      case "OrderConfirmed":
        this.status = "confirmed";
        break;
      case "OrderShipped":
        this.status = "shipped";
        break;
      case "OrderCancelled":
        this.status = "cancelled";
        break;
    }
  }

  // Command that produces new events
  confirm() {
    if (this.status !== "pending") throw new Error("Only pending orders");
    const event: OrderEvent = {
      type: "OrderConfirmed",
      data: { confirmedAt: new Date().toISOString() },
    };
    this.apply(event);
    this.changes.push(event);
  }

  getUncommittedChanges(): OrderEvent[] {
    return [...this.changes];
  }
}

7. Circuit Breaker Pattern

The circuit breaker pattern prevents cascading failures. When the failure rate of downstream calls exceeds a threshold, the breaker opens and returns errors immediately without making requests, protecting upstream resources. After a timeout, it enters half-open state and allows a few probe requests through.

// Circuit Breaker implementation in TypeScript
type CircuitState = "CLOSED" | "OPEN" | "HALF_OPEN";

class CircuitBreaker {
  private state: CircuitState = "CLOSED";
  private failureCount = 0;
  private successCount = 0;
  private lastFailureTime = 0;

  constructor(
    private failureThreshold: number = 5,
    private resetTimeoutMs: number = 30000,
    private halfOpenMaxCalls: number = 3
  ) {}

  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === "OPEN") {
      if (Date.now() - this.lastFailureTime > this.resetTimeoutMs) {
        this.state = "HALF_OPEN";
        this.successCount = 0;
      } else {
        throw new Error("Circuit is OPEN — request rejected");
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess() {
    if (this.state === "HALF_OPEN") {
      this.successCount++;
      if (this.successCount >= this.halfOpenMaxCalls) {
        this.state = "CLOSED";
        this.failureCount = 0;
      }
    } else {
      this.failureCount = 0;
    }
  }

  private onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.failureThreshold) {
      this.state = "OPEN";
    }
  }
}

// Usage
const breaker = new CircuitBreaker(5, 30000, 3);

async function getOrderFromService(orderId: string) {
  return breaker.call(async () => {
    const res = await fetch(
      "http://order-service:3001/orders/" + orderId
    );
    if (!res.ok) throw new Error("Service error: " + res.status);
    return res.json();
  });
}

Resilience4j Configuration (Java/Spring Boot)

# application.yml — Resilience4j circuit breaker config
resilience4j:
  circuitbreaker:
    instances:
      orderService:
        registerHealthIndicator: true
        slidingWindowType: COUNT_BASED
        slidingWindowSize: 10
        failureRateThreshold: 50
        waitDurationInOpenState: 30s
        permittedNumberOfCallsInHalfOpenState: 3
        automaticTransitionFromOpenToHalfOpenEnabled: true
        recordExceptions:
          - java.io.IOException
          - java.util.concurrent.TimeoutException
  retry:
    instances:
      orderService:
        maxAttempts: 3
        waitDuration: 1s
        enableExponentialBackoff: true
        exponentialBackoffMultiplier: 2
  bulkhead:
    instances:
      orderService:
        maxConcurrentCalls: 25
        maxWaitDuration: 500ms

8. Bulkhead Pattern

The bulkhead pattern borrows from the watertight compartment concept in ship design. By isolating resource pools (thread pools, connection pools, memory), it ensures that a failure in one downstream service does not exhaust the entire system, limiting the blast radius of failures.

// Bulkhead pattern — isolated resource pools per service
class Bulkhead {
  private activeCount = 0;
  private queue: Array<{ resolve: Function; reject: Function }> = [];

  constructor(
    private maxConcurrent: number,
    private maxQueueSize: number,
    private timeoutMs: number
  ) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.activeCount >= this.maxConcurrent) {
      if (this.queue.length >= this.maxQueueSize) {
        throw new Error("Bulkhead full — request rejected");
      }
      await new Promise<void>((resolve, reject) => {
        const timer = setTimeout(
          () => reject(new Error("Bulkhead queue timeout")),
          this.timeoutMs
        );
        this.queue.push({
          resolve: () => { clearTimeout(timer); resolve(); },
          reject,
        });
      });
    }

    this.activeCount++;
    try {
      return await fn();
    } finally {
      this.activeCount--;
      if (this.queue.length > 0) {
        const next = this.queue.shift()!;
        next.resolve();
      }
    }
  }
}

// Separate bulkheads per downstream dependency
const orderBulkhead = new Bulkhead(10, 20, 5000);
const paymentBulkhead = new Bulkhead(5, 10, 3000);
const inventoryBulkhead = new Bulkhead(15, 30, 5000);

9. Retry with Exponential Backoff

Transient failures (network jitter, temporary overload) can be resolved with retries, but fixed-interval retries risk thundering herd effects. Exponential backoff with jitter effectively distributes retry attempts, preventing simultaneous bursts of retries to downstream services.

// Retry with exponential backoff + jitter
interface RetryConfig {
  maxAttempts: number;
  baseDelayMs: number;
  maxDelayMs: number;
  jitter: boolean;
}

async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  config: RetryConfig
): Promise<T> {
  let lastError: Error | undefined;

  for (let attempt = 0; attempt < config.maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error as Error;

      if (attempt === config.maxAttempts - 1) break;

      // Calculate delay: base * 2^attempt
      let delay = Math.min(
        config.baseDelayMs * Math.pow(2, attempt),
        config.maxDelayMs
      );

      // Add jitter: random value between 0 and delay
      if (config.jitter) {
        delay = Math.random() * delay;
      }

      console.log(
        "Attempt " + (attempt + 1) + " failed. " +
        "Retrying in " + Math.round(delay) + "ms..."
      );
      await new Promise(r => setTimeout(r, delay));
    }
  }

  throw lastError;
}

// Usage
const result = await retryWithBackoff(
  () => fetch("http://payment-service:3003/charge"),
  { maxAttempts: 4, baseDelayMs: 500, maxDelayMs: 8000, jitter: true }
);

10. Distributed Tracing with OpenTelemetry

OpenTelemetry is the CNCF observability standard. By propagating trace context (trace ID + span ID) across service boundaries, it provides end-to-end request tracing, latency analysis, and error localization for every request.

// OpenTelemetry setup for a Node.js microservice
import { NodeSDK } from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from
  "@opentelemetry/auto-instrumentations-node";
import { OTLPTraceExporter } from
  "@opentelemetry/exporter-trace-otlp-http";
import { Resource } from "@opentelemetry/resources";
import {
  SEMRESATTRS_SERVICE_NAME,
  SEMRESATTRS_SERVICE_VERSION,
} from "@opentelemetry/semantic-conventions";

const sdk = new NodeSDK({
  resource: new Resource({
    [SEMRESATTRS_SERVICE_NAME]: "order-service",
    [SEMRESATTRS_SERVICE_VERSION]: "1.2.0",
  }),
  traceExporter: new OTLPTraceExporter({
    url: "http://jaeger:4318/v1/traces",
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();
console.log("OpenTelemetry tracing initialized");

// Custom span for business logic
import { trace, SpanStatusCode } from "@opentelemetry/api";

const tracer = trace.getTracer("order-service");

async function processOrder(orderId: string) {
  return tracer.startActiveSpan(
    "processOrder",
    async (span) => {
      try {
        span.setAttribute("order.id", orderId);
        const order = await fetchOrder(orderId);
        span.setAttribute("order.total", order.total);
        span.setAttribute("order.items_count", order.items.length);

        await validateOrder(order);
        await chargePayment(order);

        span.setStatus({ code: SpanStatusCode.OK });
        return order;
      } catch (error) {
        span.setStatus({
          code: SpanStatusCode.ERROR,
          message: (error as Error).message,
        });
        span.recordException(error as Error);
        throw error;
      } finally {
        span.end();
      }
    }
  );
}

11. Service Mesh (Istio / Linkerd)

A service mesh transparently handles inter-service communication through sidecar proxies, providing traffic management, security (mTLS), observability, and resilience capabilities without modifying application code. Istio and Linkerd are the two most popular implementations.

Feature	Istio	Linkerd
Proxy	Envoy	linkerd2-proxy (Rust)
Complexity	High (feature-rich)	Low (lightweight, simple)
mTLS	Built-in, automatic	Built-in, on by default
Traffic Mgmt	Advanced (canary, fault injection, mirroring)	Basic (traffic split)
Resource Overhead	Higher	Lower
Best For	Large-scale enterprise deployments	Small-medium scale, quick start

# Istio VirtualService for canary deployment
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: order-service
spec:
  hosts:
    - order-service
  http:
    - route:
        - destination:
            host: order-service
            subset: stable
          weight: 90
        - destination:
            host: order-service
            subset: canary
          weight: 10
      retries:
        attempts: 3
        perTryTimeout: 2s
      timeout: 10s
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: order-service
spec:
  host: order-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: DEFAULT
        maxRequestsPerConnection: 10
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 60s
  subsets:
    - name: stable
      labels:
        version: v1
    - name: canary
      labels:
        version: v2

12. Data Management Strategies

Database Per Service

Each microservice owns its own database instance or schema, ensuring data isolation and independent deployment. Services exchange data through APIs or events, never through a shared database.

# docker-compose.yml — database per service
services:
  order-db:
    image: postgres:16
    environment:
      POSTGRES_DB: orders
      POSTGRES_USER: order_svc
      POSTGRES_PASSWORD_FILE: /run/secrets/order_db_pw
    volumes:
      - order-data:/var/lib/postgresql/data
    networks:
      - order-net    # isolated network

  inventory-db:
    image: mongo:7
    environment:
      MONGO_INITDB_DATABASE: inventory
    volumes:
      - inventory-data:/data/db
    networks:
      - inventory-net

  payment-db:
    image: postgres:16
    environment:
      POSTGRES_DB: payments
      POSTGRES_USER: payment_svc
      POSTGRES_PASSWORD_FILE: /run/secrets/payment_db_pw
    volumes:
      - payment-data:/var/lib/postgresql/data
    networks:
      - payment-net

volumes:
  order-data:
  inventory-data:
  payment-data:

networks:
  order-net:
  inventory-net:
  payment-net:

Shared Database Anti-Pattern

Avoid Shared Databases: Multiple services directly accessing the same database creates implicit coupling, schema change conflicts, and inability to deploy or scale independently. This is one of the most common anti-patterns in microservices. If data must be shared, use event-driven sync or API calls.

13. Health Checks and Readiness Probes

Liveness probes confirm whether a service process is alive; readiness probes confirm whether a service is ready to receive traffic. In Kubernetes, these probes determine pod lifecycle management and traffic routing.

// Health check endpoints in Express.js
import express from "express";

const app = express();
let isReady = false;

// Liveness — is the process alive?
app.get("/health/live", (req, res) => {
  res.status(200).json({ status: "alive" });
});

// Readiness — can it accept traffic?
app.get("/health/ready", async (req, res) => {
  if (!isReady) {
    return res.status(503).json({ status: "not ready" });
  }

  try {
    // Check critical dependencies
    await checkDatabaseConnection();
    await checkCacheConnection();
    res.status(200).json({
      status: "ready",
      checks: { database: "ok", cache: "ok" },
    });
  } catch (error) {
    res.status(503).json({
      status: "not ready",
      error: (error as Error).message,
    });
  }
});

// Startup sequence
async function bootstrap() {
  await connectToDatabase();
  await connectToCache();
  await warmUpCaches();
  isReady = true;
  console.log("Service is ready to accept traffic");
}

bootstrap();
app.listen(3001);

# Kubernetes deployment with health probes
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      containers:
        - name: order-service
          image: order-service:1.2.0
          ports:
            - containerPort: 3001
          livenessProbe:
            httpGet:
              path: /health/live
              port: 3001
            initialDelaySeconds: 10
            periodSeconds: 15
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 3001
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3
          startupProbe:
            httpGet:
              path: /health/live
              port: 3001
            initialDelaySeconds: 0
            periodSeconds: 5
            failureThreshold: 30
          resources:
            requests:
              cpu: 250m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi

14. Pattern Decision Matrix

The table below helps you select the right microservices pattern based on your specific scenario.

Scenario	Recommended Pattern	Key Consideration
Migrating from monolith	Strangler Fig	Incremental, low risk
Cross-service transactions	Saga (Orchestration)	Requires compensation logic
High read, low write	CQRS	Scale read/write models independently
Complete audit trail	Event Sourcing	Event store growth needs management
Prevent cascading failures	Circuit Breaker + Bulkhead	Combine with retry and fallback
Request chain tracing	OpenTelemetry	All services must integrate
Zero-trust security	Service Mesh (Istio)	Auto mTLS, higher ops overhead
Event-driven decoupling	Async Messaging (Kafka)	Accept eventual consistency

Conclusion

Microservices architecture is not a goal but a means to achieve business agility and scalability. Success hinges on understanding the applicability and cost of each pattern, driving service decomposition with DDD bounded contexts, managing distributed transactions with Saga, addressing complex query and audit needs with CQRS/event sourcing, ensuring resilience with circuit breakers and bulkheads, and achieving observability with OpenTelemetry. Start small, introduce patterns as needed, and avoid over-engineering.

Remember: the complexity of distributed systems is a real cost. If a well-structured modular monolith meets your needs, that is the best choice. Only decompose into microservices gradually when team size, deployment frequency, and business domain complexity truly demand it.

Microservices Patterns Guide: Saga, CQRS, Event Sourcing, Service Mesh & Domain-Driven Design