Microservices decompose a large application into small, independently deployable services. Use REST or gRPC for synchronous calls, Kafka or RabbitMQ for async events. Put an API Gateway at the edge. Use circuit breakers (Resilience4j) to prevent cascades. Give each service its own database. Deploy on Kubernetes, observe with Prometheus and Grafana, trace with Jaeger and OpenTelemetry, and secure service-to-service communication with Istio mTLS.
Microservices architecture has become the dominant pattern for building large-scale, cloud-native applications. Netflix, Amazon, Uber, and Airbnb all migrated from monoliths to microservices to achieve independent scaling, faster deployments, and organizational autonomy. But microservices also introduce significant complexity: distributed tracing, eventual consistency, network latency, and operational overhead. This guide covers every aspect of microservices architecture โ from design principles and communication patterns to Kubernetes deployment and security โ with practical code examples you can use today.
- Start with a monolith; migrate to microservices when team size and complexity justify it
- Each service should have a single responsibility and own its own database
- Use gRPC for internal calls, REST for public APIs, and message queues for async workflows
- API Gateway centralizes auth, routing, rate limiting, and SSL termination
- Circuit breakers prevent cascade failures across service boundaries
- Implement distributed tracing with OpenTelemetry from day one
- CQRS and Event Sourcing solve cross-service data consistency challenges
- Istio service mesh provides zero-trust security with mTLS between services
Monolith vs Microservices: When to Use Which
The choice between a monolith and microservices is not about which is objectively better โ it is about what fits your team, stage, and problem. Many successful companies run monoliths in production. The goal is to match your architecture to your organizational and scaling needs rather than blindly following trends.
The Monolith: Benefits and Limitations
A monolith is a single deployable unit containing all application functionality. All modules share a process, database, and deployment pipeline. Monoliths are simpler to develop, test, and debug at small scale. There is no network overhead between modules, database transactions are straightforward, and local debugging requires no special tooling. The danger arises as the codebase grows: slow builds, tight coupling between teams, inability to scale individual components, and deployment risk where every change deploys everything at once.
# Monolith โ all in one deployable
my-app/
โโโ src/
โ โโโ auth/ # Authentication module
โ โโโ orders/ # Order management
โ โโโ payments/ # Payment processing
โ โโโ inventory/ # Inventory tracking
โ โโโ notifications/ # Email/SMS
โโโ database/ # One shared database schema
โโโ Dockerfile # Single container image
# Single deployment โ all modules go together
docker build -t my-app:v1.5.0 .
docker run -p 8080:8080 my-app:v1.5.0
# Problems at scale:
# - Slow build times (rebuild everything for any change)
# - One team's bad deploy breaks everyone
# - Cannot scale payment service independently of auth service
# - Tech debt accumulates: impossible to change DB schema safelyMicroservices: Benefits and Trade-offs
Microservices split the application into independently deployable services, each with a focused responsibility. Benefits include independent scaling (scale only the payment service during peak sales), independent deployment (the notifications team deploys without coordination), technology diversity (use Python for ML, Go for high-throughput APIs), and fault isolation (a crash in the recommendations service does not take down checkout). The trade-offs are real: network latency replaces function calls, distributed transactions are hard, debugging requires tracing across services, and operational complexity multiplies significantly.
# Microservices โ independently deployed per service
services/
โโโ auth-service/ # JWT issuance and validation
โ โโโ src/
โ โโโ Dockerfile
โ โโโ package.json # Node.js
โโโ order-service/ # Order lifecycle management
โ โโโ src/
โ โโโ Dockerfile
โ โโโ go.mod # Go
โโโ payment-service/ # Payment processing (PCI-DSS scope)
โ โโโ src/
โ โโโ Dockerfile
โ โโโ pom.xml # Java/Spring Boot
โโโ inventory-service/ # Stock management
โ โโโ src/
โ โโโ Dockerfile
โ โโโ requirements.txt # Python/FastAPI
โโโ notification-service/ # Async email/SMS/push
โโโ api-gateway/ # Single external entry point
# Each service deploys independently โ no coordination needed
cd order-service
docker build -t order-service:v2.1.0 .
kubectl set image deployment/order-service order=order-service:v2.1.0
# payment-service continues running v3.0.0 โ unaffectedDecision Framework
| Factor | Choose Monolith | Choose Microservices |
|---|---|---|
| Team size | 1โ8 engineers | 10+ engineers, multiple teams |
| Product stage | Early MVP, validating ideas | Established product, clear domain boundaries |
| Scaling needs | Uniform scaling requirements | Different scaling per component |
| Deploy frequency | Weekly or monthly releases | Multiple deploys per day per service |
| Domain complexity | Simple, well-understood domain | Complex domain with distinct bounded contexts |
| Ops maturity | Limited DevOps capacity | Strong DevOps, Kubernetes expertise |
Microservices Design Principles
Well-designed microservices are not just small โ they are designed around business domains, maintain clear boundaries, and minimize coupling. The following principles guide good microservice design. Violating any of them tends to produce distributed monoliths: systems with all the complexity of microservices and none of the benefits.
Single Responsibility Principle
Each microservice should do one thing and do it well. The challenge is defining what one thing means. Domain-Driven Design (DDD) provides the concept of a Bounded Context โ a logical boundary within which a domain model is consistent and unambiguous. Each bounded context typically maps to one or a small group of microservices. When a service is too large, you find yourself making cross-team changes for a single feature. When it is too small (nano-services), you have excessive network overhead and operational burden.
# Bad: A "user-service" that does too many unrelated things
user-service:
- Manages user profiles
- Handles JWT authentication tokens
- Sends welcome and transactional emails
- Tracks user analytics events
- Processes subscription billing
# Good: Separate services per bounded context
auth-service -> JWT creation, token validation, OAuth2 flows
profile-service -> User profile CRUD, preferences, avatar upload
notification-service -> Email, SMS, push notifications (event-driven)
analytics-service -> Event ingestion, user behavior tracking
billing-service -> Subscription management, invoice generation
# Each service can be owned by a separate team
# and deployed on its own release scheduleLoose Coupling
Services should be able to change independently without requiring changes in other services. Loose coupling means services interact through well-defined, stable interfaces โ APIs or events โ and do not share implementation details, internal data structures, or databases. The most common coupling anti-pattern is a shared database: two services reading and writing the same tables creates tight coupling at the data layer, making independent evolution impossible and turning schema migrations into cross-team events.
// Tight coupling anti-pattern โ AVOID THIS
// order-service directly queries payment-service's database
const payment = await db.query(
'SELECT * FROM payment_service.payments WHERE order_id = $1',
[orderId]
);
// Problems: schema change in payment-service breaks order-service,
// payment-service cannot migrate its DB independently
// Loose coupling โ communicate via stable public API
const response = await fetch(
'http://payment-service/api/v1/payments/' + orderId,
{
headers: { Authorization: 'Bearer ' + await getServiceToken('payment-service') },
signal: AbortSignal.timeout(5000),
}
);
if (!response.ok) throw new PaymentServiceError(response.status);
const payment = await response.json() as PaymentRecord;
// payment-service can change its internal DB, rename columns,
// migrate to a different DB engine โ order-service is unaffectedHigh Cohesion
Related functionality should live within the same service. If you find yourself frequently making cross-service calls to complete a single operation, your service boundaries may be wrong. A common symptom is the "distributed monolith" where microservices are deployed separately but are so tightly coupled that they must deploy together and share data through shared databases. High cohesion means the data a service needs to do its job lives within its own boundary.
# Checking service cohesion:
# 1. Does this service have a single, clear purpose?
# 2. Do most operations complete within this service's boundary?
# 3. Does the service own all the data it needs?
# 4. Would a domain expert call this a natural unit of business logic?
# Low cohesion warning sign:
# "Place order" operation requires sequential calls to:
# -> inventory-service (check stock)
# -> pricing-service (get current price)
# -> coupon-service (validate discount)
# -> customer-service (verify customer)
# -> fraud-service (score order)
# All synchronous, all required, all blocking
# Consider: orchestration layer or domain re-design
# High cohesion โ order-service owns its domain:
# - orders table (order lifecycle and status)
# - order_items table (line items with snapshot pricing)
# - order_history table (state machine transitions)
# Most operations complete without external callsDesign for Failure
In a distributed system, failures are not exceptional โ they are normal operating conditions. Networks partition, services crash, latency spikes, disks fill up. Every microservice must be designed assuming that its dependencies will sometimes be unavailable. Always implement timeouts on all outbound calls, retry with exponential backoff for transient errors, use circuit breakers to prevent cascades, and define fallback behaviors that degrade gracefully.
// Resilient service call pattern
async function callInventoryService(productId: string, qty: number) {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 3000); // 3s timeout
try {
const response = await fetch(
'http://inventory-service/api/v1/reserve',
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ productId, quantity: qty }),
signal: controller.signal,
}
);
clearTimeout(timeoutId);
if (response.status === 503) {
// Service unavailable โ use fallback
return { reserved: false, fallback: true };
}
if (!response.ok) {
throw new Error('Inventory service error: ' + response.status);
}
return response.json();
} catch (err) {
if ((err as Error).name === 'AbortError') {
// Timeout โ use fallback instead of letting the error propagate
return { reserved: false, timeout: true, fallback: true };
}
throw err; // Re-throw non-timeout errors for circuit breaker tracking
} finally {
clearTimeout(timeoutId);
}
}Service Communication: REST vs gRPC vs Message Queues
How services communicate is one of the most consequential architectural decisions in microservices. There are two fundamental communication styles: synchronous (the caller blocks waiting for a response) and asynchronous (fire-and-forget, event-driven). Each has distinct tools and trade-offs. Most microservices systems use both โ synchronous for queries requiring immediate responses, asynchronous for workflows that can tolerate eventual consistency.
REST over HTTP/JSON (Synchronous)
REST is the most widely used inter-service communication protocol. It is human-readable, trivially debuggable with curl, and universally supported across every language and platform. REST is ideal for request-response patterns where the caller needs an immediate result: user authentication, fetching product details, creating a resource. The main drawbacks are payload verbosity compared to binary formats and the CPU cost of JSON serialization and deserialization at high request volumes.
# REST API โ order-service calling inventory-service
# POST /api/v1/inventory/reserve
curl -X POST https://inventory-service/api/v1/inventory/reserve -H "Content-Type: application/json" -H "Authorization: Bearer ${SERVICE_TOKEN}" -d '{
"product_id": "prod_abc123",
"quantity": 2,
"order_id": "ord_xyz789",
"idempotency_key": "ord_xyz789-reserve-attempt-1"
}'
# Response
{
"reserved": true,
"reservation_id": "res_def456",
"expires_at": "2026-02-27T14:30:00Z",
"available_stock": 47
}
# Node.js โ service-to-service REST call with timeout and retry
async function reserveInventory(productId: string, qty: number, orderId: string) {
for (let attempt = 1; attempt <= 3; attempt++) {
try {
const res = await fetch('http://inventory-service/api/v1/inventory/reserve', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer ' + await tokenCache.getToken('inventory-service'),
'X-Idempotency-Key': orderId + '-reserve-' + attempt,
},
body: JSON.stringify({ product_id: productId, quantity: qty, order_id: orderId }),
signal: AbortSignal.timeout(5000),
});
if (res.status === 429) {
// Rate limited โ exponential backoff
await sleep(Math.pow(2, attempt) * 100);
continue;
}
return res.json();
} catch (err) {
if (attempt === 3) throw err;
await sleep(Math.pow(2, attempt) * 100);
}
}
}gRPC (High-Performance Synchronous)
gRPC uses HTTP/2 and Protocol Buffers (protobuf) for binary serialization, making it significantly faster and more efficient than REST/JSON โ typically 5 to 10 times faster for serialization with considerably smaller payloads. gRPC supports four call patterns: unary (like REST), client streaming, server streaming, and bidirectional streaming. Strong typing via .proto schema files eliminates an entire class of integration bugs. Automatic client code generation in over 10 languages removes boilerplate. gRPC is ideal for internal service-to-service communication where performance matters.
// Step 1: Define the service contract in inventory.proto
syntax = "proto3";
package inventory.v1;
service InventoryService {
rpc CheckStock(StockRequest) returns (StockResponse);
rpc ReserveItems(ReserveRequest) returns (ReserveResponse);
// Server-streaming: push real-time inventory updates
rpc StreamUpdates(StreamRequest) returns (stream InventoryEvent);
}
message StockRequest {
string product_id = 1;
string warehouse_id = 2; // optional
}
message StockResponse {
int32 available_quantity = 1;
bool in_stock = 2;
string warehouse_location = 3;
}
message ReserveRequest {
string product_id = 1;
int32 quantity = 2;
string order_id = 3;
string idempotency_key = 4;
}
message ReserveResponse {
bool success = 1;
string reservation_id = 2;
string expires_at = 3;
string error_code = 4; // non-empty on failure
}
// Step 2: Generate Go server stub
// protoc --go_out=. --go-grpc_out=. inventory.proto
// Step 3: Implement the Go server
type inventoryServer struct {
pb.UnimplementedInventoryServiceServer
db *pgxpool.Pool
}
func (s *inventoryServer) ReserveItems(
ctx context.Context, req *pb.ReserveRequest,
) (*pb.ReserveResponse, error) {
if req.Quantity <= 0 {
return nil, status.Errorf(codes.InvalidArgument, "quantity must be positive")
}
reservation, err := s.db.Reserve(ctx, req.ProductId, req.Quantity, req.OrderId)
if err != nil {
if errors.Is(err, ErrInsufficientStock) {
return &pb.ReserveResponse{ErrorCode: "INSUFFICIENT_STOCK"}, nil
}
return nil, status.Errorf(codes.Internal, "reservation failed: %v", err)
}
return &pb.ReserveResponse{
Success: true,
ReservationId: reservation.ID,
ExpiresAt: reservation.ExpiresAt.Format(time.RFC3339),
}, nil
}
// Step 4: Call from Node.js client
// npm install @grpc/grpc-js @grpc/proto-loader
import * as grpc from '@grpc/grpc-js';
import * as protoLoader from '@grpc/proto-loader';
const packageDef = protoLoader.loadSync('inventory.proto', { keepCase: true });
const proto = grpc.loadPackageDefinition(packageDef) as any;
const client = new proto.inventory.v1.InventoryService(
'inventory-service:50051',
grpc.credentials.createSsl(),
);
client.reserveItems(
{ product_id: 'prod_abc', quantity: 2, order_id: 'ord_xyz', idempotency_key: 'key-1' },
(err: Error | null, response: any) => {
if (err) throw err;
console.log('Reservation:', response.reservation_id);
}
);Message Queues: RabbitMQ and Apache Kafka
Message queues enable asynchronous, event-driven communication. The publisher sends a message and continues immediately without waiting. This decouples services in time: the consumer can be temporarily offline and messages are buffered. RabbitMQ is a traditional message broker with flexible routing, exchange types, and acknowledgment semantics โ ideal for task queues and work distribution. Kafka is a distributed commit log designed for high-throughput event streaming, retaining messages for configurable periods โ ideal for audit logs, event sourcing, and stream processing pipelines.
# โโ RabbitMQ โ task queue pattern โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# Publisher: order-service queues a payment task
import pika, json
connection = pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
channel = connection.channel()
channel.queue_declare(queue='payment_tasks', durable=True)
channel.basic_publish(
exchange='',
routing_key='payment_tasks',
body=json.dumps({
'order_id': 'ord_xyz789',
'amount': 99.99,
'currency': 'USD',
'customer_id': 'cust_abc123',
}),
properties=pika.BasicProperties(
delivery_mode=2, # Persistent โ survives broker restart
content_type='application/json'
)
)
connection.close()
# Consumer: payment-service processes tasks
def process_payment(ch, method, properties, body):
payload = json.loads(body)
try:
charge_card(payload['customer_id'], payload['amount'])
publish_event('order-events', 'PAYMENT_COMPLETED', payload['order_id'])
ch.basic_ack(delivery_tag=method.delivery_tag) # Mark as done
except CardDeclinedError:
publish_event('order-events', 'PAYMENT_FAILED', payload['order_id'])
ch.basic_ack(delivery_tag=method.delivery_tag) # Do not retry
except Exception:
ch.basic_nack(delivery_tag=method.delivery_tag, requeue=False) # Dead-letter
channel.basic_qos(prefetch_count=1) # Process one at a time
channel.basic_consume(queue='payment_tasks', on_message_callback=process_payment)
channel.start_consuming()
# โโ Apache Kafka โ event streaming pattern โโโโโโโโโโโโโโโโโโโโโโ
# Producer: order-service publishes domain events
from confluent_kafka import Producer
producer = Producer({'bootstrap.servers': 'kafka:9092', 'acks': 'all'})
producer.produce(
topic='order-events',
key=order_id.encode(), # Ensures same-order events go to same partition
value=json.dumps({
'event_type': 'ORDER_CREATED',
'order_id': order_id,
'customer_id': customer_id,
'items': items,
'timestamp': datetime.utcnow().isoformat(),
'schema_version': 'v1',
}).encode(),
on_delivery=lambda err, msg: print('Delivered' if not err else 'Error: ' + str(err))
)
producer.flush()
# Multiple independent consumers (each in its own consumer group)
from confluent_kafka import Consumer
# inventory-service consumer group
inventory_consumer = Consumer({
'bootstrap.servers': 'kafka:9092',
'group.id': 'inventory-service', # Unique group per service
'auto.offset.reset': 'earliest',
'enable.auto.commit': False, # Manual commit for reliability
})
inventory_consumer.subscribe(['order-events'])
while True:
msg = inventory_consumer.poll(timeout=1.0)
if msg is None or msg.error(): continue
event = json.loads(msg.value())
if event['event_type'] == 'ORDER_CREATED':
reserve_inventory(event['items'])
inventory_consumer.commit(msg)| Protocol | Style | Best For | Payload | Performance |
|---|---|---|---|---|
| REST/HTTP | Sync | Public APIs, CRUD operations | JSON (text) | Good |
| gRPC | Sync | Internal service calls, streaming | Protobuf (binary) | Excellent |
| RabbitMQ | Async | Task queues, work distribution | Any (JSON/binary) | Very Good |
| Apache Kafka | Async | Event streaming, audit logs | Any (Avro/JSON) | Excellent |
API Gateway Pattern: Kong, AWS API Gateway, NGINX
The API Gateway is the single entry point for all external client requests. Instead of clients knowing the addresses of dozens of microservices, they send all requests to the gateway, which routes them to the appropriate service. The gateway handles cross-cutting concerns that would otherwise be duplicated in every service: authentication and authorization, rate limiting, SSL/TLS termination, request and response transformation, logging, caching, and request aggregation.
Kong API Gateway
Kong is a high-performance, open-source API gateway built on NGINX. It supports a rich plugin ecosystem for authentication (JWT, OAuth2, API keys, HMAC), rate limiting, request transformation, logging, and observability. Kong can run on Kubernetes as an Ingress Controller, replacing the default NGINX Ingress.
# Kong declarative configuration (deck sync)
_format_version: "3.0"
_transform: true
services:
- name: order-service
url: http://order-service.production.svc.cluster.local:80
connect_timeout: 5000
read_timeout: 30000
routes:
- name: orders-api
paths: ["/api/v1/orders"]
methods: ["GET", "POST", "PUT", "DELETE", "PATCH"]
strip_path: false
plugins:
- name: jwt
config:
secret_is_base64: false
claims_to_verify: ["exp"]
- name: rate-limiting
config:
minute: 100
hour: 3000
policy: redis
redis_host: redis
redis_port: 6379
- name: cors
config:
origins: ["https://app.mycompany.com", "https://www.mycompany.com"]
methods: ["GET", "POST", "PUT", "DELETE", "PATCH", "OPTIONS"]
headers: ["Authorization", "Content-Type", "X-Request-ID"]
max_age: 3600
- name: product-catalog-service
url: http://product-service.production.svc.cluster.local:80
routes:
- name: products-api
paths: ["/api/v1/products"]
methods: ["GET"]
plugins:
- name: proxy-cache
config:
response_code: [200]
request_method: ["GET"]
content_type: ["application/json; charset=utf-8"]
cache_ttl: 60
storage_ttl: 300
- name: internal-admin-service
url: http://admin-service.production.svc.cluster.local:80
routes:
- name: admin-api
paths: ["/api/v1/admin"]
plugins:
- name: ip-restriction
config:
allow: ["10.0.0.0/8", "192.168.0.0/16"] # Internal onlyNGINX as API Gateway
# nginx.conf โ production API gateway configuration
upstream order_service {
least_conn;
server order-service-1.production:8080;
server order-service-2.production:8080;
server order-service-3.production:8080;
keepalive 32;
}
upstream payment_service {
server payment-service.production:8080 max_fails=3 fail_timeout=30s;
keepalive 16;
}
limit_req_zone ${binary_remote_addr} zone=api_per_ip:10m rate=100r/m;
limit_req_zone ${http_authorization} zone=api_per_token:10m rate=1000r/m;
server {
listen 443 ssl http2;
server_name api.myapp.com;
ssl_certificate /etc/ssl/tls.crt;
ssl_certificate_key /etc/ssl/tls.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
# Security headers
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Content-Type-Options nosniff;
add_header X-Frame-Options DENY;
# Rate limiting
limit_req zone=api_per_ip burst=20 nodelay;
limit_req zone=api_per_token burst=100 nodelay;
# Internal auth validation endpoint
location = /_auth/validate {
internal;
proxy_pass http://auth-service.production:8080/validate;
proxy_pass_request_body off;
proxy_set_header Content-Length "";
proxy_set_header X-Original-URI ${request_uri};
proxy_set_header X-Original-Method ${request_method};
proxy_cache_valid 200 30s; # Cache valid tokens for 30s
}
location /api/v1/orders {
auth_request /_auth/validate;
auth_request_set ${auth_user_id} ${upstream_http_x_user_id};
proxy_pass http://order_service;
proxy_set_header X-User-ID ${auth_user_id};
proxy_set_header X-Real-IP ${remote_addr};
proxy_set_header X-Request-ID ${request_id};
proxy_connect_timeout 5s;
proxy_read_timeout 30s;
}
location /api/v1/payments {
auth_request /_auth/validate;
# Payment endpoint โ extra strict rate limiting
limit_req zone=api_per_ip burst=5 nodelay;
proxy_pass http://payment_service;
}
}Service Discovery: Consul, Eureka, and Kubernetes DNS
In a microservices environment, service instances are dynamic โ they scale up and down, crash and restart, move between hosts when Kubernetes reschedules pods. Service discovery solves how services find each other's current network address without hardcoding IPs or requiring manual configuration updates. There are two patterns: client-side discovery (the client queries a registry and load-balances itself) and server-side discovery (the client goes through a load balancer that does the lookup).
Consul Service Mesh
# Deploy Consul in Kubernetes
helm repo add hashicorp https://helm.releases.hashicorp.com
helm install consul hashicorp/consul --set global.name=consul --set server.replicas=3 --set connectInject.enabled=true # Enables sidecar injection
# Service registration via Consul API (for non-Kubernetes environments)
curl -X PUT http://consul:8500/v1/agent/service/register -H "Content-Type: application/json" -d '{
"ID": "order-service-pod-abc",
"Name": "order-service",
"Address": "10.0.1.50",
"Port": 8080,
"Tags": ["v2.1", "production"],
"Meta": {
"version": "2.1.0",
"region": "us-east-1"
},
"Check": {
"HTTP": "http://10.0.1.50:8080/health/ready",
"Interval": "10s",
"Timeout": "5s",
"DeregisterCriticalServiceAfter": "60s"
}
}'
# Discover healthy instances of order-service
curl "http://consul:8500/v1/health/service/order-service?passing=true" | jq '
.[] | { id: .Service.ID, address: .Service.Address, port: .Service.Port }
'
# DNS-based discovery โ Consul serves DNS on port 8600
# order-service.service.consul resolves to all healthy instances
dig @consul -p 8600 order-service.service.consul SRV
# With datacenter targeting
dig @consul -p 8600 order-service.service.us-east.consul SRVKubernetes DNS (Built-in Service Discovery)
Kubernetes provides built-in service discovery through CoreDNS. Every Kubernetes Service gets a stable DNS entry automatically โ no separate registry infrastructure required. Services within the same namespace resolve by short name, cross-namespace by full qualified name. This is the recommended approach for Kubernetes-native microservices.
# Create a Kubernetes Service โ auto-registers in CoreDNS
apiVersion: v1
kind: Service
metadata:
name: order-service
namespace: production
labels:
app: order-service
version: v2.1.0
spec:
selector:
app: order-service # Routes to all pods with this label
ports:
- name: http
protocol: TCP
port: 80 # Service port (stable)
targetPort: 8080 # Container port
type: ClusterIP # Only accessible within cluster
# DNS entries created automatically:
# order-service (same namespace)
# order-service.production (namespace-qualified)
# order-service.production.svc (cluster domain)
# order-service.production.svc.cluster.local (fully qualified)
# From any pod in the 'production' namespace:
curl http://order-service/api/v1/orders
# Cross-namespace call (from 'staging' namespace):
curl http://order-service.production.svc.cluster.local/api/v1/orders
# Headless service โ returns individual pod IPs (for stateful sets)
apiVersion: v1
kind: Service
metadata:
name: kafka
namespace: production
spec:
clusterIP: None # Headless โ DNS returns all pod IPs
selector:
app: kafka
ports:
- port: 9092Circuit Breaker Pattern: Resilience4j and Hystrix
The circuit breaker pattern prevents cascade failures โ where one slow or failing service causes upstream services to exhaust their thread pools waiting for responses, eventually crashing the entire system. Named after electrical circuit breakers, the pattern has three states. Closed is normal operation where all calls pass through. Open is the state where the downstream service is failing and calls are rejected immediately without attempting the network call, returning a fallback response instead. Half-Open means the circuit is testing whether the service has recovered by allowing a limited number of calls through.
Resilience4j for Java/Spring Boot
# application.yml โ Resilience4j configuration
resilience4j:
circuitbreaker:
instances:
inventory-service:
sliding-window-type: COUNT_BASED
sliding-window-size: 10 # Evaluate last 10 calls
failure-rate-threshold: 50 # Open circuit if >50% fail
wait-duration-in-open-state: 30s # Wait 30s before testing
permitted-calls-in-half-open-state: 3
automatic-transition-from-open-to-half-open-enabled: true
slow-call-duration-threshold: 2s
slow-call-rate-threshold: 80 # Open if >80% calls are slow
event-consumer-buffer-size: 10
record-exceptions:
- java.io.IOException
- java.util.concurrent.TimeoutException
- feign.FeignException.ServiceUnavailable
retry:
instances:
inventory-service:
max-attempts: 3
wait-duration: 500ms
exponential-backoff-multiplier: 2.0 # 500ms โ 1s โ 2s
retry-exceptions:
- java.io.IOException
ignore-exceptions:
- com.example.BusinessException
timelimiter:
instances:
inventory-service:
timeout-duration: 3s
// Java service implementation
@Service
public class OrderService {
private final InventoryClient inventoryClient;
@CircuitBreaker(name = "inventory-service", fallbackMethod = "reserveFallback")
@Retry(name = "inventory-service")
@TimeLimiter(name = "inventory-service")
public CompletableFuture<ReservationResult> reserveInventory(
String productId, int quantity, String orderId
) {
return CompletableFuture.supplyAsync(() ->
inventoryClient.reserve(new ReserveRequest(productId, quantity, orderId))
);
}
// Fallback: called when circuit is open or all retries are exhausted
public CompletableFuture<ReservationResult> reserveFallback(
String productId, int quantity, String orderId, Exception ex
) {
log.warn("Circuit open for inventory-service, using fallback. orderId={} error={}",
orderId, ex.getMessage());
// Allow order to proceed with async reservation (backfill queue)
asyncReservationQueue.add(new PendingReservation(orderId, productId, quantity));
return CompletableFuture.completedFuture(
ReservationResult.pending(orderId, "inventory-service-degraded")
);
}
}
// Monitor circuit state events
@Component
public class CircuitBreakerMonitor {
@EventListener
public void onStateTransition(CircuitBreakerOnStateTransitionEvent event) {
log.warn("Circuit {} changed: {} -> {}",
event.getCircuitBreakerName(),
event.getStateTransition().getFromState(),
event.getStateTransition().getToState()
);
metrics.gauge("circuit_breaker_state",
event.getStateTransition().getToState().ordinal(),
"service", event.getCircuitBreakerName()
);
}
}Circuit Breaker in Node.js with opossum
import CircuitBreaker from 'opossum';
import { metrics } from './observability';
// Wrap any async function
const breaker = new CircuitBreaker(
async (productId: string, qty: number, orderId: string) => {
return inventoryService.reserve(productId, qty, orderId);
},
{
timeout: 3000, // Trigger failure if > 3s
errorThresholdPercentage: 50, // Open if > 50% fail
resetTimeout: 30_000, // Try again after 30s
volumeThreshold: 5, // Minimum calls before calculating
rollingCountTimeout: 10_000, // 10s rolling window
}
);
// Graceful degradation
breaker.fallback((productId: string, qty: number, orderId: string) => ({
reserved: false,
pending: true,
orderId,
message: 'Inventory service temporarily unavailable. Order queued.',
}));
// Observability hooks
breaker.on('open', () => metrics.increment('circuit_breaker.open', { svc: 'inventory' }));
breaker.on('halfOpen', () => metrics.increment('circuit_breaker.half_open', { svc: 'inventory' }));
breaker.on('close', () => metrics.increment('circuit_breaker.close', { svc: 'inventory' }));
breaker.on('timeout', () => metrics.increment('circuit_breaker.timeout', { svc: 'inventory' }));
breaker.on('reject', () => metrics.increment('circuit_breaker.rejected', { svc: 'inventory' }));
// Expose circuit state in health check
app.get('/health/ready', (req, res) => {
const stats = breaker.stats;
res.json({
status: 'ok',
dependencies: {
'inventory-service': {
state: breaker.opened ? 'open' : breaker.halfOpen ? 'half-open' : 'closed',
successRate: stats.successful / Math.max(stats.calls, 1),
latencyMean: stats.latencyMean,
},
},
});
});
// Usage in business logic
export async function reserveInventory(productId: string, qty: number, orderId: string) {
return breaker.fire(productId, qty, orderId);
}Distributed Tracing: Jaeger, Zipkin, and OpenTelemetry
When a user request flows through 5 microservices and response time is 3 seconds, finding which service is slow without tracing is like debugging production without logs. Distributed tracing assigns a unique trace ID to each incoming request. This ID propagates through every service call via HTTP headers, creating a tree of spans that records what happened, in which service, how long each step took, and what errors occurred. OpenTelemetry (OTel) is now the industry-standard, vendor-neutral framework. Jaeger and Zipkin are popular open-source trace visualization backends.
# 1. Deploy Jaeger โ trace collection and visualization
# docker-compose.yml
services:
jaeger:
image: jaegertracing/all-in-one:1.54
ports:
- "16686:16686" # Jaeger Web UI
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
environment:
COLLECTOR_OTLP_ENABLED: "true"
SPAN_STORAGE_TYPE: badger
BADGER_EPHEMERAL: "false"
BADGER_DIRECTORY_VALUE: /badger/data
BADGER_DIRECTORY_KEY: /badger/key
volumes:
- jaeger-data:/badger
# 2. Instrument Node.js with OpenTelemetry (auto-instrumentation)
# npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node
# npm install @opentelemetry/exporter-trace-otlp-http @opentelemetry/resources
// tracing.ts โ MUST be loaded before any other module
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { resourceFromAttributes } from '@opentelemetry/resources';
import { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';
const sdk = new NodeSDK({
resource: resourceFromAttributes({
[ATTR_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME || 'order-service',
[ATTR_SERVICE_VERSION]: process.env.APP_VERSION || '0.0.0',
'deployment.environment': process.env.NODE_ENV || 'development',
}),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/traces'
|| 'http://jaeger:4318/v1/traces',
}),
instrumentations: [
getNodeAutoInstrumentations({
// Auto-instruments: express, http, pg, redis, mongodb, kafka, grpc
'@opentelemetry/instrumentation-http': {
enabled: true,
ignoreIncomingRequestHook: (req) => req.url === '/health',
},
'@opentelemetry/instrumentation-express': { enabled: true },
'@opentelemetry/instrumentation-pg': { enabled: true },
'@opentelemetry/instrumentation-ioredis': { enabled: true },
}),
],
});
sdk.start();
process.on('SIGTERM', () => sdk.shutdown());
// 3. Add custom business spans
import { trace, SpanStatusCode, SpanKind, context } from '@opentelemetry/api';
const tracer = trace.getTracer('order-service', '2.1.0');
async function createOrder(input: OrderInput): Promise<Order> {
// Start a span for the overall business operation
const span = tracer.startSpan('order.create', {
kind: SpanKind.INTERNAL,
attributes: {
'order.customer_id': input.customerId,
'order.item_count': input.items.length,
'order.total_amount': input.totalAmount,
'order.currency': input.currency,
},
});
return context.with(trace.setSpan(context.active(), span), async () => {
try {
// Child spans created automatically by auto-instrumentation
const inventory = await checkInventory(input.items);
const payment = await processPayment(input.payment);
const order = await db.orders.create(input);
span.setAttributes({ 'order.id': order.id, 'order.status': 'created' });
span.setStatus({ code: SpanStatusCode.OK });
return order;
} catch (err) {
span.recordException(err as Error);
span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
throw err;
} finally {
span.end();
}
});
}
# 4. Kubernetes ConfigMap for OTel configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-config
namespace: production
data:
OTEL_EXPORTER_OTLP_ENDPOINT: "http://jaeger-collector.observability:4318"
OTEL_PROPAGATORS: "tracecontext,baggage"
OTEL_TRACES_SAMPLER: "parentbased_traceidratio"
OTEL_TRACES_SAMPLER_ARG: "0.1" # Sample 10% of requests in productionData Management: Database per Service, CQRS, Event Sourcing
Data management is where microservices architecture gets genuinely complex. The database-per-service pattern mandates that each service owns its data exclusively โ no other service can access it directly via SQL or direct connections. This enables true independent evolution but creates challenges: cross-service queries require API calls or denormalized read models, and maintaining consistency across service boundaries without distributed transactions requires careful design using sagas, events, and eventual consistency.
Database per Service Pattern
# Each service has its own database โ completely isolated
# docker-compose.yml (or Kubernetes StatefulSets in production)
services:
# Order service โ PostgreSQL (ACID transactions, relational integrity)
order-db:
image: postgres:16-alpine
volumes:
- order-db-data:/var/lib/postgresql/data
environment:
POSTGRES_DB: orders
POSTGRES_USER: order_svc
POSTGRES_PASSWORD: ${ORDER_DB_PASSWORD}
networks:
- order-network # ONLY order-service connects here
# Product catalog โ MongoDB (flexible schema, fast reads, geo-queries)
product-db:
image: mongo:7
volumes:
- product-db-data:/data/db
networks:
- product-network # Isolated to product-service
# Auth sessions โ Redis (in-memory, TTL-based expiry)
auth-cache:
image: redis:7-alpine
command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
networks:
- auth-network
# Search service โ Elasticsearch (full-text, faceted search)
search-db:
image: elasticsearch:8.12.0
environment:
discovery.type: single-node
ELASTIC_PASSWORD: ${ELASTIC_PASSWORD}
networks:
- search-network
# Analytics โ ClickHouse (columnar, fast aggregations, time-series)
analytics-db:
image: clickhouse/clickhouse-server:23.12
networks:
- analytics-network
networks:
order-network: { internal: true } # No external internet access
product-network: { internal: true }
auth-network: { internal: true }
search-network: { internal: true }
analytics-network: { internal: true }CQRS (Command Query Responsibility Segregation)
CQRS separates the write model (commands that change state, optimized for consistency) from the read model (queries, optimized for performance and shape). This is powerful in microservices because different services can maintain their own read-optimized projections of data from other services, kept up-to-date by consuming domain events. The order-service owns the authoritative order data; the analytics-service maintains a denormalized ClickHouse projection optimized for reporting; the search-service maintains an Elasticsearch index for full-text search โ all updated asynchronously from the same stream of order events.
// CQRS pattern โ separating writes and reads
// โโโ WRITE SIDE โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
// Commands change state and publish events
interface CreateOrderCommand {
customerId: string;
items: Array<{ productId: string; quantity: number; unitPrice: number }>;
shippingAddress: Address;
idempotencyKey: string;
}
class OrderCommandHandler {
async handle(cmd: CreateOrderCommand): Promise<{ orderId: string }> {
// 1. Validate business invariants
const totalAmount = cmd.items.reduce((s, i) => s + i.unitPrice * i.quantity, 0);
if (totalAmount > 10_000) throw new BusinessError('Order exceeds limit');
// 2. Write to normalized write DB (PostgreSQL)
const order = await this.db.transaction(async (trx) => {
const o = await trx.orders.insert({
customer_id: cmd.customerId,
status: 'PENDING',
total_amount: totalAmount,
idempotency_key: cmd.idempotencyKey,
});
await trx.order_items.insertMany(
cmd.items.map(i => ({ order_id: o.id, ...i }))
);
return o;
});
// 3. Publish domain event โ drives all read-model updates
await this.kafka.produce('order-events', {
eventType: 'ORDER_CREATED',
orderId: order.id,
customerId: cmd.customerId,
items: cmd.items,
totalAmount,
occurredAt: new Date().toISOString(),
version: 1,
});
return { orderId: order.id };
}
}
// โโโ READ SIDE โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
// Each read model is a projection tailored to query needs
// Elasticsearch projection โ powers the order search UI
class OrderSearchProjection {
async onOrderCreated(event: OrderCreatedEvent): Promise<void> {
// Fetch denormalized data (acceptable: read-side can call APIs)
const [customer, products] = await Promise.all([
this.customerApi.get(event.customerId),
this.productApi.getBatch(event.items.map(i => i.productId)),
]);
await this.elasticsearch.index({
index: 'orders-v3',
id: event.orderId,
body: {
orderId: event.orderId,
customerName: customer.name, // Denormalized
customerEmail: customer.email,
productNames: products.map(p => p.name),
totalAmount: event.totalAmount,
status: 'PENDING',
createdAt: event.occurredAt,
},
});
}
async onOrderShipped(event: OrderShippedEvent): Promise<void> {
await this.elasticsearch.update({
index: 'orders-v3',
id: event.orderId,
body: { doc: { status: 'SHIPPED', trackingNumber: event.trackingNumber } },
});
}
}
// Dashboard analytics projection โ powers the ClickHouse reporting DB
class OrderAnalyticsProjection {
async onOrderCreated(event: OrderCreatedEvent): Promise<void> {
await this.clickhouse.insert('order_analytics', [{
order_id: event.orderId,
customer_id: event.customerId,
item_count: event.items.length,
total_usd: event.totalAmount,
event_date: event.occurredAt.slice(0, 10),
hour: new Date(event.occurredAt).getUTCHours(),
}]);
}
}Event Sourcing
Event Sourcing stores the complete history of state changes as an immutable sequence of events rather than the current state. Instead of an UPDATE orders SET status = 'SHIPPED', you append an OrderShipped event. The current state is always derived by replaying all events. Benefits include a complete audit trail, the ability to replay history and rebuild projections, temporal queries, and natural integration with CQRS. The trade-off is query complexity (you need projections for most reads) and storage growth over time (mitigated by snapshots).
// Event Sourcing โ Order aggregate
type OrderEvent =
| { type: 'ORDER_CREATED'; orderId: string; customerId: string; items: LineItem[] }
| { type: 'PAYMENT_RECEIVED'; orderId: string; amount: number; paymentId: string }
| { type: 'ORDER_SHIPPED'; orderId: string; carrier: string; trackingNumber: string }
| { type: 'ORDER_DELIVERED'; orderId: string; deliveredAt: string }
| { type: 'ORDER_CANCELLED'; orderId: string; reason: string; refundId?: string };
interface OrderState {
id: string;
status: 'PENDING' | 'PAID' | 'SHIPPED' | 'DELIVERED' | 'CANCELLED';
customerId: string;
items: LineItem[];
paymentId?: string;
trackingNumber?: string;
cancelReason?: string;
}
// Pure function โ derives state from sequence of events
function applyEvent(state: OrderState, event: OrderEvent): OrderState {
switch (event.type) {
case 'ORDER_CREATED':
return { ...state, status: 'PENDING', customerId: event.customerId, items: event.items };
case 'PAYMENT_RECEIVED':
return { ...state, status: 'PAID', paymentId: event.paymentId };
case 'ORDER_SHIPPED':
return { ...state, status: 'SHIPPED', trackingNumber: event.trackingNumber };
case 'ORDER_DELIVERED':
return { ...state, status: 'DELIVERED' };
case 'ORDER_CANCELLED':
return { ...state, status: 'CANCELLED', cancelReason: event.reason };
}
}
class EventSourcedOrderRepository {
async load(orderId: string): Promise<OrderState> {
// Check for a recent snapshot to avoid replaying all events
const snapshot = await this.snapshots.latest(orderId);
const fromVersion = snapshot?.version ?? 0;
const events = await this.eventStore.load(
'orders',
orderId,
fromVersion
);
const initialState = snapshot?.state ?? ({ id: orderId } as OrderState);
return events.reduce(applyEvent, initialState);
}
async append(orderId: string, events: OrderEvent[], expectedVersion: number): Promise<void> {
// Optimistic concurrency โ fail if another process wrote first
await this.eventStore.appendWithVersion('orders', orderId, events, expectedVersion);
// Publish to Kafka for projections (eventually consistent)
await Promise.all(events.map(e => this.kafka.produce('order-events', e)));
}
}Docker and Kubernetes for Microservices
Docker provides the packaging format for microservices โ each service is a container image with all its dependencies, running identically in development, CI, and production. Kubernetes orchestrates those containers at scale: scheduling pods onto nodes, maintaining desired replica counts, performing rolling updates with zero downtime, self-healing by restarting failed pods, and exposing services via stable DNS and load-balanced ClusterIP services.
Production Dockerfile
# Multi-stage Dockerfile for a Node.js/TypeScript microservice
# Stage 1: Install production dependencies (cached layer)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force
# Stage 2: Build TypeScript to JavaScript
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json tsconfig.json ./
RUN npm ci # Install dev deps for build
COPY src/ ./src/
RUN npm run build # Emit JS to dist/
# Stage 3: Minimal production image
FROM node:20-alpine AS runner
# Security: run as non-root user
RUN addgroup -g 1001 -S nodejs && adduser -S appuser -u 1001 -G nodejs
WORKDIR /app
# Only copy what production needs
COPY --from=deps --chown=appuser:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:nodejs /app/dist ./dist
COPY --from=builder --chown=appuser:nodejs /app/package.json .
USER appuser
EXPOSE 8080
# Health check โ Kubernetes will use /health/live and /health/ready
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 CMD wget -qO- http://localhost:8080/health/live || exit 1
CMD ["node", "dist/server.js"]Kubernetes Manifests for a Microservice
# Complete Kubernetes setup for order-service
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
namespace: production
labels:
app: order-service
version: v2.1.0
spec:
replicas: 3
selector:
matchLabels:
app: order-service
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Spin up 1 extra pod before removing old
maxUnavailable: 0 # Never reduce below desired replica count
template:
metadata:
labels:
app: order-service
version: v2.1.0
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: order-service-sa
containers:
- name: order-service
image: registry.mycompany.com/order-service:v2.1.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
- name: metrics
containerPort: 9090
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: order-service-secrets
key: database-url
- name: KAFKA_BROKERS
valueFrom:
configMapKeyRef:
name: kafka-config
key: brokers
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://jaeger-collector.observability.svc:4318"
- name: OTEL_SERVICE_NAME
value: "order-service"
- name: NODE_ENV
value: "production"
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
startupProbe:
httpGet:
path: /health/live
port: 8080
failureThreshold: 30
periodSeconds: 2 # Allow up to 60s for startup
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: order-service
topologyKey: kubernetes.io/hostname # One pod per node
---
apiVersion: v1
kind: Service
metadata:
name: order-service
namespace: production
spec:
selector:
app: order-service
ports:
- name: http
port: 80
targetPort: 8080
type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-service-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-service
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling downSecurity: Service Mesh, mTLS, and JWT Between Services
Security in microservices requires a zero-trust model: assume the network is hostile, never trust a caller just because it is inside the cluster, require authentication and authorization for every service-to-service call. A compromised service should not be able to freely call any other service. The combination of a service mesh for transport-level security (mTLS) and application-level authorization (JWT claims, RBAC) provides defense in depth.
Istio Service Mesh with mTLS
# Install Istio with production profile
istioctl install --set profile=production -y
# Enable automatic sidecar injection in the namespace
kubectl label namespace production istio-injection=enabled
# Enforce strict mTLS cluster-wide โ reject all plaintext traffic
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: production
spec:
mtls:
mode: STRICT # All service-to-service traffic must use mTLS
---
# Fine-grained authorization โ only order-service can call payment endpoints
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: payment-service-authz
namespace: production
spec:
selector:
matchLabels:
app: payment-service
action: ALLOW
rules:
- from:
- source:
# Only traffic from order-service's service account is allowed
principals:
- "cluster.local/ns/production/sa/order-service-sa"
to:
- operation:
methods: ["POST"]
paths: ["/api/v1/payments", "/api/v1/payments/*"]
---
# Default deny โ reject all unlisted traffic
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: deny-all
namespace: production
spec:
{} # Empty spec = deny all (explicit allow rules override this)
---
# Canary deployment with traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: order-service-vs
namespace: production
spec:
hosts: [order-service]
http:
- route:
- destination: { host: order-service, subset: stable }
weight: 90
- destination: { host: order-service, subset: canary }
weight: 10
timeout: 10s
retries:
attempts: 3
perTryTimeout: 3s
retryOn: gateway-error,connect-failure,retriable-4xx
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: order-service-dr
namespace: production
spec:
host: order-service
trafficPolicy:
connectionPool:
http:
h2UpgradePolicy: UPGRADE # Use HTTP/2
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
subsets:
- name: stable
labels: { version: v2.1.0 }
- name: canary
labels: { version: v2.2.0-rc1 }Service-to-Service JWT Authentication
// Pattern: short-lived service tokens issued by a central auth service
// auth-service: issue tokens to other services
async function issueServiceToken(
callerService: string,
targetService: string
): Promise<string> {
return jwt.sign(
{
iss: 'auth-service.production.svc',
sub: callerService, // WHO is calling
aud: targetService, // WHO is being called
iat: Math.floor(Date.now() / 1000),
exp: Math.floor(Date.now() / 1000) + 300, // 5-minute lifetime
scope: 'service-call',
},
process.env.SERVICE_SIGNING_KEY!,
{ algorithm: 'RS256', keyid: currentKeyId }
);
}
// Middleware: validate incoming service tokens (in payment-service)
import jwksClient from 'jwks-rsa';
const client = jwksClient({
jwksUri: 'http://auth-service/.well-known/jwks.json',
cache: true,
cacheMaxEntries: 10,
cacheMaxAge: 600_000, // Cache JWKS for 10 minutes
});
async function validateServiceToken(req: Request, res: Response, next: NextFunction) {
const authHeader = req.headers.authorization;
if (!authHeader?.startsWith('Bearer ')) {
return res.status(401).json({ error: 'missing_token' });
}
const token = authHeader.slice(7);
try {
const decoded = jwt.decode(token, { complete: true }) as any;
const key = await client.getSigningKey(decoded?.header?.kid);
const publicKey = key.getPublicKey();
const payload = jwt.verify(token, publicKey, {
algorithms: ['RS256'],
audience: 'payment-service', // This service's name
issuer: 'auth-service.production.svc',
}) as ServiceTokenPayload;
req.callerService = payload.sub;
next();
} catch (err) {
return res.status(401).json({ error: 'invalid_token', detail: (err as Error).message });
}
}
// Client-side: token caching to avoid auth-service call on every request
class ServiceTokenManager {
private tokens = new Map<string, { value: string; expiresAt: number }>();
async getToken(targetService: string): Promise<string> {
const entry = this.tokens.get(targetService);
if (entry && entry.expiresAt - Date.now() > 30_000) { // 30s buffer
return entry.value;
}
const token = await this.authClient.issueToken(targetService);
const payload = jwt.decode(token) as { exp: number };
this.tokens.set(targetService, {
value: token,
expiresAt: payload.exp * 1000,
});
return token;
}
}Monitoring and Observability: Prometheus and Grafana
The three pillars of observability in microservices are metrics, logs, and traces. Metrics (Prometheus) provide aggregate numerical data โ request rates, error rates, latency percentiles, resource usage. Logs (Loki, ELK) provide detailed contextual records of individual events. Traces (Jaeger) show request flow across services. Grafana provides unified dashboards and alerting across all three. The four golden signals โ latency, traffic, errors, and saturation โ should be monitored for every service with automated alerts when thresholds are breached.
# prometheus.yml โ Kubernetes service discovery
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: production-us-east
region: us-east-1
rule_files:
- /etc/prometheus/rules/microservices.yml
alerting:
alertmanagers:
- static_configs:
- targets: [alertmanager.monitoring:9093]
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
namespaces:
names: [production, staging]
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: "true"
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
target_label: __metrics_path__
regex: (.+)
- source_labels: [__meta_kubernetes_pod_ip,
__meta_kubernetes_pod_annotation_prometheus_io_port]
target_label: __address__
regex: (.+);(.+)
replacement: "$1:$2"
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: service# Alerting rules โ microservices golden signals
# /etc/prometheus/rules/microservices.yml
groups:
- name: microservices.golden_signals
interval: 30s
rules:
# Error rate > 5% for 2 minutes
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service, namespace)
/
sum(rate(http_requests_total[5m])) by (service, namespace) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "High error rate on {{ $labels.service }}"
description: "Error rate is {{ $value | humanizePercentage }} (threshold 5%)"
runbook: "https://runbooks.mycompany.com/high-error-rate"
# p99 latency > 1s for 5 minutes
- alert: HighLatencyP99
expr: |
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "High p99 latency on {{ $labels.service }}"
description: "p99 latency is {{ $value | humanizeDuration }}"
# Service is completely down
- alert: ServiceDown
expr: up{job="kubernetes-pods"} == 0
for: 1m
labels:
severity: critical
pagerduty: "true"
annotations:
summary: "Service {{ $labels.service }} is down"
# Circuit breaker is open
- alert: CircuitBreakerOpen
expr: circuit_breaker_state{state="open"} == 1
for: 30s
labels:
severity: warning
annotations:
summary: "Circuit breaker open: {{ $labels.service }} -> {{ $labels.target }}"
# Node.js service exposing Prometheus metrics
import { Registry, Counter, Histogram, Gauge, Summary } from 'prom-client';
const register = new Registry();
register.setDefaultLabels({ service: 'order-service', environment: 'production' });
export const httpRequestsTotal = new Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status_code'],
registers: [register],
});
export const httpLatency = new Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request latency',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
registers: [register],
});
export const ordersTotal = new Counter({
name: 'orders_created_total',
help: 'Total orders created',
labelNames: ['status', 'payment_method', 'region'],
registers: [register],
});
export const activeOrders = new Gauge({
name: 'active_orders_gauge',
help: 'Currently active (non-terminal) orders',
registers: [register],
});
// Instrument all routes automatically
app.use((req, res, next) => {
const timer = httpLatency.startTimer();
res.on('finish', () => {
const route = req.route?.path ?? 'unknown';
const labels = { method: req.method, route, status_code: String(res.statusCode) };
httpRequestsTotal.inc(labels);
timer(labels);
});
next();
});
app.get('/metrics', async (_req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});Architecture Comparison: Monolith vs Microservices vs Serverless
| Attribute | Monolith | Microservices | Serverless |
|---|---|---|---|
| Deployment unit | Single artifact | Per-service containers | Individual functions |
| Scaling | Entire app scales together | Per-service horizontal scaling | Auto-scales to zero |
| Latency | No network overhead | Network hops between services | Cold start latency (100msโ3s) |
| Development speed | Fast initially, slows with growth | Parallel team development | Very fast for simple workflows |
| Operational complexity | Low โ one process to manage | High โ many services, k8s, mesh | Low ops, high vendor lock-in |
| Data management | Shared relational database | Database per service | Typically shared or per-function |
| Technology choice | One language and framework | Per-service technology freedom | Runtime-constrained by provider |
| Cost model | Fixed (servers always running) | Fixed (pods always running) | Pay-per-invocation (spiky is cheap) |
| Fault isolation | One bug can crash everything | Failures are service-scoped | Functions are isolated by default |
| Testing strategy | Unit + integration locally | Contract testing (Pact) required | Mock cloud services (LocalStack) |
| Observability | Simple logging and metrics | Distributed tracing essential | Cloud vendor tools (CloudWatch, X-Ray) |
| Best fit | Small teams, early product | Large teams, complex domain | Event-driven, bursty, scheduled tasks |
Conclusion
Microservices architecture is a powerful tool for building large-scale, maintainable systems โ but it is not a silver bullet and it is not appropriate for every team or every stage of a product. The distributed nature introduces real, unavoidable complexity: network failures, eventual consistency, distributed tracing, and operational overhead that can overwhelm teams without the organizational maturity to manage it.
The key is to adopt microservices patterns incrementally, driven by real organizational and technical needs. Start with a well-structured monolith that follows domain boundaries in its module structure. When team size creates coordination friction or when scaling requirements diverge, extract services โ starting with those that change most frequently or have the most distinct scaling needs. Add an API Gateway early to centralize cross-cutting concerns. Instrument everything with OpenTelemetry from day one; debugging distributed systems without tracing is extraordinarily painful. Use circuit breakers on every external call and define fallback behaviors before you need them. Give each service its own database and embrace eventual consistency through well-defined events. Deploy on Kubernetes for resilience, layer Istio for zero-trust security, and monitor with Prometheus and Grafana. When these patterns are applied thoughtfully, microservices enable teams to move fast at scale โ independently, safely, and with confidence in each deployment.
Frequently Asked Questions
What is microservices architecture?
Microservices architecture is a software design approach where an application is built as a collection of small, independently deployable services. Each service runs in its own process, owns its own data, communicates via well-defined APIs, and can be developed, deployed, and scaled by an independent team.
When should I use microservices over a monolith?
Use microservices when your team is large enough to work on independent services (typically 3+ teams), when different parts of your system have divergent scaling requirements, when you need independent deployment cycles, or when you require technology diversity. Start with a monolith for small teams or early-stage products โ a well-structured monolith is easier to migrate later than a poorly designed distributed system.
What is the difference between REST, gRPC, and message queues?
REST uses HTTP/JSON for synchronous request-response communication โ human-readable, universally supported, ideal for public APIs. gRPC uses HTTP/2 and Protocol Buffers for high-performance binary synchronous calls with strong typing โ ideal for internal service calls. Message queues (RabbitMQ, Kafka) enable asynchronous, decoupled communication โ the sender does not wait for a response, suitable for event-driven workflows and eventual consistency.
What is an API Gateway and why do I need one?
An API Gateway is the single entry point for all external requests to your microservices. It centralizes authentication, rate limiting, SSL termination, request routing, logging, and caching โ concerns that would otherwise need to be duplicated in every service. Without it, clients must know the address of every service and each service must implement its own auth and rate limiting.
What is the circuit breaker pattern?
The circuit breaker prevents cascade failures by stopping outbound calls to a failing service after a failure threshold is reached. In the Open state, calls fail immediately without hitting the network, allowing the downstream service time to recover. After a reset timeout, it enters Half-Open state and tests with a small number of calls. Resilience4j (Java) and opossum (Node.js) are popular implementations.
How do microservices handle data management and cross-service queries?
Each service owns its own database exclusively (database-per-service pattern). Cross-service queries are handled through CQRS: each service maintains a read-model projection updated by consuming events from other services. Distributed transactions use the Saga pattern โ a sequence of local transactions coordinated by events โ rather than two-phase commit, which does not scale.
What is distributed tracing and how does OpenTelemetry help?
Distributed tracing tracks a request as it flows through multiple services, recording timing and context at each step. OpenTelemetry is the vendor-neutral CNCF standard that provides language SDKs, auto-instrumentation, and exporters for traces, metrics, and logs. It sends data to backends like Jaeger (open-source) or commercial vendors (Datadog, New Relic, Grafana Tempo) without vendor lock-in.
How does Istio improve microservices security?
Istio injects an Envoy sidecar proxy into every pod, handling all network traffic without code changes. It enforces mTLS (mutual TLS) for encrypted, mutually-authenticated service-to-service communication, implements fine-grained authorization policies specifying which services can call which endpoints, provides traffic management for canary deployments, and generates telemetry for all service communication.