DevOps Pipeline Guide: CI/CD, GitHub Actions, Docker, Infrastructure as Code & Deployment Strategies
A comprehensive DevOps pipeline guide covering CI/CD fundamentals, GitHub Actions workflows, GitLab CI/CD, Docker multi-stage builds, Terraform and Pulumi IaC, blue-green and canary deployments, secrets management with Vault, GitOps with ArgoCD and Flux, pipeline security with SAST/DAST, and monitoring strategies.
CI/CD Fundamentals
Continuous Integration (CI) and Continuous Delivery/Deployment (CD) form the backbone of modern DevOps pipelines. CI ensures that every code change is automatically built and tested, catching bugs early. CD extends this by automatically deploying validated changes to staging or production environments.
Build, Test, Deploy Pipeline
A standard pipeline has three core stages. The Build stage compiles code, resolves dependencies, and produces artifacts. The Test stage runs unit tests, integration tests, and linting. The Deploy stage pushes artifacts to target environments. Each stage acts as a quality gate — if any stage fails, the pipeline stops and notifies the team.
# Typical CI/CD Pipeline Stages
#
# 1. Source - Code commit triggers pipeline
# 2. Build - Compile code, install dependencies
# 3. Test - Unit tests, integration tests, linting
# 4. Security - SAST, dependency scanning
# 5. Package - Build Docker image, push to registry
# 6. Deploy - Deploy to staging environment
# 7. Verify - Smoke tests, health checks
# 8. Promote - Deploy to production (manual gate or auto)
# 9. Monitor - Track metrics, error rates, performance
# Each stage acts as a quality gate:
# If tests fail -> pipeline stops, team notified
# If scan finds critical CVE -> pipeline blocks merge
# If health check fails -> automatic rollbackGitHub Actions Workflows
GitHub Actions is a CI/CD platform built into GitHub. Workflows are defined in YAML files under .github/workflows/ and triggered by events like push, pull_request, or schedule. Key features include matrix builds for testing across multiple OS/language versions, caching for faster builds, encrypted secrets for credentials, and reusable workflows for DRY pipeline definitions.
# .github/workflows/ci.yml
name: CI Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [18, 20, 22]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: \${{ matrix.node-version }}
cache: "npm"
- run: npm ci
- run: npm test
- run: npm run lint
build-and-push:
needs: test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: \${{ github.actor }}
password: \${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v5
with:
push: true
tags: ghcr.io/\${{ github.repository }}:latest
cache-from: type=gha
cache-to: type=gha,mode=maxGitLab CI/CD
GitLab CI/CD uses a .gitlab-ci.yml file in the repository root. It features built-in container registry, Auto DevOps for zero-configuration pipelines, environments with review apps, and DAG (Directed Acyclic Graph) pipelines for complex dependency management. GitLab runners execute pipeline jobs and can be shared or project-specific.
# .gitlab-ci.yml
stages:
- test
- build
- deploy
variables:
DOCKER_IMAGE: \$CI_REGISTRY_IMAGE:\$CI_COMMIT_SHA
test:
stage: test
image: node:20-alpine
cache:
key: \$CI_COMMIT_REF_SLUG
paths:
- node_modules/
script:
- npm ci
- npm test
- npm run lint
artifacts:
reports:
junit: test-results.xml
build:
stage: build
image: docker:24
services:
- docker:24-dind
script:
- docker login -u \$CI_REGISTRY_USER -p \$CI_REGISTRY_PASSWORD \$CI_REGISTRY
- docker build -t \$DOCKER_IMAGE .
- docker push \$DOCKER_IMAGE
deploy_staging:
stage: deploy
environment:
name: staging
url: https://staging.example.com
script:
- kubectl set image deployment/app app=\$DOCKER_IMAGE
only:
- mainDocker Multi-Stage Builds
Multi-stage builds use multiple FROM statements in a single Dockerfile. The build stage contains all development dependencies and compilers, while the final stage copies only the compiled artifact into a minimal base image. This dramatically reduces image size and attack surface. Always use specific version tags for base images, never use latest in production.
# Dockerfile — Multi-stage build
# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && \
cp -R node_modules /prod_modules && \
npm ci
COPY . .
RUN npm run build
# Stage 2: Production
FROM node:20-alpine AS production
WORKDIR /app
RUN addgroup -g 1001 appgroup && \
adduser -u 1001 -G appgroup -s /bin/sh -D appuser
COPY --from=builder /prod_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]
# Result: ~150MB instead of ~1.2GB
# No dev dependencies, no source code, non-root userContainer Registries
Container registries store and distribute Docker images. Options include Docker Hub (public and private repositories), GitHub Container Registry (ghcr.io, integrated with GitHub Actions), Amazon ECR (integrated with AWS services), Google Artifact Registry, and Azure Container Registry. Choose based on your cloud provider and access control requirements. Always scan images for vulnerabilities before pushing to production registries.
Terraform Basics
Terraform by HashiCorp is the most widely adopted Infrastructure as Code tool. It uses HCL (HashiCorp Configuration Language) to declaratively define infrastructure. The core workflow is terraform init (initialize providers), terraform plan (preview changes), and terraform apply (execute changes). State is stored in a backend (S3, Azure Blob, GCS) to track resource mappings. Use modules for reusable infrastructure components and workspaces for environment separation.
# main.tf — Terraform AWS ECS Service
terraform {
required_version = ">= 1.7"
backend "s3" {
bucket = "myapp-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
}
}
provider "aws" {
region = var.aws_region
}
resource "aws_ecs_cluster" "main" {
name = "\${var.project}-\${var.environment}"
setting {
name = "containerInsights"
value = "enabled"
}
}
resource "aws_ecs_service" "app" {
name = "\${var.project}-svc"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = var.app_count
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnets
security_groups = [aws_security_group.ecs.id]
}
load_balancer {
target_group_arn = aws_lb_target_group.app.arn
container_name = var.project
container_port = 3000
}
}
# terraform init -> download providers
# terraform plan -> preview changes
# terraform apply -> execute changesPulumi — IaC with Programming Languages
Pulumi takes a different approach to IaC by using real programming languages (TypeScript, Python, Go, C#) instead of domain-specific languages. This gives you access to loops, conditionals, type checking, IDE support, and testing frameworks. Pulumi manages state similarly to Terraform and supports all major cloud providers. It is particularly appealing to teams that prefer writing infrastructure code in the same language as their application.
// index.ts — Pulumi AWS ECS with TypeScript
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
import * as awsx from "@pulumi/awsx";
const config = new pulumi.Config();
const environment = config.require("environment");
const desiredCount = config.getNumber("desiredCount") || 2;
// Create a VPC with best-practice defaults
const vpc = new awsx.ec2.Vpc("app-vpc", {
numberOfAvailabilityZones: 2,
natGateways: { strategy: "Single" },
});
// Create an ECS cluster
const cluster = new aws.ecs.Cluster("app-cluster", {
settings: [{
name: "containerInsights",
value: "enabled",
}],
});
// Build and publish Docker image
const image = new awsx.ecs.Image("app-image", {
repositoryUrl: repo.url,
context: "./app",
platform: "linux/amd64",
});
// Create Fargate service with ALB
const service = new awsx.ecs.FargateService("app-svc", {
cluster: cluster.arn,
desiredCount: desiredCount,
taskDefinitionArgs: {
container: {
image: image.imageUri,
cpu: 256,
memory: 512,
portMappings: [{ containerPort: 3000 }],
},
},
});
export const url = service.loadBalancer?.endpoint;Deployment Strategies: Blue-Green and Canary
Blue-green deployment maintains two identical production environments. Blue is the current live version, Green is the new version. After deploying and testing Green, traffic is switched from Blue to Green. Rollback is instant — just switch back to Blue. Canary deployment gradually routes a small percentage of traffic (e.g., 5%) to the new version, monitors metrics, and increases traffic if everything looks healthy. Rolling deployments update instances one at a time, maintaining availability throughout.
# Blue-Green Deployment with Kubernetes
# Step 1: Deploy green (new version) alongside blue (current)
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-green
labels:
app: myapp
version: green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: app
image: myapp:2.0.0
ports:
- containerPort: 3000
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
---
# Step 2: Switch service selector to green
apiVersion: v1
kind: Service
metadata:
name: app-service
spec:
selector:
app: myapp
version: green # Change from "blue" to "green"
ports:
- port: 80
targetPort: 3000Environment Management
Proper environment management requires at least three tiers: development, staging, and production. Staging should mirror production as closely as possible in terms of configuration, data shape, and infrastructure. Use environment-specific configuration files, feature flags for gradual rollouts, and database migration strategies that work across environments. Never share secrets between environments.
Secrets Management
Never store secrets in code, environment variables on disk, or pipeline configurations. Use dedicated secrets management tools. HashiCorp Vault provides dynamic secrets, encryption as a service, and fine-grained access policies. AWS Secrets Manager integrates with IAM for access control and supports automatic rotation. For Kubernetes, use External Secrets Operator to sync secrets from Vault or cloud providers into Kubernetes Secrets.
# Vault integration in GitHub Actions
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Import Secrets from Vault
uses: hashicorp/vault-action@v3
with:
url: https://vault.example.com
method: jwt
role: github-deploy
secrets: |
secret/data/prod/db DB_PASSWORD ;
secret/data/prod/api API_KEY
- name: Deploy with secrets
run: |
echo "Deploying with injected secrets..."
# Secrets are available as env vars
# DB_PASSWORD and API_KEY injected by vault-action
helm upgrade app ./chart \
--set db.password=\$DB_PASSWORD \
--set api.key=\$API_KEY
# External Secrets Operator for Kubernetes
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: app-secrets
data:
- secretKey: db-password
remoteRef:
key: secret/data/prod/db
property: passwordMonitoring Pipelines
Pipeline monitoring goes beyond application monitoring. Track build duration trends, test flakiness rates, deployment frequency, lead time for changes, mean time to recovery (MTTR), and change failure rate. These are the DORA metrics that indicate DevOps performance. Use tools like Grafana dashboards, Datadog CI Visibility, or built-in analytics from GitHub Actions or GitLab to visualize pipeline health.
Rollback Strategies
Every deployment needs a rollback plan. Strategies include: reverting the Git commit and re-running the pipeline, using container image tags to redeploy the previous version, Kubernetes rollout undo for instant rollback, database migration rollback scripts (always write backward-compatible migrations), and feature flags to disable problematic features without redeploying. Automate rollbacks based on health check failures and error rate thresholds.
# Kubernetes rollback commands
# View rollout history
kubectl rollout history deployment/app
# Rollback to previous version
kubectl rollout undo deployment/app
# Rollback to specific revision
kubectl rollout undo deployment/app --to-revision=3
# Check rollout status
kubectl rollout status deployment/app
# -------------------------------------------
# Automated rollback with health checks
# -------------------------------------------
# deploy.sh
#!/bin/bash
set -e
DEPLOY_NAME="app"
NAMESPACE="production"
TIMEOUT="300s"
# Apply new deployment
kubectl apply -f deployment.yaml -n \$NAMESPACE
# Wait for rollout with timeout
if ! kubectl rollout status deployment/\$DEPLOY_NAME \
-n \$NAMESPACE --timeout=\$TIMEOUT; then
echo "Rollout failed! Initiating rollback..."
kubectl rollout undo deployment/\$DEPLOY_NAME -n \$NAMESPACE
kubectl rollout status deployment/\$DEPLOY_NAME -n \$NAMESPACE
echo "Rollback complete."
exit 1
fi
echo "Deployment successful!"GitOps with ArgoCD and Flux
GitOps uses Git repositories as the single source of truth for declarative infrastructure and applications. ArgoCD and Flux are Kubernetes-native GitOps operators that continuously reconcile the desired state in Git with the actual state in the cluster. Changes are made via pull requests — no direct kubectl apply. This provides audit trails, easy rollbacks (git revert), and consistent environments. ArgoCD offers a web UI for visualization, while Flux is more lightweight and composable.
# ArgoCD Application manifest
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-production
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/org/k8s-manifests.git
targetRevision: main
path: environments/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # Delete resources removed from Git
selfHeal: true # Revert manual cluster changes
syncOptions:
- CreateNamespace=true
retry:
limit: 3
backoff:
duration: 5s
factor: 2
maxDuration: 3m
# Flux Kustomization
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: myapp
namespace: flux-system
spec:
interval: 5m
path: ./environments/production
prune: true
sourceRef:
kind: GitRepository
name: k8s-manifests
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: myapp
namespace: productionPipeline Security: SAST, DAST, and Dependency Scanning
Shift security left by integrating it into the pipeline. SAST (Static Application Security Testing) analyzes source code for vulnerabilities without running it — tools include Semgrep, SonarQube, and CodeQL. DAST (Dynamic Application Security Testing) tests the running application for vulnerabilities like XSS and SQL injection — tools include OWASP ZAP and Burp Suite. Dependency scanning checks third-party packages for known CVEs — tools include Snyk, Dependabot, and Trivy for container images. Run all three in every pipeline.
# Security scanning in GitHub Actions
name: Security Pipeline
on: [push, pull_request]
jobs:
sast:
name: Static Analysis (SAST)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Semgrep
uses: returntocorp/semgrep-action@v1
with:
config: >-
p/security-audit
p/owasp-top-ten
p/nodejs
dependency-scan:
name: Dependency Scanning
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Snyk
uses: snyk/actions/node@master
env:
SNYK_TOKEN: \${{ secrets.SNYK_TOKEN }}
with:
args: --severity-threshold=high
container-scan:
name: Container Image Scan
runs-on: ubuntu-latest
needs: [sast]
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t myapp:scan .
- name: Run Trivy
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:scan
format: table
exit-code: 1
severity: CRITICAL,HIGH
dast:
name: Dynamic Analysis (DAST)
runs-on: ubuntu-latest
needs: [container-scan]
steps:
- name: Deploy to test environment
run: |
docker run -d -p 3000:3000 myapp:scan
sleep 10
- name: OWASP ZAP Baseline Scan
uses: zaproxy/action-baseline@v0.12.0
with:
target: http://localhost:3000