CI/CD指南：GitHub Actions、GitLab CI、Docker和部署流水线

TL;DR

CI builds and tests every commit; CD automates deployment to staging or production
GitHub Actions uses YAML workflows triggered by events; GitLab CI uses .gitlab-ci.yml with stages
Docker multi-stage builds produce small, secure images; layer caching keeps CI fast
Blue-green and canary deployments enable zero-downtime releases with safe rollback
Store secrets outside code using GitHub Secrets, HashiCorp Vault, or cloud secret managers
Speed up pipelines with parallelism, dependency caching, and path-based filtering

Key Takeaways

A failing pipeline is not a problem — it is a fast feedback loop that prevents broken code from reaching users
Treat your pipeline configuration as production code: review it, test it, version it
Start simple (push → test → deploy to staging) and add complexity only when the pain justifies it
Invest in caching early; it is the single highest-ROI pipeline optimization
OIDC-based authentication to cloud providers eliminates the need for long-lived credentials in CI

CI/CD（持续集成与持续交付/部署）已成为现代软件工程的核心支柱。那些发布快速、可靠且能从故障中快速恢复的团队都有一个共同点：设计良好的自动化流水线。本指南将带你全面了解如何使用 GitHub Actions、GitLab CI、Docker 和经过验证的部署模式构建生产级 CI/CD 工作流。

CI/CD 核心概念

持续集成（CI）

持续集成是指每次开发者将代码推送到共享仓库时，自动触发构建和测试流程。目标是尽早发现集成问题，避免"集成地狱"——即多个开发者的代码长期独立开发后合并时产生的大量冲突。

CI 的最佳实践：每天提交多次、保持构建时间在 10 分钟以内（避免慢反馈循环）、当主干构建失败时立即修复，以及对所有分支都运行 CI。

持续交付（CD – Delivery）

持续交付在 CI 的基础上，将每次通过测试的构建自动打包成可部署的制品（Docker 镜像、JAR 包、zip 包等），并将其部署到预发布环境。部署到生产仍需人工审批。

持续部署（CD – Deployment）

持续部署更进一步，每次通过所有测试和检查的提交都会自动部署到生产环境，无需人工干预。这是成熟度最高的 CI/CD 模式，要求有完善的自动化测试套件、特性开关（Feature Flag）和强健的监控体系。

Aspect	CI	Continuous Delivery	Continuous Deployment
Trigger	Every push	Every passing build	Every passing build
Production deploy	Manual	Manual approval	Automatic
Human gate	Yes (deploy)	Yes (prod deploy)	None
Test suite required	Basic	Comprehensive	Very comprehensive

GitHub Actions 深度解析

GitHub Actions 是 GitHub 原生的 CI/CD 平台，于 2018 年推出，现已成为开源项目和许多企业项目的首选。它基于事件驱动，支持数千种社区维护的 Action，与 GitHub 的 Issues、PRs、Packages、Releases 深度集成。

工作流语法基础

工作流文件存放在 <code>.github/workflows/</code> 目录下，使用 YAML 格式，结构如下：

# .github/workflows/ci.yml
name: CI Pipeline

# Triggers
on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]
  workflow_dispatch:  # manual trigger

# Environment variables available to all jobs
env:
  NODE_VERSION: "20"

jobs:
  test:
    name: Run Tests
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: "npm"

      - name: Install dependencies
        run: npm ci

      - name: Run linter
        run: npm run lint

      - name: Run unit tests
        run: npm test -- --coverage

      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          token: ${{ secrets.CODECOV_TOKEN }}

矩阵策略（Matrix Strategy）

矩阵策略允许用一个 Job 定义跑多个参数组合，GitHub Actions 会自动并行执行所有组合：

jobs:
  test-matrix:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false  # continue other combos if one fails
      matrix:
        os: [ubuntu-latest, windows-latest, macos-latest]
        node: [18, 20, 22]
        exclude:
          - os: windows-latest
            node: 18  # skip this specific combo
        include:
          - os: ubuntu-latest
            node: 20
            experimental: true  # add extra property
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node }}
      - run: npm ci && npm test

Job 依赖与条件执行

jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
    steps:
      - uses: actions/checkout@v4
      - id: meta
        run: echo "tags=myapp:${{ github.sha }}" >> $GITHUB_OUTPUT
      - run: docker build -t myapp:${{ github.sha }} .

  deploy-staging:
    needs: build
    if: github.ref == "refs/heads/main"
    environment: staging
    runs-on: ubuntu-latest
    steps:
      - run: echo "Deploying ${{ needs.build.outputs.image-tag }} to staging"

  deploy-prod:
    needs: deploy-staging
    if: github.event_name == "push" && github.ref == "refs/heads/main"
    environment:
      name: production
      url: https://myapp.com
    runs-on: ubuntu-latest
    steps:
      - run: echo "Deploying to production"

密钥与 OIDC 认证

避免在工作流中存储长期凭证。使用 GitHub OIDC 直接向 AWS、GCP 或 Azure 申请临时凭证：

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write  # required for OIDC
      contents: read
    steps:
      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
          aws-region: us-east-1
          # No ACCESS_KEY_ID or SECRET_ACCESS_KEY needed!

      - name: Deploy to ECS
        run: |
          aws ecs update-service \
            --cluster prod-cluster \
            --service myapp \
            --force-new-deployment

GitLab CI/CD 完全指南

GitLab CI 将流水线配置集中在仓库根目录的 <code>.gitlab-ci.yml</code> 文件中。它以"Stage（阶段）"为组织单位，同一 Stage 内的 Job 并行运行，Stage 之间顺序执行。

完整的 .gitlab-ci.yml 示例

# .gitlab-ci.yml
image: node:20-alpine

stages:
  - install
  - test
  - build
  - deploy

variables:
  npm_config_cache: "$CI_PROJECT_DIR/.npm"

# Reusable cache configuration
.node-cache: &node-cache
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - .npm/
      - node_modules/

install-deps:
  stage: install
  <<: *node-cache
  script:
    - npm ci
  artifacts:
    paths:
      - node_modules/
    expire_in: 1 hour

unit-tests:
  stage: test
  <<: *node-cache
  script:
    - npm test -- --coverage --ci
  coverage: /All files[^|]*|[^|]*s+([d.]+)/
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml
    paths:
      - coverage/

build-app:
  stage: build
  script:
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 1 week
  only:
    - main
    - merge_requests

deploy-staging:
  stage: deploy
  environment:
    name: staging
    url: https://staging.myapp.com
  script:
    - echo "Deploying to staging..."
    - ./scripts/deploy.sh staging
  only:
    - main

deploy-production:
  stage: deploy
  environment:
    name: production
    url: https://myapp.com
  script:
    - ./scripts/deploy.sh production
  when: manual  # requires human approval
  only:
    - main

Rules 与路径过滤

使用 <code>rules</code> 关键字可以精确控制 Job 的触发条件，在 Monorepo 中尤为有用：

build-api:
  stage: build
  script:
    - cd packages/api && npm run build
  rules:
    # Run on merge request if api files changed
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      changes:
        - packages/api/**/*
        - packages/shared/**/*
    # Always run on main branch pushes
    - if: $CI_COMMIT_BRANCH == "main"

build-frontend:
  stage: build
  script:
    - cd packages/frontend && npm run build
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      changes:
        - packages/frontend/**/*
        - packages/shared/**/*
    - if: $CI_COMMIT_BRANCH == "main"

GitLab Runner 类型

Runner Type	Use Case	Isolation	Notes
Shell	Legacy / simple jobs	None	Runs directly on host
Docker	Most workloads	Container	Clean env each run
Kubernetes	Scale-out / cloud-native	Pod	Auto-scales runner pods
Instance (SaaS)	gitlab.com users	VM	Free tier: 400 min/month

Docker 在 CI/CD 中的应用

多阶段构建

多阶段构建是生产环境 Dockerfile 的标准模式。它将"构建环境"（包含编译器、测试工具等）和"运行环境"（只有运行时依赖）分离，生成体积更小、攻击面更小的镜像。

# Dockerfile

# Stage 1: Install dependencies
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

# Stage 2: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Stage 3: Runtime (smallest possible image)
FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production

# Copy only what is needed at runtime
COPY --from=deps /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./

# Run as non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

EXPOSE 3000
CMD ["node", "dist/server.js"]

在 GitHub Actions 中构建并推送镜像

jobs:
  build-push:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write  # for GHCR
    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Login to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract Docker metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/${{ github.repository }}
          tags: |
            type=sha
            type=ref,event=branch
            type=semver,pattern={{version}}

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha        # GitHub Actions cache
          cache-to: type=gha,mode=max

Docker 层缓存最优实践

Docker 按层从上到下缓存。一旦某层失效，其后的所有层都需要重建。因此 Dockerfile 的层顺序至关重要：

Copy package.json and lock files FIRST, then run npm ci — this layer only invalidates when dependencies change
Copy source code AFTER installing dependencies — source changes are frequent but fast to copy
Use .dockerignore to exclude node_modules, .git, test files, and docs from the build context
Use --mount=type=cache in BuildKit for package manager caches that persist across builds

部署策略详解

滚动更新（Rolling Deployment）

滚动更新逐步将实例替换为新版本。例如，有 10 个实例时，每次替换 2 个，直到所有实例更新完毕。Kubernetes Deployment 默认使用此策略。优点是资源消耗低；缺点是更新过程中新旧版本共存，需要确保 API 向后兼容。

# kubernetes-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2       # max extra pods during rollout
      maxUnavailable: 0 # never reduce below desired count
  template:
    spec:
      containers:
        - name: myapp
          image: ghcr.io/myorg/myapp:v2.1.0
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5

蓝绿部署（Blue-Green Deployment）

蓝绿部署维护两套完全相同的生产环境（蓝色和绿色），任何时候只有一套在线上服务流量。部署新版本时，将其部署到空闲的环境，运行冒烟测试，然后切换负载均衡器。出现问题时可立即回滚，切换负载均衡器即可恢复。

# blue-green deploy script
#!/bin/bash
set -euo pipefail

CURRENT=$(aws elbv2 describe-target-groups \
  --names myapp-blue myapp-green \
  --query "TargetGroups[?Tags[?Key=='active' && Value=='true']].TargetGroupName" \
  --output text)

if [ "$CURRENT" = "myapp-blue" ]; then
  NEW_TG="myapp-green"
  OLD_TG="myapp-blue"
else
  NEW_TG="myapp-blue"
  OLD_TG="myapp-green"
fi

echo "Deploying to $NEW_TG"

# Update the idle target group
aws ecs update-service --cluster prod --service "myapp-${NEW_TG}" \
  --task-definition "myapp:$NEW_TASK_DEF_REVISION" \
  --force-new-deployment

# Wait for stability
aws ecs wait services-stable --cluster prod --services "myapp-${NEW_TG}"

# Run smoke tests
./scripts/smoke-test.sh "https://staging.myapp.com"

# Switch traffic
aws elbv2 modify-listener \
  --listener-arn $LISTENER_ARN \
  --default-actions "Type=forward,TargetGroupArn=$(aws elbv2 describe-target-groups --names $NEW_TG --query TargetGroups[0].TargetGroupArn --output text)"

echo "Successfully switched traffic to $NEW_TG"

金丝雀发布（Canary Deployment）

金丝雀发布将少量（如 5%）的真实生产流量路由到新版本，同时监控错误率、延迟和业务指标。如果指标正常，逐步扩大流量比例，直到 100%。如果出现异常，立即将所有流量切回稳定版本。

流水线中的测试策略

测试金字塔

有效的 CI 测试套件遵循测试金字塔原则：大量快速的单元测试（毫秒级）、中等数量的集成测试（秒级）、少量端到端测试（分钟级）。越靠近底层，运行越快、维护成本越低、反馈越及时。

# Full test pipeline with coverage gate
jobs:
  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "20", cache: "npm" }
      - run: npm ci
      - run: npm run test:unit -- --coverage
      - name: Check coverage threshold
        run: |
          COVERAGE=$(cat coverage/coverage-summary.json | \
            jq ".total.lines.pct")
          echo "Coverage: $COVERAGE%"
          if (( $(echo "$COVERAGE < 80" | bc -l) )); then
            echo "Coverage $COVERAGE% is below 80% threshold"
            exit 1
          fi

  integration-tests:
    runs-on: ubuntu-latest
    needs: unit-tests
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: testdb
          POSTGRES_USER: testuser
          POSTGRES_PASSWORD: testpass
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
      redis:
        image: redis:7-alpine
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "20", cache: "npm" }
      - run: npm ci
      - run: npm run test:integration
        env:
          DATABASE_URL: postgresql://testuser:testpass@localhost:5432/testdb
          REDIS_URL: redis://localhost:6379

环境管理

典型的多环境架构

Environment	Trigger	Approval	Purpose
Preview	Every PR	None	PR review, feature demo
Staging	Merge to main	None (auto)	QA, integration, UAT
Production	Tag / release	Required	Live user traffic

环境变量与密钥管理

不同环境需要不同的配置。按照以下原则管理：非敏感配置（如功能开关、API 端点）放在 CI 的环境变量中；敏感数据（如数据库密码、API 密钥）使用专用密钥管理系统。

# Using HashiCorp Vault in GitHub Actions
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Import Secrets from Vault
        uses: hashicorp/vault-action@v3
        with:
          url: https://vault.mycompany.com
          method: jwt
          role: github-actions
          secrets: |
            secret/data/myapp/prod database_url | DATABASE_URL ;
            secret/data/myapp/prod redis_url | REDIS_URL ;
            secret/data/myapp/prod stripe_key | STRIPE_SECRET_KEY

      - name: Deploy
        run: ./scripts/deploy.sh
        env:
          DATABASE_URL: ${{ env.DATABASE_URL }}
          REDIS_URL: ${{ env.REDIS_URL }}

Monorepo CI/CD

Monorepo（单仓库包含多个服务/包）对 CI/CD 带来了独特挑战：如何避免每次提交都触发所有服务的完整构建？答案是路径过滤和增量构建工具。

使用 Turborepo 的 Monorepo 流水线

# .github/workflows/monorepo-ci.yml
name: Monorepo CI

on:
  push:
    branches: [main]
  pull_request:

jobs:
  changes:
    runs-on: ubuntu-latest
    outputs:
      api: ${{ steps.filter.outputs.api }}
      frontend: ${{ steps.filter.outputs.frontend }}
      shared: ${{ steps.filter.outputs.shared }}
    steps:
      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            api:
              - "packages/api/**"
              - "packages/shared/**"
            frontend:
              - "packages/frontend/**"
              - "packages/shared/**"
            shared:
              - "packages/shared/**"

  test-api:
    needs: changes
    if: needs.changes.outputs.api == "true"
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "20", cache: "npm" }
      - run: npm ci
      - run: npx turbo run test --filter=@myapp/api...

  test-frontend:
    needs: changes
    if: needs.changes.outputs.frontend == "true"
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "20", cache: "npm" }
      - run: npm ci
      - run: npx turbo run test --filter=@myapp/frontend...

流水线性能优化

依赖缓存

# Cache node_modules across runs
- name: Cache node modules
  uses: actions/cache@v4
  with:
    path: |
      ~/.npm
      node_modules
    key: ${{ runner.os }}-node-${{ hashFiles("**/package-lock.json") }}
    restore-keys: |
      ${{ runner.os }}-node-

# For Python projects
- uses: actions/setup-python@v5
  with:
    python-version: "3.12"
    cache: "pip"

# For Rust projects
- uses: Swatinem/rust-cache@v2
  with:
    workspaces: ". -> target"

# For Gradle (Android / Java)
- name: Cache Gradle
  uses: actions/cache@v4
  with:
    path: |
      ~/.gradle/caches
      ~/.gradle/wrapper
    key: ${{ runner.os }}-gradle-${{ hashFiles("**/*.gradle*", "**/gradle-wrapper.properties") }}

并行化测试

# Split test suite across 4 parallel jobs
jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "20", cache: "npm" }
      - run: npm ci
      - name: Run test shard
        run: |
          npx jest \
            --shard=${{ matrix.shard }}/4 \
            --coverage \
            --ci

通知与状态检查

Slack 通知集成

# Notify Slack on deployment success or failure
- name: Notify Slack on success
  if: success()
  uses: slackapi/slack-github-action@v1
  with:
    channel-id: "deployments"
    payload: |
      {
        "text": ":white_check_mark: Deployed to production",
        "attachments": [{
          "color": "good",
          "fields": [
            { "title": "Version", "value": "${{ github.sha }}", "short": true },
            { "title": "Author", "value": "${{ github.actor }}", "short": true }
          ]
        }]
      }
  env:
    SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

- name: Notify Slack on failure
  if: failure()
  uses: slackapi/slack-github-action@v1
  with:
    channel-id: "deployments"
    payload: |
      {
        "text": ":x: Production deployment FAILED",
        "attachments": [{
          "color": "danger",
          "fields": [
            { "title": "Run URL", "value": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}", "short": false }
          ]
        }]
      }
  env:
    SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

GitHub Actions vs GitLab CI vs CircleCI 对比

Feature	GitHub Actions	GitLab CI	CircleCI
Config file	.github/workflows/*.yml	.gitlab-ci.yml	.circleci/config.yml
Free tier	2,000 min/month	400 min/month	6,000 min/month
Marketplace	20,000+ actions	Component catalog	Orbs registry
Self-hosted	Self-hosted runners	GitLab Runner	Self-hosted runners
Docker support	Services containers	Services + DinD	Native Docker layer
Caching	actions/cache	cache: keyword	restore_cache step
OIDC cloud auth	Yes (AWS/GCP/Azure)	Yes (ID tokens)	Yes (OIDC contexts)
Best for	GitHub-hosted repos	Self-hosted GitLab	Speed-focused teams
Parallelism	Matrix + jobs	Parallel + needs	Native parallel jobs

完整的生产级工作流示例

以下是一个真实的 Node.js 应用完整 CI/CD 工作流，覆盖从代码提交到生产部署的全流程：

# .github/workflows/production.yml
name: Production Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true  # cancel stale PR runs

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  quality:
    name: Code Quality
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "20"
          cache: "npm"
      - run: npm ci
      - run: npm run lint
      - run: npm run typecheck

  test:
    name: Tests
    runs-on: ubuntu-latest
    needs: quality
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "20"
          cache: "npm"
      - run: npm ci
      - run: npm test -- --coverage
      - uses: codecov/codecov-action@v4

  build-image:
    name: Build Docker Image
    runs-on: ubuntu-latest
    needs: test
    if: github.event_name == "push"
    permissions:
      contents: read
      packages: write
      id-token: write
    outputs:
      digest: ${{ steps.build.outputs.digest }}
      image: ${{ steps.meta.outputs.tags }}
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
      - id: build
        uses: docker/build-push-action@v5
        with:
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy-staging:
    name: Deploy to Staging
    runs-on: ubuntu-latest
    needs: build-image
    environment: staging
    steps:
      - uses: actions/checkout@v4
      - name: Configure AWS
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_STAGING_ROLE }}
          aws-region: us-east-1
      - run: |
          aws ecs update-service \
            --cluster staging \
            --service myapp \
            --image-override ${{ needs.build-image.outputs.image }} \
            --force-new-deployment
      - run: aws ecs wait services-stable --cluster staging --services myapp
      - run: ./scripts/smoke-test.sh https://staging.myapp.com

  deploy-production:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: deploy-staging
    environment:
      name: production
      url: https://myapp.com
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_PROD_ROLE }}
          aws-region: us-east-1
      - run: |
          aws ecs update-service \
            --cluster production \
            --service myapp \
            --image-override ${{ needs.build-image.outputs.image }} \
            --force-new-deployment
      - run: aws ecs wait services-stable --cluster production --services myapp
      - name: Notify Slack
        if: always()
        uses: slackapi/slack-github-action@v1
        with:
          channel-id: deployments
          payload: |
            {
              "text": "${{ job.status == 'success' && ':white_check_mark: Production deploy succeeded' || ':x: Production deploy FAILED' }}"
            }
        env:
          SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

常见问题（FAQ）

Q: What is the difference between CI, CD (Delivery), and CD (Deployment)?

Continuous Integration (CI) automatically builds and tests code on every commit. Continuous Delivery (CD) extends CI by automatically preparing a release artifact that is ready to deploy but requires manual approval to go live. Continuous Deployment goes one step further and automatically deploys every passing build to production without human intervention.

Q: How do I securely pass secrets in GitHub Actions?

Store sensitive values in Settings > Secrets and variables > Actions in your repository. Reference them in workflows using the ${{ secrets.YOUR_SECRET_NAME }} syntax. Never hard-code secrets in workflow files or print them in logs. For advanced cases, use GitHub OIDC to assume cloud roles without storing long-lived credentials at all.

Q: What is a matrix strategy in GitHub Actions?

A matrix strategy allows a single job definition to run across multiple combinations of variables (e.g., Node.js versions, operating systems). GitHub Actions fans out the job automatically, running all combinations in parallel. This is useful for cross-platform testing or multi-version compatibility checks without duplicating job definitions.

Q: How does Docker layer caching speed up CI builds?

Docker caches each layer of an image. If a layer and all preceding layers are unchanged, Docker reuses the cached result instead of rebuilding. In CI you can use --cache-from to pull a previously built image and use its layers as cache. Structuring your Dockerfile so dependency installation (slow, rarely changes) comes before source code copying (fast, changes frequently) maximizes cache hits.

Q: What is a blue-green deployment?

Blue-green deployment maintains two identical production environments called "blue" and "green". At any time, one environment serves live traffic. When deploying a new version, you deploy to the idle environment, run smoke tests, then switch the load balancer to route traffic to it. If problems occur, you instantly roll back by switching the load balancer back. This achieves zero-downtime deployments with a simple rollback path.

Q: How do I trigger a GitLab CI pipeline only for changed files in a monorepo?

Use the changes keyword under rules in your .gitlab-ci.yml. For example: rules: [{ if: "$CI_PIPELINE_SOURCE == "push"", changes: ["packages/api/**/*"] }]. This tells GitLab to only run that job when files under packages/api/ are modified. Combine with needs to build a dependency graph between jobs so downstream jobs only run if their upstream counterparts ran.

Q: What is a canary deployment and when should I use it?

A canary deployment routes a small percentage of real production traffic (e.g., 5%) to a new version while the rest continues to run the stable version. You monitor error rates, latency, and business metrics. If metrics look healthy you gradually increase the canary percentage until 100% of traffic runs the new version. Use canary deployments for high-traffic services where even a short outage is costly and you want to validate behavior under real load before full rollout.

Q: How do I optimize slow CI pipelines?

The main levers are: (1) Parallelism — split test suites across multiple runners. (2) Caching — cache dependency directories (node_modules, .gradle, ~/.cargo) between runs. (3) Path filtering — skip jobs for unrelated changes. (4) Fail-fast — cancel remaining matrix jobs when one fails. (5) Incremental builds — use tools like Nx, Turborepo, or Bazel to only rebuild affected packages. (6) Use faster runners — GitHub larger runners or self-hosted runners with SSDs can dramatically cut I/O-bound steps.

总结与推荐路径

构建 CI/CD 流水线的最佳策略是从简单开始，随着痛点出现逐步增加复杂度。对于大多数团队，推荐路径是：

Week 1: 设置基本 CI（推送 → 测试），确保主干分支始终处于可部署状态
Week 2: 添加 Docker 构建和推送到镜像仓库，添加自动部署到预发布环境
Week 3: 添加依赖缓存和矩阵测试，优化构建速度
Week 4: 迁移到 OIDC 认证，消除长期凭证。添加通知和生产审批门
Beyond: 根据需要探索蓝绿/金丝雀部署、Monorepo 路径过滤和高级安全扫描

核心原则：流水线失败不是问题——它是一个快速反馈机制，阻止了有问题的代码进入生产。一个从不失败的流水线要么不测试任何有意义的东西，要么没有保护任何东西。投资于让失败快速、明显且易于修复，而不是试图让流水线从不失败。