- CI builds and tests every commit; CD automates deployment to staging or production
- GitHub Actions uses YAML workflows triggered by events; GitLab CI uses
.gitlab-ci.ymlwith stages - Docker multi-stage builds produce small, secure images; layer caching keeps CI fast
- Blue-green and canary deployments enable zero-downtime releases with safe rollback
- Store secrets outside code using GitHub Secrets, HashiCorp Vault, or cloud secret managers
- Speed up pipelines with parallelism, dependency caching, and path-based filtering
- A failing pipeline is not a problem — it is a fast feedback loop that prevents broken code from reaching users
- Treat your pipeline configuration as production code: review it, test it, version it
- Start simple (push → test → deploy to staging) and add complexity only when the pain justifies it
- Invest in caching early; it is the single highest-ROI pipeline optimization
- OIDC-based authentication to cloud providers eliminates the need for long-lived credentials in CI
CI/CD(持续集成与持续交付/部署)已成为现代软件工程的核心支柱。那些发布快速、可靠且能从故障中快速恢复的团队都有一个共同点:设计良好的自动化流水线。本指南将带你全面了解如何使用 GitHub Actions、GitLab CI、Docker 和经过验证的部署模式构建生产级 CI/CD 工作流。
CI/CD 核心概念
持续集成(CI)
持续集成是指每次开发者将代码推送到共享仓库时,自动触发构建和测试流程。目标是尽早发现集成问题,避免"集成地狱"——即多个开发者的代码长期独立开发后合并时产生的大量冲突。
CI 的最佳实践:每天提交多次、保持构建时间在 10 分钟以内(避免慢反馈循环)、当主干构建失败时立即修复,以及对所有分支都运行 CI。
持续交付(CD – Delivery)
持续交付在 CI 的基础上,将每次通过测试的构建自动打包成可部署的制品(Docker 镜像、JAR 包、zip 包等),并将其部署到预发布环境。部署到生产仍需人工审批。
持续部署(CD – Deployment)
持续部署更进一步,每次通过所有测试和检查的提交都会自动部署到生产环境,无需人工干预。这是成熟度最高的 CI/CD 模式,要求有完善的自动化测试套件、特性开关(Feature Flag)和强健的监控体系。
| Aspect | CI | Continuous Delivery | Continuous Deployment |
|---|---|---|---|
| Trigger | Every push | Every passing build | Every passing build |
| Production deploy | Manual | Manual approval | Automatic |
| Human gate | Yes (deploy) | Yes (prod deploy) | None |
| Test suite required | Basic | Comprehensive | Very comprehensive |
GitHub Actions 深度解析
GitHub Actions 是 GitHub 原生的 CI/CD 平台,于 2018 年推出,现已成为开源项目和许多企业项目的首选。它基于事件驱动,支持数千种社区维护的 Action,与 GitHub 的 Issues、PRs、Packages、Releases 深度集成。
工作流语法基础
工作流文件存放在 <code>.github/workflows/</code> 目录下,使用 YAML 格式,结构如下:
# .github/workflows/ci.yml
name: CI Pipeline
# Triggers
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
workflow_dispatch: # manual trigger
# Environment variables available to all jobs
env:
NODE_VERSION: "20"
jobs:
test:
name: Run Tests
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Run unit tests
run: npm test -- --coverage
- name: Upload coverage
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}矩阵策略(Matrix Strategy)
矩阵策略允许用一个 Job 定义跑多个参数组合,GitHub Actions 会自动并行执行所有组合:
jobs:
test-matrix:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false # continue other combos if one fails
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
node: [18, 20, 22]
exclude:
- os: windows-latest
node: 18 # skip this specific combo
include:
- os: ubuntu-latest
node: 20
experimental: true # add extra property
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node }}
- run: npm ci && npm testJob 依赖与条件执行
jobs:
build:
runs-on: ubuntu-latest
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
steps:
- uses: actions/checkout@v4
- id: meta
run: echo "tags=myapp:${{ github.sha }}" >> $GITHUB_OUTPUT
- run: docker build -t myapp:${{ github.sha }} .
deploy-staging:
needs: build
if: github.ref == "refs/heads/main"
environment: staging
runs-on: ubuntu-latest
steps:
- run: echo "Deploying ${{ needs.build.outputs.image-tag }} to staging"
deploy-prod:
needs: deploy-staging
if: github.event_name == "push" && github.ref == "refs/heads/main"
environment:
name: production
url: https://myapp.com
runs-on: ubuntu-latest
steps:
- run: echo "Deploying to production"密钥与 OIDC 认证
避免在工作流中存储长期凭证。使用 GitHub OIDC 直接向 AWS、GCP 或 Azure 申请临时凭证:
jobs:
deploy:
runs-on: ubuntu-latest
permissions:
id-token: write # required for OIDC
contents: read
steps:
- name: Configure AWS credentials via OIDC
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
aws-region: us-east-1
# No ACCESS_KEY_ID or SECRET_ACCESS_KEY needed!
- name: Deploy to ECS
run: |
aws ecs update-service \
--cluster prod-cluster \
--service myapp \
--force-new-deploymentGitLab CI/CD 完全指南
GitLab CI 将流水线配置集中在仓库根目录的 <code>.gitlab-ci.yml</code> 文件中。它以"Stage(阶段)"为组织单位,同一 Stage 内的 Job 并行运行,Stage 之间顺序执行。
完整的 .gitlab-ci.yml 示例
# .gitlab-ci.yml
image: node:20-alpine
stages:
- install
- test
- build
- deploy
variables:
npm_config_cache: "$CI_PROJECT_DIR/.npm"
# Reusable cache configuration
.node-cache: &node-cache
cache:
key:
files:
- package-lock.json
paths:
- .npm/
- node_modules/
install-deps:
stage: install
<<: *node-cache
script:
- npm ci
artifacts:
paths:
- node_modules/
expire_in: 1 hour
unit-tests:
stage: test
<<: *node-cache
script:
- npm test -- --coverage --ci
coverage: /All files[^|]*|[^|]*s+([d.]+)/
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml
paths:
- coverage/
build-app:
stage: build
script:
- npm run build
artifacts:
paths:
- dist/
expire_in: 1 week
only:
- main
- merge_requests
deploy-staging:
stage: deploy
environment:
name: staging
url: https://staging.myapp.com
script:
- echo "Deploying to staging..."
- ./scripts/deploy.sh staging
only:
- main
deploy-production:
stage: deploy
environment:
name: production
url: https://myapp.com
script:
- ./scripts/deploy.sh production
when: manual # requires human approval
only:
- mainRules 与路径过滤
使用 <code>rules</code> 关键字可以精确控制 Job 的触发条件,在 Monorepo 中尤为有用:
build-api:
stage: build
script:
- cd packages/api && npm run build
rules:
# Run on merge request if api files changed
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
changes:
- packages/api/**/*
- packages/shared/**/*
# Always run on main branch pushes
- if: $CI_COMMIT_BRANCH == "main"
build-frontend:
stage: build
script:
- cd packages/frontend && npm run build
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
changes:
- packages/frontend/**/*
- packages/shared/**/*
- if: $CI_COMMIT_BRANCH == "main"GitLab Runner 类型
| Runner Type | Use Case | Isolation | Notes |
|---|---|---|---|
| Shell | Legacy / simple jobs | None | Runs directly on host |
| Docker | Most workloads | Container | Clean env each run |
| Kubernetes | Scale-out / cloud-native | Pod | Auto-scales runner pods |
| Instance (SaaS) | gitlab.com users | VM | Free tier: 400 min/month |
Docker 在 CI/CD 中的应用
多阶段构建
多阶段构建是生产环境 Dockerfile 的标准模式。它将"构建环境"(包含编译器、测试工具等)和"运行环境"(只有运行时依赖)分离,生成体积更小、攻击面更小的镜像。
# Dockerfile # Stage 1: Install dependencies FROM node:20-alpine AS deps WORKDIR /app COPY package*.json ./ RUN npm ci --only=production # Stage 2: Build FROM node:20-alpine AS builder WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . RUN npm run build # Stage 3: Runtime (smallest possible image) FROM node:20-alpine AS runner WORKDIR /app ENV NODE_ENV=production # Copy only what is needed at runtime COPY --from=deps /app/node_modules ./node_modules COPY --from=builder /app/dist ./dist COPY --from=builder /app/package.json ./ # Run as non-root user RUN addgroup -S appgroup && adduser -S appuser -G appgroup USER appuser EXPOSE 3000 CMD ["node", "dist/server.js"]
在 GitHub Actions 中构建并推送镜像
jobs:
build-push:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write # for GHCR
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ghcr.io/${{ github.repository }}
tags: |
type=sha
type=ref,event=branch
type=semver,pattern={{version}}
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha # GitHub Actions cache
cache-to: type=gha,mode=maxDocker 层缓存最优实践
Docker 按层从上到下缓存。一旦某层失效,其后的所有层都需要重建。因此 Dockerfile 的层顺序至关重要:
- Copy
package.jsonand lock files FIRST, then runnpm ci— this layer only invalidates when dependencies change - Copy source code AFTER installing dependencies — source changes are frequent but fast to copy
- Use
.dockerignoreto excludenode_modules,.git, test files, and docs from the build context - Use
--mount=type=cachein BuildKit for package manager caches that persist across builds
部署策略详解
滚动更新(Rolling Deployment)
滚动更新逐步将实例替换为新版本。例如,有 10 个实例时,每次替换 2 个,直到所有实例更新完毕。Kubernetes Deployment 默认使用此策略。优点是资源消耗低;缺点是更新过程中新旧版本共存,需要确保 API 向后兼容。
# kubernetes-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # max extra pods during rollout
maxUnavailable: 0 # never reduce below desired count
template:
spec:
containers:
- name: myapp
image: ghcr.io/myorg/myapp:v2.1.0
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5蓝绿部署(Blue-Green Deployment)
蓝绿部署维护两套完全相同的生产环境(蓝色和绿色),任何时候只有一套在线上服务流量。部署新版本时,将其部署到空闲的环境,运行冒烟测试,然后切换负载均衡器。出现问题时可立即回滚,切换负载均衡器即可恢复。
# blue-green deploy script
#!/bin/bash
set -euo pipefail
CURRENT=$(aws elbv2 describe-target-groups \
--names myapp-blue myapp-green \
--query "TargetGroups[?Tags[?Key=='active' && Value=='true']].TargetGroupName" \
--output text)
if [ "$CURRENT" = "myapp-blue" ]; then
NEW_TG="myapp-green"
OLD_TG="myapp-blue"
else
NEW_TG="myapp-blue"
OLD_TG="myapp-green"
fi
echo "Deploying to $NEW_TG"
# Update the idle target group
aws ecs update-service --cluster prod --service "myapp-${NEW_TG}" \
--task-definition "myapp:$NEW_TASK_DEF_REVISION" \
--force-new-deployment
# Wait for stability
aws ecs wait services-stable --cluster prod --services "myapp-${NEW_TG}"
# Run smoke tests
./scripts/smoke-test.sh "https://staging.myapp.com"
# Switch traffic
aws elbv2 modify-listener \
--listener-arn $LISTENER_ARN \
--default-actions "Type=forward,TargetGroupArn=$(aws elbv2 describe-target-groups --names $NEW_TG --query TargetGroups[0].TargetGroupArn --output text)"
echo "Successfully switched traffic to $NEW_TG"金丝雀发布(Canary Deployment)
金丝雀发布将少量(如 5%)的真实生产流量路由到新版本,同时监控错误率、延迟和业务指标。如果指标正常,逐步扩大流量比例,直到 100%。如果出现异常,立即将所有流量切回稳定版本。
流水线中的测试策略
测试金字塔
有效的 CI 测试套件遵循测试金字塔原则:大量快速的单元测试(毫秒级)、中等数量的集成测试(秒级)、少量端到端测试(分钟级)。越靠近底层,运行越快、维护成本越低、反馈越及时。
# Full test pipeline with coverage gate
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "20", cache: "npm" }
- run: npm ci
- run: npm run test:unit -- --coverage
- name: Check coverage threshold
run: |
COVERAGE=$(cat coverage/coverage-summary.json | \
jq ".total.lines.pct")
echo "Coverage: $COVERAGE%"
if (( $(echo "$COVERAGE < 80" | bc -l) )); then
echo "Coverage $COVERAGE% is below 80% threshold"
exit 1
fi
integration-tests:
runs-on: ubuntu-latest
needs: unit-tests
services:
postgres:
image: postgres:16
env:
POSTGRES_DB: testdb
POSTGRES_USER: testuser
POSTGRES_PASSWORD: testpass
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7-alpine
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "20", cache: "npm" }
- run: npm ci
- run: npm run test:integration
env:
DATABASE_URL: postgresql://testuser:testpass@localhost:5432/testdb
REDIS_URL: redis://localhost:6379环境管理
典型的多环境架构
| Environment | Trigger | Approval | Purpose |
|---|---|---|---|
| Preview | Every PR | None | PR review, feature demo |
| Staging | Merge to main | None (auto) | QA, integration, UAT |
| Production | Tag / release | Required | Live user traffic |
环境变量与密钥管理
不同环境需要不同的配置。按照以下原则管理:非敏感配置(如功能开关、API 端点)放在 CI 的环境变量中;敏感数据(如数据库密码、API 密钥)使用专用密钥管理系统。
# Using HashiCorp Vault in GitHub Actions
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Import Secrets from Vault
uses: hashicorp/vault-action@v3
with:
url: https://vault.mycompany.com
method: jwt
role: github-actions
secrets: |
secret/data/myapp/prod database_url | DATABASE_URL ;
secret/data/myapp/prod redis_url | REDIS_URL ;
secret/data/myapp/prod stripe_key | STRIPE_SECRET_KEY
- name: Deploy
run: ./scripts/deploy.sh
env:
DATABASE_URL: ${{ env.DATABASE_URL }}
REDIS_URL: ${{ env.REDIS_URL }}Monorepo CI/CD
Monorepo(单仓库包含多个服务/包)对 CI/CD 带来了独特挑战:如何避免每次提交都触发所有服务的完整构建?答案是路径过滤和增量构建工具。
使用 Turborepo 的 Monorepo 流水线
# .github/workflows/monorepo-ci.yml
name: Monorepo CI
on:
push:
branches: [main]
pull_request:
jobs:
changes:
runs-on: ubuntu-latest
outputs:
api: ${{ steps.filter.outputs.api }}
frontend: ${{ steps.filter.outputs.frontend }}
shared: ${{ steps.filter.outputs.shared }}
steps:
- uses: actions/checkout@v4
- uses: dorny/paths-filter@v3
id: filter
with:
filters: |
api:
- "packages/api/**"
- "packages/shared/**"
frontend:
- "packages/frontend/**"
- "packages/shared/**"
shared:
- "packages/shared/**"
test-api:
needs: changes
if: needs.changes.outputs.api == "true"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "20", cache: "npm" }
- run: npm ci
- run: npx turbo run test --filter=@myapp/api...
test-frontend:
needs: changes
if: needs.changes.outputs.frontend == "true"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "20", cache: "npm" }
- run: npm ci
- run: npx turbo run test --filter=@myapp/frontend...流水线性能优化
依赖缓存
# Cache node_modules across runs
- name: Cache node modules
uses: actions/cache@v4
with:
path: |
~/.npm
node_modules
key: ${{ runner.os }}-node-${{ hashFiles("**/package-lock.json") }}
restore-keys: |
${{ runner.os }}-node-
# For Python projects
- uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
# For Rust projects
- uses: Swatinem/rust-cache@v2
with:
workspaces: ". -> target"
# For Gradle (Android / Java)
- name: Cache Gradle
uses: actions/cache@v4
with:
path: |
~/.gradle/caches
~/.gradle/wrapper
key: ${{ runner.os }}-gradle-${{ hashFiles("**/*.gradle*", "**/gradle-wrapper.properties") }}并行化测试
# Split test suite across 4 parallel jobs
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "20", cache: "npm" }
- run: npm ci
- name: Run test shard
run: |
npx jest \
--shard=${{ matrix.shard }}/4 \
--coverage \
--ci通知与状态检查
Slack 通知集成
# Notify Slack on deployment success or failure
- name: Notify Slack on success
if: success()
uses: slackapi/slack-github-action@v1
with:
channel-id: "deployments"
payload: |
{
"text": ":white_check_mark: Deployed to production",
"attachments": [{
"color": "good",
"fields": [
{ "title": "Version", "value": "${{ github.sha }}", "short": true },
{ "title": "Author", "value": "${{ github.actor }}", "short": true }
]
}]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
- name: Notify Slack on failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
channel-id: "deployments"
payload: |
{
"text": ":x: Production deployment FAILED",
"attachments": [{
"color": "danger",
"fields": [
{ "title": "Run URL", "value": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}", "short": false }
]
}]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}GitHub Actions vs GitLab CI vs CircleCI 对比
| Feature | GitHub Actions | GitLab CI | CircleCI |
|---|---|---|---|
| Config file | .github/workflows/*.yml | .gitlab-ci.yml | .circleci/config.yml |
| Free tier | 2,000 min/month | 400 min/month | 6,000 min/month |
| Marketplace | 20,000+ actions | Component catalog | Orbs registry |
| Self-hosted | Self-hosted runners | GitLab Runner | Self-hosted runners |
| Docker support | Services containers | Services + DinD | Native Docker layer |
| Caching | actions/cache | cache: keyword | restore_cache step |
| OIDC cloud auth | Yes (AWS/GCP/Azure) | Yes (ID tokens) | Yes (OIDC contexts) |
| Best for | GitHub-hosted repos | Self-hosted GitLab | Speed-focused teams |
| Parallelism | Matrix + jobs | Parallel + needs | Native parallel jobs |
完整的生产级工作流示例
以下是一个真实的 Node.js 应用完整 CI/CD 工作流,覆盖从代码提交到生产部署的全流程:
# .github/workflows/production.yml
name: Production Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true # cancel stale PR runs
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
quality:
name: Code Quality
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
- run: npm ci
- run: npm run lint
- run: npm run typecheck
test:
name: Tests
runs-on: ubuntu-latest
needs: quality
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
- run: npm ci
- run: npm test -- --coverage
- uses: codecov/codecov-action@v4
build-image:
name: Build Docker Image
runs-on: ubuntu-latest
needs: test
if: github.event_name == "push"
permissions:
contents: read
packages: write
id-token: write
outputs:
digest: ${{ steps.build.outputs.digest }}
image: ${{ steps.meta.outputs.tags }}
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- id: build
uses: docker/build-push-action@v5
with:
push: true
tags: ${{ steps.meta.outputs.tags }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy-staging:
name: Deploy to Staging
runs-on: ubuntu-latest
needs: build-image
environment: staging
steps:
- uses: actions/checkout@v4
- name: Configure AWS
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_STAGING_ROLE }}
aws-region: us-east-1
- run: |
aws ecs update-service \
--cluster staging \
--service myapp \
--image-override ${{ needs.build-image.outputs.image }} \
--force-new-deployment
- run: aws ecs wait services-stable --cluster staging --services myapp
- run: ./scripts/smoke-test.sh https://staging.myapp.com
deploy-production:
name: Deploy to Production
runs-on: ubuntu-latest
needs: deploy-staging
environment:
name: production
url: https://myapp.com
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_PROD_ROLE }}
aws-region: us-east-1
- run: |
aws ecs update-service \
--cluster production \
--service myapp \
--image-override ${{ needs.build-image.outputs.image }} \
--force-new-deployment
- run: aws ecs wait services-stable --cluster production --services myapp
- name: Notify Slack
if: always()
uses: slackapi/slack-github-action@v1
with:
channel-id: deployments
payload: |
{
"text": "${{ job.status == 'success' && ':white_check_mark: Production deploy succeeded' || ':x: Production deploy FAILED' }}"
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}常见问题(FAQ)
Q: What is the difference between CI, CD (Delivery), and CD (Deployment)?
Continuous Integration (CI) automatically builds and tests code on every commit. Continuous Delivery (CD) extends CI by automatically preparing a release artifact that is ready to deploy but requires manual approval to go live. Continuous Deployment goes one step further and automatically deploys every passing build to production without human intervention.
Q: How do I securely pass secrets in GitHub Actions?
Store sensitive values in Settings > Secrets and variables > Actions in your repository. Reference them in workflows using the ${{ secrets.YOUR_SECRET_NAME }} syntax. Never hard-code secrets in workflow files or print them in logs. For advanced cases, use GitHub OIDC to assume cloud roles without storing long-lived credentials at all.
Q: What is a matrix strategy in GitHub Actions?
A matrix strategy allows a single job definition to run across multiple combinations of variables (e.g., Node.js versions, operating systems). GitHub Actions fans out the job automatically, running all combinations in parallel. This is useful for cross-platform testing or multi-version compatibility checks without duplicating job definitions.
Q: How does Docker layer caching speed up CI builds?
Docker caches each layer of an image. If a layer and all preceding layers are unchanged, Docker reuses the cached result instead of rebuilding. In CI you can use --cache-from to pull a previously built image and use its layers as cache. Structuring your Dockerfile so dependency installation (slow, rarely changes) comes before source code copying (fast, changes frequently) maximizes cache hits.
Q: What is a blue-green deployment?
Blue-green deployment maintains two identical production environments called "blue" and "green". At any time, one environment serves live traffic. When deploying a new version, you deploy to the idle environment, run smoke tests, then switch the load balancer to route traffic to it. If problems occur, you instantly roll back by switching the load balancer back. This achieves zero-downtime deployments with a simple rollback path.
Q: How do I trigger a GitLab CI pipeline only for changed files in a monorepo?
Use the changes keyword under rules in your .gitlab-ci.yml. For example: rules: [{ if: "$CI_PIPELINE_SOURCE == "push"", changes: ["packages/api/**/*"] }]. This tells GitLab to only run that job when files under packages/api/ are modified. Combine with needs to build a dependency graph between jobs so downstream jobs only run if their upstream counterparts ran.
Q: What is a canary deployment and when should I use it?
A canary deployment routes a small percentage of real production traffic (e.g., 5%) to a new version while the rest continues to run the stable version. You monitor error rates, latency, and business metrics. If metrics look healthy you gradually increase the canary percentage until 100% of traffic runs the new version. Use canary deployments for high-traffic services where even a short outage is costly and you want to validate behavior under real load before full rollout.
Q: How do I optimize slow CI pipelines?
The main levers are: (1) Parallelism — split test suites across multiple runners. (2) Caching — cache dependency directories (node_modules, .gradle, ~/.cargo) between runs. (3) Path filtering — skip jobs for unrelated changes. (4) Fail-fast — cancel remaining matrix jobs when one fails. (5) Incremental builds — use tools like Nx, Turborepo, or Bazel to only rebuild affected packages. (6) Use faster runners — GitHub larger runners or self-hosted runners with SSDs can dramatically cut I/O-bound steps.
总结与推荐路径
构建 CI/CD 流水线的最佳策略是从简单开始,随着痛点出现逐步增加复杂度。对于大多数团队,推荐路径是:
- Week 1: 设置基本 CI(推送 → 测试),确保主干分支始终处于可部署状态
- Week 2: 添加 Docker 构建和推送到镜像仓库,添加自动部署到预发布环境
- Week 3: 添加依赖缓存和矩阵测试,优化构建速度
- Week 4: 迁移到 OIDC 认证,消除长期凭证。添加通知和生产审批门
- Beyond: 根据需要探索蓝绿/金丝雀部署、Monorepo 路径过滤和高级安全扫描
核心原则:流水线失败不是问题——它是一个快速反馈机制,阻止了有问题的代码进入生产。一个从不失败的流水线要么不测试任何有意义的东西,要么没有保护任何东西。投资于让失败快速、明显且易于修复,而不是试图让流水线从不失败。