AWS 服务全指南 2026:EC2、S3、RDS、Lambda、ECS/EKS、CloudFront、IAM、VPC 与成本优化
全面讲解 AWS 核心服务:计算(EC2、Lambda、ECS/EKS)、存储(S3)、数据库(RDS、DynamoDB)、网络(VPC、Route 53、CloudFront)、安全(IAM)、消息(SQS/SNS)、监控(CloudWatch)以及经过验证的成本优化策略。
- EC2 提供弹性计算能力;Spot 实例可节省高达 90% 成本
- S3 提供 99.999999999%(11 个 9)的持久性,适合存储任何规模数据
- RDS 适合关系型数据,DynamoDB 适合单毫秒级延迟的键值/文档工作负载
- Lambda 按调用计费,ECS/EKS 用于容器化微服务
- VPC + 安全组 + IAM 构建纵深防御安全模型
- CloudFront + Route 53 实现全球亚 100ms 响应时间
- SQS/SNS 解耦微服务;CloudWatch 提供全栈可观测性
- 预留实例 + Savings Plans + 自动伸缩可削减 40-60% 账单
- 选择正确的实例类型和定价模型是最大的成本杠杆
- 一切设计都应考虑多可用区高可用
- IAM 最小权限原则是安全的基石,使用角色而非密钥
- 无服务器架构(Lambda + DynamoDB + API Gateway)可将运维开销降至接近零
- 基础设施即代码(CloudFormation / CDK / Terraform)是生产环境必需
- AWS Well-Architected Framework 五大支柱指导架构决策
1. AWS 服务全景图
Amazon Web Services(AWS)是全球最大的云平台,提供超过 200 种服务,覆盖计算、存储、数据库、网络、安全、分析、机器学习等领域。理解核心服务及其组合方式是构建可靠、可扩展云架构的基础。
| 类别 | 核心服务 | 典型用途 |
|---|---|---|
| 计算 | EC2, Lambda, ECS, EKS | Web 服务器、API、批处理、微服务 |
| 存储 | S3, EBS, EFS, Glacier | 对象存储、块存储、文件系统、归档 |
| 数据库 | RDS, DynamoDB, ElastiCache, Aurora | 关系型、NoSQL、缓存、高性能 OLTP |
| 网络 | VPC, Route 53, CloudFront, ELB | 虚拟网络、DNS、CDN、负载均衡 |
| 安全 | IAM, KMS, WAF, Shield | 身份管理、加密、Web 防火墙、DDoS 防护 |
| 消息/集成 | SQS, SNS, EventBridge, Step Functions | 消息队列、发布订阅、事件驱动、工作流 |
| 监控 | CloudWatch, X-Ray, CloudTrail | 指标、日志、链路追踪、审计 |
2. EC2:弹性计算的基石
Amazon EC2(Elastic Compute Cloud)提供按需、可调整大小的计算能力。从单台开发服务器到数千台高性能计算节点,EC2 都能满足需求。理解实例类型、定价模型和自动伸缩是控制成本和保证性能的关键。
实例类型选择
| 系列 | 优化方向 | 典型实例 | 场景 |
|---|---|---|---|
| T3 / T4g | 突发性能 | t3.medium | 开发环境、轻量 Web |
| M6i / M7g | 通用均衡 | m6i.xlarge | 应用服务器、中型数据库 |
| C6i / C7g | 计算密集 | c6i.2xlarge | 批处理、科学建模、编码转换 |
| R6i / R7g | 内存密集 | r6i.4xlarge | 内存数据库、实时大数据 |
| P4d / G5 | GPU 加速 | p4d.24xlarge | ML 训练、图形渲染、HPC |
EC2 定价模型对比
| 模型 | 折扣 | 承诺 | 最佳场景 |
|---|---|---|---|
| On-Demand | 0% | 无 | 不可预测的短期工作负载 |
| Reserved (RI) | 最高 72% | 1 或 3 年 | 稳态基线工作负载 |
| Spot | 最高 90% | 无(可被中断) | 容错批处理、CI/CD、大数据 |
| Savings Plans | 最高 72% | 1 或 3 年 $/hr 承诺 | 灵活跨实例类型/区域 |
Auto Scaling 配置示例
# EC2 Auto Scaling Group — CloudFormation snippet
AutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
LaunchTemplate:
LaunchTemplateId: !Ref MyLaunchTemplate
Version: !GetAtt MyLaunchTemplate.LatestVersionNumber
MinSize: 2
MaxSize: 10
DesiredCapacity: 2
VPCZoneIdentifier:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
TargetGroupARNs:
- !Ref MyTargetGroup
HealthCheckType: ELB
HealthCheckGracePeriod: 300
ScalingPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AutoScalingGroupName: !Ref AutoScalingGroup
PolicyType: TargetTrackingScaling
TargetTrackingConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: ASGAverageCPUUtilization
TargetValue: 60.03. S3:无限扩展的对象存储
Amazon S3(Simple Storage Service)提供行业领先的持久性(99.999999999%)、可用性和安全性。S3 存储桶可存储从字节到 5TB 的任意数量对象,广泛用于静态网站托管、数据湖、备份和媒体分发。
S3 存储类别与成本
| 存储类别 | 延迟 | 最低存储时长 | 用途 |
|---|---|---|---|
| S3 Standard | 毫秒级 | 无 | 频繁访问数据 |
| S3 Intelligent-Tiering | 毫秒级 | 无 | 访问模式不确定 |
| S3 Standard-IA | 毫秒级 | 30 天 | 不频繁但需快速访问 |
| S3 Glacier Instant | 毫秒级 | 90 天 | 季度访问归档 |
| S3 Glacier Deep Archive | 12-48h | 180 天 | 合规长期归档 |
S3 生命周期策略示例
# AWS CLI — S3 lifecycle configuration
aws s3api put-bucket-lifecycle-configuration \
--bucket my-data-lake \
--lifecycle-configuration '{
"Rules": [
{
"ID": "ArchiveOldLogs",
"Status": "Enabled",
"Filter": { "Prefix": "logs/" },
"Transitions": [
{ "Days": 30, "StorageClass": "STANDARD_IA" },
{ "Days": 90, "StorageClass": "GLACIER" },
{ "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
],
"Expiration": { "Days": 2555 }
}
]
}'4. RDS 与 DynamoDB:关系型 vs NoSQL
数据库选择是架构决策中最重要的一环。AWS 提供全托管关系型数据库(RDS/Aurora)和 NoSQL 数据库(DynamoDB),各有其最佳应用场景。
RDS / Aurora 概览
Amazon RDS 支持 MySQL、PostgreSQL、MariaDB、Oracle 和 SQL Server。Aurora 是 AWS 自研的 MySQL/PostgreSQL 兼容引擎,性能是标准 MySQL 的 5 倍、PostgreSQL 的 3 倍,存储自动扩展至 128TB。
# Create a Multi-AZ PostgreSQL RDS instance
aws rds create-db-instance \
--db-instance-identifier myapp-db \
--db-instance-class db.r6g.xlarge \
--engine postgres \
--engine-version 15.4 \
--master-username admin \
--master-user-password "\${SECURE_PASSWORD}" \
--allocated-storage 100 \
--storage-type gp3 \
--multi-az \
--vpc-security-group-ids sg-0abc1234 \
--db-subnet-group-name myapp-db-subnets \
--backup-retention-period 7 \
--storage-encryptedDynamoDB 关键概念
DynamoDB 是完全无服务器的键值和文档数据库,提供个位数毫秒响应。核心概念包括分区键(Partition Key)、排序键(Sort Key)、全局二级索引(GSI)和本地二级索引(LSI)。容量模式分为按需(On-Demand)和预配置(Provisioned)。
# DynamoDB table with GSI — CloudFormation
OrdersTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: Orders
BillingMode: PAY_PER_REQUEST # On-Demand
AttributeDefinitions:
- AttributeName: PK
AttributeType: S
- AttributeName: SK
AttributeType: S
- AttributeName: GSI1PK
AttributeType: S
KeySchema:
- AttributeName: PK
KeyType: HASH
- AttributeName: SK
KeyType: RANGE
GlobalSecondaryIndexes:
- IndexName: GSI1
KeySchema:
- AttributeName: GSI1PK
KeyType: HASH
- AttributeName: SK
KeyType: RANGE
Projection:
ProjectionType: ALL
PointInTimeRecoverySpecification:
PointInTimeRecoveryEnabled: trueRDS vs DynamoDB 决策矩阵
| 维度 | RDS / Aurora | DynamoDB |
|---|---|---|
| 数据模型 | 关系型 / SQL | 键值 / 文档 |
| 延迟 | 低毫秒级(依查询复杂度) | 个位数毫秒(一致) |
| 扩展方式 | 纵向(读扩展用只读副本) | 自动水平扩展 |
| 事务 | 完整 ACID | 有限事务(25项/4MB) |
| 运维 | 需选择实例、维护窗口 | 完全无服务器 |
5. Lambda:事件驱动无服务器计算
AWS Lambda 让你无需管理服务器即可运行代码。上传代码,Lambda 自动分配计算资源、执行并伸缩。你只为实际使用的计算时间付费(按毫秒计费),空闲时零成本。
Lambda 函数最佳实践
# Python Lambda with best practices
import json
import boto3
import os
from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.utilities.typing import LambdaContext
# Initialize outside handler (reused across invocations)
logger = Logger()
tracer = Tracer()
metrics = Metrics()
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(os.environ["TABLE_NAME"])
@logger.inject_lambda_context
@tracer.capture_lambda_handler
@metrics.log_metrics(capture_cold_start_metric=True)
def handler(event, context: LambdaContext):
"""Process API Gateway request."""
try:
body = json.loads(event["body"])
order_id = body["order_id"]
# Single-table design query
response = table.get_item(
Key={"PK": f"ORDER#{order_id}", "SK": "METADATA"}
)
if "Item" not in response:
return {"statusCode": 404, "body": "Not found"}
metrics.add_metric(name="OrderLookup", unit="Count", value=1)
return {
"statusCode": 200,
"headers": {"Content-Type": "application/json"},
"body": json.dumps(response["Item"], default=str)
}
except Exception as e:
logger.exception("Failed to process request")
return {"statusCode": 500, "body": "Internal error"}Lambda 关键限制:最大执行时间 15 分钟、部署包 250MB(解压后)、内存 128MB-10GB、并发默认 1000(可请求提升)。冷启动延迟取决于运行时和包大小——使用 Provisioned Concurrency 或 SnapStart(Java)来消除冷启动。
6. ECS 与 EKS:容器编排
当工作负载超出 Lambda 的限制(长时间运行、大内存、自定义运行时),容器是下一个选择。AWS 提供两种编排服务:ECS(AWS 原生)和 EKS(托管 Kubernetes)。两者都支持 Fargate 无服务器启动模式。
ECS vs EKS 对比
| 维度 | ECS | EKS |
|---|---|---|
| 学习曲线 | 低——AWS 概念 | 高——需 K8s 知识 |
| 多云兼容 | 仅 AWS | 是(标准 K8s) |
| 控制平面费用 | 免费 | ~$73/月 |
| 生态系统 | AWS 服务集成深度 | CNCF / Helm / Istio 等 |
| Fargate | 完全支持 | 支持(部分限制) |
ECS Fargate 任务定义
# ECS Fargate Task Definition — CloudFormation
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: myapp-api
Cpu: "512"
Memory: "1024"
NetworkMode: awsvpc
RequiresCompatibilities: [FARGATE]
ExecutionRoleArn: !GetAtt ECSExecutionRole.Arn
TaskRoleArn: !GetAtt ECSTaskRole.Arn
ContainerDefinitions:
- Name: api
Image: !Sub "\${AWS::AccountId}.dkr.ecr.\${AWS::Region}.amazonaws.com/myapp:latest"
PortMappings:
- ContainerPort: 8080
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: /ecs/myapp-api
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: ecs
Environment:
- Name: NODE_ENV
Value: production
Secrets:
- Name: DB_PASSWORD
ValueFrom: !Ref DbPasswordSecret7. CloudFront 与 Route 53:全球内容分发与 DNS
CloudFront 是 AWS 的 CDN 服务,在全球 400+ 边缘站点缓存内容,将延迟降至个位数毫秒。Route 53 是高可用的 DNS 服务,提供域名注册、DNS 路由和健康检查三合一功能。
Route 53 路由策略
| 策略 | 说明 | 场景 |
|---|---|---|
| Simple | 单一资源标准路由 | 单一端点 |
| Weighted | 按权重分配流量 | 蓝绿/金丝雀部署 |
| Latency-based | 路由到最低延迟区域 | 多区域应用 |
| Failover | 健康检查自动故障转移 | 主备灾备 |
| Geolocation | 按用户地理位置路由 | 合规与内容本地化 |
CloudFront 分发配置
# CloudFront with S3 origin — CloudFormation
CloudFrontDistribution:
Type: AWS::CloudFront::Distribution
Properties:
DistributionConfig:
Origins:
- Id: S3Origin
DomainName: !GetAtt MyBucket.RegionalDomainName
S3OriginConfig:
OriginAccessIdentity: ""
OriginAccessControlId: !Ref OAC
DefaultCacheBehavior:
TargetOriginId: S3Origin
ViewerProtocolPolicy: redirect-to-https
CachePolicyId: 658327ea-f89d-4fab-a63d-7e88639e58f6 # CachingOptimized
Compress: true
ViewerCertificate:
AcmCertificateArn: !Ref Certificate
SslSupportMethod: sni-only
MinimumProtocolVersion: TLSv1.2_2021
Enabled: true
HttpVersion: http2and3
PriceClass: PriceClass_1008. IAM:身份与访问管理
IAM 是 AWS 安全的基础。它控制谁(身份)可以对哪些资源执行什么操作。IAM 策略是 JSON 文档,定义 Effect(Allow/Deny)、Action、Resource 和可选 Condition。遵循最小权限原则是安全运营的第一要务。
IAM 最佳实践清单
1) 为 root 账户启用 MFA 并锁定——日常不使用。2) 为每个人/服务创建独立 IAM 用户或角色。3) 使用 IAM 角色(而非 Access Key)为 EC2/Lambda 授权。4) 策略附加到组而非用户。5) 使用 AWS Organizations + SCP 实现跨账户治理。6) 定期使用 IAM Access Analyzer 审计权限。7) 使用 aws-vault 或 SSO 管理本地凭据。
# Least-privilege IAM policy example
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowS3ReadOnly",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-app-data",
"arn:aws:s3:::my-app-data/*"
]
},
{
"Sid": "AllowDynamoDBCRUD",
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:DeleteItem",
"dynamodb:Query"
],
"Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/Orders",
"Condition": {
"ForAllValues:StringEquals": {
"dynamodb:LeadingKeys": ["\${aws:PrincipalTag/tenant_id}"]
}
}
}
]
}9. VPC:网络隔离与架构
VPC(Virtual Private Cloud)是 AWS 中的隔离虚拟网络。一个生产级 VPC 通常包含多个可用区中的公有子网和私有子网、Internet 网关、NAT 网关、安全组和网络 ACL。正确的 VPC 设计是安全和可用性的基础。
安全组 vs 网络 ACL
| 特性 | 安全组 | 网络 ACL |
|---|---|---|
| 作用层级 | 实例/ENI 级别 | 子网级别 |
| 状态 | 有状态 | 无状态 |
| 规则类型 | 仅允许规则 | 允许 + 拒绝规则 |
| 评估方式 | 所有规则一起评估 | 按编号顺序匹配 |
生产级 VPC 架构
# Three-tier VPC architecture — Terraform
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.5.0"
name = "production-vpc"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
public_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
private_subnets = ["10.0.11.0/24", "10.0.12.0/24", "10.0.13.0/24"]
database_subnets = ["10.0.21.0/24", "10.0.22.0/24", "10.0.23.0/24"]
enable_nat_gateway = true
single_nat_gateway = false # One per AZ for HA
one_nat_gateway_per_az = true
enable_dns_hostnames = true
enable_dns_support = true
# VPC Flow Logs
enable_flow_log = true
create_flow_log_cloudwatch_log_group = true
create_flow_log_iam_role = true
tags = {
Environment = "production"
Terraform = "true"
}
}10. SQS 与 SNS:异步消息与通知
微服务架构的关键原则是松耦合。SQS(消息队列)和 SNS(发布/订阅)是 AWS 中实现异步通信的两大核心服务。它们经常配合使用:SNS 扇出到多个 SQS 队列,实现并行处理。
SQS 标准队列 vs FIFO 队列
| 特性 | Standard | FIFO |
|---|---|---|
| 吞吐量 | 无限制 | 300 TPS(批量 3000) |
| 顺序保证 | 尽力而为 | 严格顺序 |
| 去重 | 至少一次 | 精确一次 |
| 适用场景 | 高吞吐工作负载 | 订单处理、金融交易 |
SNS + SQS 扇出模式
# SNS topic fans out to multiple SQS queues
# — CloudFormation snippet
OrderTopic:
Type: AWS::SNS::Topic
Properties:
TopicName: order-events
PaymentQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: payment-processing
VisibilityTimeout: 300
RedrivePolicy:
deadLetterTargetArn: !GetAtt PaymentDLQ.Arn
maxReceiveCount: 3
InventoryQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: inventory-update
VisibilityTimeout: 300
NotificationQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: customer-notification
# Subscribe all queues to the topic
PaymentSub:
Type: AWS::SNS::Subscription
Properties:
TopicArn: !Ref OrderTopic
Protocol: sqs
Endpoint: !GetAtt PaymentQueue.Arn
FilterPolicy:
event_type: [order_placed, order_updated]11. CloudWatch:全栈可观测性
CloudWatch 提供指标、日志、告警和仪表盘,是 AWS 的统一可观测性平台。结合 X-Ray 进行分布式链路追踪,CloudTrail 进行 API 操作审计,构成完整的可观测性三件套。
自定义指标与告警
# Python — publish custom CloudWatch metric
import boto3
from datetime import datetime
cloudwatch = boto3.client("cloudwatch")
def publish_order_metric(order_total: float, region: str):
cloudwatch.put_metric_data(
Namespace="MyApp/Orders",
MetricData=[
{
"MetricName": "OrderValue",
"Dimensions": [
{"Name": "Region", "Value": region},
],
"Timestamp": datetime.utcnow(),
"Value": order_total,
"Unit": "None",
"StorageResolution": 60 # standard (1-min)
}
]
)CloudWatch Alarm 配置
# CloudWatch Alarm — CloudFormation
HighCPUAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: high-cpu-api-cluster
AlarmDescription: "CPU > 80% for 5 minutes"
Namespace: AWS/ECS
MetricName: CPUUtilization
Dimensions:
- Name: ClusterName
Value: !Ref ECSCluster
- Name: ServiceName
Value: !GetAtt ECSService.Name
Statistic: Average
Period: 300
EvaluationPeriods: 1
Threshold: 80
ComparisonOperator: GreaterThanThreshold
AlarmActions:
- !Ref OpsNotificationTopic
OKActions:
- !Ref OpsNotificationTopicCloudWatch Logs Insights 提供类 SQL 查询语言,可在海量日志中快速定位问题。示例查询:
# Find top 10 slowest API requests in the last hour
fields @timestamp, @message
| filter @message like /duration/
| parse @message "duration=* ms" as duration_ms
| sort duration_ms desc
| limit 1012. 成本优化:从账单中削减 40-60%
成本优化不是一次性工作——它是持续的运营实践。AWS 提供多种工具和策略来帮助你在不牺牲性能的情况下大幅降低支出。以下是经过验证的成本优化策略清单。
计算成本优化
| 策略 | 潜在节省 | 实施复杂度 |
|---|---|---|
| 实例右调(Right-sizing) | 10-30% | 低 |
| Savings Plans | 最高 72% | 低 |
| Spot Instances | 最高 90% | 中 |
| Graviton (ARM) | 20-40% | 低-中 |
| 自动伸缩 | 20-50% | 中 |
| 迁移到 Lambda/Fargate | 30-70% | 高 |
存储与数据传输优化
存储优化策略:1) 使用 S3 Intelligent-Tiering 自动迁移访问模式不确定的数据。2) 启用 S3 生命周期策略将旧数据迁移到 Glacier。3) 使用 gp3 替代 gp2 EBS 卷(同等性能,节省 20%)。4) 定期清理未挂载的 EBS 卷和过期快照。5) 使用 CloudFront 减少数据传输费用——边缘传输比从 S3/EC2 直接传输便宜得多。
AWS 成本管理工具
# Set up a monthly budget alert with AWS CLI
aws budgets create-budget \
--account-id 123456789012 \
--budget '{
"BudgetName": "MonthlySpend",
"BudgetLimit": {
"Amount": "1000",
"Unit": "USD"
},
"BudgetType": "COST",
"TimeUnit": "MONTHLY"
}' \
--notifications-with-subscribers '[
{
"Notification": {
"NotificationType": "ACTUAL",
"ComparisonOperator": "GREATER_THAN",
"Threshold": 80,
"ThresholdType": "PERCENTAGE"
},
"Subscribers": [
{
"SubscriptionType": "EMAIL",
"Address": "ops-team@example.com"
}
]
}
]'成本优化检查清单
- 使用 AWS Cost Explorer 分析支出趋势(按服务、账户、标签)
- 启用 AWS Compute Optimizer 获取实例右调建议
- 使用 Trusted Advisor 的成本优化检查
- 为所有资源添加成本分配标签(团队、环境、项目)
- 设置 AWS Budgets 告警(80% / 100% / 预测超支)
- 每月审查 Savings Plans 和 RI 覆盖率报告
- 清理:未使用的弹性 IP、EBS 卷、旧 AMI、空负载均衡器
- 考虑多区域 vs 单区域——数据传输费用是隐藏杀手
13. AWS Well-Architected 参考架构
AWS Well-Architected Framework 定义了五大支柱:卓越运营、安全性、可靠性、性能效率和成本优化。以下是一个典型的三层 Web 应用架构,综合运用了本指南介绍的核心服务。
# Three-tier architecture overview
#
# Internet
# |-- Route 53 (DNS, latency-based routing)
# |-- CloudFront (CDN, static assets + API caching)
# |-- WAF (rate limiting, SQL injection protection)
# |
# VPC (10.0.0.0/16)
# |
# |-- Public Subnets (3 AZs)
# | |-- ALB (Application Load Balancer)
# | |-- NAT Gateways
# |
# |-- Private Subnets (3 AZs)
# | |-- ECS Fargate / EKS (app containers)
# | |-- Lambda (async processing)
# | |-- SQS queues + SNS topics
# |
# |-- Database Subnets (3 AZs)
# |-- Aurora PostgreSQL (Multi-AZ)
# |-- ElastiCache Redis (cluster mode)
# |-- DynamoDB (session store)
#
# Observability:
# CloudWatch Metrics + Logs + Alarms
# X-Ray distributed tracing
# CloudTrail API audit logging
#
# Security:
# IAM roles (least privilege)
# KMS encryption (at rest + in transit)
# Secrets Manager (DB creds, API keys)
# GuardDuty (threat detection)五大支柱快速参考
| 支柱 | 关键实践 | 核心服务 |
|---|---|---|
| 卓越运营 | IaC、CI/CD、Runbook | CloudFormation, CodePipeline, Systems Manager |
| 安全性 | 最小权限、加密、审计 | IAM, KMS, GuardDuty, CloudTrail |
| 可靠性 | 多 AZ、自动恢复、备份 | Auto Scaling, Route 53, S3 Cross-Region |
| 性能效率 | 正确实例类型、缓存、CDN | CloudFront, ElastiCache, Graviton |
| 成本优化 | 预留/Spot、右调、生命周期策略 | Savings Plans, Compute Optimizer, S3 IT |
总结
AWS 生态系统庞大但有章可循。从计算(EC2/Lambda/ECS)到存储(S3)、数据库(RDS/DynamoDB)、网络(VPC/CloudFront/Route 53)、安全(IAM)、消息(SQS/SNS)到监控(CloudWatch),每个服务都有其最佳使用场景。关键是根据工作负载特征选择合适的服务组合,遵循 Well-Architected Framework 的五大支柱,并持续优化成本。从小处着手,按需扩展,善用托管服务——这就是 AWS 云架构的核心理念。
常见问题
What is the difference between EC2 On-Demand, Reserved, and Spot Instances?
On-Demand instances charge per second with no commitment, ideal for unpredictable workloads. Reserved Instances offer up to 72% discount for 1-3 year commitments, best for steady-state usage. Spot Instances provide up to 90% discount by using spare EC2 capacity, but can be interrupted with 2 minutes notice — perfect for fault-tolerant batch jobs, CI/CD, and data processing.
When should I use RDS vs DynamoDB?
Use RDS when you need relational data with complex joins, transactions, and SQL support (MySQL, PostgreSQL, Oracle, SQL Server). Use DynamoDB for key-value or document workloads requiring single-digit millisecond latency at any scale, such as gaming leaderboards, session stores, IoT data, and e-commerce carts. DynamoDB is fully managed and serverless, while RDS requires instance sizing and maintenance windows.
How do I choose between ECS and EKS for container orchestration?
Choose ECS if you want a simpler, AWS-native container orchestration service with deep integration into other AWS services and no additional management overhead. Choose EKS if you need Kubernetes compatibility, want to run the same workloads across AWS and other clouds, or already have Kubernetes expertise. Both support Fargate for serverless containers, eliminating the need to manage underlying EC2 instances.
What are the best practices for AWS IAM security?
Follow the principle of least privilege — grant only permissions needed. Enable MFA on all accounts, especially root. Use IAM roles instead of long-lived access keys. Implement Service Control Policies in AWS Organizations. Rotate credentials regularly. Use IAM Access Analyzer to identify unused permissions. Never embed credentials in code; use IAM roles for EC2/Lambda or AWS Secrets Manager.
How does Amazon VPC work and what are the key components?
A VPC is an isolated virtual network within AWS. Key components include subnets (public and private), route tables, Internet Gateway (for public internet access), NAT Gateway (for private subnet outbound access), security groups (stateful instance-level firewall), and NACLs (stateless subnet-level firewall). Best practice is to use multiple Availability Zones with public subnets for load balancers and private subnets for application servers and databases.
What is the difference between SQS and SNS?
SQS (Simple Queue Service) is a message queue for decoupling producers and consumers — messages are pulled by consumers and processed once. SNS (Simple Notification Service) is a pub/sub system that pushes messages to multiple subscribers simultaneously (Lambda, SQS, HTTP, email, SMS). Use SQS for task queues and work distribution, SNS for fan-out notifications. They are often used together: SNS publishes to multiple SQS queues for parallel processing.
How can I reduce my AWS bill by 40-60%?
Key strategies: 1) Use Reserved Instances or Savings Plans for steady workloads (up to 72% savings). 2) Leverage Spot Instances for fault-tolerant workloads (up to 90% savings). 3) Right-size instances using AWS Compute Optimizer. 4) Use S3 Intelligent-Tiering for automatic storage cost optimization. 5) Enable auto-scaling to match capacity to demand. 6) Delete unused EBS volumes, snapshots, and Elastic IPs. 7) Use AWS Cost Explorer and set billing alerts.
How do CloudFront and Route 53 work together for global applications?
Route 53 provides DNS resolution with health checks and routing policies (latency-based, geolocation, failover, weighted). CloudFront is a CDN that caches content at 400+ edge locations worldwide, reducing latency to single-digit milliseconds. Together, Route 53 routes users to the nearest CloudFront edge location, which serves cached content or fetches from the origin. This combination delivers global sub-100ms response times with automatic failover.