SQL vs NoSQL：选择正确数据库的完整指南

在 SQL 和 NoSQL 数据库之间选择是最重要的架构决策之一。SQL 数据库使用结构化表格，NoSQL 数据库提供灵活的模式和水平扩展。

概述

根本区别在于数据如何组织和查询。SQL 使用表格，NoSQL 使用文档、键值对、宽列或图。

NoSQL 数据库类型

NoSQL 不是单一技术，而是具有不同数据模型的数据库家族。

NoSQL Database Types:

1. Document Databases (MongoDB, CouchDB, Firestore)
   - Store data as JSON/BSON documents
   - Flexible schema per document
   - Good for: content management, catalogs, user profiles
   - Example: { "name": "John", "orders": [...], "prefs": {...} }

2. Key-Value Stores (Redis, DynamoDB, Memcached)
   - Simple key → value pairs
   - Fastest reads/writes
   - Good for: caching, sessions, leaderboards, counters
   - Example: "user:123" → "{\"name\": \"John\"}"

3. Wide-Column Stores (Cassandra, ScyllaDB, HBase)
   - Tables with rows and dynamic columns
   - Optimized for writes and time-series data
   - Good for: IoT, logging, metrics, messaging
   - Example: row_key → {col1: val1, col2: val2, ...}

4. Graph Databases (Neo4j, ArangoDB, Amazon Neptune)
   - Nodes and edges (relationships are first-class)
   - Optimized for traversing relationships
   - Good for: social networks, recommendations, fraud detection
   - Example: (User)-[:FOLLOWS]->(User)

SQL vs NoSQL 对比

方面	SQL（关系型）	NoSQL
数据模型	行和列的表格	文档、键值、宽列、图
模式	严格（写时模式）	灵活（读时模式）
查询语言	SQL（标准化）	数据库特定
扩展	垂直扩展 + 只读副本	原生水平扩展
ACID 事务	完全 ACID	因数据库而异
连接	完整 JOIN 支持	有限或无连接
一致性	默认强一致性	最终一致性
性能	复杂查询优秀	大规模简单查询优秀
最适合	结构化数据、复杂查询	灵活数据、高写入、水平扩展

ACID vs BASE

SQL 遵循 ACID 模型，NoSQL 通常遵循 BASE 模型。

ACID (SQL Databases):
  A - Atomicity:    All operations succeed or all fail (no partial updates)
  C - Consistency:  Data always moves from one valid state to another
  I - Isolation:    Concurrent transactions don't interfere with each other
  D - Durability:   Committed data survives system failures

BASE (Many NoSQL Databases):
  BA - Basically Available:  System guarantees availability (may serve stale data)
  S  - Soft state:           State may change over time (even without input)
  E  - Eventual consistency: System will become consistent eventually

Practical Impact:
  ACID: "Transfer $100 from Account A to Account B"
    → Both debit and credit happen, or neither does
    → No money is lost or created
    → Other transactions see either the before or after state

  BASE: "Update user profile across 3 data centers"
    → All 3 data centers will eventually have the update
    → For a brief period, different data centers may show different data
    → Acceptable for social media posts, not for bank transfers

何时选择 SQL

结构化关系数据
复杂查询 — JOIN、子查询、聚合
ACID 事务 — 银行、电商
数据完整性 — 外键、约束

何时选择 NoSQL

灵活模式 — 快速原型
高写入吞吐 — IoT、日志
水平扩展 — PB 级数据
缓存 — Redis

代码示例

SQL (PostgreSQL)

-- PostgreSQL: Relational schema with strong typing
CREATE TABLE users (
  id          SERIAL PRIMARY KEY,
  name        VARCHAR(100) NOT NULL,
  email       VARCHAR(255) UNIQUE NOT NULL,
  role        VARCHAR(20) DEFAULT 'user',
  created_at  TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE posts (
  id          SERIAL PRIMARY KEY,
  user_id     INTEGER REFERENCES users(id) ON DELETE CASCADE,
  title       VARCHAR(300) NOT NULL,
  content     TEXT NOT NULL,
  tags        TEXT[],                    -- PostgreSQL array type
  metadata    JSONB DEFAULT '{}',        -- Flexible JSON column!
  published   BOOLEAN DEFAULT false,
  created_at  TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_posts_user ON posts(user_id);
CREATE INDEX idx_posts_tags ON posts USING GIN(tags);
CREATE INDEX idx_posts_metadata ON posts USING GIN(metadata);

-- Complex query: posts with author info and comment count
SELECT
  p.title,
  p.created_at,
  u.name AS author,
  COUNT(c.id) AS comment_count,
  p.metadata->>'views' AS views
FROM posts p
JOIN users u ON p.user_id = u.id
LEFT JOIN comments c ON c.post_id = p.id
WHERE p.published = true
  AND p.tags @> ARRAY['javascript']
GROUP BY p.id, u.name
ORDER BY p.created_at DESC
LIMIT 20;

-- ACID Transaction
BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
  INSERT INTO transfers (from_id, to_id, amount) VALUES (1, 2, 100);
COMMIT;

NoSQL (MongoDB)

// MongoDB: Document-based, flexible schema
// No schema definition needed — just insert documents

// Insert a user with embedded data
db.users.insertOne({
  name: "Jane Smith",
  email: "jane@example.com",
  role: "admin",
  profile: {
    bio: "Full-stack developer",
    skills: ["TypeScript", "React", "Node.js"],
    social: {
      twitter: "@janesmith",
      github: "janesmith",
    },
  },
  preferences: {
    theme: "dark",
    notifications: true,
  },
  createdAt: new Date(),
});

// Insert posts (embedded or referenced)
db.posts.insertOne({
  userId: ObjectId("user123"),
  title: "Getting Started with MongoDB",
  content: "MongoDB stores data as documents...",
  tags: ["mongodb", "database", "nosql"],
  metadata: { views: 1523, likes: 42 },
  comments: [
    { user: "Bob", text: "Great article!", date: new Date() },
    { user: "Alice", text: "Very helpful", date: new Date() },
  ],
  published: true,
  createdAt: new Date(),
});

// Query with aggregation pipeline
db.posts.aggregate([
  { $match: { published: true, tags: "javascript" } },
  { $lookup: {
      from: "users",
      localField: "userId",
      foreignField: "_id",
      as: "author"
  }},
  { $unwind: "$author" },
  { $project: {
      title: 1,
      "author.name": 1,
      commentCount: { $size: "$comments" },
      views: "$metadata.views"
  }},
  { $sort: { createdAt: -1 } },
  { $limit: 20 }
]);

NoSQL (Redis)

# Redis: Key-Value store (in-memory, ultra-fast)

# Simple key-value
SET user:123:name "Jane Smith"
GET user:123:name  # → "Jane Smith"

# Hash (object-like)
HSET user:123 name "Jane" email "jane@example.com" role "admin"
HGET user:123 name  # → "Jane"
HGETALL user:123    # → all fields

# List (queue/stack)
LPUSH notifications:user:123 "New follower: Bob"
LPUSH notifications:user:123 "New comment on your post"
LRANGE notifications:user:123 0 9  # Get latest 10

# Set (unique values)
SADD post:456:likes "user:123" "user:456" "user:789"
SCARD post:456:likes  # → 3 (count unique likes)
SISMEMBER post:456:likes "user:123"  # → 1 (true)

# Sorted Set (leaderboard)
ZADD leaderboard 1500 "player:alice"
ZADD leaderboard 2200 "player:bob"
ZADD leaderboard 1800 "player:charlie"
ZREVRANGE leaderboard 0 9 WITHSCORES  # Top 10 players

# TTL (auto-expiring keys — perfect for caching)
SET api:cache:/users "[\"data\"]" EX 300  # expires in 5 minutes
TTL api:cache:/users  # → seconds remaining

扩展策略

SQL 扩展

SQL 主要垂直扩展。

SQL Scaling Strategies:

1. Vertical Scaling (Scale Up)
   ├── More CPU cores
   ├── More RAM (buffer pool)
   ├── Faster SSD (NVMe)
   └── Effective up to ~1TB RAM, 128 cores

2. Read Replicas (Horizontal Read Scaling)
   Primary (writes) → Replica 1 (reads)
                    → Replica 2 (reads)
                    → Replica 3 (reads)

3. Connection Pooling (PgBouncer, ProxySQL)
   App → Pooler → Database (reuse connections)

4. Partitioning (Table-level)
   orders_2024, orders_2025, orders_2026

5. Sharding (Application-level)
   Users A-M → Shard 1
   Users N-Z → Shard 2
   (Complex, usually avoid if possible)

6. Managed Services
   - Supabase, Neon, PlanetScale (auto-scaling)
   - AWS Aurora, Google Cloud SQL

NoSQL 扩展

NoSQL 为水平扩展而设计。

NoSQL Scaling (Built-in Horizontal):

MongoDB (Sharding):
  Config Servers (metadata)
  ├── Shard 1: users with _id A-F
  ├── Shard 2: users with _id G-M
  ├── Shard 3: users with _id N-S
  └── Shard 4: users with _id T-Z
  Mongos Router → routes queries to correct shard

Redis (Cluster Mode):
  ├── Node 1: hash slots 0-5460
  ├── Node 2: hash slots 5461-10922
  └── Node 3: hash slots 10923-16383
  Each node has a replica for failover

Cassandra (Ring Architecture):
  ├── Node A: token range 0-25%
  ├── Node B: token range 25-50%
  ├── Node C: token range 50-75%
  └── Node D: token range 75-100%
  Replication factor: 3 (each row on 3 nodes)
  Add nodes → data automatically rebalances

流行数据库对比

Database          Type          Best For                         License
---------------------------------------------------------------------------
PostgreSQL        SQL           General purpose, JSONB, analytics  PostgreSQL (OSS)
MySQL             SQL           Web applications, WordPress        GPL / Commercial
SQLite            SQL           Embedded, mobile, serverless       Public Domain
MariaDB           SQL           MySQL drop-in replacement          GPL

MongoDB           Document      Flexible schema, content mgmt      SSPL
Redis             Key-Value     Caching, sessions, real-time       BSD (+ commercial)
DynamoDB          Key-Value     AWS serverless, auto-scaling       Proprietary
Cassandra         Wide-Column   Time-series, IoT, high write       Apache 2.0
Neo4j             Graph         Social networks, recommendations   GPL / Commercial
Elasticsearch     Search        Full-text search, logs, analytics  SSPL / Apache 2.0

CockroachDB       NewSQL        Global SQL, horizontal scale       BSL / Commercial
PlanetScale       NewSQL        Serverless MySQL (Vitess-based)    Proprietary
TiDB              NewSQL        HTAP, MySQL compatible             Apache 2.0
Supabase          SQL (PG)      Firebase alternative, PostgreSQL   Apache 2.0

迁移模式

实际上许多应用同时使用 SQL 和 NoSQL。

Polyglot Persistence Architecture:

                    ┌─────────────────────┐
                    │   Application Layer  │
                    └─────────┬───────────┘
                              │
          ┌───────────────────┼───────────────────┐
          │                   │                   │
  ┌───────▼───────┐  ┌───────▼───────┐  ┌───────▼───────┐
  │  PostgreSQL   │  │    Redis      │  │   MongoDB     │
  │               │  │               │  │               │
  │ - Users       │  │ - Sessions    │  │ - CMS content │
  │ - Orders      │  │ - Cache       │  │ - Product     │
  │ - Payments    │  │ - Rate limits │  │   catalogs    │
  │ - Inventory   │  │ - Leaderboard │  │ - User prefs  │
  │ - Reports     │  │ - Pub/Sub     │  │ - Event logs  │
  └───────────────┘  └───────────────┘  └───────────────┘
     Transactions       Performance        Flexibility

Use each database for what it does best.

常见问题

可以同时使用 SQL 和 NoSQL 吗？

当然。这叫多语言持久化，是常见模式。

NoSQL 比 SQL 快吗？

取决于查询模式。简单查询 NoSQL 更快，复杂查询 SQL 更快。

创业公司应该用 SQL 还是 NoSQL？

大多数创业公司应该从 PostgreSQL 开始。

MongoDB 适合生产环境吗？

是的。MongoDB 在各种规模的公司中广泛使用。

NewSQL 数据库怎么样？

NewSQL 结合了 SQL 的 ACID 和 NoSQL 的水平扩展。

哪个数据库最适合 REST API？

PostgreSQL 是最安全的选择。