Database Design Guide: Normalization, ERDs, Indexing & SQL vs NoSQL (2026)
Good database design starts with normalization (1NF→3NF), clear primary/foreign key relationships, and appropriate indexing on columns used in WHERE and JOIN clauses. Use SQL for transactional data requiring ACID guarantees; use NoSQL for scale-out, unstructured data, or caching. Always define your entities and relationships in an ERD before writing DDL. Connection pooling is essential in production. PostgreSQL is the recommended default for new projects in 2026.
Introduction: Why Database Design Matters
The database is the foundation of virtually every application. A well-designed schema makes queries fast, code simple, and future changes manageable. A poorly designed schema creates data integrity issues, performance bottlenecks, and cascading technical debt that is extremely expensive to fix later.
Database design is both a science and an art. The science involves formal normalization theory, relational algebra, and query optimization. The art involves understanding your access patterns, anticipating future requirements, and knowing when to break the rules for practical reasons.
This guide covers the full spectrum of database design knowledge you need to build robust, scalable applications in 2026 — from first principles through production patterns used by companies at scale.
- Normalize to 3NF by default; selectively denormalize for read performance where measured.
- Always define entities, attributes, and relationships in an ERD before writing SQL.
- Primary keys should be immutable; prefer surrogate keys (UUIDs or auto-increment integers).
- Index every foreign key column and any column appearing frequently in WHERE/ORDER BY clauses.
- Use SQL for transactional, relational data; NoSQL for scale-out, caching, or unstructured data.
- ACID transactions protect data integrity — understand isolation levels to avoid anomalies.
- Connection pooling is mandatory in production; size your pool based on CPU cores and workload.
- PostgreSQL is the recommended default relational database for new projects in 2026.
Normalization: From Raw Data to Clean Schema
Normalization is the process of structuring a relational database according to a series of formal rules (normal forms) to reduce data redundancy and prevent update anomalies. Each normal form builds on the previous one.
First Normal Form (1NF)
A table is in 1NF when every column contains only atomic (indivisible) values, and each row is uniquely identifiable. There must be no repeating groups or arrays stored in a single column.
Violation example — storing multiple phone numbers in one column:
-- BAD: violates 1NF — multiple values in one column
CREATE TABLE contacts (
id INT PRIMARY KEY,
name VARCHAR(100),
phones VARCHAR(200) -- "555-1234, 555-5678, 555-9012"
);
-- GOOD: separate table for phone numbers (1NF compliant)
CREATE TABLE contacts (
id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE contact_phones (
id INT PRIMARY KEY,
contact_id INT REFERENCES contacts(id),
phone VARCHAR(20),
type VARCHAR(20) -- 'mobile', 'home', 'work'
);Second Normal Form (2NF)
A table is in 2NF when it is in 1NF and every non-key attribute is fully functionally dependent on the entire primary key (not just part of it). This only applies to tables with composite primary keys.
-- BAD: violates 2NF — product_name depends only on product_id, not the full composite key
CREATE TABLE order_items (
order_id INT,
product_id INT,
product_name VARCHAR(100), -- depends on product_id only!
quantity INT,
unit_price DECIMAL(10,2),
PRIMARY KEY (order_id, product_id)
);
-- GOOD: 2NF compliant — separate products table
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10,2)
);
CREATE TABLE order_items (
order_id INT REFERENCES orders(id),
product_id INT REFERENCES products(id),
quantity INT,
unit_price DECIMAL(10,2), -- snapshot price at time of order
PRIMARY KEY (order_id, product_id)
);Third Normal Form (3NF)
A table is in 3NF when it is in 2NF and every non-key attribute is directly dependent on the primary key — not on another non-key attribute (no transitive dependencies).
-- BAD: violates 3NF — zip_code determines city and state (transitive dependency)
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
zip_code VARCHAR(10),
city VARCHAR(100), -- depends on zip_code, not id!
state VARCHAR(2) -- depends on zip_code, not id!
);
-- GOOD: 3NF compliant — extract zip code lookup table
CREATE TABLE zip_codes (
zip_code VARCHAR(10) PRIMARY KEY,
city VARCHAR(100),
state VARCHAR(2)
);
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
zip_code VARCHAR(10) REFERENCES zip_codes(zip_code)
);Boyce-Codd Normal Form (BCNF)
BCNF is a stricter version of 3NF. A table is in BCNF when for every functional dependency X → Y, X is a superkey (a column or set of columns that uniquely identifies rows). BCNF handles edge cases that 3NF misses involving multiple overlapping candidate keys. In practice, 3NF is sufficient for most production databases.
When to Denormalize
Normalization optimizes for write consistency. Denormalization trades some redundancy for read performance. Consider denormalization when:
- A query requires joining 5+ tables and runs frequently (dashboard, API endpoint)
- You are building a read-heavy reporting or analytics system
- Latency requirements cannot be met with normalized queries even with indexes
- You are using a data warehouse (star/snowflake schema is intentionally denormalized)
Common denormalization techniques include storing computed counts (post comment_count), caching aggregate values (user total_spent), or duplicating frequently joined columns (storing category_name directly on products instead of always joining categories).
ERD Design: Entity-Relationship Diagrams
An Entity-Relationship Diagram (ERD) is a visual blueprint of your database schema. Creating an ERD before writing SQL prevents design mistakes and provides a shared language between developers, DBAs, and stakeholders.
Core ERD Concepts
ERDs consist of three fundamental building blocks:
- Entities — Real-world objects represented as tables (e.g., User, Product, Order)
- Attributes — Properties of entities represented as columns (e.g., user.email, product.price)
- Relationships — Associations between entities, characterized by cardinality (one-to-one, one-to-many, many-to-many)
Cardinality Notation
| Relationship | Example | Implementation |
|---|---|---|
| One-to-One (1:1) | User ↔ UserProfile | FK with UNIQUE constraint |
| One-to-Many (1:N) | User → Orders | FK on the "many" side |
| Many-to-Many (M:N) | Students ↔ Courses | Junction table with two FKs |
ERD Design Process
- Identify entities — List all the nouns in your system (User, Product, Order, Category, Review)
- Define attributes — For each entity, list its properties and data types
- Identify relationships — Determine how entities relate and the cardinality
- Define primary keys — Choose a unique identifier for each entity
- Add foreign keys — Implement relationships with foreign key constraints
- Apply normalization — Check each table against normal form rules
- Review with stakeholders — Validate the design reflects real business rules
Primary and Foreign Keys
Keys are the cornerstone of relational database integrity. Choosing the right key strategy significantly impacts performance, simplicity, and scalability.
Primary Key Strategies
-- Strategy 1: Auto-increment integer (simplest, most common)
-- Pros: compact, fast to index, easy to type/debug
-- Cons: sequential IDs expose business data (user count), not globally unique
CREATE TABLE users (
id SERIAL PRIMARY KEY, -- PostgreSQL: auto-increment integer
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Strategy 2: UUID v4 (random, globally unique)
-- Pros: non-sequential (no ID enumeration), works across distributed systems
-- Cons: larger (16 bytes vs 4), random inserts cause B-tree page fragmentation
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(), -- PostgreSQL 13+
user_id INT REFERENCES users(id),
total DECIMAL(10,2),
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Strategy 3: UUID v7 (time-ordered UUID) — recommended for PostgreSQL 2026
-- Pros: globally unique AND sequential (no fragmentation), sortable by time
-- Best of both worlds for most modern applications
CREATE TABLE events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(), -- use uuid_generate_v7() with extension
type VARCHAR(100),
payload JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Strategy 4: Natural key (rare, use carefully)
-- Pros: no surrogate key needed, meaningful column
-- Cons: natural keys can change (emails change, ISO codes change)
CREATE TABLE countries (
iso_code VARCHAR(2) PRIMARY KEY, -- 'US', 'GB', 'DE'
name VARCHAR(100) NOT NULL
);Foreign Key Constraints and Referential Actions
-- Foreign key with referential integrity actions
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
title VARCHAR(255) NOT NULL,
body TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
-- ON DELETE CASCADE: delete posts when user is deleted
-- ON DELETE SET NULL: set user_id to NULL when user is deleted
-- ON DELETE RESTRICT: prevent user deletion if posts exist (default)
CONSTRAINT fk_posts_user
FOREIGN KEY (user_id)
REFERENCES users(id)
ON DELETE CASCADE
ON UPDATE CASCADE
);
-- Always index foreign key columns (they are NOT auto-indexed!)
CREATE INDEX idx_posts_user_id ON posts(user_id);Indexing Strategies
Indexes are the single most impactful tool for query performance. Understanding how different index types work and when to apply them separates good database design from great database design.
B-Tree Indexes (Default)
B-tree (balanced tree) indexes are the default index type in PostgreSQL and MySQL. They support equality checks (=), range queries (>, <, BETWEEN), and ORDER BY operations efficiently.
-- Single column index — most common
CREATE INDEX idx_users_email ON users(email);
-- Verify index usage with EXPLAIN ANALYZE (PostgreSQL)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';
-- Output shows: "Index Scan using idx_users_email on users"
-- Look for: Index Scan (good), Seq Scan on large table (investigate)
-- Partial index — index only a subset of rows (smaller, faster)
-- Index only active users (status = 'active') for a users table
CREATE INDEX idx_users_active ON users(email) WHERE status = 'active';
-- Expression index — index computed values
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Now this query can use the index:
-- SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';Composite Indexes
Composite indexes cover multiple columns. The column order matters enormously — a composite index on (a, b) can be used for queries on column a alone, but NOT for queries on column b alone (the leading column rule).
-- Composite index for common query pattern
-- Query: SELECT * FROM orders WHERE user_id = 42 AND status = 'pending'
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- This index also helps: WHERE user_id = 42 (first column used)
-- This index does NOT help: WHERE status = 'pending' (missing first column)
-- Index for range queries: put the equality column first
-- Query: SELECT * FROM events WHERE user_id = 42 AND created_at > '2026-01-01'
CREATE INDEX idx_events_user_created ON events(user_id, created_at);
-- Covering index: include all columns needed by the query
-- The database can answer entirely from the index without touching the table
CREATE INDEX idx_orders_covering ON orders(user_id, status)
INCLUDE (id, total, created_at); -- PostgreSQL syntaxOther Index Types (PostgreSQL)
-- GIN index: for full-text search and JSONB containment queries
CREATE INDEX idx_articles_fts ON articles USING GIN(to_tsvector('english', title || ' ' || body));
-- Query: SELECT * FROM articles WHERE to_tsvector('english', title || ' ' || body) @@ plainto_tsquery('database design');
CREATE INDEX idx_products_attributes ON products USING GIN(attributes); -- JSONB column
-- Query: SELECT * FROM products WHERE attributes @> '{"color": "red"}';
-- GiST index: for geometric data, IP ranges, exclusion constraints
CREATE INDEX idx_reservations_period ON reservations USING GiST(tstzrange(start_time, end_time));
-- BRIN index: for very large tables with naturally ordered data (logs, time-series)
-- Much smaller than B-tree, faster to build, less accurate (block-level)
CREATE INDEX idx_logs_created_brin ON logs USING BRIN(created_at);
-- Hash index: for equality-only lookups (rarely better than B-tree in practice)
CREATE INDEX idx_sessions_token ON sessions USING HASH(token);Index Best Practices
- Always index foreign key columns — PostgreSQL does NOT create them automatically
- Use
EXPLAIN ANALYZEto confirm index usage before and after adding indexes - Remove unused indexes — they slow down writes without helping reads
- Consider partial indexes for tables with heavily skewed data (e.g., 95% inactive records)
- In PostgreSQL, use
CREATE INDEX CONCURRENTLYto add indexes without locking the table - Keep index column count to 3 or fewer for most use cases
SQL vs NoSQL: Choosing the Right Database
One of the most consequential decisions in system design is choosing between relational (SQL) and non-relational (NoSQL) databases. Both have legitimate use cases, and modern applications often use both.
| Dimension | SQL (Relational) | NoSQL (Non-Relational) |
|---|---|---|
| Schema | Fixed schema, DDL migrations | Flexible/schemaless |
| Scaling | Vertical (scale-up); horizontal via read replicas | Horizontal (scale-out) native |
| Transactions | Full ACID transactions | BASE (eventually consistent); some offer ACID |
| Query Language | Standardized SQL | Database-specific APIs |
| Joins | Efficient JOINs | No native JOINs (must denormalize) |
| Best for | Banking, e-commerce, CRM, ERP | Caching, sessions, IoT, catalogs, feeds |
| Examples | PostgreSQL, MySQL, SQLite, SQL Server | MongoDB, Redis, DynamoDB, Cassandra |
NoSQL Database Types
| Type | Database | Best Use Case |
|---|---|---|
| Document Store | MongoDB, CouchDB, Firestore | Catalogs, CMS, user profiles with variable attributes |
| Key-Value Store | Redis, DynamoDB, Memcached | Sessions, caching, rate limiting, leaderboards |
| Wide-Column Store | Cassandra, HBase, Bigtable | IoT time-series, write-heavy analytics at scale |
| Graph Database | Neo4j, Amazon Neptune, ArangoDB | Social graphs, recommendation engines, fraud detection |
| Search Engine | Elasticsearch, Typesense, Meilisearch | Full-text search, log analytics, auto-complete |
Relationships: One-to-Many and Many-to-Many
Understanding how to model relationships correctly is the most practically important skill in database design. Most real-world data involves complex webs of relationships.
One-to-Many Relationships
The most common relationship type. The foreign key always lives on the "many" side of the relationship.
-- One user has many posts (1:N)
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
title VARCHAR(255) NOT NULL,
slug VARCHAR(255) UNIQUE NOT NULL,
body TEXT,
published BOOLEAN DEFAULT FALSE,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Index the foreign key
CREATE INDEX idx_posts_user_id ON posts(user_id);
-- Query: get all posts by a user with author info
SELECT p.id, p.title, p.created_at, u.username
FROM posts p
JOIN users u ON u.id = p.user_id
WHERE u.username = 'alice'
ORDER BY p.created_at DESC;Many-to-Many Relationships
Many-to-many relationships require a junction table (also called a bridge, pivot, or associative table). The junction table has foreign keys to both related tables and can carry additional data about the relationship itself.
-- Many-to-many: posts can have many tags; tags can belong to many posts
CREATE TABLE tags (
id SERIAL PRIMARY KEY,
name VARCHAR(50) UNIQUE NOT NULL,
slug VARCHAR(50) UNIQUE NOT NULL
);
-- Junction table: post_tags
CREATE TABLE post_tags (
post_id INT REFERENCES posts(id) ON DELETE CASCADE,
tag_id INT REFERENCES tags(id) ON DELETE CASCADE,
tagged_at TIMESTAMPTZ DEFAULT NOW(), -- extra relationship data
PRIMARY KEY (post_id, tag_id) -- composite PK prevents duplicates
);
CREATE INDEX idx_post_tags_tag_id ON post_tags(tag_id);
-- Query: get all posts for a given tag
SELECT p.title, p.slug, p.created_at
FROM posts p
JOIN post_tags pt ON pt.post_id = p.id
JOIN tags t ON t.id = pt.tag_id
WHERE t.slug = 'javascript'
ORDER BY p.created_at DESC;
-- Query: get all tags for a given post
SELECT t.name, t.slug
FROM tags t
JOIN post_tags pt ON pt.tag_id = t.id
WHERE pt.post_id = 42;Self-Referencing Relationships
-- Hierarchical data: categories with subcategories
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id INT REFERENCES categories(id) ON DELETE SET NULL, -- self-reference
depth INT DEFAULT 0 -- for fast depth queries
);
-- Example data
INSERT INTO categories (id, name, parent_id) VALUES
(1, 'Electronics', NULL),
(2, 'Computers', 1),
(3, 'Laptops', 2),
(4, 'Gaming Laptops', 3);
-- Recursive CTE to get full hierarchy
WITH RECURSIVE category_tree AS (
SELECT id, name, parent_id, 0 AS level, name::TEXT AS path
FROM categories WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.name, c.parent_id, ct.level + 1,
ct.path || ' > ' || c.name
FROM categories c
JOIN category_tree ct ON ct.id = c.parent_id
)
SELECT * FROM category_tree ORDER BY path;Performance Optimization
Database performance optimization is an iterative process. Start with good schema design, then measure, then optimize. Never optimize prematurely without measuring first.
Query Optimization with EXPLAIN ANALYZE
-- Always use EXPLAIN ANALYZE to understand query execution plans
EXPLAIN ANALYZE
SELECT u.username, COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.created_at > '2026-01-01'
GROUP BY u.id, u.username
ORDER BY post_count DESC
LIMIT 20;
-- Key things to look for in the output:
-- "Seq Scan" on large tables → add an index
-- "Rows Removed by Filter: 999999" → bad index selectivity
-- "Nested Loop" with large row counts → may need a Hash Join
-- Actual time much higher than estimated → stale statistics, run ANALYZE
-- "cost=0.00..1234.56" → higher cost = more expensive
-- Update table statistics if estimates are wrong
ANALYZE users;
ANALYZE posts;Common Performance Patterns
-- Pattern 1: Pagination with keyset (cursor) instead of OFFSET
-- BAD: OFFSET becomes slower as offset grows (scans and discards rows)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- GOOD: keyset pagination (constant time regardless of page number)
-- "Give me 20 posts created before the last one I saw"
SELECT * FROM posts
WHERE created_at < '2026-01-15T12:00:00Z'
ORDER BY created_at DESC
LIMIT 20;
-- Pattern 2: Avoid SELECT * in production queries
-- BAD: fetches all columns including large TEXT/BYTEA columns
SELECT * FROM posts WHERE id = 42;
-- GOOD: fetch only what you need
SELECT id, title, slug, created_at FROM posts WHERE id = 42;
-- Pattern 3: Use EXISTS instead of COUNT for existence checks
-- BAD: scans entire result set to count
SELECT COUNT(*) > 0 FROM posts WHERE user_id = 42;
-- GOOD: stops at first match
SELECT EXISTS(SELECT 1 FROM posts WHERE user_id = 42);
-- Pattern 4: Batch inserts for bulk data loading
-- BAD: 1000 individual INSERT statements
INSERT INTO events (type, payload) VALUES ('click', '{}');
-- ...repeated 1000 times
-- GOOD: single multi-row INSERT (or COPY for PostgreSQL bulk loading)
INSERT INTO events (type, payload) VALUES
('click', '{"x":100, "y":200}'),
('scroll', '{"delta":300}'),
('keypress', '{"key":"Enter"}');
-- ... up to a few thousand rows per statement
-- PostgreSQL COPY for maximum throughput (millions of rows)
COPY events(type, payload) FROM '/tmp/events.csv' CSV HEADER;Materialized Views for Expensive Aggregations
-- Create a materialized view for an expensive report
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(total) AS revenue,
COUNT(*) AS order_count,
COUNT(DISTINCT user_id) AS unique_customers
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month DESC;
-- Index the materialized view
CREATE INDEX idx_monthly_revenue_month ON monthly_revenue(month);
-- Query is now instant (reads pre-computed data)
SELECT * FROM monthly_revenue WHERE month >= '2026-01-01';
-- Refresh when underlying data changes (schedule this)
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;Transactions and ACID Properties
Transactions allow multiple SQL statements to be executed as a single atomic unit. ACID properties guarantee that your database remains in a consistent state even when things go wrong.
Using Transactions
-- Transfer $100 from account A to account B atomically
-- Both updates succeed or both roll back — never a partial state
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- Check for negative balance before committing
DO $$
BEGIN
IF (SELECT balance FROM accounts WHERE id = 1) < 0 THEN
RAISE EXCEPTION 'Insufficient funds';
END IF;
END $$;
COMMIT;
-- ROLLBACK; if anything goes wrong// Transaction in Node.js with pg (node-postgres)
const { Pool } = require('pg');
const pool = new Pool();
async function transferFunds(fromId, toId, amount) {
const client = await pool.connect();
try {
await client.query('BEGIN');
const { rows } = await client.query(
'SELECT balance FROM accounts WHERE id = $1 FOR UPDATE',
[fromId] // FOR UPDATE locks the row during the transaction
);
if (rows[0].balance < amount) {
throw new Error('Insufficient funds');
}
await client.query(
'UPDATE accounts SET balance = balance - $1 WHERE id = $2',
[amount, fromId]
);
await client.query(
'UPDATE accounts SET balance = balance + $1 WHERE id = $2',
[amount, toId]
);
await client.query('COMMIT');
console.log('Transfer successful');
} catch (err) {
await client.query('ROLLBACK');
throw err;
} finally {
client.release(); // return connection to pool
}
}Transaction Isolation Levels
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Use When |
|---|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible | Almost never (highest throughput) |
| READ COMMITTED | Prevented | Possible | Possible | Default in PostgreSQL; most OLTP apps |
| REPEATABLE READ | Prevented | Prevented | Possible | Reports, consistent snapshots |
| SERIALIZABLE | Prevented | Prevented | Prevented | Financial systems, maximum correctness |
Real-World Schema Examples
The best way to learn database design is to study real schemas. Here are production-quality schemas for three common application types.
E-Commerce Schema
-- E-Commerce: Users, Products, Orders, Reviews
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
first_name VARCHAR(100),
last_name VARCHAR(100),
role VARCHAR(20) DEFAULT 'customer' CHECK (role IN ('customer', 'admin')),
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
slug VARCHAR(100) UNIQUE NOT NULL,
parent_id INT REFERENCES categories(id)
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
category_id INT REFERENCES categories(id),
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) UNIQUE NOT NULL,
description TEXT,
price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
stock_qty INT NOT NULL DEFAULT 0 CHECK (stock_qty >= 0),
is_active BOOLEAN DEFAULT TRUE,
attributes JSONB DEFAULT '{}', -- flexible: {"color":"red","size":"XL"}
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_products_category ON products(category_id);
CREATE INDEX idx_products_active ON products(is_active) WHERE is_active = TRUE;
CREATE INDEX idx_products_attrs ON products USING GIN(attributes);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id),
status VARCHAR(20) DEFAULT 'pending'
CHECK (status IN ('pending','confirmed','shipped','delivered','cancelled')),
subtotal DECIMAL(10,2) NOT NULL,
tax DECIMAL(10,2) DEFAULT 0,
shipping_cost DECIMAL(10,2) DEFAULT 0,
total DECIMAL(10,2) NOT NULL,
shipping_addr JSONB,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status, created_at DESC);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id INT NOT NULL REFERENCES products(id),
quantity INT NOT NULL CHECK (quantity > 0),
unit_price DECIMAL(10,2) NOT NULL, -- snapshot at time of purchase
total DECIMAL(10,2) NOT NULL
);
CREATE INDEX idx_order_items_order ON order_items(order_id);
CREATE INDEX idx_order_items_product ON order_items(product_id);
CREATE TABLE reviews (
id SERIAL PRIMARY KEY,
product_id INT NOT NULL REFERENCES products(id) ON DELETE CASCADE,
user_id INT NOT NULL REFERENCES users(id),
rating SMALLINT NOT NULL CHECK (rating BETWEEN 1 AND 5),
title VARCHAR(255),
body TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE (product_id, user_id) -- one review per user per product
);
CREATE INDEX idx_reviews_product ON reviews(product_id, rating);Blog/CMS Schema
-- Blog: Authors, Posts, Categories, Tags, Comments
CREATE TABLE authors (
id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
bio TEXT,
avatar_url VARCHAR(500),
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
author_id INT NOT NULL REFERENCES authors(id),
title VARCHAR(255) NOT NULL,
slug VARCHAR(255) UNIQUE NOT NULL,
excerpt VARCHAR(500),
body TEXT NOT NULL,
status VARCHAR(20) DEFAULT 'draft' CHECK (status IN ('draft','published','archived')),
published_at TIMESTAMPTZ,
view_count INT DEFAULT 0,
search_vector TSVECTOR, -- for full-text search
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_posts_author ON posts(author_id);
CREATE INDEX idx_posts_status ON posts(status, published_at DESC);
CREATE INDEX idx_posts_search ON posts USING GIN(search_vector);
-- Auto-update search_vector on insert/update
CREATE OR REPLACE FUNCTION posts_search_vector_update() RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector := to_tsvector('english', NEW.title || ' ' || COALESCE(NEW.body, ''));
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER posts_search_vector_trigger
BEFORE INSERT OR UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION posts_search_vector_update();
-- Full-text search query
SELECT title, slug, excerpt, published_at
FROM posts
WHERE search_vector @@ plainto_tsquery('english', 'database design normalization')
AND status = 'published'
ORDER BY ts_rank(search_vector, plainto_tsquery('english', 'database design normalization')) DESC
LIMIT 10;Social Media Schema
-- Social Media: Users, Follows, Posts, Likes, Comments
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
display_name VARCHAR(100),
bio VARCHAR(500),
avatar_url VARCHAR(500),
follower_count INT DEFAULT 0, -- denormalized counter (updated by trigger)
following_count INT DEFAULT 0,
post_count INT DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Follows: many-to-many self-referencing
CREATE TABLE follows (
follower_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
following_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
created_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (follower_id, following_id),
CHECK (follower_id != following_id) -- can't follow yourself
);
CREATE INDEX idx_follows_following ON follows(following_id);
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
body VARCHAR(280), -- Twitter-like character limit
media_urls TEXT[], -- PostgreSQL array for multiple images
like_count INT DEFAULT 0,
comment_count INT DEFAULT 0,
is_reply_to INT REFERENCES posts(id), -- threaded replies
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_posts_user_id ON posts(user_id, created_at DESC);
CREATE INDEX idx_posts_is_reply ON posts(is_reply_to) WHERE is_reply_to IS NOT NULL;
-- Likes: many-to-many between users and posts
CREATE TABLE likes (
user_id INT REFERENCES users(id) ON DELETE CASCADE,
post_id INT REFERENCES posts(id) ON DELETE CASCADE,
created_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (user_id, post_id)
);
CREATE INDEX idx_likes_post_id ON likes(post_id);
-- User feed query: posts from users I follow, most recent first
SELECT p.*, u.username, u.avatar_url
FROM posts p
JOIN users u ON u.id = p.user_id
WHERE p.user_id IN (
SELECT following_id FROM follows WHERE follower_id = 42
)
ORDER BY p.created_at DESC
LIMIT 50;Connection Pooling
Connection pooling is not optional in production — it is essential. Without it, each HTTP request would need to establish a fresh database connection, incurring significant overhead and potentially overwhelming the database.
Connection Pooling with Node.js (pg-pool)
// connection-pool.js
const { Pool } = require('pg');
const pool = new Pool({
host: process.env.DB_HOST || 'localhost',
port: parseInt(process.env.DB_PORT || '5432'),
database: process.env.DB_NAME || 'myapp',
user: process.env.DB_USER || 'postgres',
password: process.env.DB_PASSWORD,
// Pool configuration
max: 20, // Maximum connections in pool (default: 10)
min: 5, // Minimum idle connections to keep open
idleTimeoutMillis: 30000, // Close idle clients after 30s
connectionTimeoutMillis: 2000, // Timeout if no connection available in 2s
maxUses: 7500, // Close connection after 7500 uses (prevents memory leaks)
// SSL for production
ssl: process.env.NODE_ENV === 'production'
? { rejectUnauthorized: true, ca: process.env.DB_SSL_CA }
: false,
});
// Monitor pool health
pool.on('connect', (client) => {
console.log('New client connected to pool');
});
pool.on('error', (err, client) => {
console.error('Unexpected error on idle client', err);
process.exit(-1);
});
// Graceful shutdown
process.on('SIGTERM', async () => {
await pool.end();
console.log('Pool has ended');
});
module.exports = pool;
// Usage in application code
async function getUserById(id) {
// pool.query() automatically checks out and returns a client
const result = await pool.query(
'SELECT id, username, email FROM users WHERE id = $1',
[id]
);
return result.rows[0] || null;
}PgBouncer: External Connection Pooling
For high-traffic applications, PgBouncer is a lightweight connection pooler that sits between your application and PostgreSQL. It supports three pooling modes:
- Session pooling — one server connection per client session (similar to no pooling)
- Transaction pooling — server connection held only during a transaction (recommended)
- Statement pooling — server connection held per statement (breaks multi-statement transactions)
; /etc/pgbouncer/pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp
[pgbouncer]
listen_port = 6432
listen_addr = 127.0.0.1
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction ; recommended for most apps
max_client_conn = 1000 ; max app connections to PgBouncer
default_pool_size = 25 ; max connections to PostgreSQL per database
min_pool_size = 5
reserve_pool_size = 5
log_connections = 0 ; disable in production for performance
server_idle_timeout = 600Best Practices and Production Tips
These are the lessons learned from operating databases at scale — patterns that prevent common mistakes and operational pain.
PostgreSQL vs MySQL: 2026 Recommendations
| Feature | PostgreSQL 16 | MySQL 8.4 |
|---|---|---|
| JSON support | JSONB (binary, indexed, fast) | JSON (text, limited indexing) |
| Full-text search | Built-in tsvector/tsquery | FULLTEXT index (limited) |
| Window functions | Full support since v8.4 | Support since MySQL 8.0 |
| CTE (WITH) | Full recursive CTE support | Support since MySQL 8.0 |
| Concurrency | MVCC (readers don't block writers) | MVCC (InnoDB engine) |
| Hosting support | All major cloud providers | All major cloud providers |
| Extensions | PostGIS, pgvector, TimescaleDB | Limited extension ecosystem |
| Verdict (2026) | Recommended default | Good for existing MySQL stacks |
Schema Migration Best Practices
-- Safe schema changes in production (zero-downtime patterns)
-- 1. Adding a column: safe (no lock) with a default in PostgreSQL 11+
ALTER TABLE users ADD COLUMN last_login TIMESTAMPTZ DEFAULT NULL;
-- 2. Adding NOT NULL column safely (add nullable, backfill, then constrain)
-- UNSAFE: one-step NOT NULL with DEFAULT on large tables locks the table
ALTER TABLE users ADD COLUMN preferences JSONB NOT NULL DEFAULT '{}';
-- SAFE: three-step approach
ALTER TABLE users ADD COLUMN preferences JSONB; -- Step 1: add nullable
UPDATE users SET preferences = '{}' WHERE preferences IS NULL; -- Step 2: backfill
ALTER TABLE users ALTER COLUMN preferences SET NOT NULL; -- Step 3: add constraint
-- 3. Adding an index without blocking reads/writes (PostgreSQL)
CREATE INDEX CONCURRENTLY idx_users_last_login ON users(last_login);
-- 4. Renaming a column (dangerous — use a view or application-side rename)
-- Step 1: Add new column
ALTER TABLE users ADD COLUMN user_name VARCHAR(50);
-- Step 2: Sync old to new via trigger or backfill
-- Step 3: Update application to use new name
-- Step 4: Drop old column after full deployment
ALTER TABLE users DROP COLUMN username;
-- 5. Partitioning a large table (range partitioning by date)
CREATE TABLE events (
id SERIAL,
type VARCHAR(100),
created_at TIMESTAMPTZ DEFAULT NOW()
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2026_01 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');Security Best Practices
-- 1. Use parameterized queries ALWAYS (prevents SQL injection)
-- In Node.js with pg:
const result = await pool.query(
'SELECT * FROM users WHERE email = $1 AND status = $2',
[email, 'active']
// Never: 'SELECT * FROM users WHERE email = ' + email
);
-- 2. Principle of least privilege: create role-specific database users
CREATE USER app_readonly WITH PASSWORD 'secret';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_readonly;
CREATE USER app_readwrite WITH PASSWORD 'secret2';
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_readwrite;
-- 3. Row-level security (RLS) for multi-tenant apps (PostgreSQL)
ALTER TABLE posts ENABLE ROW LEVEL SECURITY;
-- Users can only see their own posts
CREATE POLICY posts_user_isolation ON posts
USING (user_id = current_setting('app.user_id')::INT);
-- Set the user context at connection/session time
SET app.user_id = '42';
-- 4. Always store password hashes, never plaintext
-- Use bcrypt or argon2 from application layer, not database
-- 5. Enable TLS for all database connections in productionMonitoring and Maintenance
-- Find slow queries (PostgreSQL — enable pg_stat_statements extension)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
SELECT
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS mean_ms,
round(stddev_exec_time::numeric, 2) AS stddev_ms,
rows,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- Find missing indexes (tables with many sequential scans)
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
seq_tup_read / seq_scan AS avg_rows_per_scan
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 20;
-- Find unused indexes (wasteful, slow down writes)
SELECT
indexrelname AS index_name,
relname AS table_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS times_used
FROM pg_stat_user_indexes
JOIN pg_index USING (indexrelid)
WHERE idx_scan = 0 AND NOT indisprimary AND NOT indisunique
ORDER BY pg_relation_size(indexrelid) DESC;
-- Table bloat check (VACUUM keeps tables healthy)
SELECT relname, n_dead_tup, n_live_tup,
round(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 2) AS dead_ratio
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;
-- Configure autovacuum for busy tables
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.01, -- vacuum when 1% rows are dead
autovacuum_analyze_scale_factor = 0.005
);Summary: Database Design Checklist
- Draw an ERD before writing any SQL — validate with stakeholders
- Normalize to 3NF; denormalize only where measured performance requires it
- Every table has a primary key; prefer auto-increment or UUID v7
- Every foreign key column has an index (not created automatically)
- Use parameterized queries everywhere — never string-concatenate SQL
- Apply appropriate NOT NULL, UNIQUE, and CHECK constraints at the database level
- Use TIMESTAMPTZ (not TIMESTAMP) for all datetime columns
- Add created_at and updated_at to every table
- Use EXPLAIN ANALYZE before and after adding indexes
- Configure connection pooling in all production services
- Use transactions for all multi-step operations
- Test schema migrations on a production-sized dataset before deploying
- Monitor slow queries, index usage, and table bloat in production
- Keep pg_stat_statements enabled in PostgreSQL for query analytics
- Enable SSL/TLS for all database connections