TL;DR
PostgreSQL is the world's most advanced open-source relational database. Use SERIAL/UUID for primary keys, JSONB for semi-structured data, GIN indexes for full-text search and JSONB queries, and node-postgres (pg) or Prisma for Node.js applications. Always use parameterized queries to prevent SQL injection, add EXPLAIN ANALYZE before slow queries, and configure PgBouncer for connection pooling in production. Use our SQL formatter tool to clean up your queries instantly.
1. Core SQL — Tables, Data Types, and Constraints
PostgreSQL extends standard SQL with powerful native data types. Choosing the right type upfront avoids costly migrations later.
CREATE TABLE with Common Data Types
CREATE TABLE users (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
username TEXT NOT NULL UNIQUE,
email TEXT NOT NULL UNIQUE,
age INTEGER CHECK (age >= 0),
score NUMERIC(10, 2) DEFAULT 0.00,
tags TEXT[],
metadata JSONB DEFAULT '{}',
is_active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- SERIAL alternative (legacy but still common)
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
title TEXT NOT NULL,
body TEXT,
published BOOLEAN DEFAULT FALSE,
published_at TIMESTAMPTZ
);ALTER TABLE — Schema Changes
-- Add a column ALTER TABLE users ADD COLUMN phone TEXT; -- Add NOT NULL with a default (safe for large tables) ALTER TABLE users ADD COLUMN tier TEXT NOT NULL DEFAULT 'free'; -- Rename a column ALTER TABLE users RENAME COLUMN username TO handle; -- Drop a column ALTER TABLE users DROP COLUMN phone; -- Add a foreign key constraint ALTER TABLE posts ADD CONSTRAINT fk_posts_user FOREIGN KEY (user_id) REFERENCES users(id); -- Add a unique constraint ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);
For large tables in production, adding a NOT NULL column without a default requires a full table rewrite. Use a two-step migration: add the nullable column, backfill the data, then add the constraint.
2. CRUD & Querying — SELECT, JOIN, GROUP BY, EXPLAIN ANALYZE
PostgreSQL supports the full SQL standard plus many extensions. Mastering EXPLAIN ANALYZE is the single most important skill for query optimization.
-- INSERT with RETURNING
INSERT INTO users (email, username)
VALUES ('alice@example.com', 'alice')
RETURNING id, created_at;
-- UPDATE with condition
UPDATE users
SET score = score + 10, updated_at = NOW()
WHERE id = $1
RETURNING *;
-- DELETE with RETURNING
DELETE FROM posts WHERE id = $1 RETURNING id;
-- Complex SELECT with JOIN, GROUP BY, HAVING
SELECT
u.id,
u.username,
COUNT(p.id) AS post_count,
MAX(p.published_at) AS last_published
FROM users u
LEFT JOIN posts p ON p.user_id = u.id AND p.published = TRUE
WHERE u.is_active = TRUE
GROUP BY u.id, u.username
HAVING COUNT(p.id) > 0
ORDER BY post_count DESC
LIMIT 20 OFFSET 0;
-- EXPLAIN ANALYZE to diagnose slow queries
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM posts WHERE user_id = 'some-uuid' ORDER BY created_at DESC;Read EXPLAIN ANALYZE output from the innermost node outward. Look for Seq Scan on large tables (add an index), high actual rows vs estimated rows discrepancies (run ANALYZE tablename), and nested loop joins on large datasets (may benefit from hash joins via SET enable_nestloop = off for testing).
3. Indexes — B-tree, GIN, GiST, BRIN, Partial, Multi-column
Indexes are the primary performance lever in PostgreSQL. The default index type is B-tree, but choosing the right type for your access pattern can make queries orders of magnitude faster.
-- B-tree (default): equality and range queries CREATE INDEX idx_users_email ON users (email); CREATE INDEX idx_posts_created ON posts (created_at DESC); -- Multi-column: column order matters — most selective first CREATE INDEX idx_posts_user_published ON posts (user_id, published, published_at DESC); -- Partial index: only index a subset of rows CREATE INDEX idx_active_users ON users (email) WHERE is_active = TRUE; -- GIN: JSONB, full-text search, arrays CREATE INDEX idx_metadata_gin ON users USING gin(metadata); CREATE INDEX idx_tags_gin ON users USING gin(tags); -- GiST: geometric types, nearest-neighbor (PostGIS) CREATE INDEX idx_location_gist ON places USING gist(coordinates); -- BRIN: very large tables with natural ordering (time-series, logs) CREATE INDEX idx_logs_created_brin ON logs USING brin(created_at); -- CONCURRENTLY: build index without locking writes (production-safe) CREATE INDEX CONCURRENTLY idx_users_score ON users (score); -- Check index usage SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes WHERE tablename = 'users' ORDER BY idx_scan DESC;
A partial index is often dramatically smaller and faster than a full index. If 95% of your queries filter on WHERE is_active = TRUE, index only those rows. Unused indexes waste disk space and slow down writes — drop them with DROP INDEX CONCURRENTLY.
4. Node.js with node-postgres (pg) — Pool, Parameterized Queries, Transactions
node-postgres (the pg npm package) is the most widely used PostgreSQL driver for Node.js. Always use a Pool (not a single Client) in web applications to share connections across requests.
npm install pg npm install --save-dev @types/pg
// db.ts — connection pool setup
import { Pool } from 'pg';
export const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // maximum pool size
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
// Basic parameterized query — NEVER use string interpolation
export async function getUserByEmail(email: string) {
const result = await pool.query(
'SELECT id, username, email, created_at FROM users WHERE email = $1',
[email]
);
return result.rows[0] ?? null;
}
// Insert with RETURNING
export async function createUser(email: string, username: string) {
const result = await pool.query(
`INSERT INTO users (email, username)
VALUES ($1, $2)
RETURNING id, email, username, created_at`,
[email, username]
);
return result.rows[0];
}
// Transaction — acquire client from pool
export async function transferScore(fromId: string, toId: string, points: number) {
const client = await pool.connect();
try {
await client.query('BEGIN');
await client.query(
'UPDATE users SET score = score - $1 WHERE id = $2',
[points, fromId]
);
await client.query(
'UPDATE users SET score = score + $1 WHERE id = $2',
[points, toId]
);
await client.query('COMMIT');
} catch (err) {
await client.query('ROLLBACK');
throw err;
} finally {
client.release(); // always release back to pool
}
}5. Prisma ORM — Schema, Migrations, findMany/create/update/delete
Prisma provides a type-safe database client generated from your schema.prisma file. It is the most popular ORM for TypeScript/Next.js applications and integrates seamlessly with PostgreSQL.
npm install prisma @prisma/client npx prisma init
// schema.prisma
generator client {
provider = "prisma-client-js"
}
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
}
model User {
id String @id @default(uuid())
email String @unique
username String @unique
score Decimal @default(0) @db.Decimal(10, 2)
isActive Boolean @default(true)
createdAt DateTime @default(now()) @map("created_at")
posts Post[]
@@map("users")
}
model Post {
id String @id @default(uuid())
userId String @map("user_id")
title String
body String?
published Boolean @default(false)
publishedAt DateTime? @map("published_at")
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
@@map("posts")
}// Run migrations
npx prisma migrate dev --name init
npx prisma generate
// prisma-client.ts
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();
// findMany with where, select, include, orderBy, take
const users = await prisma.user.findMany({
where: {
isActive: true,
score: { gte: 100 },
},
select: {
id: true,
email: true,
username: true,
posts: {
where: { published: true },
orderBy: { publishedAt: 'desc' },
take: 3,
},
},
orderBy: { score: 'desc' },
take: 20,
skip: 0,
});
// create
const user = await prisma.user.create({
data: { email: 'bob@example.com', username: 'bob' },
});
// update
await prisma.user.update({
where: { id: userId },
data: { score: { increment: 10 } },
});
// delete (cascade via schema)
await prisma.user.delete({ where: { id: userId } });
// upsert
await prisma.user.upsert({
where: { email: 'alice@example.com' },
create: { email: 'alice@example.com', username: 'alice' },
update: { isActive: true },
});6. Python — asyncpg and psycopg2
Python has two dominant PostgreSQL libraries: psycopg2 (synchronous, battle-tested) and asyncpg (async, extremely fast, used with FastAPI/asyncio).
pip install asyncpg psycopg2-binary
# asyncpg — async/await (FastAPI, aiohttp)
import asyncpg
import asyncio
async def main():
conn = await asyncpg.connect(
'postgresql://user:password@localhost/dbname'
)
# Parameterized query — uses $1, $2 placeholders
rows = await conn.fetch(
'SELECT id, email FROM users WHERE is_active = $1 LIMIT $2',
True, 20
)
for row in rows:
print(dict(row))
# Insert with RETURNING
user = await conn.fetchrow(
'INSERT INTO users(email, username) VALUES($1, $2) RETURNING *',
'charlie@example.com', 'charlie'
)
# Transaction
async with conn.transaction():
await conn.execute(
'UPDATE users SET score = score - $1 WHERE id = $2',
50, from_id
)
await conn.execute(
'UPDATE users SET score = score + $1 WHERE id = $2',
50, to_id
)
await conn.close()
# Connection pool with asyncpg
async def create_pool():
return await asyncpg.create_pool(
dsn='postgresql://user:password@localhost/dbname',
min_size=5,
max_size=20
)
pool = asyncio.get_event_loop().run_until_complete(create_pool())# psycopg2 — synchronous (Django, Flask)
import psycopg2
from psycopg2.pool import ThreadedConnectionPool
# Always use %s placeholders — NEVER f-strings
conn = psycopg2.connect(
host='localhost', dbname='mydb',
user='myuser', password='secret'
)
cur = conn.cursor()
# Safe parameterized query
cur.execute(
'SELECT id, email FROM users WHERE email = %s',
(email,) # note: tuple, even for single value
)
user = cur.fetchone()
# WRONG — SQL injection vulnerability:
# cur.execute(f"SELECT * FROM users WHERE email = '{email}'")
# Connection pool
pool = ThreadedConnectionPool(
minconn=1, maxconn=20,
dsn='postgresql://user:pass@localhost/dbname'
)
conn = pool.getconn()
try:
with conn.cursor() as cur:
cur.execute('SELECT COUNT(*) FROM users')
count = cur.fetchone()[0]
conn.commit()
finally:
pool.putconn(conn)7. JSONB — Storing, Querying, Operators, and GIN Indexes
JSONB is PostgreSQL's binary JSON format. Unlike JSON, it is stored in a decomposed binary format that supports indexing and fast key-based access. Use JSONB for flexible schema attributes, configuration objects, and event payloads.
-- Store JSON
INSERT INTO users (email, username, metadata)
VALUES ('dave@example.com', 'dave',
'{"plan": "pro", "features": ["export", "api"], "settings": {"theme": "dark"}}');
-- -> returns JSONB, ->> returns TEXT
SELECT metadata->'plan' AS plan_json, -- "pro"
metadata->>'plan' AS plan_text, -- pro (no quotes)
metadata->'settings'->>'theme' AS theme -- dark
FROM users WHERE username = 'dave';
-- #> navigates nested path
SELECT metadata #> '{settings,theme}' AS theme FROM users;
-- ? checks for key existence
SELECT * FROM users WHERE metadata ? 'plan';
-- @> containment: rows where plan = "pro"
SELECT * FROM users WHERE metadata @> '{"plan": "pro"}';
-- ?| any of keys exists
SELECT * FROM users WHERE metadata ?| array['plan', 'tier'];
-- jsonb_build_object and aggregation
SELECT
jsonb_build_object(
'total', COUNT(*),
'plans', jsonb_agg(DISTINCT metadata->>'plan')
) AS stats
FROM users WHERE is_active = TRUE;
-- GIN index for fast JSONB queries
CREATE INDEX idx_users_metadata ON users USING gin(metadata);
-- Update a JSONB key (jsonb_set)
UPDATE users
SET metadata = jsonb_set(metadata, '{plan}', '"enterprise"')
WHERE id = $1;
-- Remove a key
UPDATE users
SET metadata = metadata - 'old_key'
WHERE id = $1;8. Full-Text Search — tsvector, tsquery, GIN Index, ts_rank, ts_headline, pg_trgm
PostgreSQL has a built-in full-text search engine. It is not as powerful as Elasticsearch for large-scale search, but it is sufficient for most application search needs and requires no additional infrastructure.
-- Basic full-text search
SELECT title, body
FROM posts
WHERE to_tsvector('english', title || ' ' || COALESCE(body, ''))
@@ to_tsquery('english', 'postgresql & index');
-- Store pre-computed tsvector for performance
ALTER TABLE posts ADD COLUMN search_vector TSVECTOR;
UPDATE posts
SET search_vector = to_tsvector('english', title || ' ' || COALESCE(body, ''));
-- Create GIN index on the tsvector column
CREATE INDEX idx_posts_search ON posts USING gin(search_vector);
-- Auto-update tsvector with a trigger
CREATE FUNCTION posts_tsvector_update() RETURNS trigger AS $$
BEGIN
NEW.search_vector := to_tsvector('english',
coalesce(NEW.title, '') || ' ' || coalesce(NEW.body, ''));
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER posts_search_trigger
BEFORE INSERT OR UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION posts_tsvector_update();
-- Search with ranking and highlighting
SELECT
id,
title,
ts_rank(search_vector, query) AS rank,
ts_headline('english', body, query,
'MaxFragments=2, MaxWords=30, MinWords=10') AS snippet
FROM posts, to_tsquery('english', 'postgresql & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;
-- pg_trgm: fuzzy/similarity search
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_users_username_trgm ON users USING gin(username gin_trgm_ops);
SELECT username, similarity(username, 'alce') AS sim
FROM users
WHERE username % 'alce' -- trigram similarity threshold
ORDER BY sim DESC;9. Window Functions — ROW_NUMBER, RANK, LAG, LEAD, Running Totals, FILTER
Window functions perform calculations across a set of rows related to the current row, without collapsing the result set (unlike GROUP BY). They are essential for rankings, running totals, and time-series analysis.
-- ROW_NUMBER, RANK, DENSE_RANK
SELECT
username,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
RANK() OVER (ORDER BY score DESC) AS rank, -- gaps on ties
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank -- no gaps
FROM users WHERE is_active = TRUE;
-- PARTITION BY: rank within groups
SELECT
username,
metadata->>'plan' AS plan,
score,
RANK() OVER (
PARTITION BY metadata->>'plan'
ORDER BY score DESC
) AS rank_in_plan
FROM users;
-- LAG / LEAD: access adjacent rows
SELECT
id,
score,
LAG(score, 1, 0) OVER (ORDER BY created_at) AS prev_score,
LEAD(score, 1, 0) OVER (ORDER BY created_at) AS next_score,
score - LAG(score, 1, 0) OVER (ORDER BY created_at) AS delta
FROM users ORDER BY created_at;
-- FIRST_VALUE / LAST_VALUE
SELECT
username,
score,
FIRST_VALUE(username) OVER (
ORDER BY score DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS top_user
FROM users;
-- Running total (cumulative sum)
SELECT
created_at::DATE AS day,
COUNT(*) AS new_users,
SUM(COUNT(*)) OVER (ORDER BY created_at::DATE) AS cumulative_users
FROM users
GROUP BY created_at::DATE
ORDER BY day;
-- FILTER clause: conditional aggregates
SELECT
COUNT(*) FILTER (WHERE is_active = TRUE) AS active_count,
COUNT(*) FILTER (WHERE is_active = FALSE) AS inactive_count,
AVG(score) FILTER (WHERE score > 0) AS avg_positive_score
FROM users;10. Performance & Tuning — PgBouncer, autovacuum, ANALYZE, pg_stat_statements
PostgreSQL performance involves multiple layers: connection management, query planning, table maintenance, and monitoring. Address these systematically for production workloads.
Connection Pooling with PgBouncer
# pgbouncer.ini [databases] mydb = host=127.0.0.1 port=5432 dbname=mydb [pgbouncer] listen_addr = 127.0.0.1 listen_port = 6432 auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt pool_mode = transaction # best for stateless web apps max_client_conn = 1000 default_pool_size = 25 # actual Postgres connections # Connect your app to PgBouncer port (6432), not Postgres (5432) DATABASE_URL=postgresql://user:pass@localhost:6432/mydb
autovacuum and ANALYZE
-- Check table bloat / autovacuum stats SELECT relname, n_live_tup, n_dead_tup, last_autovacuum, last_autoanalyze FROM pg_stat_user_tables ORDER BY n_dead_tup DESC; -- Manually run VACUUM and ANALYZE VACUUM ANALYZE users; -- VACUUM FULL reclaims disk (locks table — avoid in production) VACUUM FULL users; -- Tune autovacuum per table for high-churn tables ALTER TABLE users SET ( autovacuum_vacuum_scale_factor = 0.01, -- vacuum when 1% rows are dead autovacuum_analyze_scale_factor = 0.005 );
pg_stat_statements — Top Slow Queries
-- Enable extension (requires restart or superuser) CREATE EXTENSION IF NOT EXISTS pg_stat_statements; -- Top 10 slowest queries by total time SELECT LEFT(query, 80) AS query_snippet, calls, total_exec_time::BIGINT AS total_ms, mean_exec_time::BIGINT AS mean_ms, rows FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10; -- Key postgresql.conf settings for performance -- shared_buffers = 25% of RAM (e.g., 4GB on 16GB system) -- effective_cache_size = 75% of RAM -- work_mem = 4MB to 64MB (per sort/hash operation) -- max_connections = 100-200 (use PgBouncer for more) -- checkpoint_completion_target = 0.9 -- random_page_cost = 1.1 (for SSDs; default 4.0 is for HDD)
Bonus: CTEs, Recursive Queries, and UPSERT Patterns
Common Table Expressions (CTEs) introduced with WITH make complex queries modular and readable. Recursive CTEs are essential for hierarchical data like org charts, category trees, and threaded comments.
CTEs for Readable Multi-step Queries
-- CTE: each step is named and reusable
WITH active_users AS (
SELECT id, username, score
FROM users
WHERE is_active = TRUE
),
ranked_users AS (
SELECT
id,
username,
score,
RANK() OVER (ORDER BY score DESC) AS rnk
FROM active_users
)
SELECT * FROM ranked_users WHERE rnk <= 10;
-- Writeable CTE: UPDATE and return affected rows
WITH updated AS (
UPDATE users
SET score = score + 100, updated_at = NOW()
WHERE id = ANY($1::uuid[])
RETURNING id, score
)
SELECT COUNT(*) AS updated_count FROM updated;Recursive CTEs for Tree Data
-- categories table: id, parent_id, name
WITH RECURSIVE category_tree AS (
-- Base case: root categories (no parent)
SELECT id, name, parent_id, 0 AS depth, name::TEXT AS path
FROM categories
WHERE parent_id IS NULL
UNION ALL
-- Recursive case: join children to their parent
SELECT c.id, c.name, c.parent_id,
ct.depth + 1,
ct.path || ' > ' || c.name
FROM categories c
INNER JOIN category_tree ct ON ct.id = c.parent_id
)
SELECT * FROM category_tree ORDER BY path;
-- Find all ancestors of a given node
WITH RECURSIVE ancestors AS (
SELECT id, name, parent_id
FROM categories WHERE id = $1 -- start node
UNION ALL
SELECT c.id, c.name, c.parent_id
FROM categories c
INNER JOIN ancestors a ON a.parent_id = c.id
)
SELECT * FROM ancestors;UPSERT with ON CONFLICT
-- ON CONFLICT DO UPDATE (upsert) INSERT INTO users (email, username, score) VALUES ($1, $2, $3) ON CONFLICT (email) DO UPDATE SET username = EXCLUDED.username, score = GREATEST(users.score, EXCLUDED.score), updated_at = NOW() RETURNING *; -- ON CONFLICT DO NOTHING (insert-or-ignore) INSERT INTO user_tags (user_id, tag) VALUES ($1, $2) ON CONFLICT (user_id, tag) DO NOTHING; -- Bulk upsert with unnest (efficient batch insert) INSERT INTO scores (user_id, value, recorded_at) SELECT * FROM unnest($1::uuid[], $2::numeric[], $3::timestamptz[]) AS t(user_id, value, recorded_at) ON CONFLICT (user_id) DO UPDATE SET value = EXCLUDED.value, recorded_at = EXCLUDED.recorded_at;
Partitioning for Time-series Data
For high-volume append-only tables (logs, events, metrics), table partitioning dramatically improves query performance and simplifies data retention (drop old partitions instead of DELETE).
-- Range-partitioned events table by month
CREATE TABLE events (
id BIGSERIAL,
user_id UUID NOT NULL,
event_type TEXT NOT NULL,
payload JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE events_2026_01
PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02
PARTITION OF events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
-- Indexes must be created on each partition (or use CREATE INDEX on parent)
CREATE INDEX ON events_2026_01 (user_id, created_at DESC);
CREATE INDEX ON events_2026_02 (user_id, created_at DESC);
-- Drop old partition (instantly deletes all data, no vacuum needed)
DROP TABLE events_2025_12;
-- Query automatically hits only relevant partition (partition pruning)
SELECT * FROM events
WHERE user_id = $1
AND created_at BETWEEN '2026-02-01' AND '2026-03-01';Useful Administrative Queries
-- Database sizes
SELECT datname, pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database ORDER BY pg_database_size(datname) DESC;
-- Table sizes including indexes
SELECT
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size,
pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
-- Cache hit rate (aim for > 99%)
SELECT
sum(heap_blks_hit) AS heap_hit,
sum(heap_blks_read) AS heap_read,
round(sum(heap_blks_hit)::NUMERIC /
NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) * 100, 2) AS cache_hit_pct
FROM pg_statio_user_tables;
-- Index hit rate (aim for > 95%)
SELECT
tablename,
round(idx_scan::NUMERIC / NULLIF(seq_scan + idx_scan, 0) * 100, 2) AS index_usage_pct,
n_live_tup AS row_count
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;Extensions Worth Installing
| Extension | Purpose | Install Command |
|---|---|---|
pg_stat_statements | Track slow query statistics | CREATE EXTENSION pg_stat_statements; |
pg_trgm | Fuzzy/trigram text search | CREATE EXTENSION pg_trgm; |
uuid-ossp | UUID generation (legacy; prefer pgcrypto) | CREATE EXTENSION "uuid-ossp"; |
pgcrypto | gen_random_uuid(), encryption functions | CREATE EXTENSION pgcrypto; |
PostGIS | Geospatial queries and indexes | CREATE EXTENSION postgis; |
hstore | Key-value pairs (use JSONB instead for new code) | CREATE EXTENSION hstore; |
timescaledb | Time-series data, hypertables, continuous aggregates | CREATE EXTENSION timescaledb; |
Need to clean up or format your SQL queries? Try our free SQL Formatter tool — instantly format, indent, and validate your PostgreSQL, MySQL, or SQLite queries online.
Bonus: Replication, Backup, Monitoring, and Security
Streaming Replication and Read Replicas
PostgreSQL's built-in streaming replication sends WAL (Write-Ahead Log) records from a primary to one or more standby servers in near real-time. Read replicas offload SELECT queries from the primary, improving throughput. Most managed PostgreSQL providers (AWS RDS, Supabase, Neon) configure this automatically.
-- postgresql.conf on primary wal_level = replica max_wal_senders = 5 wal_keep_size = 1GB # keep enough WAL for replicas to catch up -- pg_hba.conf on primary: allow replication connections host replication replicator 10.0.0.0/8 md5 -- Create replication user CREATE ROLE replicator REPLICATION LOGIN PASSWORD 'secret'; -- On replica: recovery.conf (PostgreSQL < 12) or postgresql.conf (>= 12) primary_conninfo = 'host=primary port=5432 user=replicator password=secret' hot_standby = on # allow SELECT queries on replica -- Check replication lag SELECT client_addr, state, sent_lsn - replay_lsn AS replication_lag_bytes FROM pg_stat_replication;
Backup Strategies — pg_dump, pg_basebackup, PITR
# pg_dump: logical backup of a single database (safe for live DB) pg_dump -Fc -Z9 -U postgres mydb > mydb_$(date +%Y%m%d).dump # Restore from dump pg_restore -U postgres -d mydb_restore mydb_20260227.dump # pg_dumpall: dump all databases + global objects (roles, tablespaces) pg_dumpall -U postgres > all_databases.sql # pg_basebackup: physical backup (full cluster, for streaming replication) pg_basebackup -h localhost -U replicator -D /backup/base -P -Xs -R # Point-in-time Recovery (PITR) # 1. Take a base backup # 2. Enable WAL archiving (archive_mode = on, archive_command = ...) # 3. Restore base backup + replay WAL up to target time # In postgresql.conf: archive_mode = on archive_command = 'cp %p /mnt/wal_archive/%f' # recovery.conf / postgresql.conf (>= 12) for PITR: restore_command = 'cp /mnt/wal_archive/%f %p' recovery_target_time = '2026-02-27 14:30:00'
Security — Row-Level Security (RLS) and Roles
Row-Level Security (RLS) lets you enforce data access policies at the database level. This is especially powerful in multi-tenant applications — each tenant can only see their own rows, regardless of how the application queries the database.
-- Enable RLS on a table
ALTER TABLE posts ENABLE ROW LEVEL SECURITY;
-- Policy: users can only see their own posts
CREATE POLICY posts_owner_policy ON posts
USING (user_id = current_setting('app.current_user_id')::UUID);
-- Policy: admins can see all posts
CREATE POLICY posts_admin_policy ON posts
TO admin_role
USING (TRUE);
-- In your app: set the current user before queries
SET LOCAL app.current_user_id = 'user-uuid-here';
SELECT * FROM posts; -- automatically filtered by RLS
-- Role-based access control
CREATE ROLE app_readonly;
GRANT CONNECT ON DATABASE mydb TO app_readonly;
GRANT USAGE ON SCHEMA public TO app_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_readonly;
CREATE ROLE app_readwrite;
GRANT app_readonly TO app_readwrite;
GRANT INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_readwrite;
-- Principle of least privilege: app uses limited-privilege role
-- Never connect as postgres/superuser from application codeMonitoring with pg_stat_activity and Locks
-- Active connections and long-running queries
SELECT
pid,
usename,
application_name,
state,
now() - query_start AS duration,
left(query, 100) AS query_snippet,
wait_event_type,
wait_event
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;
-- Kill a long-running query (non-destructive)
SELECT pg_cancel_backend(pid)
FROM pg_stat_activity
WHERE now() - query_start > interval '5 minutes';
-- Detect lock contention
SELECT
blocked.pid AS blocked_pid,
blocked_query.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking_query.query AS blocking_query
FROM pg_locks blocked_lock
JOIN pg_stat_activity blocked_query ON blocked_query.pid = blocked_lock.pid
JOIN pg_locks blocking_lock ON (
blocking_lock.locktype = blocked_lock.locktype
AND blocking_lock.granted = TRUE
AND blocked_lock.granted = FALSE
AND blocking_lock.relation = blocked_lock.relation
)
JOIN pg_stat_activity blocking_query ON blocking_query.pid = blocking_lock.pid
WHERE blocked_lock.granted = FALSE
LIMIT 10;Key Takeaways
- Use UUID or BIGINT GENERATED ALWAYS AS IDENTITY for primary keys; prefer TIMESTAMPTZ over TIMESTAMP.
- Always use parameterized queries ($1, $2 in pg; %s in psycopg2) to prevent SQL injection.
- EXPLAIN ANALYZE is your first debugging tool — look for Seq Scan on large tables and high row count mismatches.
- Choose the right index type: B-tree for scalar equality/range, GIN for JSONB/arrays/FTS, BRIN for large sequential tables.
- Use JSONB with GIN indexes for flexible schema columns; the
@>containment operator is the most powerful JSONB query. - Window functions (ROW_NUMBER, LAG, SUM OVER) replace complex self-joins for ranking and time-series analysis.
- PgBouncer in transaction mode is essential when using serverless functions or running many application instances.
- pg_stat_statements + pg_stat_user_indexes give you data-driven insight into what to optimize.