DevToolBoxGRATIS
Blog

PostgreSQL Complete Guide: SQL, Indexes, JSONB, and Performance

13 min readby DevToolBox

TL;DR

PostgreSQL is the world's most advanced open-source relational database. Use SERIAL/UUID for primary keys, JSONB for semi-structured data, GIN indexes for full-text search and JSONB queries, and node-postgres (pg) or Prisma for Node.js applications. Always use parameterized queries to prevent SQL injection, add EXPLAIN ANALYZE before slow queries, and configure PgBouncer for connection pooling in production. Use our SQL formatter tool to clean up your queries instantly.

1. Core SQL — Tables, Data Types, and Constraints

PostgreSQL extends standard SQL with powerful native data types. Choosing the right type upfront avoids costly migrations later.

CREATE TABLE with Common Data Types

CREATE TABLE users (
  id          UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  username    TEXT NOT NULL UNIQUE,
  email       TEXT NOT NULL UNIQUE,
  age         INTEGER CHECK (age >= 0),
  score       NUMERIC(10, 2) DEFAULT 0.00,
  tags        TEXT[],
  metadata    JSONB DEFAULT '{}',
  is_active   BOOLEAN NOT NULL DEFAULT TRUE,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- SERIAL alternative (legacy but still common)
CREATE TABLE posts (
  id          SERIAL PRIMARY KEY,
  user_id     UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  title       TEXT NOT NULL,
  body        TEXT,
  published   BOOLEAN DEFAULT FALSE,
  published_at TIMESTAMPTZ
);

ALTER TABLE — Schema Changes

-- Add a column
ALTER TABLE users ADD COLUMN phone TEXT;

-- Add NOT NULL with a default (safe for large tables)
ALTER TABLE users ADD COLUMN tier TEXT NOT NULL DEFAULT 'free';

-- Rename a column
ALTER TABLE users RENAME COLUMN username TO handle;

-- Drop a column
ALTER TABLE users DROP COLUMN phone;

-- Add a foreign key constraint
ALTER TABLE posts
  ADD CONSTRAINT fk_posts_user
  FOREIGN KEY (user_id) REFERENCES users(id);

-- Add a unique constraint
ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);

For large tables in production, adding a NOT NULL column without a default requires a full table rewrite. Use a two-step migration: add the nullable column, backfill the data, then add the constraint.

2. CRUD & Querying — SELECT, JOIN, GROUP BY, EXPLAIN ANALYZE

PostgreSQL supports the full SQL standard plus many extensions. Mastering EXPLAIN ANALYZE is the single most important skill for query optimization.

-- INSERT with RETURNING
INSERT INTO users (email, username)
VALUES ('alice@example.com', 'alice')
RETURNING id, created_at;

-- UPDATE with condition
UPDATE users
SET score = score + 10, updated_at = NOW()
WHERE id = $1
RETURNING *;

-- DELETE with RETURNING
DELETE FROM posts WHERE id = $1 RETURNING id;

-- Complex SELECT with JOIN, GROUP BY, HAVING
SELECT
  u.id,
  u.username,
  COUNT(p.id)          AS post_count,
  MAX(p.published_at)  AS last_published
FROM users u
LEFT JOIN posts p ON p.user_id = u.id AND p.published = TRUE
WHERE u.is_active = TRUE
GROUP BY u.id, u.username
HAVING COUNT(p.id) > 0
ORDER BY post_count DESC
LIMIT 20 OFFSET 0;

-- EXPLAIN ANALYZE to diagnose slow queries
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM posts WHERE user_id = 'some-uuid' ORDER BY created_at DESC;

Read EXPLAIN ANALYZE output from the innermost node outward. Look for Seq Scan on large tables (add an index), high actual rows vs estimated rows discrepancies (run ANALYZE tablename), and nested loop joins on large datasets (may benefit from hash joins via SET enable_nestloop = off for testing).

3. Indexes — B-tree, GIN, GiST, BRIN, Partial, Multi-column

Indexes are the primary performance lever in PostgreSQL. The default index type is B-tree, but choosing the right type for your access pattern can make queries orders of magnitude faster.

-- B-tree (default): equality and range queries
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_posts_created ON posts (created_at DESC);

-- Multi-column: column order matters — most selective first
CREATE INDEX idx_posts_user_published ON posts (user_id, published, published_at DESC);

-- Partial index: only index a subset of rows
CREATE INDEX idx_active_users ON users (email) WHERE is_active = TRUE;

-- GIN: JSONB, full-text search, arrays
CREATE INDEX idx_metadata_gin ON users USING gin(metadata);
CREATE INDEX idx_tags_gin ON users USING gin(tags);

-- GiST: geometric types, nearest-neighbor (PostGIS)
CREATE INDEX idx_location_gist ON places USING gist(coordinates);

-- BRIN: very large tables with natural ordering (time-series, logs)
CREATE INDEX idx_logs_created_brin ON logs USING brin(created_at);

-- CONCURRENTLY: build index without locking writes (production-safe)
CREATE INDEX CONCURRENTLY idx_users_score ON users (score);

-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename = 'users'
ORDER BY idx_scan DESC;

A partial index is often dramatically smaller and faster than a full index. If 95% of your queries filter on WHERE is_active = TRUE, index only those rows. Unused indexes waste disk space and slow down writes — drop them with DROP INDEX CONCURRENTLY.

4. Node.js with node-postgres (pg) — Pool, Parameterized Queries, Transactions

node-postgres (the pg npm package) is the most widely used PostgreSQL driver for Node.js. Always use a Pool (not a single Client) in web applications to share connections across requests.

npm install pg
npm install --save-dev @types/pg
// db.ts — connection pool setup
import { Pool } from 'pg';

export const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,              // maximum pool size
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

// Basic parameterized query — NEVER use string interpolation
export async function getUserByEmail(email: string) {
  const result = await pool.query(
    'SELECT id, username, email, created_at FROM users WHERE email = $1',
    [email]
  );
  return result.rows[0] ?? null;
}

// Insert with RETURNING
export async function createUser(email: string, username: string) {
  const result = await pool.query(
    `INSERT INTO users (email, username)
     VALUES ($1, $2)
     RETURNING id, email, username, created_at`,
    [email, username]
  );
  return result.rows[0];
}

// Transaction — acquire client from pool
export async function transferScore(fromId: string, toId: string, points: number) {
  const client = await pool.connect();
  try {
    await client.query('BEGIN');
    await client.query(
      'UPDATE users SET score = score - $1 WHERE id = $2',
      [points, fromId]
    );
    await client.query(
      'UPDATE users SET score = score + $1 WHERE id = $2',
      [points, toId]
    );
    await client.query('COMMIT');
  } catch (err) {
    await client.query('ROLLBACK');
    throw err;
  } finally {
    client.release();  // always release back to pool
  }
}

5. Prisma ORM — Schema, Migrations, findMany/create/update/delete

Prisma provides a type-safe database client generated from your schema.prisma file. It is the most popular ORM for TypeScript/Next.js applications and integrates seamlessly with PostgreSQL.

npm install prisma @prisma/client
npx prisma init
// schema.prisma
generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "postgresql"
  url      = env("DATABASE_URL")
}

model User {
  id        String   @id @default(uuid())
  email     String   @unique
  username  String   @unique
  score     Decimal  @default(0) @db.Decimal(10, 2)
  isActive  Boolean  @default(true)
  createdAt DateTime @default(now()) @map("created_at")
  posts     Post[]

  @@map("users")
}

model Post {
  id          String    @id @default(uuid())
  userId      String    @map("user_id")
  title       String
  body        String?
  published   Boolean   @default(false)
  publishedAt DateTime? @map("published_at")
  user        User      @relation(fields: [userId], references: [id], onDelete: Cascade)

  @@map("posts")
}
// Run migrations
npx prisma migrate dev --name init
npx prisma generate

// prisma-client.ts
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();

// findMany with where, select, include, orderBy, take
const users = await prisma.user.findMany({
  where: {
    isActive: true,
    score: { gte: 100 },
  },
  select: {
    id: true,
    email: true,
    username: true,
    posts: {
      where: { published: true },
      orderBy: { publishedAt: 'desc' },
      take: 3,
    },
  },
  orderBy: { score: 'desc' },
  take: 20,
  skip: 0,
});

// create
const user = await prisma.user.create({
  data: { email: 'bob@example.com', username: 'bob' },
});

// update
await prisma.user.update({
  where: { id: userId },
  data: { score: { increment: 10 } },
});

// delete (cascade via schema)
await prisma.user.delete({ where: { id: userId } });

// upsert
await prisma.user.upsert({
  where: { email: 'alice@example.com' },
  create: { email: 'alice@example.com', username: 'alice' },
  update: { isActive: true },
});

6. Python — asyncpg and psycopg2

Python has two dominant PostgreSQL libraries: psycopg2 (synchronous, battle-tested) and asyncpg (async, extremely fast, used with FastAPI/asyncio).

pip install asyncpg psycopg2-binary
# asyncpg — async/await (FastAPI, aiohttp)
import asyncpg
import asyncio

async def main():
    conn = await asyncpg.connect(
        'postgresql://user:password@localhost/dbname'
    )

    # Parameterized query — uses $1, $2 placeholders
    rows = await conn.fetch(
        'SELECT id, email FROM users WHERE is_active = $1 LIMIT $2',
        True, 20
    )
    for row in rows:
        print(dict(row))

    # Insert with RETURNING
    user = await conn.fetchrow(
        'INSERT INTO users(email, username) VALUES($1, $2) RETURNING *',
        'charlie@example.com', 'charlie'
    )

    # Transaction
    async with conn.transaction():
        await conn.execute(
            'UPDATE users SET score = score - $1 WHERE id = $2',
            50, from_id
        )
        await conn.execute(
            'UPDATE users SET score = score + $1 WHERE id = $2',
            50, to_id
        )

    await conn.close()

# Connection pool with asyncpg
async def create_pool():
    return await asyncpg.create_pool(
        dsn='postgresql://user:password@localhost/dbname',
        min_size=5,
        max_size=20
    )

pool = asyncio.get_event_loop().run_until_complete(create_pool())
# psycopg2 — synchronous (Django, Flask)
import psycopg2
from psycopg2.pool import ThreadedConnectionPool

# Always use %s placeholders — NEVER f-strings
conn = psycopg2.connect(
    host='localhost', dbname='mydb',
    user='myuser', password='secret'
)
cur = conn.cursor()

# Safe parameterized query
cur.execute(
    'SELECT id, email FROM users WHERE email = %s',
    (email,)   # note: tuple, even for single value
)
user = cur.fetchone()

# WRONG — SQL injection vulnerability:
# cur.execute(f"SELECT * FROM users WHERE email = '{email}'")

# Connection pool
pool = ThreadedConnectionPool(
    minconn=1, maxconn=20,
    dsn='postgresql://user:pass@localhost/dbname'
)
conn = pool.getconn()
try:
    with conn.cursor() as cur:
        cur.execute('SELECT COUNT(*) FROM users')
        count = cur.fetchone()[0]
    conn.commit()
finally:
    pool.putconn(conn)

7. JSONB — Storing, Querying, Operators, and GIN Indexes

JSONB is PostgreSQL's binary JSON format. Unlike JSON, it is stored in a decomposed binary format that supports indexing and fast key-based access. Use JSONB for flexible schema attributes, configuration objects, and event payloads.

-- Store JSON
INSERT INTO users (email, username, metadata)
VALUES ('dave@example.com', 'dave',
        '{"plan": "pro", "features": ["export", "api"], "settings": {"theme": "dark"}}');

-- -> returns JSONB, ->> returns TEXT
SELECT metadata->'plan'          AS plan_json,   -- "pro"
       metadata->>'plan'         AS plan_text,   -- pro (no quotes)
       metadata->'settings'->>'theme' AS theme   -- dark
FROM users WHERE username = 'dave';

-- #> navigates nested path
SELECT metadata #> '{settings,theme}' AS theme FROM users;

-- ? checks for key existence
SELECT * FROM users WHERE metadata ? 'plan';

-- @> containment: rows where plan = "pro"
SELECT * FROM users WHERE metadata @> '{"plan": "pro"}';

-- ?| any of keys exists
SELECT * FROM users WHERE metadata ?| array['plan', 'tier'];

-- jsonb_build_object and aggregation
SELECT
  jsonb_build_object(
    'total', COUNT(*),
    'plans', jsonb_agg(DISTINCT metadata->>'plan')
  ) AS stats
FROM users WHERE is_active = TRUE;

-- GIN index for fast JSONB queries
CREATE INDEX idx_users_metadata ON users USING gin(metadata);

-- Update a JSONB key (jsonb_set)
UPDATE users
SET metadata = jsonb_set(metadata, '{plan}', '"enterprise"')
WHERE id = $1;

-- Remove a key
UPDATE users
SET metadata = metadata - 'old_key'
WHERE id = $1;

8. Full-Text Search — tsvector, tsquery, GIN Index, ts_rank, ts_headline, pg_trgm

PostgreSQL has a built-in full-text search engine. It is not as powerful as Elasticsearch for large-scale search, but it is sufficient for most application search needs and requires no additional infrastructure.

-- Basic full-text search
SELECT title, body
FROM posts
WHERE to_tsvector('english', title || ' ' || COALESCE(body, ''))
      @@ to_tsquery('english', 'postgresql & index');

-- Store pre-computed tsvector for performance
ALTER TABLE posts ADD COLUMN search_vector TSVECTOR;

UPDATE posts
SET search_vector = to_tsvector('english', title || ' ' || COALESCE(body, ''));

-- Create GIN index on the tsvector column
CREATE INDEX idx_posts_search ON posts USING gin(search_vector);

-- Auto-update tsvector with a trigger
CREATE FUNCTION posts_tsvector_update() RETURNS trigger AS $$
BEGIN
  NEW.search_vector := to_tsvector('english',
    coalesce(NEW.title, '') || ' ' || coalesce(NEW.body, ''));
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER posts_search_trigger
BEFORE INSERT OR UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION posts_tsvector_update();

-- Search with ranking and highlighting
SELECT
  id,
  title,
  ts_rank(search_vector, query)    AS rank,
  ts_headline('english', body, query,
    'MaxFragments=2, MaxWords=30, MinWords=10') AS snippet
FROM posts, to_tsquery('english', 'postgresql & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

-- pg_trgm: fuzzy/similarity search
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_users_username_trgm ON users USING gin(username gin_trgm_ops);

SELECT username, similarity(username, 'alce') AS sim
FROM users
WHERE username % 'alce'   -- trigram similarity threshold
ORDER BY sim DESC;

9. Window Functions — ROW_NUMBER, RANK, LAG, LEAD, Running Totals, FILTER

Window functions perform calculations across a set of rows related to the current row, without collapsing the result set (unlike GROUP BY). They are essential for rankings, running totals, and time-series analysis.

-- ROW_NUMBER, RANK, DENSE_RANK
SELECT
  username,
  score,
  ROW_NUMBER() OVER (ORDER BY score DESC)    AS row_num,
  RANK()       OVER (ORDER BY score DESC)    AS rank,      -- gaps on ties
  DENSE_RANK() OVER (ORDER BY score DESC)    AS dense_rank -- no gaps
FROM users WHERE is_active = TRUE;

-- PARTITION BY: rank within groups
SELECT
  username,
  metadata->>'plan' AS plan,
  score,
  RANK() OVER (
    PARTITION BY metadata->>'plan'
    ORDER BY score DESC
  ) AS rank_in_plan
FROM users;

-- LAG / LEAD: access adjacent rows
SELECT
  id,
  score,
  LAG(score,  1, 0) OVER (ORDER BY created_at) AS prev_score,
  LEAD(score, 1, 0) OVER (ORDER BY created_at) AS next_score,
  score - LAG(score, 1, 0) OVER (ORDER BY created_at) AS delta
FROM users ORDER BY created_at;

-- FIRST_VALUE / LAST_VALUE
SELECT
  username,
  score,
  FIRST_VALUE(username) OVER (
    ORDER BY score DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS top_user
FROM users;

-- Running total (cumulative sum)
SELECT
  created_at::DATE AS day,
  COUNT(*)         AS new_users,
  SUM(COUNT(*)) OVER (ORDER BY created_at::DATE) AS cumulative_users
FROM users
GROUP BY created_at::DATE
ORDER BY day;

-- FILTER clause: conditional aggregates
SELECT
  COUNT(*) FILTER (WHERE is_active = TRUE)  AS active_count,
  COUNT(*) FILTER (WHERE is_active = FALSE) AS inactive_count,
  AVG(score) FILTER (WHERE score > 0)       AS avg_positive_score
FROM users;

10. Performance & Tuning — PgBouncer, autovacuum, ANALYZE, pg_stat_statements

PostgreSQL performance involves multiple layers: connection management, query planning, table maintenance, and monitoring. Address these systematically for production workloads.

Connection Pooling with PgBouncer

# pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction   # best for stateless web apps
max_client_conn = 1000
default_pool_size = 25    # actual Postgres connections

# Connect your app to PgBouncer port (6432), not Postgres (5432)
DATABASE_URL=postgresql://user:pass@localhost:6432/mydb

autovacuum and ANALYZE

-- Check table bloat / autovacuum stats
SELECT
  relname,
  n_live_tup,
  n_dead_tup,
  last_autovacuum,
  last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Manually run VACUUM and ANALYZE
VACUUM ANALYZE users;

-- VACUUM FULL reclaims disk (locks table — avoid in production)
VACUUM FULL users;

-- Tune autovacuum per table for high-churn tables
ALTER TABLE users SET (
  autovacuum_vacuum_scale_factor = 0.01,   -- vacuum when 1% rows are dead
  autovacuum_analyze_scale_factor = 0.005
);

pg_stat_statements — Top Slow Queries

-- Enable extension (requires restart or superuser)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries by total time
SELECT
  LEFT(query, 80)          AS query_snippet,
  calls,
  total_exec_time::BIGINT  AS total_ms,
  mean_exec_time::BIGINT   AS mean_ms,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Key postgresql.conf settings for performance
-- shared_buffers = 25% of RAM        (e.g., 4GB on 16GB system)
-- effective_cache_size = 75% of RAM
-- work_mem = 4MB to 64MB             (per sort/hash operation)
-- max_connections = 100-200          (use PgBouncer for more)
-- checkpoint_completion_target = 0.9
-- random_page_cost = 1.1             (for SSDs; default 4.0 is for HDD)

Bonus: CTEs, Recursive Queries, and UPSERT Patterns

Common Table Expressions (CTEs) introduced with WITH make complex queries modular and readable. Recursive CTEs are essential for hierarchical data like org charts, category trees, and threaded comments.

CTEs for Readable Multi-step Queries

-- CTE: each step is named and reusable
WITH active_users AS (
  SELECT id, username, score
  FROM users
  WHERE is_active = TRUE
),
ranked_users AS (
  SELECT
    id,
    username,
    score,
    RANK() OVER (ORDER BY score DESC) AS rnk
  FROM active_users
)
SELECT * FROM ranked_users WHERE rnk <= 10;

-- Writeable CTE: UPDATE and return affected rows
WITH updated AS (
  UPDATE users
  SET score = score + 100, updated_at = NOW()
  WHERE id = ANY($1::uuid[])
  RETURNING id, score
)
SELECT COUNT(*) AS updated_count FROM updated;

Recursive CTEs for Tree Data

-- categories table: id, parent_id, name
WITH RECURSIVE category_tree AS (
  -- Base case: root categories (no parent)
  SELECT id, name, parent_id, 0 AS depth, name::TEXT AS path
  FROM categories
  WHERE parent_id IS NULL

  UNION ALL

  -- Recursive case: join children to their parent
  SELECT c.id, c.name, c.parent_id,
         ct.depth + 1,
         ct.path || ' > ' || c.name
  FROM categories c
  INNER JOIN category_tree ct ON ct.id = c.parent_id
)
SELECT * FROM category_tree ORDER BY path;

-- Find all ancestors of a given node
WITH RECURSIVE ancestors AS (
  SELECT id, name, parent_id
  FROM categories WHERE id = $1   -- start node

  UNION ALL

  SELECT c.id, c.name, c.parent_id
  FROM categories c
  INNER JOIN ancestors a ON a.parent_id = c.id
)
SELECT * FROM ancestors;

UPSERT with ON CONFLICT

-- ON CONFLICT DO UPDATE (upsert)
INSERT INTO users (email, username, score)
VALUES ($1, $2, $3)
ON CONFLICT (email)
DO UPDATE SET
  username   = EXCLUDED.username,
  score      = GREATEST(users.score, EXCLUDED.score),
  updated_at = NOW()
RETURNING *;

-- ON CONFLICT DO NOTHING (insert-or-ignore)
INSERT INTO user_tags (user_id, tag)
VALUES ($1, $2)
ON CONFLICT (user_id, tag) DO NOTHING;

-- Bulk upsert with unnest (efficient batch insert)
INSERT INTO scores (user_id, value, recorded_at)
SELECT * FROM unnest($1::uuid[], $2::numeric[], $3::timestamptz[])
  AS t(user_id, value, recorded_at)
ON CONFLICT (user_id)
DO UPDATE SET value = EXCLUDED.value, recorded_at = EXCLUDED.recorded_at;

Partitioning for Time-series Data

For high-volume append-only tables (logs, events, metrics), table partitioning dramatically improves query performance and simplifies data retention (drop old partitions instead of DELETE).

-- Range-partitioned events table by month
CREATE TABLE events (
  id         BIGSERIAL,
  user_id    UUID NOT NULL,
  event_type TEXT NOT NULL,
  payload    JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE events_2026_01
  PARTITION OF events
  FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE events_2026_02
  PARTITION OF events
  FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

-- Indexes must be created on each partition (or use CREATE INDEX on parent)
CREATE INDEX ON events_2026_01 (user_id, created_at DESC);
CREATE INDEX ON events_2026_02 (user_id, created_at DESC);

-- Drop old partition (instantly deletes all data, no vacuum needed)
DROP TABLE events_2025_12;

-- Query automatically hits only relevant partition (partition pruning)
SELECT * FROM events
WHERE user_id = $1
  AND created_at BETWEEN '2026-02-01' AND '2026-03-01';

Useful Administrative Queries

-- Database sizes
SELECT datname, pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database ORDER BY pg_database_size(datname) DESC;

-- Table sizes including indexes
SELECT
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
  pg_size_pretty(pg_relation_size(schemaname || '.' || tablename))       AS table_size,
  pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename))        AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;

-- Cache hit rate (aim for > 99%)
SELECT
  sum(heap_blks_hit)  AS heap_hit,
  sum(heap_blks_read) AS heap_read,
  round(sum(heap_blks_hit)::NUMERIC /
    NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) * 100, 2) AS cache_hit_pct
FROM pg_statio_user_tables;

-- Index hit rate (aim for > 95%)
SELECT
  tablename,
  round(idx_scan::NUMERIC / NULLIF(seq_scan + idx_scan, 0) * 100, 2) AS index_usage_pct,
  n_live_tup AS row_count
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;

Extensions Worth Installing

ExtensionPurposeInstall Command
pg_stat_statementsTrack slow query statisticsCREATE EXTENSION pg_stat_statements;
pg_trgmFuzzy/trigram text searchCREATE EXTENSION pg_trgm;
uuid-osspUUID generation (legacy; prefer pgcrypto)CREATE EXTENSION "uuid-ossp";
pgcryptogen_random_uuid(), encryption functionsCREATE EXTENSION pgcrypto;
PostGISGeospatial queries and indexesCREATE EXTENSION postgis;
hstoreKey-value pairs (use JSONB instead for new code)CREATE EXTENSION hstore;
timescaledbTime-series data, hypertables, continuous aggregatesCREATE EXTENSION timescaledb;

Need to clean up or format your SQL queries? Try our free SQL Formatter tool — instantly format, indent, and validate your PostgreSQL, MySQL, or SQLite queries online.

Bonus: Replication, Backup, Monitoring, and Security

Streaming Replication and Read Replicas

PostgreSQL's built-in streaming replication sends WAL (Write-Ahead Log) records from a primary to one or more standby servers in near real-time. Read replicas offload SELECT queries from the primary, improving throughput. Most managed PostgreSQL providers (AWS RDS, Supabase, Neon) configure this automatically.

-- postgresql.conf on primary
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1GB   # keep enough WAL for replicas to catch up

-- pg_hba.conf on primary: allow replication connections
host replication replicator 10.0.0.0/8 md5

-- Create replication user
CREATE ROLE replicator REPLICATION LOGIN PASSWORD 'secret';

-- On replica: recovery.conf (PostgreSQL < 12) or postgresql.conf (>= 12)
primary_conninfo = 'host=primary port=5432 user=replicator password=secret'
hot_standby = on   # allow SELECT queries on replica

-- Check replication lag
SELECT
  client_addr,
  state,
  sent_lsn - replay_lsn AS replication_lag_bytes
FROM pg_stat_replication;

Backup Strategies — pg_dump, pg_basebackup, PITR

# pg_dump: logical backup of a single database (safe for live DB)
pg_dump -Fc -Z9 -U postgres mydb > mydb_$(date +%Y%m%d).dump

# Restore from dump
pg_restore -U postgres -d mydb_restore mydb_20260227.dump

# pg_dumpall: dump all databases + global objects (roles, tablespaces)
pg_dumpall -U postgres > all_databases.sql

# pg_basebackup: physical backup (full cluster, for streaming replication)
pg_basebackup -h localhost -U replicator -D /backup/base -P -Xs -R

# Point-in-time Recovery (PITR)
# 1. Take a base backup
# 2. Enable WAL archiving (archive_mode = on, archive_command = ...)
# 3. Restore base backup + replay WAL up to target time

# In postgresql.conf:
archive_mode = on
archive_command = 'cp %p /mnt/wal_archive/%f'

# recovery.conf / postgresql.conf (>= 12) for PITR:
restore_command = 'cp /mnt/wal_archive/%f %p'
recovery_target_time = '2026-02-27 14:30:00'

Security — Row-Level Security (RLS) and Roles

Row-Level Security (RLS) lets you enforce data access policies at the database level. This is especially powerful in multi-tenant applications — each tenant can only see their own rows, regardless of how the application queries the database.

-- Enable RLS on a table
ALTER TABLE posts ENABLE ROW LEVEL SECURITY;

-- Policy: users can only see their own posts
CREATE POLICY posts_owner_policy ON posts
  USING (user_id = current_setting('app.current_user_id')::UUID);

-- Policy: admins can see all posts
CREATE POLICY posts_admin_policy ON posts
  TO admin_role
  USING (TRUE);

-- In your app: set the current user before queries
SET LOCAL app.current_user_id = 'user-uuid-here';
SELECT * FROM posts;  -- automatically filtered by RLS

-- Role-based access control
CREATE ROLE app_readonly;
GRANT CONNECT ON DATABASE mydb TO app_readonly;
GRANT USAGE ON SCHEMA public TO app_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_readonly;

CREATE ROLE app_readwrite;
GRANT app_readonly TO app_readwrite;
GRANT INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_readwrite;

-- Principle of least privilege: app uses limited-privilege role
-- Never connect as postgres/superuser from application code

Monitoring with pg_stat_activity and Locks

-- Active connections and long-running queries
SELECT
  pid,
  usename,
  application_name,
  state,
  now() - query_start        AS duration,
  left(query, 100)           AS query_snippet,
  wait_event_type,
  wait_event
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- Kill a long-running query (non-destructive)
SELECT pg_cancel_backend(pid)
FROM pg_stat_activity
WHERE now() - query_start > interval '5 minutes';

-- Detect lock contention
SELECT
  blocked.pid        AS blocked_pid,
  blocked_query.query AS blocked_query,
  blocking.pid       AS blocking_pid,
  blocking_query.query AS blocking_query
FROM pg_locks blocked_lock
JOIN pg_stat_activity blocked_query ON blocked_query.pid = blocked_lock.pid
JOIN pg_locks blocking_lock ON (
    blocking_lock.locktype = blocked_lock.locktype
    AND blocking_lock.granted = TRUE
    AND blocked_lock.granted = FALSE
    AND blocking_lock.relation = blocked_lock.relation
)
JOIN pg_stat_activity blocking_query ON blocking_query.pid = blocking_lock.pid
WHERE blocked_lock.granted = FALSE
LIMIT 10;

Key Takeaways

  • Use UUID or BIGINT GENERATED ALWAYS AS IDENTITY for primary keys; prefer TIMESTAMPTZ over TIMESTAMP.
  • Always use parameterized queries ($1, $2 in pg; %s in psycopg2) to prevent SQL injection.
  • EXPLAIN ANALYZE is your first debugging tool — look for Seq Scan on large tables and high row count mismatches.
  • Choose the right index type: B-tree for scalar equality/range, GIN for JSONB/arrays/FTS, BRIN for large sequential tables.
  • Use JSONB with GIN indexes for flexible schema columns; the @> containment operator is the most powerful JSONB query.
  • Window functions (ROW_NUMBER, LAG, SUM OVER) replace complex self-joins for ranking and time-series analysis.
  • PgBouncer in transaction mode is essential when using serverless functions or running many application instances.
  • pg_stat_statements + pg_stat_user_indexes give you data-driven insight into what to optimize.
𝕏 Twitterin LinkedIn
Was dit nuttig?

Blijf op de hoogte

Ontvang wekelijkse dev-tips en nieuwe tools.

Geen spam. Altijd opzegbaar.

Try These Related Tools

SQLSQL Formatter{ }JSON FormatterIDUUID Generator

Related Articles

Redis Complete Guide: Caching, Pub/Sub, Streams, and Production Patterns

Master Redis with this complete guide. Covers data types, Node.js ioredis, caching patterns, session storage, Pub/Sub, Streams, Python redis-py, rate limiting, transactions, and production setup.

API Testing: Complete Guide with cURL, Supertest, and k6

Master API testing with this complete guide. Covers HTTP methods, cURL, fetch/axios, Postman/Newman, supertest, Python httpx, mock servers, contract testing, k6 load testing, and OpenAPI documentation.

Docker Commands: Complete Guide from Basics to Production

Master Docker with this complete commands guide. Covers docker run/build/push, Dockerfile, multi-stage builds, volumes, networking, Docker Compose, security, registry, and production deployment patterns.