Optimasi kueri SQL: indeks, EXPLAIN, dan pencegahan N+1

Kueri lambat adalah salah satu hambatan kinerja paling umum dalam aplikasi web.

Indeks B-Tree: Fondasi

B-Tree adalah tipe indeks default yang secara efisien menangani kesetaraan, rentang, dan pengurutan.

-- Understanding B-Tree indexes (default in PostgreSQL, MySQL, SQLite)
-- An index is a separate data structure that maps column values → row locations

-- Without index: full table scan (reads every row)
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Seq Scan on orders  (cost=0.00..1842.00 rows=18 width=96)

-- Create a B-Tree index on the foreign key
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);

-- With index: index scan (jumps directly to matching rows)
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Index Scan using idx_orders_customer_id on orders
-- (cost=0.29..8.55 rows=18 width=96)

-- Composite index: order matters! (customer_id, created_at) covers:
--   WHERE customer_id = ?
--   WHERE customer_id = ? AND created_at > ?
-- But NOT:  WHERE created_at > ?  (left-prefix rule)
CREATE INDEX idx_orders_customer_date
  ON orders(customer_id, created_at DESC);

-- Partial index: only index rows matching a condition
-- Huge win when you frequently query a small subset
CREATE INDEX idx_orders_pending
  ON orders(created_at)
  WHERE status = 'pending';   -- only ~5% of rows get indexed

Indeks GIN dan ekspresi

GIN dioptimalkan untuk data multi-nilai seperti kolom JSONB, array, dan pencarian teks lengkap.

-- GIN (Generalized Inverted Index): for JSONB, arrays, full-text search
-- Much faster than B-Tree for containment queries

-- JSONB column with GIN index
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);

-- Containment query: "find products where attributes includes {color: 'red'}"
SELECT * FROM products
WHERE attributes @> '{"color": "red"}';   -- uses GIN index

-- Full-text search with GIN
ALTER TABLE articles ADD COLUMN tsv tsvector
  GENERATED ALWAYS AS (
    to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,''))
  ) STORED;

CREATE INDEX idx_articles_fts ON articles USING GIN (tsv);

SELECT title, ts_rank(tsv, q) AS rank
FROM articles, to_tsquery('english', 'postgres & performance') q
WHERE tsv @@ q
ORDER BY rank DESC
LIMIT 10;

-- Expression index: index on a computed value
CREATE INDEX idx_users_email_lower ON users (lower(email));
SELECT * FROM users WHERE lower(email) = lower('User@Example.com');

Membaca EXPLAIN ANALYZE

EXPLAIN ANALYZE menjalankan kueri dan menampilkan waktu aktual dan jumlah baris.

-- EXPLAIN ANALYZE: actual execution stats (not just estimates)
-- Always use ANALYZE to see real row counts and timing

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
  c.name,
  COUNT(o.id) AS order_count,
  SUM(o.total) AS revenue
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.created_at >= '2025-01-01'
  AND o.status = 'completed'
GROUP BY c.id, c.name
ORDER BY revenue DESC
LIMIT 20;

-- Reading the output:
-- "Seq Scan"     → no index used, reading all rows   (BAD for large tables)
-- "Index Scan"   → using B-Tree index                (good)
-- "Bitmap Scan"  → multiple ranges, batched          (good for many rows)
-- "Hash Join"    → join via hash table               (good for large datasets)
-- "Nested Loop"  → row-by-row join                   (good for small inner set)
-- "actual time=X..Y"  → X=first row, Y=last row (ms)
-- "rows=N"       → if estimate vs actual differ a lot → stale stats!

-- Fix stale statistics:
ANALYZE orders;                          -- update stats for one table
ANALYZE;                                 -- update all tables
-- Or set autovacuum_analyze_scale_factor = 0.01 in postgresql.conf

Masalah N+1: deteksi dan perbaikan

Masalah N+1 terjadi ketika kode menjalankan satu kueri untuk daftar lalu N kueri tambahan untuk setiap item.

-- N+1 Problem: the silent query killer in ORMs

-- BAD: N+1 in pseudo-ORM code
-- for each user (1 query):
--   fetch user's orders (N queries)
-- Total: 1 + N queries for N users

-- Example: fetching 100 users + their order counts = 101 queries
const users = await db.query('SELECT * FROM users LIMIT 100');
for (const user of users) {
  const orders = await db.query(
    'SELECT COUNT(*) FROM orders WHERE customer_id = $1',
    [user.id]
  );
  user.orderCount = orders[0].count;
}

-- GOOD: Single query with JOIN or subquery
SELECT
  u.id,
  u.name,
  u.email,
  COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.customer_id = u.id
GROUP BY u.id, u.name, u.email
LIMIT 100;

-- GOOD: Use IN clause to batch (avoid when set is large)
-- Fetch users first, then batch-load orders
SELECT * FROM orders
WHERE customer_id = ANY($1::int[]);  -- pass array of user IDs

-- GOOD: Prisma/TypeORM eager loading
const users = await prisma.user.findMany({
  include: { orders: true },    -- generates single LEFT JOIN query
  take: 100,
});

Pola penulisan ulang kueri

Banyak kueri lambat dapat dipercepat secara dramatis dengan penulisan ulang.

-- Query rewrites that dramatically improve performance

-- 1. Replace correlated subquery with JOIN
-- SLOW: correlated subquery executes for each outer row
SELECT name,
  (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS cnt
FROM customers c;

-- FAST: single aggregation pass
SELECT c.name, COUNT(o.id) AS cnt
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;

-- 2. Use window functions instead of self-join
-- SLOW: self-join to find each employee's department average
SELECT e.name, e.salary,
  (SELECT AVG(salary) FROM employees e2 WHERE e2.dept = e.dept) AS dept_avg
FROM employees e;

-- FAST: window function scans table once
SELECT name, salary,
  AVG(salary) OVER (PARTITION BY dept) AS dept_avg
FROM employees;

-- 3. Avoid SELECT * in subqueries (forces extra columns)
-- SLOW
SELECT * FROM (SELECT * FROM large_table) sub WHERE id > 1000;

-- FAST: only select needed columns
SELECT id, name FROM large_table WHERE id > 1000;

-- 4. Use EXISTS instead of COUNT for existence checks
-- SLOW: scans all matching rows
SELECT * FROM products p WHERE (SELECT COUNT(*) FROM reviews r WHERE r.product_id = p.id) > 0;

-- FAST: stops at first match
SELECT * FROM products p WHERE EXISTS (SELECT 1 FROM reviews r WHERE r.product_id = p.id);

Pooling koneksi dan operasi batch

Koneksi database mahal. Pooling, pernyataan siap pakai, dan INSERT batch meningkatkan throughput.

-- Connection pooling & query batching best practices

-- PostgreSQL: use PgBouncer or built-in pooling
-- Connection pool settings (nodejs pg pool)
import { Pool } from 'pg';

const pool = new Pool({
  max: 20,                // max connections (default: 10)
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
  // keepAlive improves performance for long-lived processes
  keepAlive: true,
  keepAliveInitialDelayMillis: 10000,
});

// Prepared statements: parse once, execute many times
await pool.query({
  name: 'get-user-orders',
  text: 'SELECT * FROM orders WHERE customer_id = $1 AND status = $2',
  values: [customerId, 'completed'],
});

-- Batch INSERT with single statement (much faster than loop)
INSERT INTO events (user_id, event_type, created_at)
SELECT * FROM UNNEST(
  $1::int[],
  $2::text[],
  $3::timestamptz[]
);
-- Inserts thousands of rows in one round-trip

-- Use COPY for bulk data loading (fastest option)
COPY orders (customer_id, total, status, created_at)
FROM '/tmp/orders.csv'
WITH (FORMAT CSV, HEADER true);

Perbandingan tipe indeks

Index Type	Best For	Operators	Write Overhead	Notes
B-Tree	Equality, range, sort	=, <, >, BETWEEN, LIKE prefix	Low	Default; most queries
GIN	Multi-valued (JSONB, arrays)	@>, <@, @@, ANY	High write	Full-text, JSONB queries
GiST	Geometric, full-text	&&, @>, <@, ~~	Medium	Nearest-neighbor, overlap
BRIN	Sequential data (timestamps)	=, <, >	Very low	Huge tables, append-only
Hash	Equality only	=	Low	Faster than B-Tree for =, no range
Partial	Subset of rows	Any	Low	WHERE clause filter in index def
Expression	Computed values	Any	Medium	lower(email), date_trunc()

Pertanyaan yang sering diajukan

Bagaimana cara mengetahui kueri mana yang harus dioptimalkan lebih dulu?

Aktifkan pencatatan kueri lambat dan gunakan pg_stat_statements.

Mengapa menambahkan indeks terkadang membuat kueri lebih lambat?

Pemindaian indeks memiliki overhead; perencana mungkin memilih pemindaian berurutan.

Apa perbedaan antara EXPLAIN dan EXPLAIN ANALYZE?

EXPLAIN menampilkan rencana perkiraan, EXPLAIN ANALYZE benar-benar menjalankan kueri.

Bagaimana cara memperbaiki N+1 di ORM?

Gunakan eager loading (include/relations) atau DataLoader untuk batching.

Alat terkait

json-formatter regex-tester base64-encoder jwt-decoder url-encoder

Optimasi Query SQL: 15 Teknik untuk Mempercepat Database Anda

Optimasi kueri SQL: indeks, EXPLAIN, dan pencegahan N+1

Indeks B-Tree: Fondasi

Indeks GIN dan ekspresi

Membaca EXPLAIN ANALYZE

Masalah N+1: deteksi dan perbaikan

Pola penulisan ulang kueri

Pooling koneksi dan operasi batch

Perbandingan tipe indeks

Pertanyaan yang sering diajukan

Bagaimana cara mengetahui kueri mana yang harus dioptimalkan lebih dulu?

Mengapa menambahkan indeks terkadang membuat kueri lebih lambat?

Apa perbedaan antara EXPLAIN dan EXPLAIN ANALYZE?

Bagaimana cara memperbaiki N+1 di ORM?

Alat terkait

Stay Updated

Sponsor this article

Coba Alat Terkait

Artikel Terkait

PostgreSQL Performance Tuning: Indexing dan Optimasi Query

SQL vs NoSQL: Panduan Lengkap Memilih Database yang Tepat

SQL Joins Dijelaskan Secara Visual: INNER, LEFT, RIGHT, FULL, CROSS