SQLクエリ最適化：インデックス、EXPLAIN、N+1問題の解決

遅いクエリはWebアプリケーションで最も一般的なパフォーマンスボトルネックの一つです。

B-Treeインデックス：基礎

B-Treeはデフォルトのインデックスタイプで、等値、範囲クエリ、ソートを効率的に処理します。

-- Understanding B-Tree indexes (default in PostgreSQL, MySQL, SQLite)
-- An index is a separate data structure that maps column values → row locations

-- Without index: full table scan (reads every row)
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Seq Scan on orders  (cost=0.00..1842.00 rows=18 width=96)

-- Create a B-Tree index on the foreign key
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);

-- With index: index scan (jumps directly to matching rows)
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Index Scan using idx_orders_customer_id on orders
-- (cost=0.29..8.55 rows=18 width=96)

-- Composite index: order matters! (customer_id, created_at) covers:
--   WHERE customer_id = ?
--   WHERE customer_id = ? AND created_at > ?
-- But NOT:  WHERE created_at > ?  (left-prefix rule)
CREATE INDEX idx_orders_customer_date
  ON orders(customer_id, created_at DESC);

-- Partial index: only index rows matching a condition
-- Huge win when you frequently query a small subset
CREATE INDEX idx_orders_pending
  ON orders(created_at)
  WHERE status = 'pending';   -- only ~5% of rows get indexed

GINと式インデックス

GINはJSONB列、配列、全文検索などの多値データに最適化されています。

-- GIN (Generalized Inverted Index): for JSONB, arrays, full-text search
-- Much faster than B-Tree for containment queries

-- JSONB column with GIN index
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);

-- Containment query: "find products where attributes includes {color: 'red'}"
SELECT * FROM products
WHERE attributes @> '{"color": "red"}';   -- uses GIN index

-- Full-text search with GIN
ALTER TABLE articles ADD COLUMN tsv tsvector
  GENERATED ALWAYS AS (
    to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,''))
  ) STORED;

CREATE INDEX idx_articles_fts ON articles USING GIN (tsv);

SELECT title, ts_rank(tsv, q) AS rank
FROM articles, to_tsquery('english', 'postgres & performance') q
WHERE tsv @@ q
ORDER BY rank DESC
LIMIT 10;

-- Expression index: index on a computed value
CREATE INDEX idx_users_email_lower ON users (lower(email));
SELECT * FROM users WHERE lower(email) = lower('User@Example.com');

EXPLAIN ANALYZEの読み方

EXPLAIN ANALYZEはクエリを実行し、実際のタイミングと行数を表示します。

-- EXPLAIN ANALYZE: actual execution stats (not just estimates)
-- Always use ANALYZE to see real row counts and timing

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
  c.name,
  COUNT(o.id) AS order_count,
  SUM(o.total) AS revenue
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.created_at >= '2025-01-01'
  AND o.status = 'completed'
GROUP BY c.id, c.name
ORDER BY revenue DESC
LIMIT 20;

-- Reading the output:
-- "Seq Scan"     → no index used, reading all rows   (BAD for large tables)
-- "Index Scan"   → using B-Tree index                (good)
-- "Bitmap Scan"  → multiple ranges, batched          (good for many rows)
-- "Hash Join"    → join via hash table               (good for large datasets)
-- "Nested Loop"  → row-by-row join                   (good for small inner set)
-- "actual time=X..Y"  → X=first row, Y=last row (ms)
-- "rows=N"       → if estimate vs actual differ a lot → stale stats!

-- Fix stale statistics:
ANALYZE orders;                          -- update stats for one table
ANALYZE;                                 -- update all tables
-- Or set autovacuum_analyze_scale_factor = 0.01 in postgresql.conf

N+1問題：検出と修正

N+1問題はコードがリストを取得するクエリを1回実行し、各アイテムに追加クエリをN回実行するときに発生します。

-- N+1 Problem: the silent query killer in ORMs

-- BAD: N+1 in pseudo-ORM code
-- for each user (1 query):
--   fetch user's orders (N queries)
-- Total: 1 + N queries for N users

-- Example: fetching 100 users + their order counts = 101 queries
const users = await db.query('SELECT * FROM users LIMIT 100');
for (const user of users) {
  const orders = await db.query(
    'SELECT COUNT(*) FROM orders WHERE customer_id = $1',
    [user.id]
  );
  user.orderCount = orders[0].count;
}

-- GOOD: Single query with JOIN or subquery
SELECT
  u.id,
  u.name,
  u.email,
  COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.customer_id = u.id
GROUP BY u.id, u.name, u.email
LIMIT 100;

-- GOOD: Use IN clause to batch (avoid when set is large)
-- Fetch users first, then batch-load orders
SELECT * FROM orders
WHERE customer_id = ANY($1::int[]);  -- pass array of user IDs

-- GOOD: Prisma/TypeORM eager loading
const users = await prisma.user.findMany({
  include: { orders: true },    -- generates single LEFT JOIN query
  take: 100,
});

クエリ書き換えパターン

多くの遅いクエリは書き換えで大幅に高速化できます：相関サブクエリをJOINで置き換え、ウィンドウ関数を使用。

-- Query rewrites that dramatically improve performance

-- 1. Replace correlated subquery with JOIN
-- SLOW: correlated subquery executes for each outer row
SELECT name,
  (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS cnt
FROM customers c;

-- FAST: single aggregation pass
SELECT c.name, COUNT(o.id) AS cnt
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;

-- 2. Use window functions instead of self-join
-- SLOW: self-join to find each employee's department average
SELECT e.name, e.salary,
  (SELECT AVG(salary) FROM employees e2 WHERE e2.dept = e.dept) AS dept_avg
FROM employees e;

-- FAST: window function scans table once
SELECT name, salary,
  AVG(salary) OVER (PARTITION BY dept) AS dept_avg
FROM employees;

-- 3. Avoid SELECT * in subqueries (forces extra columns)
-- SLOW
SELECT * FROM (SELECT * FROM large_table) sub WHERE id > 1000;

-- FAST: only select needed columns
SELECT id, name FROM large_table WHERE id > 1000;

-- 4. Use EXISTS instead of COUNT for existence checks
-- SLOW: scans all matching rows
SELECT * FROM products p WHERE (SELECT COUNT(*) FROM reviews r WHERE r.product_id = p.id) > 0;

-- FAST: stops at first match
SELECT * FROM products p WHERE EXISTS (SELECT 1 FROM reviews r WHERE r.product_id = p.id);

接続プールとバッチ操作

データベース接続はコストが高い。接続プーリング、プリペアドステートメント、バッチINSERTでスループットが向上します。

-- Connection pooling & query batching best practices

-- PostgreSQL: use PgBouncer or built-in pooling
-- Connection pool settings (nodejs pg pool)
import { Pool } from 'pg';

const pool = new Pool({
  max: 20,                // max connections (default: 10)
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
  // keepAlive improves performance for long-lived processes
  keepAlive: true,
  keepAliveInitialDelayMillis: 10000,
});

// Prepared statements: parse once, execute many times
await pool.query({
  name: 'get-user-orders',
  text: 'SELECT * FROM orders WHERE customer_id = $1 AND status = $2',
  values: [customerId, 'completed'],
});

-- Batch INSERT with single statement (much faster than loop)
INSERT INTO events (user_id, event_type, created_at)
SELECT * FROM UNNEST(
  $1::int[],
  $2::text[],
  $3::timestamptz[]
);
-- Inserts thousands of rows in one round-trip

-- Use COPY for bulk data loading (fastest option)
COPY orders (customer_id, total, status, created_at)
FROM '/tmp/orders.csv'
WITH (FORMAT CSV, HEADER true);

インデックスタイプ比較

Index Type	Best For	Operators	Write Overhead	Notes
B-Tree	Equality, range, sort	=, <, >, BETWEEN, LIKE prefix	Low	Default; most queries
GIN	Multi-valued (JSONB, arrays)	@>, <@, @@, ANY	High write	Full-text, JSONB queries
GiST	Geometric, full-text	&&, @>, <@, ~~	Medium	Nearest-neighbor, overlap
BRIN	Sequential data (timestamps)	=, <, >	Very low	Huge tables, append-only
Hash	Equality only	=	Low	Faster than B-Tree for =, no range
Partial	Subset of rows	Any	Low	WHERE clause filter in index def
Expression	Computed values	Any	Medium	lower(email), date_trunc()

よくある質問

どのクエリを最初に最適化すべきか分かりますか？

スロークエリログを有効にし、pg_stat_statementsを使用してください。

インデックスを追加するとクエリが遅くなることがあるのはなぜですか？

インデックススキャンにはオーバーヘッドがあり、プランナーは場合によってはシーケンシャルスキャンを好む場合があります。

EXPLAINとEXPLAIN ANALYZEの違いは？

EXPLAINは推定プランを表示し、EXPLAIN ANALYZEは実際にクエリを実行します。

ORMでN+1を修正するには？

Eager Loading（include/relations）またはDataLoaderをバッチング用に使用してください。

SQLクエリ最適化：データベースを高速化する15のテクニック

SQLクエリ最適化：インデックス、EXPLAIN、N+1問題の解決

B-Treeインデックス：基礎

GINと式インデックス

EXPLAIN ANALYZEの読み方

N+1問題：検出と修正

クエリ書き換えパターン

接続プールとバッチ操作

インデックスタイプ比較

よくある質問

どのクエリを最初に最適化すべきか分かりますか？

インデックスを追加するとクエリが遅くなることがあるのはなぜですか？

EXPLAINとEXPLAIN ANALYZEの違いは？

ORMでN+1を修正するには？

関連ツール

最新情報を受け取る

Try These Related Tools

Related Articles

PostgreSQLパフォーマンスチューニング：インデックス、クエリ最適化、設定

SQL vs NoSQL：正しいデータベースを選ぶ完全ガイド

SQL Joins図解ガイド：INNER、LEFT、RIGHT、FULL、CROSS