Optimisation des requêtes SQL : index, EXPLAIN et prévention du N+1
Les requêtes lentes sont l'un des goulots d'étranglement les plus courants dans les applications web.
Index B-Tree : les bases
Le B-Tree est le type d'index par défaut, gérant efficacement l'égalité, les plages et le tri.
-- Understanding B-Tree indexes (default in PostgreSQL, MySQL, SQLite)
-- An index is a separate data structure that maps column values → row locations
-- Without index: full table scan (reads every row)
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Seq Scan on orders (cost=0.00..1842.00 rows=18 width=96)
-- Create a B-Tree index on the foreign key
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);
-- With index: index scan (jumps directly to matching rows)
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Index Scan using idx_orders_customer_id on orders
-- (cost=0.29..8.55 rows=18 width=96)
-- Composite index: order matters! (customer_id, created_at) covers:
-- WHERE customer_id = ?
-- WHERE customer_id = ? AND created_at > ?
-- But NOT: WHERE created_at > ? (left-prefix rule)
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, created_at DESC);
-- Partial index: only index rows matching a condition
-- Huge win when you frequently query a small subset
CREATE INDEX idx_orders_pending
ON orders(created_at)
WHERE status = 'pending'; -- only ~5% of rows get indexedIndex GIN et index d'expression
GIN est optimisé pour les données multi-valuées comme JSONB, les tableaux et la recherche plein texte.
-- GIN (Generalized Inverted Index): for JSONB, arrays, full-text search
-- Much faster than B-Tree for containment queries
-- JSONB column with GIN index
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);
-- Containment query: "find products where attributes includes {color: 'red'}"
SELECT * FROM products
WHERE attributes @> '{"color": "red"}'; -- uses GIN index
-- Full-text search with GIN
ALTER TABLE articles ADD COLUMN tsv tsvector
GENERATED ALWAYS AS (
to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,''))
) STORED;
CREATE INDEX idx_articles_fts ON articles USING GIN (tsv);
SELECT title, ts_rank(tsv, q) AS rank
FROM articles, to_tsquery('english', 'postgres & performance') q
WHERE tsv @@ q
ORDER BY rank DESC
LIMIT 10;
-- Expression index: index on a computed value
CREATE INDEX idx_users_email_lower ON users (lower(email));
SELECT * FROM users WHERE lower(email) = lower('User@Example.com');Lire EXPLAIN ANALYZE
EXPLAIN ANALYZE exécute la requête et montre les temps réels et les comptages de lignes.
-- EXPLAIN ANALYZE: actual execution stats (not just estimates)
-- Always use ANALYZE to see real row counts and timing
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
c.name,
COUNT(o.id) AS order_count,
SUM(o.total) AS revenue
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.created_at >= '2025-01-01'
AND o.status = 'completed'
GROUP BY c.id, c.name
ORDER BY revenue DESC
LIMIT 20;
-- Reading the output:
-- "Seq Scan" → no index used, reading all rows (BAD for large tables)
-- "Index Scan" → using B-Tree index (good)
-- "Bitmap Scan" → multiple ranges, batched (good for many rows)
-- "Hash Join" → join via hash table (good for large datasets)
-- "Nested Loop" → row-by-row join (good for small inner set)
-- "actual time=X..Y" → X=first row, Y=last row (ms)
-- "rows=N" → if estimate vs actual differ a lot → stale stats!
-- Fix stale statistics:
ANALYZE orders; -- update stats for one table
ANALYZE; -- update all tables
-- Or set autovacuum_analyze_scale_factor = 0.01 in postgresql.confProblème N+1 : détection et corrections
Le problème N+1 survient quand le code exécute une requête pour une liste puis N requêtes supplémentaires.
-- N+1 Problem: the silent query killer in ORMs
-- BAD: N+1 in pseudo-ORM code
-- for each user (1 query):
-- fetch user's orders (N queries)
-- Total: 1 + N queries for N users
-- Example: fetching 100 users + their order counts = 101 queries
const users = await db.query('SELECT * FROM users LIMIT 100');
for (const user of users) {
const orders = await db.query(
'SELECT COUNT(*) FROM orders WHERE customer_id = $1',
[user.id]
);
user.orderCount = orders[0].count;
}
-- GOOD: Single query with JOIN or subquery
SELECT
u.id,
u.name,
u.email,
COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.customer_id = u.id
GROUP BY u.id, u.name, u.email
LIMIT 100;
-- GOOD: Use IN clause to batch (avoid when set is large)
-- Fetch users first, then batch-load orders
SELECT * FROM orders
WHERE customer_id = ANY($1::int[]); -- pass array of user IDs
-- GOOD: Prisma/TypeORM eager loading
const users = await prisma.user.findMany({
include: { orders: true }, -- generates single LEFT JOIN query
take: 100,
});Modèles de réécriture de requêtes
Beaucoup de requêtes lentes peuvent être accélérées : remplacer les sous-requêtes par des JOINs, utiliser les fonctions de fenêtrage.
-- Query rewrites that dramatically improve performance
-- 1. Replace correlated subquery with JOIN
-- SLOW: correlated subquery executes for each outer row
SELECT name,
(SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS cnt
FROM customers c;
-- FAST: single aggregation pass
SELECT c.name, COUNT(o.id) AS cnt
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;
-- 2. Use window functions instead of self-join
-- SLOW: self-join to find each employee's department average
SELECT e.name, e.salary,
(SELECT AVG(salary) FROM employees e2 WHERE e2.dept = e.dept) AS dept_avg
FROM employees e;
-- FAST: window function scans table once
SELECT name, salary,
AVG(salary) OVER (PARTITION BY dept) AS dept_avg
FROM employees;
-- 3. Avoid SELECT * in subqueries (forces extra columns)
-- SLOW
SELECT * FROM (SELECT * FROM large_table) sub WHERE id > 1000;
-- FAST: only select needed columns
SELECT id, name FROM large_table WHERE id > 1000;
-- 4. Use EXISTS instead of COUNT for existence checks
-- SLOW: scans all matching rows
SELECT * FROM products p WHERE (SELECT COUNT(*) FROM reviews r WHERE r.product_id = p.id) > 0;
-- FAST: stops at first match
SELECT * FROM products p WHERE EXISTS (SELECT 1 FROM reviews r WHERE r.product_id = p.id);Pooling de connexions et opérations par lots
Les connexions à la base sont coûteuses. Le pooling, les instructions préparées et les INSERT en lot améliorent le débit.
-- Connection pooling & query batching best practices
-- PostgreSQL: use PgBouncer or built-in pooling
-- Connection pool settings (nodejs pg pool)
import { Pool } from 'pg';
const pool = new Pool({
max: 20, // max connections (default: 10)
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
// keepAlive improves performance for long-lived processes
keepAlive: true,
keepAliveInitialDelayMillis: 10000,
});
// Prepared statements: parse once, execute many times
await pool.query({
name: 'get-user-orders',
text: 'SELECT * FROM orders WHERE customer_id = $1 AND status = $2',
values: [customerId, 'completed'],
});
-- Batch INSERT with single statement (much faster than loop)
INSERT INTO events (user_id, event_type, created_at)
SELECT * FROM UNNEST(
$1::int[],
$2::text[],
$3::timestamptz[]
);
-- Inserts thousands of rows in one round-trip
-- Use COPY for bulk data loading (fastest option)
COPY orders (customer_id, total, status, created_at)
FROM '/tmp/orders.csv'
WITH (FORMAT CSV, HEADER true);Comparaison des types d'index
| Index Type | Best For | Operators | Write Overhead | Notes |
|---|---|---|---|---|
| B-Tree | Equality, range, sort | =, <, >, BETWEEN, LIKE prefix | Low | Default; most queries |
| GIN | Multi-valued (JSONB, arrays) | @>, <@, @@, ANY | High write | Full-text, JSONB queries |
| GiST | Geometric, full-text | &&, @>, <@, ~~ | Medium | Nearest-neighbor, overlap |
| BRIN | Sequential data (timestamps) | =, <, > | Very low | Huge tables, append-only |
| Hash | Equality only | = | Low | Faster than B-Tree for =, no range |
| Partial | Subset of rows | Any | Low | WHERE clause filter in index def |
| Expression | Computed values | Any | Medium | lower(email), date_trunc() |
Questions fréquentes
Comment savoir quelles requêtes optimiser en premier ?
Activez le journal des requêtes lentes et utilisez pg_stat_statements.
Pourquoi un index peut-il parfois ralentir les requêtes ?
Les scans d'index ont un surcoût ; le planificateur peut préférer un scan séquentiel si plus de ~15% des lignes sont retournées.
Quelle est la différence entre EXPLAIN et EXPLAIN ANALYZE ?
EXPLAIN montre le plan estimé, EXPLAIN ANALYZE exécute réellement la requête.
Comment corriger le N+1 dans un ORM ?
Utilisez le chargement eager (include/relations) ou DataLoader pour le batching.