Guide du Rate Limiting API : strategies, algorithmes et implementation

API Rate Limiting: Strategies and Implementation Guide

Rate limiting is a critical technique for protecting APIs from abuse, ensuring fair usage, and maintaining service stability. Whether you are building a public API, a SaaS platform, or an internal microservice, understanding rate limiting strategies is essential. This guide covers the major algorithms, implementation patterns, and best practices used by production APIs in 2026.

Why Rate Limiting Matters

Without rate limiting, a single client can overwhelm your API, causing cascading failures for all users. Rate limiting serves multiple critical purposes:

Abuse prevention - Stop malicious actors from brute-forcing endpoints, scraping data, or launching DDoS attacks
Fair resource allocation - Ensure no single tenant monopolizes shared infrastructure
Cost control - Prevent unexpected cloud computing bills from traffic spikes
Service stability - Keep response times consistent under load
Compliance - Meet SLA commitments and regulatory requirements

Rate Limiting Algorithms

1. Fixed Window Counter

The simplest algorithm. Divide time into fixed windows (e.g., 1-minute intervals) and count requests in each window. When the count exceeds the limit, reject until the next window starts.

// Fixed Window Counter implementation
class FixedWindowCounter {
  private counts = new Map<string, { count: number; windowStart: number }>();
  private windowSize: number; // in milliseconds
  private limit: number;

  constructor(windowSizeMs: number, limit: number) {
    this.windowSize = windowSizeMs;
    this.limit = limit;
  }

  isAllowed(clientId: string): boolean {
    const now = Date.now();
    const windowStart = Math.floor(now / this.windowSize) * this.windowSize;
    const record = this.counts.get(clientId);

    if (!record || record.windowStart !== windowStart) {
      // New window — reset counter
      this.counts.set(clientId, { count: 1, windowStart });
      return true;
    }

    if (record.count < this.limit) {
      record.count++;
      return true;
    }

    return false; // Rate limited
  }
}

// 100 requests per minute
const limiter = new FixedWindowCounter(60_000, 100);

Pros: Simple to implement, low memory usage.Cons: Boundary burst problem — a client can make 200 requests in 2 seconds by hitting the last second of one window and the first second of the next.

2. Sliding Window Log

Stores the timestamp of every request and counts how many fall within the sliding window. This eliminates the boundary burst problem but uses more memory.

class SlidingWindowLog {
  private logs = new Map<string, number[]>();
  private windowSize: number;
  private limit: number;

  constructor(windowSizeMs: number, limit: number) {
    this.windowSize = windowSizeMs;
    this.limit = limit;
  }

  isAllowed(clientId: string): boolean {
    const now = Date.now();
    const windowStart = now - this.windowSize;

    let timestamps = this.logs.get(clientId) || [];

    // Remove expired timestamps
    timestamps = timestamps.filter(t => t > windowStart);

    if (timestamps.length < this.limit) {
      timestamps.push(now);
      this.logs.set(clientId, timestamps);
      return true;
    }

    this.logs.set(clientId, timestamps);
    return false;
  }
}

// 100 requests per 60 seconds (true sliding window)
const limiter = new SlidingWindowLog(60_000, 100);

Pros: Precise, no boundary burst issue.Cons: Higher memory usage (stores every timestamp), not ideal for high-traffic APIs.

3. Sliding Window Counter

A hybrid approach that combines the low memory of fixed windows with the accuracy of sliding windows. It calculates a weighted count based on the overlap between the current and previous windows.

class SlidingWindowCounter {
  private windows = new Map<string, { prev: number; curr: number; prevStart: number; currStart: number }>();
  private windowSize: number;
  private limit: number;

  constructor(windowSizeMs: number, limit: number) {
    this.windowSize = windowSizeMs;
    this.limit = limit;
  }

  isAllowed(clientId: string): boolean {
    const now = Date.now();
    const currStart = Math.floor(now / this.windowSize) * this.windowSize;
    const prevStart = currStart - this.windowSize;
    let record = this.windows.get(clientId);

    if (!record || record.currStart !== currStart) {
      // Slide window
      record = {
        prev: record?.currStart === prevStart ? record.curr : 0,
        curr: 0,
        prevStart,
        currStart,
      };
    }

    // Weight the previous window by how much of it overlaps
    const elapsed = now - currStart;
    const weight = 1 - elapsed / this.windowSize;
    const estimatedCount = record.prev * weight + record.curr;

    if (estimatedCount < this.limit) {
      record.curr++;
      this.windows.set(clientId, record);
      return true;
    }

    return false;
  }
}

Pros: Low memory (only 2 counters per client), smooth rate limiting, no bursts.Cons: Slightly approximate count (but close enough for production use).

4. Token Bucket

The most widely used algorithm. A bucket holds tokens that are refilled at a constant rate. Each request consumes a token. When the bucket is empty, requests are rejected. This naturally allows short bursts while maintaining an average rate.

class TokenBucket {
  private buckets = new Map<string, { tokens: number; lastRefill: number }>();
  private capacity: number;    // max tokens
  private refillRate: number;  // tokens per second

  constructor(capacity: number, refillRate: number) {
    this.capacity = capacity;
    this.refillRate = refillRate;
  }

  isAllowed(clientId: string, tokensNeeded = 1): boolean {
    const now = Date.now();
    let bucket = this.buckets.get(clientId);

    if (!bucket) {
      bucket = { tokens: this.capacity, lastRefill: now };
      this.buckets.set(clientId, bucket);
    }

    // Refill tokens based on elapsed time
    const elapsed = (now - bucket.lastRefill) / 1000;
    bucket.tokens = Math.min(
      this.capacity,
      bucket.tokens + elapsed * this.refillRate
    );
    bucket.lastRefill = now;

    if (bucket.tokens >= tokensNeeded) {
      bucket.tokens -= tokensNeeded;
      return true;
    }

    return false;
  }
}

// 10 tokens max, refill 2 tokens per second
// Allows bursts of 10, sustained rate of 2/sec
const limiter = new TokenBucket(10, 2);

Pros: Allows controlled bursts, smooth rate limiting, intuitive to configure.Cons: Slightly more complex than fixed window.

5. Leaky Bucket

Similar to token bucket, but requests are processed at a constant rate regardless of arrival pattern. Incoming requests are queued, and the queue drains at a fixed rate. Overflow is rejected.

class LeakyBucket {
  private buckets = new Map<string, { queue: number; lastDrain: number }>();
  private capacity: number;   // max queue size
  private drainRate: number;  // requests processed per second

  constructor(capacity: number, drainRate: number) {
    this.capacity = capacity;
    this.drainRate = drainRate;
  }

  isAllowed(clientId: string): boolean {
    const now = Date.now();
    let bucket = this.buckets.get(clientId);

    if (!bucket) {
      bucket = { queue: 0, lastDrain: now };
      this.buckets.set(clientId, bucket);
    }

    // Drain queue based on elapsed time
    const elapsed = (now - bucket.lastDrain) / 1000;
    bucket.queue = Math.max(0, bucket.queue - elapsed * this.drainRate);
    bucket.lastDrain = now;

    if (bucket.queue < this.capacity) {
      bucket.queue += 1;
      return true;
    }

    return false;
  }
}

// Queue up to 20 requests, process 5 per second
const limiter = new LeakyBucket(20, 5);

Pros: Produces a smooth, constant output rate. Prevents bursts.Cons: No burst allowance, potential latency from queuing.

Algorithm Comparison

Algorithm	Memory	Accuracy	Burst Handling	Best For
Fixed Window	Very Low	Low	Edge bursts	Simple internal APIs
Sliding Window Log	High	Exact	No bursts	Low-volume, strict APIs
Sliding Window Counter	Low	Near-exact	Smooth	Most APIs (recommended)
Token Bucket	Low	Good	Controlled bursts	Public APIs, CDNs
Leaky Bucket	Low	Good	No bursts	Message queues, streaming

HTTP Response Headers

Well-designed APIs communicate rate limit status through standard HTTP headers. This allows clients to implement adaptive behavior and avoid unnecessary rejections.

// Standard rate limit headers (RFC 6585 / draft-ietf-httpapi-ratelimit-headers)
HTTP/1.1 200 OK
X-RateLimit-Limit: 100        // Max requests per window
X-RateLimit-Remaining: 42     // Requests remaining in current window
X-RateLimit-Reset: 1708646400 // Unix timestamp when window resets
Retry-After: 30               // Seconds to wait (on 429 response)

// When rate limited
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708646400

{
  "error": "rate_limit_exceeded",
  "message": "Too many requests. Please retry after 30 seconds.",
  "retryAfter": 30
}

Express.js Middleware Implementation

Here is a production-ready rate limiting middleware for Express.js using the token bucket algorithm with Redis for distributed state.

import { Request, Response, NextFunction } from "express";

interface RateLimitConfig {
  windowMs: number;
  max: number;
  keyGenerator?: (req: Request) => string;
  message?: string;
}

function rateLimit(config: RateLimitConfig) {
  const {
    windowMs,
    max,
    keyGenerator = (req) => req.ip || "unknown",
    message = "Too many requests, please try again later.",
  } = config;

  const store = new Map<string, { count: number; resetTime: number }>();

  return (req: Request, res: Response, next: NextFunction) => {
    const key = keyGenerator(req);
    const now = Date.now();
    let record = store.get(key);

    if (!record || now > record.resetTime) {
      record = { count: 0, resetTime: now + windowMs };
      store.set(key, record);
    }

    record.count++;
    const remaining = Math.max(0, max - record.count);
    const resetSeconds = Math.ceil((record.resetTime - now) / 1000);

    // Set rate limit headers
    res.set("X-RateLimit-Limit", String(max));
    res.set("X-RateLimit-Remaining", String(remaining));
    res.set("X-RateLimit-Reset", String(Math.ceil(record.resetTime / 1000)));

    if (record.count > max) {
      res.set("Retry-After", String(resetSeconds));
      return res.status(429).json({
        error: "rate_limit_exceeded",
        message,
        retryAfter: resetSeconds,
      });
    }

    next();
  };
}

// Usage
app.use("/api/", rateLimit({
  windowMs: 15 * 60 * 1000,  // 15 minutes
  max: 100,                   // 100 requests per window
}));

// Stricter limit for auth endpoints
app.use("/api/auth/", rateLimit({
  windowMs: 60 * 1000,  // 1 minute
  max: 5,               // 5 attempts per minute
  message: "Too many login attempts. Please wait before trying again.",
}));

Distributed Rate Limiting with Redis

For applications running multiple instances behind a load balancer, in-memory rate limiting does not work because each instance has its own counter. Redis provides a shared, atomic counter.

import Redis from "ioredis";

const redis = new Redis();

async function slidingWindowRateLimit(
  clientId: string,
  limit: number,
  windowSeconds: number
): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
  const key = `ratelimit:${clientId}`;
  const now = Date.now();
  const windowStart = now - windowSeconds * 1000;

  // Atomic Redis operations using a pipeline
  const pipeline = redis.pipeline();
  pipeline.zremrangebyscore(key, 0, windowStart); // Remove expired entries
  pipeline.zadd(key, now, `${now}-${Math.random()}`); // Add current request
  pipeline.zcard(key); // Count requests in window
  pipeline.expire(key, windowSeconds); // Set TTL for cleanup

  const results = await pipeline.exec();
  const count = results?.[2]?.[1] as number;

  const allowed = count <= limit;
  const remaining = Math.max(0, limit - count);
  const resetAt = Math.ceil((now + windowSeconds * 1000) / 1000);

  if (!allowed) {
    // Remove the request we just added since it was rejected
    await redis.zrem(key, `${now}-${Math.random()}`);
  }

  return { allowed, remaining, resetAt };
}

// Usage
const result = await slidingWindowRateLimit("user:123", 100, 60);
if (!result.allowed) {
  res.status(429).json({ error: "rate_limit_exceeded" });
}

Rate Limiting Strategies by Tier

Most production APIs implement tiered rate limits based on authentication level and subscription plan:

Tier	Rate Limit	Burst	Identification
Anonymous	60/hour	10/minute	IP address
Free	1,000/hour	100/minute	API key
Pro	10,000/hour	500/minute	API key
Enterprise	100,000/hour	5,000/minute	API key + IP

Client-Side Rate Limit Handling

Well-behaved API clients should detect rate limits and implement exponential backoff with jitter to avoid thundering herd problems.

async function fetchWithRetry(
  url: string,
  options: RequestInit = {},
  maxRetries = 3
): Promise<Response> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status !== 429) {
      return response;
    }

    if (attempt === maxRetries) {
      throw new Error("Rate limit exceeded after max retries");
    }

    // Get retry delay from header or use exponential backoff
    const retryAfter = response.headers.get("Retry-After");
    let delayMs: number;

    if (retryAfter) {
      delayMs = parseInt(retryAfter, 10) * 1000;
    } else {
      // Exponential backoff with jitter
      const baseDelay = Math.pow(2, attempt) * 1000;
      const jitter = Math.random() * 1000;
      delayMs = baseDelay + jitter;
    }

    console.log(`Rate limited. Retrying in ${delayMs}ms (attempt ${attempt + 1})`);
    await new Promise(resolve => setTimeout(resolve, delayMs));
  }

  throw new Error("Unreachable");
}

Best Practices

Always return rate limit headers - Include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After in every response
Use 429 status code - Never return 200 with an error body for rate-limited requests
Identify by API key, not just IP - Multiple users may share an IP (corporate NAT, VPN)
Implement multiple tiers - Different endpoints need different limits (auth vs read vs write)
Document your limits - Publish rate limits in your API documentation
Use Redis for distributed systems - In-memory counters do not work across multiple instances
Consider burst allowance - Token bucket is ideal because it allows short bursts without exceeding average rate
Log rate limit events - Monitor who gets rate-limited and adjust limits accordingly
Exempt health checks - Do not rate-limit monitoring and health check endpoints
Graceful degradation - Consider returning cached responses instead of hard rejecting

Frequently Asked Questions

What is the difference between rate limiting and throttling?

Rate limiting rejects excess requests immediately with a 429 status code. Throttling delays (queues) excess requests and processes them later at a controlled rate. Rate limiting is simpler and more common for APIs, while throttling is used in streaming and real-time systems.

Should I rate limit by IP address or API key?

Use API keys when available, as multiple users may share the same IP address (corporate networks, VPNs). Fall back to IP-based limiting for unauthenticated endpoints. For maximum protection, combine both: per-key limits for fairness and per-IP limits for abuse prevention.

Which algorithm should I choose?

For most APIs, the sliding window counter provides the best balance of accuracy and simplicity. If you need burst allowance (e.g., a public API), use the token bucket. Use the fixed window only for simple internal services where boundary bursts are acceptable.

How do I handle rate limiting in a microservices architecture?

Use a centralized rate limiter at the API gateway level (e.g., Kong, NGINX, AWS API Gateway) for global limits. For service-to-service communication, implement local rate limiters with circuit breakers. Redis or Memcached provides shared state across instances.

What rate limits should I set for my API?

Start with generous limits and tighten based on actual usage patterns. Common starting points are 100 requests per 15 minutes for authenticated users and 20 requests per 15 minutes for anonymous access. Monitor your infrastructure capacity and adjust accordingly.