API Rate Limiting: Strategies and Implementation Guide
Rate limiting is a critical technique for protecting APIs from abuse, ensuring fair usage, and maintaining service stability. Whether you are building a public API, a SaaS platform, or an internal microservice, understanding rate limiting strategies is essential. This guide covers the major algorithms, implementation patterns, and best practices used by production APIs in 2026.
Why Rate Limiting Matters
Without rate limiting, a single client can overwhelm your API, causing cascading failures for all users. Rate limiting serves multiple critical purposes:
- Abuse prevention - Stop malicious actors from brute-forcing endpoints, scraping data, or launching DDoS attacks
- Fair resource allocation - Ensure no single tenant monopolizes shared infrastructure
- Cost control - Prevent unexpected cloud computing bills from traffic spikes
- Service stability - Keep response times consistent under load
- Compliance - Meet SLA commitments and regulatory requirements
Rate Limiting Algorithms
1. Fixed Window Counter
The simplest algorithm. Divide time into fixed windows (e.g., 1-minute intervals) and count requests in each window. When the count exceeds the limit, reject until the next window starts.
// Fixed Window Counter implementation
class FixedWindowCounter {
private counts = new Map<string, { count: number; windowStart: number }>();
private windowSize: number; // in milliseconds
private limit: number;
constructor(windowSizeMs: number, limit: number) {
this.windowSize = windowSizeMs;
this.limit = limit;
}
isAllowed(clientId: string): boolean {
const now = Date.now();
const windowStart = Math.floor(now / this.windowSize) * this.windowSize;
const record = this.counts.get(clientId);
if (!record || record.windowStart !== windowStart) {
// New window — reset counter
this.counts.set(clientId, { count: 1, windowStart });
return true;
}
if (record.count < this.limit) {
record.count++;
return true;
}
return false; // Rate limited
}
}
// 100 requests per minute
const limiter = new FixedWindowCounter(60_000, 100);Pros: Simple to implement, low memory usage.Cons: Boundary burst problem — a client can make 200 requests in 2 seconds by hitting the last second of one window and the first second of the next.
2. Sliding Window Log
Stores the timestamp of every request and counts how many fall within the sliding window. This eliminates the boundary burst problem but uses more memory.
class SlidingWindowLog {
private logs = new Map<string, number[]>();
private windowSize: number;
private limit: number;
constructor(windowSizeMs: number, limit: number) {
this.windowSize = windowSizeMs;
this.limit = limit;
}
isAllowed(clientId: string): boolean {
const now = Date.now();
const windowStart = now - this.windowSize;
let timestamps = this.logs.get(clientId) || [];
// Remove expired timestamps
timestamps = timestamps.filter(t => t > windowStart);
if (timestamps.length < this.limit) {
timestamps.push(now);
this.logs.set(clientId, timestamps);
return true;
}
this.logs.set(clientId, timestamps);
return false;
}
}
// 100 requests per 60 seconds (true sliding window)
const limiter = new SlidingWindowLog(60_000, 100);Pros: Precise, no boundary burst issue.Cons: Higher memory usage (stores every timestamp), not ideal for high-traffic APIs.
3. Sliding Window Counter
A hybrid approach that combines the low memory of fixed windows with the accuracy of sliding windows. It calculates a weighted count based on the overlap between the current and previous windows.
class SlidingWindowCounter {
private windows = new Map<string, { prev: number; curr: number; prevStart: number; currStart: number }>();
private windowSize: number;
private limit: number;
constructor(windowSizeMs: number, limit: number) {
this.windowSize = windowSizeMs;
this.limit = limit;
}
isAllowed(clientId: string): boolean {
const now = Date.now();
const currStart = Math.floor(now / this.windowSize) * this.windowSize;
const prevStart = currStart - this.windowSize;
let record = this.windows.get(clientId);
if (!record || record.currStart !== currStart) {
// Slide window
record = {
prev: record?.currStart === prevStart ? record.curr : 0,
curr: 0,
prevStart,
currStart,
};
}
// Weight the previous window by how much of it overlaps
const elapsed = now - currStart;
const weight = 1 - elapsed / this.windowSize;
const estimatedCount = record.prev * weight + record.curr;
if (estimatedCount < this.limit) {
record.curr++;
this.windows.set(clientId, record);
return true;
}
return false;
}
}Pros: Low memory (only 2 counters per client), smooth rate limiting, no bursts.Cons: Slightly approximate count (but close enough for production use).
4. Token Bucket
The most widely used algorithm. A bucket holds tokens that are refilled at a constant rate. Each request consumes a token. When the bucket is empty, requests are rejected. This naturally allows short bursts while maintaining an average rate.
class TokenBucket {
private buckets = new Map<string, { tokens: number; lastRefill: number }>();
private capacity: number; // max tokens
private refillRate: number; // tokens per second
constructor(capacity: number, refillRate: number) {
this.capacity = capacity;
this.refillRate = refillRate;
}
isAllowed(clientId: string, tokensNeeded = 1): boolean {
const now = Date.now();
let bucket = this.buckets.get(clientId);
if (!bucket) {
bucket = { tokens: this.capacity, lastRefill: now };
this.buckets.set(clientId, bucket);
}
// Refill tokens based on elapsed time
const elapsed = (now - bucket.lastRefill) / 1000;
bucket.tokens = Math.min(
this.capacity,
bucket.tokens + elapsed * this.refillRate
);
bucket.lastRefill = now;
if (bucket.tokens >= tokensNeeded) {
bucket.tokens -= tokensNeeded;
return true;
}
return false;
}
}
// 10 tokens max, refill 2 tokens per second
// Allows bursts of 10, sustained rate of 2/sec
const limiter = new TokenBucket(10, 2);Pros: Allows controlled bursts, smooth rate limiting, intuitive to configure.Cons: Slightly more complex than fixed window.
5. Leaky Bucket
Similar to token bucket, but requests are processed at a constant rate regardless of arrival pattern. Incoming requests are queued, and the queue drains at a fixed rate. Overflow is rejected.
class LeakyBucket {
private buckets = new Map<string, { queue: number; lastDrain: number }>();
private capacity: number; // max queue size
private drainRate: number; // requests processed per second
constructor(capacity: number, drainRate: number) {
this.capacity = capacity;
this.drainRate = drainRate;
}
isAllowed(clientId: string): boolean {
const now = Date.now();
let bucket = this.buckets.get(clientId);
if (!bucket) {
bucket = { queue: 0, lastDrain: now };
this.buckets.set(clientId, bucket);
}
// Drain queue based on elapsed time
const elapsed = (now - bucket.lastDrain) / 1000;
bucket.queue = Math.max(0, bucket.queue - elapsed * this.drainRate);
bucket.lastDrain = now;
if (bucket.queue < this.capacity) {
bucket.queue += 1;
return true;
}
return false;
}
}
// Queue up to 20 requests, process 5 per second
const limiter = new LeakyBucket(20, 5);Pros: Produces a smooth, constant output rate. Prevents bursts.Cons: No burst allowance, potential latency from queuing.
Algorithm Comparison
| Algorithm | Memory | Accuracy | Burst Handling | Best For |
|---|---|---|---|---|
| Fixed Window | Very Low | Low | Edge bursts | Simple internal APIs |
| Sliding Window Log | High | Exact | No bursts | Low-volume, strict APIs |
| Sliding Window Counter | Low | Near-exact | Smooth | Most APIs (recommended) |
| Token Bucket | Low | Good | Controlled bursts | Public APIs, CDNs |
| Leaky Bucket | Low | Good | No bursts | Message queues, streaming |
HTTP Response Headers
Well-designed APIs communicate rate limit status through standard HTTP headers. This allows clients to implement adaptive behavior and avoid unnecessary rejections.
// Standard rate limit headers (RFC 6585 / draft-ietf-httpapi-ratelimit-headers)
HTTP/1.1 200 OK
X-RateLimit-Limit: 100 // Max requests per window
X-RateLimit-Remaining: 42 // Requests remaining in current window
X-RateLimit-Reset: 1708646400 // Unix timestamp when window resets
Retry-After: 30 // Seconds to wait (on 429 response)
// When rate limited
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708646400
{
"error": "rate_limit_exceeded",
"message": "Too many requests. Please retry after 30 seconds.",
"retryAfter": 30
}Express.js Middleware Implementation
Here is a production-ready rate limiting middleware for Express.js using the token bucket algorithm with Redis for distributed state.
import { Request, Response, NextFunction } from "express";
interface RateLimitConfig {
windowMs: number;
max: number;
keyGenerator?: (req: Request) => string;
message?: string;
}
function rateLimit(config: RateLimitConfig) {
const {
windowMs,
max,
keyGenerator = (req) => req.ip || "unknown",
message = "Too many requests, please try again later.",
} = config;
const store = new Map<string, { count: number; resetTime: number }>();
return (req: Request, res: Response, next: NextFunction) => {
const key = keyGenerator(req);
const now = Date.now();
let record = store.get(key);
if (!record || now > record.resetTime) {
record = { count: 0, resetTime: now + windowMs };
store.set(key, record);
}
record.count++;
const remaining = Math.max(0, max - record.count);
const resetSeconds = Math.ceil((record.resetTime - now) / 1000);
// Set rate limit headers
res.set("X-RateLimit-Limit", String(max));
res.set("X-RateLimit-Remaining", String(remaining));
res.set("X-RateLimit-Reset", String(Math.ceil(record.resetTime / 1000)));
if (record.count > max) {
res.set("Retry-After", String(resetSeconds));
return res.status(429).json({
error: "rate_limit_exceeded",
message,
retryAfter: resetSeconds,
});
}
next();
};
}
// Usage
app.use("/api/", rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // 100 requests per window
}));
// Stricter limit for auth endpoints
app.use("/api/auth/", rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 5, // 5 attempts per minute
message: "Too many login attempts. Please wait before trying again.",
}));Distributed Rate Limiting with Redis
For applications running multiple instances behind a load balancer, in-memory rate limiting does not work because each instance has its own counter. Redis provides a shared, atomic counter.
import Redis from "ioredis";
const redis = new Redis();
async function slidingWindowRateLimit(
clientId: string,
limit: number,
windowSeconds: number
): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
const key = `ratelimit:${clientId}`;
const now = Date.now();
const windowStart = now - windowSeconds * 1000;
// Atomic Redis operations using a pipeline
const pipeline = redis.pipeline();
pipeline.zremrangebyscore(key, 0, windowStart); // Remove expired entries
pipeline.zadd(key, now, `${now}-${Math.random()}`); // Add current request
pipeline.zcard(key); // Count requests in window
pipeline.expire(key, windowSeconds); // Set TTL for cleanup
const results = await pipeline.exec();
const count = results?.[2]?.[1] as number;
const allowed = count <= limit;
const remaining = Math.max(0, limit - count);
const resetAt = Math.ceil((now + windowSeconds * 1000) / 1000);
if (!allowed) {
// Remove the request we just added since it was rejected
await redis.zrem(key, `${now}-${Math.random()}`);
}
return { allowed, remaining, resetAt };
}
// Usage
const result = await slidingWindowRateLimit("user:123", 100, 60);
if (!result.allowed) {
res.status(429).json({ error: "rate_limit_exceeded" });
}Rate Limiting Strategies by Tier
Most production APIs implement tiered rate limits based on authentication level and subscription plan:
| Tier | Rate Limit | Burst | Identification |
|---|---|---|---|
| Anonymous | 60/hour | 10/minute | IP address |
| Free | 1,000/hour | 100/minute | API key |
| Pro | 10,000/hour | 500/minute | API key |
| Enterprise | 100,000/hour | 5,000/minute | API key + IP |
Client-Side Rate Limit Handling
Well-behaved API clients should detect rate limits and implement exponential backoff with jitter to avoid thundering herd problems.
async function fetchWithRetry(
url: string,
options: RequestInit = {},
maxRetries = 3
): Promise<Response> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status !== 429) {
return response;
}
if (attempt === maxRetries) {
throw new Error("Rate limit exceeded after max retries");
}
// Get retry delay from header or use exponential backoff
const retryAfter = response.headers.get("Retry-After");
let delayMs: number;
if (retryAfter) {
delayMs = parseInt(retryAfter, 10) * 1000;
} else {
// Exponential backoff with jitter
const baseDelay = Math.pow(2, attempt) * 1000;
const jitter = Math.random() * 1000;
delayMs = baseDelay + jitter;
}
console.log(`Rate limited. Retrying in ${delayMs}ms (attempt ${attempt + 1})`);
await new Promise(resolve => setTimeout(resolve, delayMs));
}
throw new Error("Unreachable");
}Best Practices
- Always return rate limit headers - Include
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset, andRetry-Afterin every response - Use 429 status code - Never return 200 with an error body for rate-limited requests
- Identify by API key, not just IP - Multiple users may share an IP (corporate NAT, VPN)
- Implement multiple tiers - Different endpoints need different limits (auth vs read vs write)
- Document your limits - Publish rate limits in your API documentation
- Use Redis for distributed systems - In-memory counters do not work across multiple instances
- Consider burst allowance - Token bucket is ideal because it allows short bursts without exceeding average rate
- Log rate limit events - Monitor who gets rate-limited and adjust limits accordingly
- Exempt health checks - Do not rate-limit monitoring and health check endpoints
- Graceful degradation - Consider returning cached responses instead of hard rejecting
Frequently Asked Questions
What is the difference between rate limiting and throttling?
Rate limiting rejects excess requests immediately with a 429 status code. Throttling delays (queues) excess requests and processes them later at a controlled rate. Rate limiting is simpler and more common for APIs, while throttling is used in streaming and real-time systems.
Should I rate limit by IP address or API key?
Use API keys when available, as multiple users may share the same IP address (corporate networks, VPNs). Fall back to IP-based limiting for unauthenticated endpoints. For maximum protection, combine both: per-key limits for fairness and per-IP limits for abuse prevention.
Which algorithm should I choose?
For most APIs, the sliding window counter provides the best balance of accuracy and simplicity. If you need burst allowance (e.g., a public API), use the token bucket. Use the fixed window only for simple internal services where boundary bursts are acceptable.
How do I handle rate limiting in a microservices architecture?
Use a centralized rate limiter at the API gateway level (e.g., Kong, NGINX, AWS API Gateway) for global limits. For service-to-service communication, implement local rate limiters with circuit breakers. Redis or Memcached provides shared state across instances.
What rate limits should I set for my API?
Start with generous limits and tighten based on actual usage patterns. Common starting points are 100 requests per 15 minutes for authenticated users and 20 requests per 15 minutes for anonymous access. Monitor your infrastructure capacity and adjust accordingly.