JSON API Rate Limiting: Token Bucket vs Sliding Window vs Fixed Window, Redis & 429
Last updated:
JSON API rate limiting controls request frequency per client — the three main algorithms are fixed window (simple, burst-prone), sliding window (smooth but Redis-heavy), and token bucket (bursty traffic allowed up to bucket size). Token bucket allows bursts: a bucket of 100 tokens refills at 10 tokens/second — a client can make 100 requests instantly then 10/second sustained. Fixed window allows 1000 requests per hour but 2000 requests in 2 minutes straddling the window boundary. This guide covers implementing all three algorithms with Redis, rate limit response headers (X-RateLimit-Limit, X-RateLimit-Remaining, Retry-After), client-side exponential backoff for 429 responses, express-rate-limit middleware, and Next.js API route rate limiting. All examples include TypeScript types.
Rate Limiting Algorithms: Fixed Window, Sliding Window, Token Bucket
The three core rate limiting algorithms differ in burst tolerance, memory cost, and boundary accuracy. Fixed window is cheapest (two Redis commands) but allows double the quota at window boundaries. Sliding window is most accurate but uses roughly 87x more Redis memory. Token bucket offers configurable burst tolerance with moderate Redis overhead via a Lua script. Choosing the right algorithm depends on whether boundary accuracy or burst tolerance matters more for your API.
// ── Algorithm comparison ──────────────────────────────────────────
// Fixed window: 1000 req/hr, window resets at :00 every hour
// Exploit: send 1000 at 00:59, 1000 at 01:00 = 2000 in 2 minutes
// Redis cost: INCR + EXPIRE (2 commands, ~32 bytes/key)
// Use when: loose IP throttling, DDoS protection, simple quotas
// Sliding window: 1000 req/hr, counts last 3600 seconds at all times
// No exploit: never more than 1000 in any 3600-second window
// Redis cost: ZADD + ZREMRANGEBYSCORE + ZCARD (3 commands, ~28 KB/key)
// Use when: paid API keys, authenticated users, strict SLA enforcement
// Token bucket: capacity=100, refill_rate=10 tokens/s
// Burst: up to 100 simultaneous requests allowed (bucket full)
// Sustained: 10 requests/second once bucket empties
// Redis cost: Lua script (HMGET + HMSET, ~96 bytes/key)
// Use when: interactive clients, mobile apps, batch-friendly APIs
// ── Fixed window — the boundary exploit ──────────────────────────
// Time: 00:59:59 01:00:00 01:00:01
// Count: 1000/1000 0/1000 1000/1000 ← 2000 in 2 seconds!
// Window: [reset] → [new window starts]
// ── Sliding window — boundary safe ───────────────────────────────
// At any point T, count requests from (T - 3600s) to T
// Cannot exceed 1000 in any rolling 3600-second window
// ── Token bucket — burst then sustain ────────────────────────────
// t=0: 100 tokens available → send 100 requests instantly (all allowed)
// t=0: bucket now empty → next request blocked (retryAfter = 0.1s)
// t=5: 50 tokens refilled → can send 50 more immediately
// t=10: 100 tokens refilled → full burst capacity restored
// TypeScript types shared across all algorithms
interface RateLimitResult {
allowed: boolean
limit: number
remaining: number
reset: number // Unix timestamp (seconds)
retryAfter?: number // seconds to wait if denied
}Fixed window is best when implementation simplicity and memory efficiency outweigh accuracy — anonymous IP throttling and DDoS protection are good candidates because limits are already generous and boundary exploits matter less. Sliding window is the right choice for paid tiers and authenticated users where clients may attempt boundary exploitation. Token bucket is ideal for interactive web and mobile apps that make legitimate parallel requests (e.g., loading a page that fires 15 concurrent API calls). See also our guide on JSON API design for conventions around exposing rate limit information in API responses.
Redis-Based Rate Limiting Implementation
Redis is the standard backend for distributed rate limiting across multiple Node.js instances. Each algorithm maps to specific Redis commands chosen for atomicity and latency. Fixed window uses INCR + EXPIRE (atomic counting). Sliding window uses a sorted set pipeline (ZADD + ZREMRANGEBYSCORE + ZCARD). Token bucket uses a Lua script for full read-modify-write atomicity — a Lua script runs server-side and is never interrupted between operations.
import Redis from 'ioredis'
const redis = new Redis(process.env.REDIS_URL!, {
maxRetriesPerRequest: 1,
enableReadyCheck: false, // required for serverless environments
})
// ── Fixed window — INCR + EXPIRE ──────────────────────────────────
async function fixedWindowLimit(
identifier: string,
limit: number,
windowSeconds: number
): Promise<RateLimitResult> {
const windowId = Math.floor(Date.now() / (windowSeconds * 1000))
const key = `rl:fixed:${identifier}:${windowId}`
const reset = (windowId + 1) * windowSeconds
const pipeline = redis.pipeline()
pipeline.incr(key)
pipeline.expire(key, windowSeconds)
const results = await pipeline.exec()
const count = (results?.[0]?.[1] as number) ?? 1
const remaining = Math.max(0, limit - count)
return {
allowed: count <= limit,
limit,
remaining,
reset,
retryAfter: count > limit ? reset - Math.floor(Date.now() / 1000) : undefined,
}
}
// ── Sliding window — sorted set pipeline ──────────────────────────
async function slidingWindowLimit(
identifier: string,
limit: number,
windowSeconds: number
): Promise<RateLimitResult> {
const key = `rl:sliding:${identifier}`
const now = Date.now()
const windowStart = now - windowSeconds * 1000
const reset = Math.ceil(now / 1000) + windowSeconds
const pipeline = redis.pipeline()
pipeline.zadd(key, now, `${now}-${Math.random()}`) // unique member per request
pipeline.zremrangebyscore(key, '-inf', windowStart) // evict entries outside window
pipeline.zcard(key) // count remaining entries
pipeline.expire(key, windowSeconds + 1) // auto-cleanup TTL
const results = await pipeline.exec()
const count = (results?.[2]?.[1] as number) ?? 0
const allowed = count <= limit
if (!allowed) {
// Undo the ZADD — do not count this denied request against the window
await redis.zremrangebyscore(key, now, now)
}
return {
allowed,
limit,
remaining: Math.max(0, limit - (allowed ? count : count - 1)),
reset,
retryAfter: allowed ? undefined : windowSeconds,
}
}
// ── Token bucket — atomic Lua script ──────────────────────────────
const TOKEN_BUCKET_SCRIPT = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refillRate = tonumber(ARGV[2]) -- tokens/second
local now = tonumber(ARGV[3]) -- current time in milliseconds
local cost = tonumber(ARGV[4]) -- tokens per request (usually 1)
local bucket = redis.call('HMGET', key, 'tokens', 'lastRefill')
local tokens = tonumber(bucket[1]) or capacity
local lastRefill = tonumber(bucket[2]) or now
-- Compute tokens earned since last refill
local elapsedSec = math.max(0, now - lastRefill) / 1000
local newTokens = math.min(capacity, tokens + elapsedSec * refillRate)
if newTokens >= cost then
-- Allow: deduct cost, save state
local remaining = newTokens - cost
redis.call('HMSET', key, 'tokens', remaining, 'lastRefill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refillRate) + 60)
return { 1, math.floor(remaining) }
else
-- Deny: save refilled state (don't deduct)
redis.call('HMSET', key, 'tokens', newTokens, 'lastRefill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refillRate) + 60)
return { 0, math.floor(newTokens) }
end
`
async function tokenBucketLimit(
identifier: string,
capacity: number, // max tokens (burst size)
refillRate: number, // tokens per second
cost = 1
): Promise<RateLimitResult> {
const key = `rl:token:${identifier}`
const now = Date.now()
const result = await redis.eval(
TOKEN_BUCKET_SCRIPT, 1, key,
capacity, refillRate, now, cost
) as [number, number]
const allowed = result[0] === 1
const remaining = result[1]
const tokensNeeded = cost - (allowed ? 0 : remaining)
const retryAfter = allowed ? undefined : Math.ceil(tokensNeeded / refillRate)
return {
allowed,
limit: capacity,
remaining,
reset: Math.floor(now / 1000) + (retryAfter ?? 0),
retryAfter,
}
}The Lua script for token bucket is critical — without it, a read-then-write sequence (HMGET, compute, HMSET) has a race condition: two concurrent requests can both read the same token count, both compute "I have enough tokens", and both succeed, consuming more than one token. Redis runs Lua scripts atomically — no other commands execute between the HMGET and HMSET inside the script. For sliding window, the pipeline is not fully atomic (ZADD and ZCARD are separate commands), but the ZADD-then-undo pattern on denial keeps counts correct under concurrent load with acceptable accuracy. See JSON caching to reduce the number of requests reaching rate-limited backends.
Rate Limit Response Headers: X-RateLimit-Limit, Remaining, Retry-After
Every rate-limited JSON API response should include standard headers so clients know their current quota status. The IETF RateLimit Headers draft defines RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset as the canonical names. On a 429 response, Retry-After is mandatory — it tells clients exactly how long to wait. Provider-specific X-RateLimit-* prefixed headers carry identical semantics.
// ── Standard IETF headers (draft-ietf-httpapi-ratelimit-headers) ──
// Every API response (200 and 429):
RateLimit-Limit: 100 // quota ceiling for current window
RateLimit-Remaining: 43 // requests left in this window
RateLimit-Reset: 1716220800 // Unix timestamp (seconds) when quota resets
// 429 only — mandatory:
Retry-After: 30 // seconds to wait (integer) OR HTTP-date string
RateLimit-Remaining: 0 // always 0 on 429
// ── Setting headers in Express ────────────────────────────────────
import express from 'express'
const app = express()
app.use('/api', async (req, res, next) => {
const result = await slidingWindowLimit(`ip:${req.ip}`, 100, 60)
// Always set rate limit headers
res.set('RateLimit-Limit', String(result.limit))
res.set('RateLimit-Remaining', String(result.remaining))
res.set('RateLimit-Reset', String(result.reset))
if (!result.allowed) {
const retryAfter = result.retryAfter ?? 60
res.set('Retry-After', String(retryAfter))
return res.status(429)
.set('Content-Type', 'application/problem+json')
.json({
type: 'https://jsonic.io/errors/rate-limit-exceeded',
title: 'Too Many Requests',
status: 429,
detail: `Rate limit exceeded. Retry after ${retryAfter} seconds.`,
retryAfter,
limit: result.limit,
remaining: 0,
reset: result.reset,
})
}
next()
})
// ── Header parsing on the client side ────────────────────────────
function parseRateLimitHeaders(headers: Headers) {
const limit = Number(headers.get('ratelimit-limit') ?? headers.get('x-ratelimit-limit'))
const remaining = Number(headers.get('ratelimit-remaining') ?? headers.get('x-ratelimit-remaining'))
const reset = Number(headers.get('ratelimit-reset') ?? headers.get('x-ratelimit-reset'))
// Retry-After can be integer seconds OR an HTTP-date string
const retryAfterHeader = headers.get('retry-after')
let retryAfterMs = 0
if (retryAfterHeader) {
const parsed = Number(retryAfterHeader)
retryAfterMs = isNaN(parsed)
? Math.max(0, new Date(retryAfterHeader).getTime() - Date.now())
: parsed * 1000
} else if (reset) {
// Fallback: compute from RateLimit-Reset
retryAfterMs = Math.max(0, reset * 1000 - Date.now())
}
return { limit, remaining, reset, retryAfterMs }
}
// ── Provider header variations ────────────────────────────────────
// GitHub: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset (epoch)
// X-RateLimit-Used, X-RateLimit-Resource (core/search/graphql)
// OpenAI: x-ratelimit-limit-requests, x-ratelimit-remaining-requests
// x-ratelimit-reset-requests (ISO 8601: "1s", "500ms")
// Stripe: RateLimit-Limit, RateLimit-Remaining, Retry-After on 429
// Twilio: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Remaining-PeriodThe Retry-After header spec (RFC 7231) supports two formats: an integer number of seconds (Retry-After: 30) and an HTTP-date string (Retry-After: Wed, 21 Oct 2025 07:28:00 GMT). Integer seconds are simpler to parse and preferred for rate limiting — HTTP-date is more common in Retry-After responses for scheduled maintenance. Always emit Retry-After as an integer on 429 responses. When RateLimit-Reset is a Unix timestamp, convert to a wait duration with Math.max(0, reset - Math.floor(Date.now() / 1000)). See JSON error handling for the full RFC 7807 error response pattern.
HTTP 429 Too Many Requests JSON Error Response
HTTP 429 is the correct status code for rate limit exceeded responses (defined in RFC 6585). The JSON response body should follow RFC 7807 Problem Details for HTTP APIs, using Content-Type: application/problem+json. A well-formed 429 response lets clients parse the retry timing programmatically rather than relying on a human-readable error message. Include retryAfter in the body as a numeric backup for clients that cannot read the Retry-After header.
// ── Minimal RFC 7807 Problem Details for 429 ─────────────────────
HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 1716220800
Retry-After: 30
{
"type": "https://jsonic.io/errors/rate-limit-exceeded",
"title": "Too Many Requests",
"status": 429,
"detail": "You have exceeded 100 requests per minute. Retry after 30 seconds.",
"retryAfter": 30
}
// ── Extended 429 body with quota details ──────────────────────────
{
"type": "https://jsonic.io/errors/rate-limit-exceeded",
"title": "Too Many Requests",
"status": 429,
"detail": "API rate limit exceeded for key sk-abc. Retry after 30 seconds.",
"instance": "/api/v1/completions",
"retryAfter": 30,
"limit": 100,
"remaining": 0,
"reset": 1716220800,
"window": "1m",
"tier": "free",
"upgradeUrl": "https://jsonic.io/pricing"
}
// ── TypeScript helper to build RFC 7807 429 responses ─────────────
import { NextResponse } from 'next/server'
interface RateLimitErrorOptions {
retryAfter: number // seconds
limit: number
reset: number // Unix timestamp (seconds)
instance?: string // URI of the specific request that was rejected
tier?: string
}
function rateLimitResponse(opts: RateLimitErrorOptions): NextResponse {
const body = {
type: 'https://jsonic.io/errors/rate-limit-exceeded',
title: 'Too Many Requests',
status: 429,
detail: `Rate limit of ${opts.limit} requests exceeded. Retry after ${opts.retryAfter} seconds.`,
retryAfter: opts.retryAfter,
limit: opts.limit,
remaining: 0,
reset: opts.reset,
...(opts.instance && { instance: opts.instance }),
...(opts.tier && { tier: opts.tier }),
}
return new NextResponse(JSON.stringify(body), {
status: 429,
headers: {
'Content-Type': 'application/problem+json',
'Retry-After': String(opts.retryAfter),
'RateLimit-Limit': String(opts.limit),
'RateLimit-Remaining': '0',
'RateLimit-Reset': String(opts.reset),
},
})
}
// ── Usage in a Next.js API route ─────────────────────────────────
// export async function GET(req: Request) {
// const result = await slidingWindowLimit('ip:' + getIp(req), 100, 60)
// if (!result.allowed) {
// return rateLimitResponse({
// retryAfter: result.retryAfter ?? 60,
// limit: result.limit,
// reset: result.reset,
// instance: new URL(req.url).pathname,
// })
// }
// return Response.json({ data: '...' })
// }The type field in RFC 7807 must be a URI — it acts as a stable identifier for this error class that clients can key on programmatically, not just a human-readable string. Use your own domain (https://yourapi.com/errors/rate-limit-exceeded) so you can host documentation at that URL explaining the error. The instance field identifies the specific request occurrence — useful for support tickets where a user reports a specific failed request. The upgradeUrl extension field (not standard) is a useful addition for freemium APIs to drive upgrade conversions directly from rate limit error responses. See JSON security for preventing information leakage in error responses.
express-rate-limit Middleware Setup
express-rate-limit is the standard rate limiting middleware for Express.js JSON APIs. It supports fixed window limiting out of the box with an in-memory store, and plugs into Redis via rate-limit-redis for distributed deployments. The standardHeaders: "draft-7" option emits the IETF-compliant header names automatically. Custom handler and keyGenerator functions support RFC 7807 error bodies and per-user limits.
// npm install express-rate-limit rate-limit-redis ioredis
import rateLimit from 'express-rate-limit'
import { RedisStore } from 'rate-limit-redis'
import Redis from 'ioredis'
import express from 'express'
const redis = new Redis(process.env.REDIS_URL!)
const app = express()
// ── Basic in-memory rate limiter (single instance only) ───────────
const basicLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
limit: 100, // 100 requests per window
standardHeaders: 'draft-7', // emit RateLimit-* headers (IETF draft-7)
legacyHeaders: false, // disable X-RateLimit-* legacy headers
message: { // RFC 7807 JSON body
type: 'https://jsonic.io/errors/rate-limit-exceeded',
title: 'Too Many Requests',
status: 429,
detail: 'Too many requests, please try again later.',
},
})
// ── Redis-backed distributed limiter ─────────────────────────────
const distributedLimiter = rateLimit({
windowMs: 60 * 1000,
limit: 100,
standardHeaders: 'draft-7',
legacyHeaders: false,
store: new RedisStore({
sendCommand: (...args: string[]) => redis.call(...args),
prefix: 'rl:express:',
}),
// Custom key: per-user if authenticated, per-IP if anonymous
keyGenerator: (req) => {
const apiKey = req.headers['x-api-key'] as string
return apiKey ? `apikey:${apiKey}` : `ip:${req.ip}`
},
// Custom RFC 7807 handler with Retry-After in body
handler: (req, res) => {
const resetTime = req.rateLimit.resetTime?.getTime() ?? Date.now() + 60_000
const retryAfter = Math.max(0, Math.ceil((resetTime - Date.now()) / 1000))
res.status(429)
.set('Content-Type', 'application/problem+json')
.set('Retry-After', String(retryAfter))
.json({
type: 'https://jsonic.io/errors/rate-limit-exceeded',
title: 'Too Many Requests',
status: 429,
detail: `Rate limit exceeded. Retry after ${retryAfter} seconds.`,
retryAfter,
limit: req.rateLimit.limit,
remaining: 0,
reset: Math.floor(resetTime / 1000),
})
},
// Skip rate limiting for health check endpoints
skip: (req) => req.path === '/health',
})
// ── Tiered limiters — different limits per user tier ─────────────
const freeLimiter = rateLimit({ windowMs: 60 * 60 * 1000, limit: 100, store: new RedisStore({ sendCommand: (...args: string[]) => redis.call(...args), prefix: 'rl:free:' }) })
const proLimiter = rateLimit({ windowMs: 60 * 60 * 1000, limit: 10_000, store: new RedisStore({ sendCommand: (...args: string[]) => redis.call(...args), prefix: 'rl:pro:' }) })
// Apply to all API routes
app.use('/api/', distributedLimiter)
// Apply different limits to specific high-cost endpoints
app.use('/api/search', rateLimit({ windowMs: 60_000, limit: 10, store: new RedisStore({ sendCommand: (...args: string[]) => redis.call(...args), prefix: 'rl:search:' }) }))
app.use('/api/completions', rateLimit({ windowMs: 60_000, limit: 5, store: new RedisStore({ sendCommand: (...args: string[]) => redis.call(...args), prefix: 'rl:completions:' }) }))The standardHeaders: 'draft-7' option (available in express-rate-limit v7+) emits RateLimit-Policy, RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset in the IETF draft-7 format. Earlier versions use 'draft-6' which emits the same headers without RateLimit-Policy. The keyGenerator function controls what is rate-limited — by default it uses the client IP, but authenticated APIs should use the API key or user ID to avoid penalizing shared IP addresses (corporate NAT, university networks). The skip function exempts specific routes from rate limiting — always exclude health check and metrics endpoints so load balancers and monitoring systems are not rate limited.
Client-Side Exponential Backoff for 429 Responses
Clients that retry 429 responses at a fixed interval cause retry storms — all rate-limited clients hit the server at the same time, get rate limited again, and repeat. Exponential backoff with jitter spreads retries across a time window. Full jitter (randomizing between 0 and the exponential ceiling) is recommended by AWS architecture papers as the most effective strategy for distributed retry scenarios.
// ── Exponential backoff formulas ──────────────────────────────────
// Full jitter (AWS recommended): delay = random(0, min(cap, base * 2^attempt))
// Equal jitter: delay = min(cap, base * 2^attempt) / 2 + random(0, min(cap, base * 2^attempt) / 2)
// Decorrelated jitter (best): delay = min(cap, random(base, prevDelay * 3))
function fullJitter(attempt: number, base = 1000, cap = 60_000): number {
return Math.random() * Math.min(cap, base * Math.pow(2, attempt))
}
// ── Fetch wrapper with full 429 retry logic ───────────────────────
async function fetchWithRetry(
url: string,
options: RequestInit = {},
maxAttempts = 4,
absoluteTimeoutMs = 120_000
): Promise<Response> {
const startTime = Date.now()
for (let attempt = 0; attempt < maxAttempts; attempt++) {
const res = await fetch(url, options)
// Return immediately for any non-429 response (success or other error)
if (res.status !== 429) return res
// Last attempt — do not wait, just throw
if (attempt === maxAttempts - 1) {
throw new Error(`Rate limited after ${maxAttempts} attempts. URL: ${url}`)
}
// ── Parse wait time (priority order) ──────────────────────────
const retryAfterHeader = res.headers.get('retry-after')
const resetHeader = res.headers.get('ratelimit-reset') ?? res.headers.get('x-ratelimit-reset')
let waitMs = fullJitter(attempt) // fallback
if (retryAfterHeader) {
const asInt = Number(retryAfterHeader)
waitMs = isNaN(asInt)
? Math.max(0, new Date(retryAfterHeader).getTime() - Date.now()) // HTTP-date
: asInt * 1000 // integer seconds
} else if (resetHeader) {
waitMs = Math.max(0, Number(resetHeader) * 1000 - Date.now())
}
// Read retryAfter from JSON body as additional signal
try {
const body = await res.clone().json()
if (typeof body.retryAfter === 'number') {
waitMs = Math.max(waitMs, body.retryAfter * 1000)
}
} catch { /* ignore body parse errors */ }
// Respect absolute timeout — do not wait if it would exceed it
const remaining = absoluteTimeoutMs - (Date.now() - startTime)
if (waitMs > remaining) {
throw new Error(`Rate limit retry would exceed timeout. Giving up after attempt ${attempt + 1}.`)
}
await new Promise(resolve => setTimeout(resolve, waitMs))
}
throw new Error('Rate limit retry loop exhausted')
}
// ── axios-retry setup ─────────────────────────────────────────────
import axios from 'axios'
import axiosRetry from 'axios-retry'
axiosRetry(axios, {
retries: 3,
retryCondition: (error) => error.response?.status === 429,
retryDelay: (retryCount, error) => {
const retryAfter = error.response?.headers['retry-after']
if (retryAfter) return Number(retryAfter) * 1000
return axiosRetry.exponentialDelay(retryCount) + Math.random() * 1000
},
onRetry: (retryCount, error) => {
console.warn(`429 retry ${retryCount} for ${error.config?.url}`)
},
})The absolute timeout guard (absoluteTimeoutMs) is critical for production API clients — without it, a misbehaving rate limit response with a very large Retry-After value could cause a client to wait indefinitely or far longer than the caller expects. Set the absolute timeout based on the calling context: a synchronous HTTP request handler should not wait more than 5 seconds total; a background job can afford 120 seconds. For payment and webhook delivery systems, implement a dead-letter queue instead of retrying inline — after 3 attempts, push the failed request to a queue (SQS, Redis list, database row) for async retry with a longer timeout. See also JSON error handling for broader error response patterns.
Next.js API Route Rate Limiting
Next.js offers two rate limiting integration points: middleware.ts (runs at the edge before any route handler, best for blanket API protection) and inside individual route handlers (best for per-endpoint granularity). The App Router and Pages Router require slightly different approaches. The @upstash/ratelimit library is the recommended choice for both — its HTTP-based Redis client works in the edge runtime where Node.js TCP clients like ioredis do not.
// ── Option 1: middleware.ts — edge rate limiting ─────────────────
// npm install @upstash/ratelimit @upstash/redis
import { NextRequest, NextResponse } from 'next/server'
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'
// Instantiate once at module level — reused across edge requests
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(), // UPSTASH_REDIS_REST_URL + TOKEN
limiter: Ratelimit.slidingWindow(100, '1 m'), // 100 req/min sliding window
prefix: 'rl',
analytics: true,
})
// Per-endpoint limiters with different quotas
const searchLimiter = new Ratelimit({ redis: Redis.fromEnv(), limiter: Ratelimit.fixedWindow(10, '1 m'), prefix: 'rl:search' })
const llmLimiter = new Ratelimit({ redis: Redis.fromEnv(), limiter: Ratelimit.tokenBucket(20, '1 m', 20), prefix: 'rl:llm' })
export async function middleware(req: NextRequest) {
// Only rate limit API routes
if (!req.nextUrl.pathname.startsWith('/api/')) return NextResponse.next()
// Extract identifier: prefer user ID from JWT, fall back to IP
const ip = req.headers.get('x-forwarded-for')?.split(',')[0].trim() ?? '127.0.0.1'
const authHeader = req.headers.get('authorization')
const identifier = authHeader ? `user:${parseJwtSub(authHeader)}` : `ip:${ip}`
// Select limiter based on endpoint
const limiter =
req.nextUrl.pathname.startsWith('/api/search') ? searchLimiter :
req.nextUrl.pathname.startsWith('/api/llm') ? llmLimiter :
ratelimit
const { success, limit, remaining, reset } = await limiter.limit(identifier)
const rateLimitHeaders = {
'RateLimit-Limit': String(limit),
'RateLimit-Remaining': String(remaining),
'RateLimit-Reset': String(Math.floor(reset / 1000)),
}
if (!success) {
const retryAfter = Math.max(0, Math.ceil((reset - Date.now()) / 1000))
return new NextResponse(
JSON.stringify({
type: 'https://jsonic.io/errors/rate-limit-exceeded',
title: 'Too Many Requests',
status: 429,
detail: `Rate limit exceeded. Retry after ${retryAfter} seconds.`,
retryAfter,
limit,
remaining: 0,
reset: Math.floor(reset / 1000),
}),
{
status: 429,
headers: { ...rateLimitHeaders, 'Retry-After': String(retryAfter), 'Content-Type': 'application/problem+json' },
}
)
}
const response = NextResponse.next()
Object.entries(rateLimitHeaders).forEach(([k, v]) => response.headers.set(k, v))
return response
}
export const config = { matcher: '/api/:path*' }
// ── Option 2: Rate limit inside App Router route handler ──────────
// app/api/completions/route.ts
export async function POST(req: Request) {
const ip = req.headers.get('x-forwarded-for')?.split(',')[0] ?? '127.0.0.1'
const { success, limit, remaining, reset } = await llmLimiter.limit(`ip:${ip}`)
if (!success) {
const retryAfter = Math.max(0, Math.ceil((reset - Date.now()) / 1000))
return Response.json(
{ type: 'https://jsonic.io/errors/rate-limit-exceeded', title: 'Too Many Requests', status: 429, retryAfter },
{ status: 429, headers: { 'Content-Type': 'application/problem+json', 'Retry-After': String(retryAfter) } }
)
}
return Response.json({ result: 'ok' })
}
function parseJwtSub(authHeader: string): string {
try {
const payload = JSON.parse(Buffer.from(authHeader.replace('Bearer ', '').split('.')[1], 'base64').toString())
return payload.sub ?? 'unknown'
} catch { return 'invalid' }
}The middleware approach is preferred for blanket API protection because it runs before route handlers, preventing any compute cost (database queries, LLM calls) for rate-limited requests. Use route-level rate limiting when different endpoints have substantially different quota requirements — for example, a /api/search endpoint that is expensive should have a tighter limit than /api/users. For self-hosted Next.js deployments without Upstash, implement the Redis-based sliding window using ioredis inside the API route handlers (not middleware) because the edge runtime does not support TCP connections. Deploy to a Node.js runtime (export const runtime = 'nodejs') when using ioredis. See the JSON caching strategies guide for complementary patterns to reduce backend load.
Key Terms
- rate limiting
- A traffic control mechanism that restricts the number of requests a client can make to an API within a defined time window. Rate limiting protects APIs from abuse, controls infrastructure costs, and enforces fair usage across tenants. It is implemented server-side (the API enforces limits) and signaled to clients via HTTP headers (
RateLimit-Limit,RateLimit-Remaining,RateLimit-Reset) and the HTTP 429 status code. Rate limiting differs from throttling (which queues excess requests and adds latency) and circuit breaking (which stops all requests to a failing downstream service). Common identifiers for rate limiting keys include IP address, API key, user ID, and endpoint path. - fixed window
- A rate limiting algorithm that divides time into fixed intervals (e.g., each minute from :00 to :59) and resets a request counter at each boundary. Implemented in Redis with two commands: INCR (increment the counter atomically) and EXPIRE (set a TTL so the key auto-deletes after the window). The Redis key includes a window identifier — typically
Math.floor(Date.now() / windowMs)— so different windows use different keys. Fixed window is the cheapest algorithm (~32 bytes per key, 2 commands, ~0.1 ms) but is vulnerable to the boundary exploit: a client can send double the quota by splitting requests across a window boundary. Use fixed window for loose throttling where the exploit is acceptable, such as anonymous IP limiting or DDoS mitigation. - sliding window
- A rate limiting algorithm that continuously counts requests within the last N seconds, rather than in a fixed interval that resets at a boundary. Each request is stored in a Redis sorted set with its timestamp as the score (ZADD). On each new request, entries older than the window are evicted (ZREMRANGEBYSCORE), and ZCARD returns the current count. Sliding window eliminates the boundary exploit — at no point can a client send more than
limitrequests in any rolling window of N seconds. The memory cost is higher: one sorted set entry per live request (~28 bytes per entry) versus two integers for fixed window. For a 100 req/min limit, a key can hold up to 100 entries (~2.8 KB), versus ~32 bytes for fixed window — roughly 87x more memory. - token bucket
- A rate limiting algorithm that models a virtual bucket with a maximum capacity (burst size) and a refill rate (tokens per second). Each request consumes one token; the request is denied when the bucket is empty. The bucket refills continuously at the refill rate, up to the capacity. Token bucket uniquely allows burst traffic up to the capacity while enforcing the average sustained rate. Implemented in Redis using a hash storing
tokens(current count) andlastRefill(timestamp), updated atomically via a Lua script to prevent race conditions. Ideal for interactive clients that make legitimate parallel requests — a bucket with capacity=20 allows 20 simultaneous requests (a page load firing 20 parallel API calls) while enforcing the sustained rate thereafter. - HTTP 429
- The HTTP status code "Too Many Requests" (defined in RFC 6585) returned by a server when a client has exceeded its rate limit quota. A 429 response must include a
Retry-Afterheader indicating how long the client should wait before retrying. The response body for a JSON API should useContent-Type: application/problem+jsonwith an RFC 7807 Problem Details object containingtype,title,status: 429,detail, andretryAfterfields. HTTP 429 differs from 503 Service Unavailable (the server is overloaded) and 503 with Retry-After (scheduled maintenance) — 429 specifically communicates a per-client quota violation, not a server-wide problem. - Retry-After
- An HTTP response header (defined in RFC 7231, used in rate limiting via RFC 6585) that tells a client how long to wait before making another request. Its value is either an integer number of seconds (
Retry-After: 30) or an HTTP-date string (Retry-After: Wed, 21 Oct 2025 07:28:00 GMT). For rate limiting responses, integer seconds are preferred — they are simpler to parse and directly represent "wait this many seconds." TheRetry-Aftervalue should be included in the JSON response body asretryAfter(integer seconds) as well, since some HTTP client libraries do not expose response headers easily. Clients must treatRetry-Afteras a minimum wait time and add jitter on top. - exponential backoff
- A retry strategy where the wait time between attempts grows exponentially:
delay = base × 2^attempt(e.g., 1s, 2s, 4s, 8s). Used by API clients responding to 429 Too Many Requests to prevent retry storms — where all rate-limited clients retry simultaneously and overload the server again. Adding jitter (randomness) to the backoff formula spreads retries across a time window: full jitter usesdelay = random(0, min(cap, base × 2^attempt))where cap is the maximum delay (e.g., 60 seconds). The AWS architecture blog recommends full jitter or decorrelated jitter as the most effective strategies for distributed retry scenarios. Always use the server-providedRetry-Aftervalue as a minimum wait floor — exponential backoff is applied on top of the server-mandated minimum. - Redis INCR
- An atomic Redis command that increments an integer value stored at a key by 1, returning the new value. INCR is the foundation of fixed-window rate limiting:
INCR rate:ip:192.168.1.1:1716220800increments the counter for this IP in the current window and returns the count — all in a single atomic operation. Because INCR is atomic, two concurrent requests cannot both read the same count and both believe they are under the limit: Redis serializes INCR operations, so they are always strictly sequential. Combined with EXPIRE (set TTL so the key deletes after the window), INCR + EXPIRE implements fixed window rate limiting in 2 commands and ~32 bytes of memory per key, at ~0.1 ms latency per request. For sliding window, ZADD + ZREMRANGEBYSCORE + ZCARD replaces INCR + EXPIRE at the cost of higher memory and latency.
FAQ
What is the difference between fixed window and sliding window rate limiting?
Fixed window divides time into fixed intervals (e.g., each minute starting at :00) and resets a request counter at each boundary. It is implemented with Redis INCR + EXPIRE — two commands, ~32 bytes per key, ~0.1 ms latency. The weakness is the boundary exploit: a client can send 1000 requests at 00:59 and 1000 more at 01:00, making 2000 requests in 2 seconds while technically respecting a 1000 req/min limit. Sliding window eliminates this by continuously counting requests in the past N seconds. Each request is stored with a timestamp in a Redis sorted set; old entries are evicted with ZREMRANGEBYSCORE before counting with ZCARD. No rolling window ever contains more than limit requests. The cost: ~28 bytes per request entry versus ~32 bytes total for fixed window — roughly 87x more memory. Use fixed window for loose IP throttling where the exploit is acceptable; use sliding window for paid API keys and authenticated users where accurate enforcement matters.
How do I implement rate limiting in a Node.js JSON API?
For Express.js, use express-rate-limit: npm install express-rate-limit, then app.use('/api/', rateLimit({ windowMs: 60_000, limit: 100, standardHeaders: "draft-7", legacyHeaders: false })). This uses an in-memory store — fine for single-instance deployments. For distributed deployments (multiple Node.js processes or servers), add a Redis store: npm install rate-limit-redis ioredis and set store: new RedisStore({ sendCommand: (...args) => redis.call(...args) }). Use a custom handler function to return RFC 7807 JSON bodies with Content-Type: application/problem+json instead of the default plain text. Use keyGenerator to rate limit by API key or user ID rather than IP address for authenticated endpoints. For per-endpoint limits, create multiple rateLimit() instances and apply them as separate middleware to specific routes.
What headers should a rate-limited JSON API response include?
Every API response (200 and 429) should include RateLimit-Limit (the quota ceiling, e.g., 100), RateLimit-Remaining (requests left in the current window, e.g., 43), and RateLimit-Reset (Unix timestamp in seconds when the quota resets, e.g., 1716220800). On a 429 response, additionally include Retry-After (integer seconds to wait, e.g., 30) and set RateLimit-Remaining: 0. These names follow the IETF RateLimit Headers draft (draft-ietf-httpapi-ratelimit-headers). Many APIs use X-RateLimit-* prefixed names (GitHub, OpenAI, Twilio) — the semantics are identical. The express-rate-limit library emits the IETF names when configured with standardHeaders: 'draft-7'. Always set Content-Type: application/problem+json on 429 responses alongside the rate limit headers.
How do I return a proper JSON 429 error response?
Return HTTP status 429 with Content-Type: application/problem+json and a body following RFC 7807 Problem Details. Required fields: type (a URI identifying this error class, e.g., 'https://yourapi.com/errors/rate-limit-exceeded'), title ('Too Many Requests'), status (integer 429), and detail (human-readable explanation with retry timing). Also include retryAfter (integer seconds — mirrors the Retry-After header in the body for clients that cannot read headers), limit (the quota), remaining (always 0 on a 429), and reset (Unix timestamp). Set the Retry-After HTTP header to the same value as the body's retryAfter. Express example: res.status(429).set('Content-Type', 'application/problem+json').set('Retry-After', String(retryAfter)).json({ type, title: "Too Many Requests", status: 429, detail, retryAfter }).
How do I implement client-side retry for rate-limited JSON APIs?
When a response returns 429, read the Retry-After header first — this is the server-mandated minimum wait. Parse it as integer seconds or as an HTTP-date string. Then apply exponential backoff with full jitter on top: delay = random(0, min(cap, base * 2^attempt)) where base is 1000 ms and cap is 60000 ms. Always wait at least the Retry-After duration. If Retry-After is absent, fall back to computing the wait from RateLimit-Reset (Unix timestamp minus current time in milliseconds), or read the retryAfter field from the RFC 7807 JSON body. Cap retries at 3-5 attempts and track an absolute timeout (e.g., 120 seconds) to prevent infinite waits. Use axios-retry with a custom retryDelay function that reads retry-after from the error response headers. Never retry at a fixed interval — all rate-limited clients retrying simultaneously causes a retry storm that overwhelms the server again.
What is a token bucket and how does it allow bursts?
A token bucket is a rate limiting algorithm with two parameters: capacity (maximum tokens, the burst size) and refill_rate (tokens added per second, the sustained rate). The bucket starts full. Each request consumes one token and is denied if the bucket is empty. The bucket refills continuously at refill_rate tokens per second, up to capacity. A bucket with capacity=100 and refill_rate=10 tokens/s allows 100 requests instantly (consuming all tokens), then 10 requests/second sustained. After 10 seconds of inactivity, the bucket refills completely, allowing another burst of 100. This makes token bucket ideal for APIs consumed by interactive clients: a mobile app loading a screen might fire 15 parallel API calls simultaneously — token bucket allows this burst while fixed or sliding window would reject most of the parallel requests. Implement token bucket in Redis using a hash (tokens, lastRefill) updated atomically via a Lua script — the script computes elapsed time, adds earned tokens, caps at capacity, deducts 1, and writes back in one atomic operation.
How do I rate limit a Next.js API route?
Two approaches for Next.js App Router: (1) middleware.ts — runs at the edge before route handlers, blocking rate-limited requests before any computation. Install @upstash/ratelimit and @upstash/redis, create a Ratelimit instance at module level, extract the client IP from x-forwarded-for, call await ratelimit.limit(identifier), and return a NextResponse with status 429 and RFC 7807 body if success is false. Set config.matcher = ['/api/:path*'] to apply only to API routes. (2) Inside route handler — add the rate limit check at the top of the GET/POST handler in route.ts. Use this for per-endpoint granularity when different routes need different limits. For self-hosted Next.js using ioredis, deploy with export const runtime = 'nodejs' — ioredis uses TCP which is not available in the edge runtime.
How do I use Redis for distributed JSON API rate limiting?
Redis is the standard distributed rate limiting backend because it provides atomic operations visible to all Node.js instances simultaneously. For fixed window: pipeline INCR rate:fixed:ip:X:windowId and EXPIRE key windowSeconds — Redis INCR is atomic, so concurrent requests cannot both read the same stale count. For sliding window: pipeline ZADD key now member, ZREMRANGEBYSCORE key '-inf' windowStart, ZCARD key, and EXPIRE key windowSeconds+1 — three commands, ~0.3 ms at p99, ~28 KB per key at 100 req/min limit. For token bucket: use a Lua script (HMGET + compute + HMSET) for full read-modify-write atomicity — without Lua, a concurrent request can observe the same token count and both succeed, exceeding the limit. Namespace keys by algorithm and identifier: rl:fixed:ip:192.168.1.1:1716220800, rl:sliding:user:123, rl:token:apikey:sk-abc. Use @upstash/redis for Vercel/edge deployments (HTTP-based, no TCP required); use ioredis with connection pooling for Node.js server deployments.
Further reading and primary sources
- IETF RateLimit Headers for HTTP APIs (draft-7) — Draft specification defining RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset, and Retry-After standard headers
- RFC 7807: Problem Details for HTTP APIs — IETF standard defining the application/problem+json format for HTTP error responses including 429 Too Many Requests
- express-rate-limit documentation — Official docs for express-rate-limit middleware — fixed window, Redis store, custom handlers, standardHeaders option
- Upstash Ratelimit — Edge-compatible rate limiting — Token bucket and sliding window rate limiting backed by Upstash Redis — compatible with Next.js edge middleware
- Exponential Backoff and Jitter (AWS Architecture Blog) — AWS analysis comparing full jitter, equal jitter, and decorrelated jitter strategies for distributed retry storms