Back to all articles
SecurityTechnical

Rate Limiting Your Next.js API Routes - The Complete Guide

Protect your API from abuse with rate limiting. From simple in-memory solutions to production-ready Redis implementations.

Rate Limiting Your Next.js API Routes - The Complete Guide

I once woke up to a $400 Vercel bill. Someone had written a script that hit our public API endpoint 2 million times overnight. No authentication required, no rate limiting in place.

The endpoint just fetched some public data. But 2 million requests meant 2 million function invocations. At Vercel's pricing, that adds up fast.

Rate limiting would have stopped this in its tracks. Here's how to implement it properly.

Why Rate Limiting Matters

Three reasons you need rate limiting on any SaaS:

1. Cost control - Every API call costs money. Serverless functions, database queries, third-party API calls. An attacker (or buggy client) can rack up thousands in charges.

2. Security - Brute force attacks against login endpoints. Credential stuffing. Enumeration attacks. Rate limiting doesn't stop them completely, but it makes them impractical.

3. Fair usage - If one user hammers your API, everyone else suffers. Rate limiting ensures resources are distributed fairly.

Rate Limiting Algorithms

Before implementing, understand the two main approaches:

Token Bucket

Imagine a bucket that fills with tokens at a steady rate. Each request takes a token. If the bucket is empty, the request is denied.

Bucket capacity: 10 tokens
Refill rate: 1 token per second

Request 1: 10 tokens → 9 tokens ✓
Request 2: 9 tokens → 8 tokens ✓
...
Request 10: 1 token → 0 tokens ✓
Request 11: 0 tokens → denied ✗
(wait 1 second)
Request 12: 1 token → 0 tokens ✓

Good for allowing bursts while maintaining an average rate.

Sliding Window

Count requests in a rolling time window. If count exceeds limit, deny.

Window: 60 seconds
Limit: 100 requests

Requests in last 60 seconds: 99 → allowed ✓
Requests in last 60 seconds: 100 → allowed ✓
Requests in last 60 seconds: 101 → denied ✗

Simpler to understand and implement. This is what most rate limiters use.

Implementation 1: Simple In-Memory (Development)

For local development or very small apps, a simple Map works:

// lib/rate-limit.ts
const requests = new Map<string, number[]>()

export function rateLimit(
  identifier: string,
  limit: number = 10,
  windowMs: number = 60000
): { success: boolean; remaining: number } {
  const now = Date.now()
  const windowStart = now - windowMs

  // Get existing timestamps and filter to current window
  const timestamps = requests.get(identifier) || []
  const recentRequests = timestamps.filter(t => t > windowStart)

  if (recentRequests.length >= limit) {
    return { success: false, remaining: 0 }
  }

  // Add current request
  recentRequests.push(now)
  requests.set(identifier, recentRequests)

  return { success: true, remaining: limit - recentRequests.length }
}

Use it in an API route:

// app/api/data/route.ts
import { rateLimit } from '@/lib/rate-limit'
import { headers } from 'next/headers'

export async function GET() {
  const headersList = await headers()
  const ip = headersList.get('x-forwarded-for') || 'anonymous'

  const { success, remaining } = rateLimit(ip, 10, 60000)

  if (!success) {
    return Response.json(
      { error: 'Too many requests' },
      {
        status: 429,
        headers: {
          'X-RateLimit-Remaining': '0',
          'Retry-After': '60'
        }
      }
    )
  }

  // Your actual logic here
  return Response.json(
    { data: 'Hello' },
    {
      headers: {
        'X-RateLimit-Remaining': remaining.toString()
      }
    }
  )
}

Why this doesn't work in production:

  1. Serverless = no shared memory - Each function instance has its own Map. User hits different instances, rate limit resets.
  2. Memory leaks - The Map grows forever if you don't clean it up.
  3. No persistence - Function cold starts wipe the Map.

Use this for development only.

Implementation 2: Upstash Redis (Production)

Upstash provides serverless Redis with a rate limiting SDK. It's the easiest production solution.

pnpm add @upstash/ratelimit @upstash/redis

Set up the client:

// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

// Create a new ratelimiter that allows 10 requests per 10 seconds
export const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '10 s'),
  analytics: true,
  prefix: '@upstash/ratelimit',
})

Environment variables needed:

UPSTASH_REDIS_REST_URL=https://xxx.upstash.io
UPSTASH_REDIS_REST_TOKEN=xxx

Use in an API route:

// app/api/data/route.ts
import { ratelimit } from '@/lib/rate-limit'
import { headers } from 'next/headers'

export async function GET() {
  const headersList = await headers()
  const ip = headersList.get('x-forwarded-for') || 'anonymous'

  const { success, limit, remaining, reset } = await ratelimit.limit(ip)

  if (!success) {
    return Response.json(
      { error: 'Too many requests. Please try again later.' },
      {
        status: 429,
        headers: {
          'X-RateLimit-Limit': limit.toString(),
          'X-RateLimit-Remaining': '0',
          'X-RateLimit-Reset': reset.toString(),
        }
      }
    )
  }

  return Response.json(
    { data: 'Hello' },
    {
      headers: {
        'X-RateLimit-Limit': limit.toString(),
        'X-RateLimit-Remaining': remaining.toString(),
        'X-RateLimit-Reset': reset.toString(),
      }
    }
  )
}

Upstash's SDK handles all the complexity - sliding windows, atomic operations, distributed state.

Different Limits for Different Routes

You can create multiple rate limiters:

// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

const redis = Redis.fromEnv()

// Strict limit for auth endpoints
export const authRatelimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(5, '1 m'),
  prefix: 'ratelimit:auth',
})

// Relaxed limit for general API
export const apiRatelimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(100, '1 m'),
  prefix: 'ratelimit:api',
})

// Very strict for sensitive operations
export const sensitiveRatelimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(3, '1 h'),
  prefix: 'ratelimit:sensitive',
})

Implementation 3: Middleware (Global Protection)

For app-wide rate limiting, use Next.js middleware:

// middleware.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'
import { NextResponse } from 'next/server'
import type { NextRequest } from 'next/server'

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(100, '1 m'),
})

export async function middleware(request: NextRequest) {
  // Only rate limit API routes
  if (!request.nextUrl.pathname.startsWith('/api')) {
    return NextResponse.next()
  }

  const ip = request.ip ?? request.headers.get('x-forwarded-for') ?? 'anonymous'

  const { success, limit, remaining, reset } = await ratelimit.limit(ip)

  if (!success) {
    return NextResponse.json(
      { error: 'Too many requests' },
      {
        status: 429,
        headers: {
          'X-RateLimit-Limit': limit.toString(),
          'X-RateLimit-Remaining': '0',
          'X-RateLimit-Reset': reset.toString(),
        }
      }
    )
  }

  const response = NextResponse.next()
  response.headers.set('X-RateLimit-Limit', limit.toString())
  response.headers.set('X-RateLimit-Remaining', remaining.toString())
  response.headers.set('X-RateLimit-Reset', reset.toString())

  return response
}

export const config = {
  matcher: '/api/:path*',
}

This catches all API requests before they hit your route handlers.

Implementation 4: Unkey (Managed Solution)

Unkey provides API key management with built-in rate limiting. If you already need API keys, this is a two-for-one:

// app/api/data/route.ts
import { verifyKey } from '@unkey/api'
import { headers } from 'next/headers'

export async function GET() {
  const headersList = await headers()
  const apiKey = headersList.get('x-api-key')

  if (!apiKey) {
    return Response.json({ error: 'Missing API key' }, { status: 401 })
  }

  const { result, error } = await verifyKey(apiKey)

  if (error || !result.valid) {
    return Response.json({ error: 'Invalid API key' }, { status: 401 })
  }

  if (result.ratelimit?.remaining === 0) {
    return Response.json(
      { error: 'Rate limit exceeded' },
      { status: 429 }
    )
  }

  return Response.json({ data: 'Hello' })
}

Rate limits are configured in the Unkey dashboard per API key. Different customers can have different limits.

Identifying Users

Rate limiting by IP address is the default, but it has issues:

  • Shared IPs - Users behind corporate NAT or VPNs share IPs
  • Dynamic IPs - Some users change IPs frequently
  • IPv6 - Users might have multiple IPv6 addresses

Better approaches:

For authenticated endpoints:

const userId = session.user.id
await ratelimit.limit(`user:${userId}`)

For public endpoints, combine signals:

const ip = headers.get('x-forwarded-for')
const fingerprint = headers.get('x-fingerprint') // If using client-side fingerprinting
const identifier = fingerprint || ip || 'anonymous'
await ratelimit.limit(identifier)

For API keys:

const apiKey = headers.get('x-api-key')
await ratelimit.limit(`key:${apiKey}`)

The Right Response

When you rate limit someone, be helpful:

return Response.json(
  {
    error: 'Too many requests',
    message: 'Please wait before making another request',
    retryAfter: Math.ceil((reset - Date.now()) / 1000),
  },
  {
    status: 429,
    headers: {
      'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
      'X-RateLimit-Limit': limit.toString(),
      'X-RateLimit-Remaining': '0',
      'X-RateLimit-Reset': reset.toString(),
    }
  }
)

The Retry-After header tells well-behaved clients when to try again. The JSON body helps developers debug.

Testing Your Rate Limiter

Before deploying, verify it works:

# Quick test with curl
for i in {1..15}; do
  curl -s -o /dev/null -w "%{http_code}\n" http://localhost:3000/api/test
done

You should see 200s turn into 429s once you hit the limit.

For automated tests:

// __tests__/rate-limit.test.ts
import { rateLimit } from '@/lib/rate-limit'

describe('rate limiter', () => {
  it('allows requests under limit', async () => {
    const identifier = `test-${Date.now()}`

    for (let i = 0; i < 10; i++) {
      const { success } = await ratelimit.limit(identifier)
      expect(success).toBe(true)
    }
  })

  it('blocks requests over limit', async () => {
    const identifier = `test-${Date.now()}`

    // Exhaust the limit
    for (let i = 0; i < 10; i++) {
      await ratelimit.limit(identifier)
    }

    // Next request should fail
    const { success } = await ratelimit.limit(identifier)
    expect(success).toBe(false)
  })
})

Common Mistakes

1. Rate limiting after expensive operations

// Bad - database query runs before rate limit check
const data = await db.query.expensive.findMany()
const { success } = await ratelimit.limit(ip)

// Good - check first
const { success } = await ratelimit.limit(ip)
if (!success) return rateLimitResponse()
const data = await db.query.expensive.findMany()

2. Not rate limiting authenticated users

Just because someone's logged in doesn't mean they can't abuse your API. Rate limit everyone.

3. Using IP only for sensitive endpoints

For login, password reset, and other sensitive operations, combine IP with the target identifier:

// Rate limit login attempts per IP per email
await ratelimit.limit(`login:${ip}:${email}`)

4. Forgetting about preview deployments

Your rate limit Redis instance should be different for preview vs production. Otherwise, testing can exhaust production rate limits.

What We Do in vibestacks

In vibestacks, we use a layered approach:

  1. Middleware - Global rate limit on all API routes (100 req/min per IP)
  2. Auth routes - Stricter limits (5 req/min per IP per email) using Upstash
  3. Sensitive operations - Very strict (3 per hour) for things like password changes
  4. Cloudflare - Additional protection at the edge for DDoS

The exact limits depend on your app, but having multiple layers means one failure doesn't expose everything.

The Bottom Line

Rate limiting isn't optional. It's like locking your front door - you might never need it, but when you do, you really need it.

Start with Upstash. It's free for small projects, scales infinitely, and takes 10 minutes to set up. The alternative is a surprise bill or a compromised system.


Questions about rate limiting or API security? Hit me up at raman@vibestacks.io or join us on Discord.