Quota

Rate limiting for global AI traffic

Control costs and prevent abuse with fine-grained rate limits per user, IP, or project. Flexible quotas that protect your AI budget.

Sliding window limits User & IP identification Configurable burst guards Multiple scopes

Visual

Rate limit flow

How we identify, check, and enforce quotas on incoming requests.

01

Request arrives

Includes user ID or IP metadata

02

Identity identified

Resolves specific rate-limit keys

03

Check sliding window

Validates usage against quota

04

Burst guard

Smooths traffic spikes

05

Decision point

Allowed → forward to providers

Over limit → 429 + Retry-After

06

Analytics + request logs

Throttle events recorded

Flexible logic
limits:
  user: 120 req/min
  ip: 300 req/min
  project: 1_000 req/min
strategy: "token-bucket"
action: "429 + retry-after"
log_throttles: true
              
1

Per user or IP

Pass an identifier to enforce limits across sessions for any specific user.

2

Sliding windows

Precise limiting that prevents bursts at the turn of the hour/minute.

3

Project-wide

Set global guards to ensure no project overruns your provider-side billing.

Scopes

Per user/IP

Limit traffic for individual users across any session.

Algorithm

Sliding

Smooth enforcement that avoids fixed-window edge cases.

Response

429 Standard

Clean headers with Retry-After for client backoff logic.

Scroll the playbook

01 · Define

Set limits per project, user, or IP in the console.

02 · Identify

Pass request metadata to trigger the correct quota bucket.

03 · Enforce

Excess traffic gets 429 responses with clear Retry-After headers.

04 · Protect

Ensure your AI budget is never blown by runaway scripts or abuse.

Use cases

  • Public API keys with per-IP protections.
  • Multi-tenant apps needing project-level quotas.
  • Freemium plans with tight burst limits.

What’s unique

  • Enforced before provider calls to save tokens.
  • Analytics + logs show who was throttled and why.
  • Works alongside failover, streaming, and webhooks.

Programmatic access

Rate limits are enforced automatically per project

POST https://api.modelriver.com/v1/ai
Authorization: Bearer mr_live_your_key

{
  "workflow": "user-query",
  "messages": [
    { "role": "user", "content": "..." }
  ]
}

// When rate limited, you receive:
{
  "error": {
    "message": "Rate limit exceeded",
    "retry_after": 60
  }
}

Configure limits per project in the console. Requests are enforced before provider calls to save tokens. Throttle events appear in analytics.

Ship safe by default

Combine limits with failover, structured outputs, and webhooks for resilient, observable traffic.