Rate limiting for global AI traffic
Control costs and prevent abuse with fine-grained rate limits per user, IP, or project. Flexible quotas that protect your AI budget.
Visual
Rate limit flow
How we identify, check, and enforce quotas on incoming requests.
Request arrives
Includes user ID or IP metadata
Identity identified
Resolves specific rate-limit keys
Check sliding window
Validates usage against quota
Burst guard
Smooths traffic spikes
Decision point
Allowed → forward to providers
Over limit → 429 + Retry-After
Analytics + request logs
Throttle events recorded
limits: user: 120 req/min ip: 300 req/min project: 1_000 req/min strategy: "token-bucket" action: "429 + retry-after" log_throttles: true
Per user or IP
Pass an identifier to enforce limits across sessions for any specific user.
Sliding windows
Precise limiting that prevents bursts at the turn of the hour/minute.
Project-wide
Set global guards to ensure no project overruns your provider-side billing.
Scopes
Per user/IP
Limit traffic for individual users across any session.
Algorithm
Sliding
Smooth enforcement that avoids fixed-window edge cases.
Response
429 Standard
Clean headers with Retry-After for client backoff logic.
01 · Define
Set limits per project, user, or IP in the console.
02 · Identify
Pass request metadata to trigger the correct quota bucket.
03 · Enforce
Excess traffic gets 429 responses with clear Retry-After headers.
04 · Protect
Ensure your AI budget is never blown by runaway scripts or abuse.
Use cases
- ● Public API keys with per-IP protections.
- ● Multi-tenant apps needing project-level quotas.
- ● Freemium plans with tight burst limits.
What’s unique
- ● Enforced before provider calls to save tokens.
- ● Analytics + logs show who was throttled and why.
- ● Works alongside failover, streaming, and webhooks.
Programmatic access
Rate limits are enforced automatically per project
POST https://api.modelriver.com/v1/ai Authorization: Bearer mr_live_your_key { "workflow": "user-query", "messages": [ { "role": "user", "content": "..." } ] } // When rate limited, you receive: { "error": { "message": "Rate limit exceeded", "retry_after": 60 } }
Configure limits per project in the console. Requests are enforced before provider calls to save tokens. Throttle events appear in analytics.
Ship safe by default
Combine limits with failover, structured outputs, and webhooks for resilient, observable traffic.