Rate limits that guard every request
Set limits per user, IP, or project. Stop runaway costs and abuse before they hit your providers.
Visual
How limits shape traffic
Requests enter buckets, burst is smoothed, healthy traffic passes through.
Source
Incoming request
user · ip · project
Identify scopes
user + ip + project
Sliding window
60s · 120 req
Burst guard
smooths spikes
Decision
Allowed → providers
Over limit → 429 + Retry-After
Logged
Analytics + request logs
limits:
user: 120 req/min
ip: 300 req/min
project: 1_000 req/min
strategy: "token-bucket"
action: "429 + retry-after"
log_throttles: true
Enforced before provider calls
Over-limit traffic is stopped up front to avoid wasted tokens and 429s.
Granular scopes
Combine user, IP, and project ceilings to smooth both burst and sustained traffic.
Audit every throttle
Throttled events hit analytics and request logs with timing and identifiers.
Multi-scope
User · IP · Project
Stack limits for layered protection.
Throttle feedback
429 + Retry-After
Clients get clear backoff guidance.
Visibility
Tracked
Analytics show who hit limits and when.
01 · Detect
Identify caller by user, IP, and project before provider calls.
02 · Enforce
Apply burst + sustained buckets; respond with 429 and retry-after.
03 · Observe
Log throttles to analytics with identifiers and timing.
04 · Tune
Adjust ceilings per plan, project, or user cohort.
Use cases
- ● Public API keys with per-IP protections.
- ● Multi-tenant apps needing project-level quotas.
- ● Freemium plans with tight burst limits.
What’s unique
- ● Enforced before provider calls to save tokens.
- ● Analytics + logs show who was throttled and why.
- ● Works alongside failover, streaming, and webhooks.
Ship safe by default
Combine limits with failover, structured outputs, and webhooks for resilient, observable traffic.