Gateway input guardrails for every AI request
Scan user prompts before they reach providers. Enforce or monitor content policies across API, streaming, async, and playground traffic — with always-on minors protection.
Visual
Guardrail decision flow
How we extract, classify, and act on user-supplied input at the gateway.
Request arrives
API, streaming, async, or playground
Extract user text
Scans messages, prompt, and input fields
Local classifier
Fast regex patterns + always-on minors check
Remote moderation
OpenAI omni-moderation for ambiguous cases
Decision point
Allow → forward to providers
Block → 403 with generic message
Audit log
Categories recorded — request body never stored
guardrail_mode: "enforce" guardrail_categories: - sexual - self-harm - hate - violence // minors/CSAM: always enforced modes: enforce | monitor | disabled
Enforce or monitor
Block violating requests in enforce mode, or log violations and allow traffic in monitor mode for gradual rollout.
Category control
Toggle sexual, self-harm, hate, and violence categories per project. Minors protection cannot be disabled.
Privacy-first blocking
Blocked prompts are never stored in logs or returned in error responses. No provider tokens are consumed.
Local check
<5ms
Regex classifier runs before any provider call.
Categories
4 + minors
Configurable policy categories plus always-on CSAM protection.
Coverage
All entrypoints
API, streaming, async, playground, and OpenAI-compatible routes.
01 · Configure
Set enforce, monitor, or disabled mode and pick categories in project settings.
02 · Scan
Local classifier checks every request; ambiguous cases escalate to remote moderation.
03 · Decide
Violations are blocked or logged based on your mode. Decisions are cached for repeat prompts.
04 · Audit
Request logs capture categories and latency — never the blocked prompt text.
Abuse protection after repeated denials
When enforce mode blocks the same actor repeatedly, ModelRiver applies a cooldown throttle — similar to smart rate limiting — returning HTTP 429 with Retry-After headers. This stops bad actors from hammering the gateway without consuming provider tokens.
Use cases
- ● Public-facing chatbots that need content policy enforcement.
- ● Enterprise apps rolling out safety policies in monitor mode first.
- ● Multi-tenant platforms requiring per-project policy controls.
What’s unique
- ● Runs before provider calls — blocked requests never bill.
- ● Two-tier local + remote classification with decision caching.
- ● Works across sync, async, streaming, and OpenAI-compatible APIs.
Programmatic access
Guardrails run automatically on every request — configure per project in the console
POST https://api.modelriver.com/v1/ai Authorization: Bearer mr_live_your_key { "model": "chat-assistant", "messages": [ { "role": "user", "content": "..." } ] } // When blocked by content policy: { "error": "content_policy_violation", "message": "Request blocked by content policy.", "categories": ["violence"] }
Configure guardrail mode and categories in project settings. Only organization owners and admins can disable or weaken policies. Blocked requests are never billed.
Ship with safety built in
Combine guardrails with rate limiting, failover, and analytics for resilient, policy-compliant AI traffic.