Content Safety

Gateway input guardrails for every AI request

Scan user prompts before they reach providers. Enforce or monitor content policies across API, streaming, async, and playground traffic — with always-on minors protection.

Enforce & monitor modes Four policy categories Always-on minors protection Local + remote moderation

Visual

Guardrail decision flow

How we extract, classify, and act on user-supplied input at the gateway.

Request arrives

API, streaming, async, or playground

Extract user text

Scans messages, prompt, and input fields

Local classifier

Fast regex patterns + always-on minors check

Remote moderation

OpenAI omni-moderation for ambiguous cases

Decision point

Allow → forward to providers

Block → 403 with generic message

Audit log

Categories recorded — request body never stored

Per-project policy

guardrail_mode: "enforce"
guardrail_categories:
  - sexual
  - self-harm
  - hate
  - violence
// minors/CSAM: always enforced
modes: enforce | monitor | disabled

Enforce or monitor

Block violating requests in enforce mode, or log violations and allow traffic in monitor mode for gradual rollout.

Category control

Toggle sexual, self-harm, hate, and violence categories per project. Minors protection cannot be disabled.

Privacy-first blocking

Blocked prompts are never stored in logs or returned in error responses. No provider tokens are consumed.

Local check

<5ms

Regex classifier runs before any provider call.

Abuse protection after repeated denials

When enforce mode blocks the same actor repeatedly, ModelRiver applies a cooldown throttle — similar to smart rate limiting — returning HTTP 429 with Retry-After headers. This stops bad actors from hammering the gateway without consuming provider tokens.

Use cases

● Public-facing chatbots that need content policy enforcement.
● Enterprise apps rolling out safety policies in monitor mode first.
● Multi-tenant platforms requiring per-project policy controls.

What’s unique

● Runs before provider calls — blocked requests never bill.
● Two-tier local + remote classification with decision caching.
● Works across sync, async, streaming, and OpenAI-compatible APIs.

Programmatic access

Guardrails run automatically on every request — configure per project in the console

POST https://api.modelriver.com/v1/ai
Authorization: Bearer mr_live_your_key

{
  "model": "chat-assistant",
  "messages": [
    { "role": "user", "content": "..." }
  ]
}

// When blocked by content policy:
{
  "error": "content_policy_violation",
  "message": "Request blocked by content policy.",
  "categories": ["violence"]
}

Configure guardrail mode and categories in project settings. Only organization owners and admins can disable or weaken policies. Blocked requests are never billed.

Ship with safety built in

Combine guardrails with rate limiting, failover, and analytics for resilient, policy-compliant AI traffic.

Start free View docs