Content Safety

Gateway input guardrails for every AI request

Scan user prompts before they reach providers. Enforce or monitor content policies across API, streaming, async, and playground traffic — with always-on minors protection.

Enforce & monitor modes Four policy categories Always-on minors protection Local + remote moderation

Visual

Guardrail decision flow

How we extract, classify, and act on user-supplied input at the gateway.

01

Request arrives

API, streaming, async, or playground

02

Extract user text

Scans messages, prompt, and input fields

03

Local classifier

Fast regex patterns + always-on minors check

04

Remote moderation

OpenAI omni-moderation for ambiguous cases

05

Decision point

Allow → forward to providers

Block → 403 with generic message

06

Audit log

Categories recorded — request body never stored

Per-project policy
guardrail_mode: "enforce"
guardrail_categories:
  - sexual
  - self-harm
  - hate
  - violence
// minors/CSAM: always enforced
modes: enforce | monitor | disabled
              
1

Enforce or monitor

Block violating requests in enforce mode, or log violations and allow traffic in monitor mode for gradual rollout.

2

Category control

Toggle sexual, self-harm, hate, and violence categories per project. Minors protection cannot be disabled.

3

Privacy-first blocking

Blocked prompts are never stored in logs or returned in error responses. No provider tokens are consumed.

Local check

<5ms

Regex classifier runs before any provider call.

Categories

4 + minors

Configurable policy categories plus always-on CSAM protection.

Coverage

All entrypoints

API, streaming, async, playground, and OpenAI-compatible routes.

Scroll the playbook

01 · Configure

Set enforce, monitor, or disabled mode and pick categories in project settings.

02 · Scan

Local classifier checks every request; ambiguous cases escalate to remote moderation.

03 · Decide

Violations are blocked or logged based on your mode. Decisions are cached for repeat prompts.

04 · Audit

Request logs capture categories and latency — never the blocked prompt text.

Abuse protection after repeated denials

When enforce mode blocks the same actor repeatedly, ModelRiver applies a cooldown throttle — similar to smart rate limiting — returning HTTP 429 with Retry-After headers. This stops bad actors from hammering the gateway without consuming provider tokens.

Use cases

  • Public-facing chatbots that need content policy enforcement.
  • Enterprise apps rolling out safety policies in monitor mode first.
  • Multi-tenant platforms requiring per-project policy controls.

What’s unique

  • Runs before provider calls — blocked requests never bill.
  • Two-tier local + remote classification with decision caching.
  • Works across sync, async, streaming, and OpenAI-compatible APIs.

Programmatic access

Guardrails run automatically on every request — configure per project in the console

POST https://api.modelriver.com/v1/ai
Authorization: Bearer mr_live_your_key

{
  "model": "chat-assistant",
  "messages": [
    { "role": "user", "content": "..." }
  ]
}

// When blocked by content policy:
{
  "error": "content_policy_violation",
  "message": "Request blocked by content policy.",
  "categories": ["violence"]
}

Configure guardrail mode and categories in project settings. Only organization owners and admins can disable or weaken policies. Blocked requests are never billed.

Ship with safety built in

Combine guardrails with rate limiting, failover, and analytics for resilient, policy-compliant AI traffic.