Performance

Zero-latency, zero-cost AI responses

Stop paying for repeated AI generations. Automatically serve instantly cached, perfectly matched responses for incoming requests.

Zero-cost hits Sub-10ms latency Exact match criteria Per-workflow windows

Visual

Cache flow

How we instantly serve previously successful responses.

01

Input prompt

Unstructured text or data stream

03

Instant delivery

Served instantly without hitting provider API

02

Match settings

System, User, Temp, Model must match exactly

04

Decision point

Hit → serve cached response

Miss → route request to provider normally

05

Analytics

Cache hit rate, latency, and tokens saved logged

Configuration
workflow: "chat-agent"
cache:
  enabled: true
  window: "15m"  // Cached responses older than 15m ignored
criteria:
  - "exact match on system/user prompts"
  - "exact match on provider settings"
  - "exact match on attachment hashes"
              
1

Enable instantly

Toggle caching per-workflow and configure cache duration window.

2

Strict matching

A cache hit only triggers if prompt, settings, and payload attachments match completely.

3

Metrics

Cache hits count towards overall usage but save you latency and provider cost.

Cost

Zero tokens

Hits bypass underlying providers.

Latency

< 10ms

Serve complex generation in milliseconds.

Failover

Configuration-aware

Automatic retries if parsing fails.

Scroll the lifecycle

01 · Enable

Toggle workflow caching directly from the settings menu.

02 · Compute

Strict hash matching on input parameters guarantees relevance.

03 · Serve

Return previous valid completions immediately.

04 · Monitor

View exact token and bandwidth savings in the analytics pane.

Use cases

  • Repetitive batch processing flows.
  • High-volume identical user queries.
  • E-commerce categorization workflows.

What’s unique

  • Caching applied dynamically per-workflow.
  • Hard criteria hash validations prior to routing.
  • Tracks precise metrics per workflow.

Programmatic access

Metrics are reported right from the dashboard endpoints

POST https://api.modelriver.com/v1/ai
Authorization: Bearer mr_live_your_key

{
  "workflow": "product-extractor",
  "messages": [...]
}

// Response returns validated structured data:
{
  "data": {
    "name": "Widget Pro",
    "price": 49.99,
    "category": "Electronics"
  },
  "meta": { "structured_output": true }
}

Cache hit indicators returned dynamically per-generation request as metadata headers.

Cache queries, drop latency, crush costs

Native support for Zod, JSON Configuration, and typed outputs across all providers. Zero parsing code required.