Zero-latency, zero-cost AI responses
Stop paying for repeated AI generations. Automatically serve instantly cached, perfectly matched responses for incoming requests.
Visual
Cache flow
How we instantly serve previously successful responses.
Input prompt
Unstructured text or data stream
Instant delivery
Served instantly without hitting provider API
Match settings
System, User, Temp, Model must match exactly
Decision point
Hit → serve cached response
Miss → route request to provider normally
Analytics
Cache hit rate, latency, and tokens saved logged
workflow: "chat-agent"
cache:
enabled: true
window: "15m" // Cached responses older than 15m ignored
criteria:
- "exact match on system/user prompts"
- "exact match on provider settings"
- "exact match on attachment hashes"
Enable instantly
Toggle caching per-workflow and configure cache duration window.
Strict matching
A cache hit only triggers if prompt, settings, and payload attachments match completely.
Metrics
Cache hits count towards overall usage but save you latency and provider cost.
Cost
Zero tokens
Hits bypass underlying providers.
Latency
< 10ms
Serve complex generation in milliseconds.
Failover
Configuration-aware
Automatic retries if parsing fails.
01 · Enable
Toggle workflow caching directly from the settings menu.
02 · Compute
Strict hash matching on input parameters guarantees relevance.
03 · Serve
Return previous valid completions immediately.
04 · Monitor
View exact token and bandwidth savings in the analytics pane.
Use cases
- ● Repetitive batch processing flows.
- ● High-volume identical user queries.
- ● E-commerce categorization workflows.
What’s unique
- ● Caching applied dynamically per-workflow.
- ● Hard criteria hash validations prior to routing.
- ● Tracks precise metrics per workflow.
Programmatic access
Metrics are reported right from the dashboard endpoints
POST https://api.modelriver.com/v1/ai Authorization: Bearer mr_live_your_key { "workflow": "product-extractor", "messages": [...] } // Response returns validated structured data: { "data": { "name": "Widget Pro", "price": 49.99, "category": "Electronics" }, "meta": { "structured_output": true } }
Cache hit indicators returned dynamically per-generation request as metadata headers.
Cache queries, drop latency, crush costs
Native support for Zod, JSON Configuration, and typed outputs across all providers. Zero parsing code required.