Why workflow caching?
Running AI models in production can become expensive as user volume grows. Often, users ask similar queries or trigger the exact same workflow multiple times. Re-processing these identical requests leads to high latency and duplicate token costs.
ModelRiver Cache solves this by capturing and instantly serving previously successful responses across your workflows.
How it works
- Intelligent Capture: When an AI request completes successfully, ModelRiver records the input context and the structured output if caching is enabled.
- Instant Delivery: When an incoming request matches a cached pattern exactly, the cached response is served instantly in less than
10ms. - Analytics: The Cache tab in your project console outlines your cache hit rate, bandwidth saved, and total latency reduction.
Enabling the cache
- Navigate to the Workflows section and open the settings for a specific workflow.
- Locate the Caching (optional) section.
- Select your desired Cache window (e.g.,
15m,1h, or1d). Cached responses older than this window are automatically ignored. - You can also manually clear the cache for this workflow at any time using the Clear button.
Key metrics
By utilizing the Workflow Cache, your team can expect:
- Zero-cost hits: Cached requests don't incur token costs from your underlying providers (like OpenAI or Anthropic).
- Sub-10ms latency: Instead of waiting
2-5 secondsfor an LLM to generate text, the response is instantly available.
Note: While cache hits save you from costly external provider usage, they are still counted as standard requests towards your ModelRiver limits to help us maintain our high-performance global cache network.
Cache hit criteria
To ensure responses are accurate when caching, ModelRiver enforces strict matching requirements. A cache hit only triggers if:
- System and User prompts match exactly.
- Temperature, Top P, and Model settings remain identical.
- Attachments (such as images for vision models) compute to the exact same hash.
If a mismatch occurs on any vector, the request will intelligently bypass the cache and run normally.
Next steps
- Type-safe solutions: Learn how cache works with structured outputs.
- Workflows: Discover how workflows process cached queries.
- Observability: Understand how cached responses appear in your timeline.