Documentation

Workflow Cache

Stop paying for repeated AI generations. Automatically serve instantly cached, perfectly matched responses for incoming requests that have already been processed by ModelRiver.

Why workflow caching?

Running AI models in production can become expensive as user volume grows. Often, users ask similar queries or trigger the exact same workflow multiple times. Re-processing these identical requests leads to high latency and duplicate token costs.

ModelRiver Cache solves this by capturing and instantly serving previously successful responses across your workflows.

How it works

  1. Intelligent Capture: When an AI request completes successfully, ModelRiver records the input context and the structured output if caching is enabled.
  2. Instant Delivery: When an incoming request matches a cached pattern exactly, the cached response is served instantly in less than 10ms.
  3. Analytics: The Cache tab in your project console outlines your cache hit rate, bandwidth saved, and total latency reduction.

Enabling the cache

  1. Navigate to the Workflows section and open the settings for a specific workflow.
  2. Locate the Caching (optional) section.
  3. Select your desired Cache window (e.g., 15m, 1h, or 1d). Cached responses older than this window are automatically ignored.
  4. You can also manually clear the cache for this workflow at any time using the Clear button.

Key metrics

By utilizing the Workflow Cache, your team can expect:

  • Zero-cost hits: Cached requests don't incur token costs from your underlying providers (like OpenAI or Anthropic).
  • Sub-10ms latency: Instead of waiting 2-5 seconds for an LLM to generate text, the response is instantly available.

Note: While cache hits save you from costly external provider usage, they are still counted as standard requests towards your ModelRiver limits to help us maintain our high-performance global cache network.

Cache hit criteria

To ensure responses are accurate when caching, ModelRiver enforces strict matching requirements. A cache hit only triggers if:

  • System and User prompts match exactly.
  • Temperature, Top P, and Model settings remain identical.
  • Attachments (such as images for vision models) compute to the exact same hash.

If a mismatch occurs on any vector, the request will intelligently bypass the cache and run normally.

Next steps