Trace every provider failover attempt

When providers fail, ModelRiver automatically retries with fallback providers. The timeline captures every attempt—successful or failed—so you can understand resilience in action.

Overview

When a request's primary provider fails, ModelRiver's multi-provider failback system automatically retries with the next provider in your workflow's fallback chain. Each failed attempt is logged as a failover attempt in the timeline, capturing the full request/response data and failure reason. This gives you complete visibility into your system's resilience behavior.


How failover attempts appear

In the log list

Requests that required failovers show a failed models badge — a red badge with a count indicating how many provider attempts failed before the request was ultimately resolved.

Example: A badge showing "2 failed" means two providers failed before the third succeeded (or before all providers were exhausted).

In the timeline

Failover attempts appear as amber/yellow badges at the beginning of the timeline, before the main request:

  • Position: Before the main request, in chronological order
  • Badge color: Amber/yellow
  • Badge content: Provider icon, name, and model
  • Additional info: Duration and timestamp

When clicked

Clicking a failover attempt reveals:

  • Provider & model – Which provider and model were attempted
  • Status – "Failed" with the specific error
  • Duration – How long the attempt took before failing
  • Timestamp – When the attempt occurred
  • Request Body tab – The exact request sent to the provider
  • Response Body tab – The provider's error response

Understanding failover data

Primary Request ID

Every failover attempt is linked to its eventual main request via the primary_req_id field. This creates a chain:

Failover Attempt 1 (primary_req_id: "abc-123")
Failover Attempt 2 (primary_req_id: "abc-123")
Main Request (id: "abc-123")

This linking ensures you can always trace the complete failover chain for any request.

Common failure reasons

Error typeDescriptionAction
Rate limit (429)Provider's rate limit exceededReduce request rate or upgrade provider plan
Server error (500/502/503)Provider infrastructure issueUsually transient; monitor for patterns
Model unavailableModel offline or deprecatedUpdate workflow to use available model
Authentication error (401/403)Invalid or expired API keyRotate credentials in ModelRiver
Content policy violationInput rejected by provider safety filtersReview and adjust input content
TimeoutProvider didn't respond in timeProvider may be overloaded
Invalid request (400)Request format incompatible with providerCheck provider-specific requirements

Failover impact on performance

Each failover attempt adds to total request latency:

Total latency = Attempt 1 duration + Attempt 2 duration + ... + Main request duration

Frequent failovers indicate provider instability and directly impact user experience. See Performance Monitoring for latency analysis.

Failover impact on cost

Some providers charge for tokens even on failed requests (if the model processed tokens before the error). This means failovers can increase the true cost of a request beyond what the main request shows.

See Cost Analysis for cost optimization strategies.


Debugging failover attempts

Step-by-step investigation

  1. Open the request detail – Click the request in the log list
  2. Review the timeline – Note the number and order of failover attempts
  3. Click each failover attempt – View the error details
  4. Compare request bodies – Verify the same request was sent to each provider
  5. Read error messages – Identify the specific failure reason
  6. Check for patterns – Is the same provider always failing? Is the error always the same?

What to look for

  • Same error across providers – May indicate a request-level issue (e.g., content policy violation) rather than a provider issue
  • One provider always failing – Provider-specific issue; consider removing or deprioritizing
  • Rate limit errors – Too many requests; need to distribute load or upgrade plans
  • Intermittent failures – Transient provider issues; failover is working as designed

Failover best practices

Configure fallback providers wisely

  • Diversity: Use providers from different vendors to avoid correlated failures
  • Priority: Put the most reliable provider first
  • Cost awareness: Put cheaper providers earlier in the chain
  • Compatibility: Ensure all fallback providers support your request format

Monitor failover rates

  • Track what percentage of requests require failovers
  • Set alerts for failover rate thresholds (e.g., > 5%)
  • Review Provider Reliability regularly

Respond to failover patterns

  • Consistent failures: Remove the failing provider or update credentials
  • Rate limiting: Upgrade provider plan or reduce request rate
  • Model deprecation: Update workflow to use the replacement model

Next steps