Provider failover attempts – ModelRiver Docs

Overview

When a request's primary provider fails, ModelRiver's multi-provider failback system automatically retries with the next provider in your workflow's fallback chain. Each failed attempt is logged as a failover attempt in the timeline, capturing the full request/response data and failure reason. This gives you complete visibility into your system's resilience behavior.

How failover attempts appear

In the log list

Requests that required failovers show a failed models badge — a red badge with a count indicating how many provider attempts failed before the request was ultimately resolved.

Example: A badge showing "2 failed" means two providers failed before the third succeeded (or before all providers were exhausted).

In the timeline

Failover attempts appear as amber/yellow badges at the beginning of the timeline, before the main request:

Position: Before the main request, in chronological order
Badge color: Amber/yellow
Badge content: Provider icon, name, and model
Additional info: Duration and timestamp

When clicked

Clicking a failover attempt reveals:

Provider & model – Which provider and model were attempted
Status – "Failed" with the specific error
Duration – How long the attempt took before failing
Timestamp – When the attempt occurred
Request Body tab – The exact request sent to the provider
Response Body tab – The provider's error response

Understanding failover data

Primary Request ID

Every failover attempt is linked to its eventual main request via the primary_req_id field. This creates a chain:

Failover Attempt 1 (primary_req_id: "abc-123")
  ↓
Failover Attempt 2 (primary_req_id: "abc-123")
  ↓
Main Request (id: "abc-123")

This linking ensures you can always trace the complete failover chain for any request.

Common failure reasons

Error type	Description	Action
Rate limit (429)	Provider's rate limit exceeded	Reduce request rate or upgrade provider plan
Server error (500/502/503)	Provider infrastructure issue	Usually transient; monitor for patterns
Model unavailable	Model offline or deprecated	Update workflow to use available model
Authentication error (401/403)	Invalid or expired API key	Rotate credentials in ModelRiver
Content policy violation	Input rejected by provider safety filters	Review and adjust input content
Timeout	Provider didn't respond in time	Provider may be overloaded
Invalid request (400)	Request format incompatible with provider	Check provider-specific requirements

Failover impact on performance

Each failover attempt adds to total request latency:

Total latency = Attempt 1 duration + Attempt 2 duration + ... + Main request duration

Frequent failovers indicate provider instability and directly impact user experience. See Performance Monitoring for latency analysis.

Failover impact on cost

Some providers charge for tokens even on failed requests (if the model processed tokens before the error). This means failovers can increase the true cost of a request beyond what the main request shows.

See Cost Analysis for cost optimization strategies.

Debugging failover attempts

Step-by-step investigation

Open the request detail – Click the request in the log list
Review the timeline – Note the number and order of failover attempts
Click each failover attempt – View the error details
Compare request bodies – Verify the same request was sent to each provider
Read error messages – Identify the specific failure reason
Check for patterns – Is the same provider always failing? Is the error always the same?

What to look for

Same error across providers – May indicate a request-level issue (e.g., content policy violation) rather than a provider issue
One provider always failing – Provider-specific issue; consider removing or deprioritizing
Rate limit errors – Too many requests; need to distribute load or upgrade plans
Intermittent failures – Transient provider issues; failover is working as designed

Failover best practices

Configure fallback providers wisely

Diversity: Use providers from different vendors to avoid correlated failures
Priority: Put the most reliable provider first
Cost awareness: Put cheaper providers earlier in the chain
Compatibility: Ensure all fallback providers support your request format

Monitor failover rates

Track what percentage of requests require failovers
Set alerts for failover rate thresholds (e.g., > 5%)
Review Provider Reliability regularly

Respond to failover patterns

Consistent failures: Remove the failing provider or update credentials
Rate limiting: Upgrade provider plan or reduce request rate
Model deprecation: Update workflow to use the replacement model

Next steps

Primary Request – The main request details
Provider Reliability – Aggregate provider failure analysis
Performance Monitoring – Latency impact of failovers
Back to Timeline – Timeline overview

Trace every provider failover attempt