Overview
AI providers are external services that can fail for various reasons—rate limiting, infrastructure issues, model availability, and more. ModelRiver's multi-provider failover system ensures your requests succeed even when individual providers fail. Request Logs capture every provider attempt, successful or failed, giving you the data to evaluate provider reliability and optimize your workflow configuration.
Understanding provider reliability data
Failover attempts in Request Logs
When a provider fails and ModelRiver retries with a fallback:
- A log entry is created for the failed attempt with
status: "failed"and aprimary_req_idlinking to the eventual successful request - The successful request (or final failed request if all providers fail) is the main log entry
- The timeline in the detail view shows all attempts in chronological order
Key indicators of provider reliability
| Indicator | Where to find it | What it tells you |
|---|---|---|
| Failed models badge | Log list view (red badge with count) | How many provider attempts failed before success |
| Failover attempts | Timeline in detail view (amber badges) | Which specific providers failed and why |
| Error messages | Failed attempt detail view | Root cause of each failure |
| Overall success/error rate | Status column across many requests | Provider health over time |
Reliability analysis workflow
Step 1: Identify failover frequency
- Navigate to Request Logs in your project console
- Filter to Live mode for production traffic
- Look for requests with failed models badges — these required fallbacks
- Count how frequently failovers occur relative to total requests
Healthy benchmark: Failovers should be rare events (< 5% of requests). If more than 10% of requests require fallbacks, there's likely a provider issue to address.
Step 2: Analyze failure patterns by provider
Click on requests with failover attempts and review the timeline:
- Which provider failed? – Note the provider name and model
- Why did it fail? – Read the error message in the failed attempt detail
- When did it fail? – Check timestamps for time-based patterns
Common failure reasons by provider:
| Failure type | Description | Indicates |
|---|---|---|
| Rate limiting | Provider returned 429 (Too Many Requests) | Need to reduce request rate or upgrade provider plan |
| Server errors | Provider returned 500/502/503 | Provider infrastructure issues |
| Model unavailable | Model is temporarily offline | Provider is updating or deprecating the model |
| Authentication error | Invalid or expired API key | Credentials need to be updated in ModelRiver |
| Content policy | Request rejected for policy violations | Input content may need filtering |
| Timeout | Provider didn't respond within time limit | Provider is overloaded or experiencing issues |
Step 3: Track provider trends
Review reliability over time:
- Is a provider consistently failing? – May need to be removed or deprioritized
- Are failures time-based? – Some providers have peak-hour degradation
- Are failures model-specific? – The provider may be fine, but a specific model is unreliable
- Are failures increasing? – May indicate a worsening provider issue
Step 4: Optimize provider configuration
Based on your analysis:
- Reorder fallback providers – Put the most reliable provider first
- Remove unreliable providers – If a provider consistently fails, remove it from the workflow
- Diversify providers – Use providers from different vendors to reduce correlated failures
- Update credentials – If authentication errors occur, rotate API keys
- Adjust rate limits – If rate limiting is frequent, consider upgrading your provider plan
Provider-specific considerations
Rate limiting patterns
Each provider has different rate limits:
- Per-minute request limits – Too many requests in a short window
- Per-minute token limits – Total tokens consumed too quickly
- Per-day limits – Daily quota exceeded
How to identify: Look for 429 errors in failed attempt details. Rate limit errors typically come in bursts during high-traffic periods.
How to address:
- Spread requests more evenly over time
- Use multiple providers to distribute load
- Upgrade provider plans for higher limits
- Implement client-side rate limiting
Provider outages
How to identify: Sudden spike in failures from a single provider, with 500/502/503 errors.
How to address:
- ModelRiver's fallback system handles this automatically
- Monitor the timeline to confirm fallbacks are working
- Check the provider's status page for outage announcements
- Consider temporarily removing the provider from workflows if outages are prolonged
Model deprecations
How to identify: Consistent "model not found" or "model unavailable" errors.
How to address:
- Update your workflow to use the replacement model
- Check the provider's documentation for model lifecycle announcements
- Set up monitoring to catch deprecation warnings early
Making data-driven decisions
When to keep a provider
- Failure rate is low (< 5%)
- Failures are transient and self-recovering
- Provider offers unique models or capabilities
- Cost-performance ratio is favorable
When to remove a provider
- Consistent failure rate above 10%
- Frequent rate limiting despite reasonable usage
- Provider latency is significantly higher than alternatives
- Authentication or credential issues are recurring
When to change provider order
- Primary provider has higher failure rate than secondary
- Secondary provider has better latency for your use case
- Primary provider is more expensive and failing often (double cost impact)
Related metrics
Provider reliability interacts with other observability metrics:
- Performance – Unreliable providers add latency through failover attempts. See Performance Monitoring.
- Cost – Failed attempts still consume tokens on some providers. See Cost Analysis.
- Debugging – Provider failures can cause unexpected results if fallback models behave differently. See Debugging.
Next steps
- Provider Failover Timeline – Deep dive into failover attempt details
- Performance Monitoring – Impact of reliability on latency
- Cost Analysis – Cost impact of provider failures
- Debugging – Investigate specific provider failures
- Back to Observability – Return to the overview