Overview
Request failures happen for many reasons — provider outages, rate limits, malformed inputs, expired credentials, or content policy violations. Request Logs give you the tools to quickly categorize, diagnose, and resolve failures systematically.
Failure categories
Provider-level failures
Failures caused by the AI provider:
| Error type | Example message | Typical cause | Resolution |
|---|---|---|---|
| Rate limit | rate_limit_exceeded | Too many requests to provider | Back off, add more providers |
| Model unavailable | model_not_found | Model deprecated or temporarily unavailable | Switch model or wait |
| Authentication | invalid_api_key | API key expired or revoked | Update credentials in Settings |
| Content policy | content_filtered | Request violated provider guidelines | Review and modify prompt content |
| Server error | internal_server_error | Provider infrastructure issue | Relies on failover; wait for resolution |
Webhook delivery failures
Failures in delivering notifications to your backend:
| Error type | Example | Typical cause | Resolution |
|---|---|---|---|
| Connection refused | ECONNREFUSED | Endpoint is down | Fix or restart your server |
| Timeout | Request timeout after 30s | Endpoint is too slow | Optimize endpoint performance |
| DNS failure | ENOTFOUND | Domain doesn't resolve | Check webhook URL configuration |
| SSL error | UNABLE_TO_VERIFY_LEAF_SIGNATURE | Certificate issue | Fix SSL certificate |
| Non-2xx response | HTTP 500 | Your endpoint returned an error | Check your server logs |
Callback failures
Failures in event-driven workflow callbacks:
| Error type | Typical cause | Resolution |
|---|---|---|
| Timeout (5 min) | Backend didn't call back in time | Optimize backend processing speed |
| Invalid payload | Callback data doesn't match expected format | Review callback documentation |
| Missing callback | Backend never called the callback URL | Verify webhook handling code |
Step-by-step troubleshooting
1. Assess the scope
Before diving into individual failures, understand the scope:
- Filter to Live mode and Error status
- Count the number of failures in the affected time period
- Look for patterns:
- All requests failing? → Likely a systemic issue (bad credentials, provider outage)
- Specific models failing? → Provider-specific issue
- Intermittent failures? → Rate limits or transient errors
2. Categorize the failure
Click a failed request and inspect the timeline:
Scenario A: All provider attempts failed┌──────────────────────────────────────────────────┐│ ✗ OpenAI gpt-4o failed 150ms ││ ✗ Anthropic claude-3.5 failed 120ms ││ ✗ Google gemini-1.5 failed 180ms ││ No successful request │└──────────────────────────────────────────────────┘→ Check: Are all providers rejecting the same input? Scenario B: Provider succeeded, webhook failed┌──────────────────────────────────────────────────┐│ ✓ OpenAI gpt-4o success 1,200ms ││ ✗ Webhook delivery error 30,000ms │└──────────────────────────────────────────────────┘→ Check: Is your webhook endpoint responding? Scenario C: Everything worked except callback┌──────────────────────────────────────────────────┐│ ✓ OpenAI gpt-4o success 1,200ms ││ ✓ Webhook delivery success 45ms ││ ✗ Backend callback timeout 300,000ms │└──────────────────────────────────────────────────┘→ Check: Is your backend processing and calling back?3. Inspect error details
Click the failed item and review:
Provider errors:
1{2 "error": {3 "type": "invalid_request_error",4 "message": "This model's maximum context length is 128000 tokens. However, your messages resulted in 135420 tokens. Please reduce the length of the messages.",5 "code": "context_length_exceeded"6 }7}Webhook errors:
Status: ErrorHTTP Status: 502Error: Bad GatewayDuration: 120msURL: https://api.yourapp.com/webhooks/modelriver4. Apply the fix
Based on the error category:
- Rate limits: Add more providers, implement request queuing, or upgrade your plan
- Context length exceeded: Trim conversation history, summarize older messages, or use a model with a larger context window
- Authentication failures: Navigate to Settings → Providers and update your API keys
- Webhook failures: Fix your endpoint, then use the Retry button in Request Logs
- Callback timeouts: Optimize your backend processing time or increase parallelism
5. Verify the fix
After applying the fix:
- Run a test in the Playground
- Check Request Logs for the new test request
- Confirm the timeline shows success
- Monitor for 30 minutes to ensure the fix holds
Advanced failure analysis
Identifying cascading failures
When one component fails, it can cascade:
1. Provider rate limit hit → failover to secondary provider2. Secondary provider also rate limited → failover to tertiary3. All providers exhausted → request fails4. Webhook not sent (no response to deliver) → backend never notified5. Downstream features that depend on the AI response also failHow to trace: Open the failed request, review each timeline item, and note the chain of events. The first failure is usually the root cause.
Provider-specific error patterns
Common patterns by provider:
OpenAI:
rate_limit_exceeded— Most common during peak hourscontext_length_exceeded— Prompt too long for the selected modelinvalid_api_key— Key rotated or revoked
Anthropic:
overloaded— High demand periodsinvalid_request_error— Format mismatch
Google:
RESOURCE_EXHAUSTED— Quota exceededINVALID_ARGUMENT— Parameter validation failure
Next steps
- Provider Reliability — Track which providers fail most
- Webhook Delivery Monitoring — Ensure reliable webhook delivery
- Debugging Production Issues — Deep-dive debugging
- Back to Observability — Return to the overview