Debugging production issues – ModelRiver Docs

Overview

When an AI feature isn't working as expected in production, Request Logs provide everything you need to diagnose the problem. Every request captures the complete lifecycle—from the initial provider attempt through failovers, webhook deliveries, and backend callbacks—giving you full visibility into what happened and why.

Debugging workflow

Step 1: Isolate the problem

Start by filtering logs to narrow down the issue:

Navigate to Request Logs in your project console
Filter to Live mode to focus on production requests only
Use the time selector to narrow to the period when the issue occurred
Look for requests with red Error status badges

Why this works: Live mode filters out playground and test mode traffic, so you're only seeing real production requests. Error badges instantly highlight failing requests.

Step 2: Inspect the timeline

Click the problematic request to open the detail view. The timeline tells the complete story:

Were there failover attempts? – Amber/yellow badges indicate provider failures before the final result. This suggests provider instability rather than a configuration issue.
Did the main request succeed or fail? – A red badge on the main request means the actual AI call failed.
Were webhooks delivered? – For async requests, check if webhook deliveries succeeded. Failed webhooks mean your backend didn't receive the response.
Did callbacks complete? – For event-driven workflows, check callback status. Timeouts indicate your backend didn't respond within the 5-minute window.

Step 3: Examine request and response payloads

Click on each timeline item to view detailed information:

Request body inspection

Verify prompt content – Ensure the messages array contains the expected content
Check model configuration – Verify temperature, max_tokens, and other parameters
Review structured output schema – If using structured outputs, confirm the schema is correct
Look for data quality issues – Malformed input data can cause unexpected responses

Response body inspection

Read error messages – Provider error responses contain specific failure reasons (rate limits, invalid content, model not available, etc.)
Verify response structure – Ensure the response matches your expected format
Check token usage – Unusually high or low token counts may indicate issues with the prompt or response

Step 4: Trace the failover chain

If the timeline shows failover attempts:

Click each failed attempt to see why it failed
Note the provider and model for each failure
Check if failures are provider-specific (e.g., all OpenAI attempts failing) or model-specific
Review error messages — common causes include:
- Rate limiting – Provider's rate limit exceeded
- Model not available – Model is temporarily unavailable
- Invalid request – Request format doesn't match provider requirements
- Authentication failure – Provider API key is invalid or expired
- Content policy violation – Request content was rejected by the provider

Step 5: Check webhook and callback flow

For async requests:

Webhook delivery status – If "Error", inspect the error message and HTTP status code
Webhook payload – Verify the payload sent to your endpoint matches expectations
Webhook response – Check your endpoint's response for errors
Callback status – For event-driven workflows, verify your callback was received

Common debugging scenarios

Requests returning wrong results

Symptoms: Request succeeds but the AI response is incorrect or unexpected.

Debugging steps:

Open the request detail and inspect the Request Body
Verify the system prompt and user messages are correct
Check if structured output schema is properly defined
Compare with a known-good request to spot differences
Review temperature and other generation parameters

Intermittent failures

Symptoms: Same request type sometimes succeeds and sometimes fails.

Debugging steps:

Compare successful and failed requests side-by-side
Check if failures correlate with specific providers or models
Look for rate limiting patterns (failures clustered in time)
Review failover behavior — successful requests may have used fallback providers
Check Provider Reliability for trends

High latency requests

Symptoms: Requests are completing but taking much longer than expected.

Debugging steps:

Check the Duration column in the log list
Open slow requests and check the timeline for multiple failover attempts
Compare provider latency — some providers may be consistently slower
Look for unusually high token counts that could increase processing time
See Performance Monitoring for trend analysis

Webhook delivery failures

Symptoms: Your backend isn't receiving webhook notifications.

Debugging steps:

Open the request detail and check the webhook delivery status in the timeline
Inspect the error message — common causes include:
- Connection refused – Your endpoint is down or unreachable
- Timeout – Your endpoint is too slow to respond
- Non-2xx response – Your endpoint returned an error
Verify the webhook URL is correct
Use the Retry button (if available) after fixing the issue
Check your endpoint logs for processing errors

Callback timeouts

Symptoms: Event-driven workflow requests fail with callback timeout.

Debugging steps:

Open the request detail and check the callback status
Verify your backend received the webhook (check webhook delivery status)
Ensure your backend is calling the callback_url within 5 minutes
Check your backend logs for processing errors
Verify the callback payload format matches ModelRiver's expectations

Best practices for debugging

Keep production logs clean

Use test mode and playground for development and testing
Filter to Live mode when debugging production issues
This ensures you're analyzing real user traffic, not test noise

Use the timeline for context

Always review the complete timeline, not just the final result
Failover attempts often reveal the root cause of issues
Webhook and callback status shows the full async request lifecycle

Compare with known-good requests

When debugging unexpected results, find a similar successful request
Compare request bodies to identify differences
Similar requests with different outcomes often reveal the issue

Monitor patterns, not just individual failures

Single failures may be transient (provider hiccups)
Repeated failures indicate systemic issues
Use Provider Reliability to track failure patterns

Next steps

Cost Analysis – Understand spending patterns
Performance Monitoring – Track latency trends
Provider Reliability – Analyze provider failure rates
Timeline Components – Deep dive into timeline details
Back to Observability – Return to the overview

Debug production AI issues with full request visibility

Overview

Debugging workflow

Step 1: Isolate the problem

Step 2: Inspect the timeline

Step 3: Examine request and response payloads

Request body inspection

Response body inspection

Step 4: Trace the failover chain

Step 5: Check webhook and callback flow

Common debugging scenarios

Requests returning wrong results

Intermittent failures

High latency requests

Webhook delivery failures

Callback timeouts

Best practices for debugging

Keep production logs clean

Use the timeline for context

Compare with known-good requests

Monitor patterns, not just individual failures

Next steps