Debug production AI issues with full request visibility

Inspect exact request and response payloads, trace provider failover chains, and pinpoint the root cause of any AI request failure or unexpected result.

Overview

When an AI feature isn't working as expected in production, Request Logs provide everything you need to diagnose the problem. Every request captures the complete lifecycle—from the initial provider attempt through failovers, webhook deliveries, and backend callbacks—giving you full visibility into what happened and why.


Debugging workflow

Step 1: Isolate the problem

Start by filtering logs to narrow down the issue:

  1. Navigate to Request Logs in your project console
  2. Filter to Live mode to focus on production requests only
  3. Use the time selector to narrow to the period when the issue occurred
  4. Look for requests with red Error status badges

Why this works: Live mode filters out playground and test mode traffic, so you're only seeing real production requests. Error badges instantly highlight failing requests.

Step 2: Inspect the timeline

Click the problematic request to open the detail view. The timeline tells the complete story:

  • Were there failover attempts? – Amber/yellow badges indicate provider failures before the final result. This suggests provider instability rather than a configuration issue.
  • Did the main request succeed or fail? – A red badge on the main request means the actual AI call failed.
  • Were webhooks delivered? – For async requests, check if webhook deliveries succeeded. Failed webhooks mean your backend didn't receive the response.
  • Did callbacks complete? – For event-driven workflows, check callback status. Timeouts indicate your backend didn't respond within the 5-minute window.

Step 3: Examine request and response payloads

Click on each timeline item to view detailed information:

Request body inspection

  • Verify prompt content – Ensure the messages array contains the expected content
  • Check model configuration – Verify temperature, max_tokens, and other parameters
  • Review structured output schema – If using structured outputs, confirm the schema is correct
  • Look for data quality issues – Malformed input data can cause unexpected responses

Response body inspection

  • Read error messages – Provider error responses contain specific failure reasons (rate limits, invalid content, model not available, etc.)
  • Verify response structure – Ensure the response matches your expected format
  • Check token usage – Unusually high or low token counts may indicate issues with the prompt or response

Step 4: Trace the failover chain

If the timeline shows failover attempts:

  1. Click each failed attempt to see why it failed
  2. Note the provider and model for each failure
  3. Check if failures are provider-specific (e.g., all OpenAI attempts failing) or model-specific
  4. Review error messages — common causes include:
    • Rate limiting – Provider's rate limit exceeded
    • Model not available – Model is temporarily unavailable
    • Invalid request – Request format doesn't match provider requirements
    • Authentication failure – Provider API key is invalid or expired
    • Content policy violation – Request content was rejected by the provider

Step 5: Check webhook and callback flow

For async requests:

  1. Webhook delivery status – If "Error", inspect the error message and HTTP status code
  2. Webhook payload – Verify the payload sent to your endpoint matches expectations
  3. Webhook response – Check your endpoint's response for errors
  4. Callback status – For event-driven workflows, verify your callback was received

Common debugging scenarios

Requests returning wrong results

Symptoms: Request succeeds but the AI response is incorrect or unexpected.

Debugging steps:

  1. Open the request detail and inspect the Request Body
  2. Verify the system prompt and user messages are correct
  3. Check if structured output schema is properly defined
  4. Compare with a known-good request to spot differences
  5. Review temperature and other generation parameters

Intermittent failures

Symptoms: Same request type sometimes succeeds and sometimes fails.

Debugging steps:

  1. Compare successful and failed requests side-by-side
  2. Check if failures correlate with specific providers or models
  3. Look for rate limiting patterns (failures clustered in time)
  4. Review failover behavior — successful requests may have used fallback providers
  5. Check Provider Reliability for trends

High latency requests

Symptoms: Requests are completing but taking much longer than expected.

Debugging steps:

  1. Check the Duration column in the log list
  2. Open slow requests and check the timeline for multiple failover attempts
  3. Compare provider latency — some providers may be consistently slower
  4. Look for unusually high token counts that could increase processing time
  5. See Performance Monitoring for trend analysis

Webhook delivery failures

Symptoms: Your backend isn't receiving webhook notifications.

Debugging steps:

  1. Open the request detail and check the webhook delivery status in the timeline
  2. Inspect the error message — common causes include:
    • Connection refused – Your endpoint is down or unreachable
    • Timeout – Your endpoint is too slow to respond
    • Non-2xx response – Your endpoint returned an error
  3. Verify the webhook URL is correct
  4. Use the Retry button (if available) after fixing the issue
  5. Check your endpoint logs for processing errors

Callback timeouts

Symptoms: Event-driven workflow requests fail with callback timeout.

Debugging steps:

  1. Open the request detail and check the callback status
  2. Verify your backend received the webhook (check webhook delivery status)
  3. Ensure your backend is calling the callback_url within 5 minutes
  4. Check your backend logs for processing errors
  5. Verify the callback payload format matches ModelRiver's expectations

Best practices for debugging

Keep production logs clean

  • Use test mode and playground for development and testing
  • Filter to Live mode when debugging production issues
  • This ensures you're analyzing real user traffic, not test noise

Use the timeline for context

  • Always review the complete timeline, not just the final result
  • Failover attempts often reveal the root cause of issues
  • Webhook and callback status shows the full async request lifecycle

Compare with known-good requests

  • When debugging unexpected results, find a similar successful request
  • Compare request bodies to identify differences
  • Similar requests with different outcomes often reveal the issue

Monitor patterns, not just individual failures

  • Single failures may be transient (provider hiccups)
  • Repeated failures indicate systemic issues
  • Use Provider Reliability to track failure patterns

Next steps