Testing Workflows – ModelRiver Docs

Overview

Before deploying workflow changes to production, you need confidence that everything works correctly. ModelRiver provides multiple testing environments — Playground (Production), Playground (Test mode), and Test mode — each with its own purpose. Request Logs let you review every test run in detail.

Testing environments explained

Playground (Production)

What it does: Runs your workflow with real provider API calls
When to use: Final validation before deploying, testing with real AI responses
Cost: Consumes provider credits (real API calls)
Log filter: Playground (Production)
Seed batch prefix: pg:

Playground (Test mode)

What it does: Runs your workflow with sample data from structured outputs
When to use: Testing workflow configuration and data flow without consuming credits
Cost: Free — no provider API calls
Log filter: Playground (Test mode)
Seed batch prefix: pg_test_mode:

Test mode (API)

What it does: Accepts API calls with the test mode flag, using sample data instead of real providers
When to use: Automated testing, CI/CD pipelines, integration test suites
Cost: Free — no provider API calls
Log filter: Test mode
Seed batch prefix: test:

Step-by-step testing workflow

1. Validate in test mode first

Start with Test mode to verify the basic workflow structure:

Open your workflow in the console
Click Test (the beaker icon) to run in test mode
Navigate to Request Logs → filter to Playground (Test mode)
Inspect the request and response to confirm correct data flow

Example test mode response:

JSON

1{
2  "choices": [
3    {
4      "message": {
5        "content": "{\"product_name\": \"Sample Product\", \"description\": \"This is a test response using structured output sample data.\"}",
6        "role": "assistant"
7      },
8      "finish_reason": "stop"
9    }
10  ],
11  "usage": {
12    "prompt_tokens": 0,
13    "completion_tokens": 0,
14    "total_tokens": 0
15  }
16}

Notice: Token usage is 0 in test mode because no real provider call was made. The response content comes from your structured output's sample data.

2. Test with real providers

Once you're confident in the workflow structure, test with real providers:

Click Run in the Playground (this uses Production mode)
Navigate to Request Logs → filter to Playground (Production)
Inspect the full timeline — verify provider selection, token usage, and response quality
Review the response body for accuracy and correctness

3. Validate webhook and callback flow

For async and event-driven workflows:

Run the workflow in playground
Check the timeline for webhook delivery status
Verify the webhook payload matches your backend's expectations
For event-driven workflows, confirm the callback was received

┌──────────────────────────────────────────────────┐
│  Playground Test Timeline                        │
│                                                  │
│  ✓ OpenAI gpt-4o-mini    success   650ms        │
│  ✓ Webhook delivery      success   32ms         │
│  ✓ Backend callback      simulated              │
└──────────────────────────────────────────────────┘

Note: In playground mode, backend callbacks are simulated — ModelRiver doesn't wait for your actual backend to respond. This lets you test the AI generation and webhook delivery independently.

4. Compare with production behavior

After deploying changes:

Filter to Live mode to see real production requests
Compare response quality, latency, and token usage with your playground tests
Monitor for any unexpected failures or behavior changes
If issues arise, compare the failing production request with your successful playground test

CI/CD integration example

Use the test mode API in your CI/CD pipeline:

Bash

# Run a test mode request
curl -X POST https://api.modelriver.com/v1/chat/completions \
  -H "Authorization: Bearer $MODELRIVER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-workflow-id",
    "messages": [{"role": "user", "content": "Test message"}],
    "test_mode": true
  }'
 
# Check the response status
# Response will use sample data, not real providers

Then verify in Request Logs:

Filter to Test mode
Confirm the request was logged correctly
Validate the response structure matches expectations

Tips for effective testing

Always test both happy path and edge cases — Test with various input types, empty messages, and long prompts
Verify structured outputs — If your workflow uses structured outputs, confirm the schema is enforced correctly
Test failover behavior — If possible, configure a test scenario where the primary provider fails to verify fallback works
Keep test data realistic — Use production-like prompts in your test data for accurate validation
Review token usage — Compare token usage between test and production to catch unexpected differences

Next steps

Cost Analysis — Monitor spending after deployment
Performance Monitoring — Track latency in production
Debugging Production Issues — When things go wrong after deployment
Back to Observability — Return to the overview

Test workflows safely before production