Overview
Every AI request through ModelRiver is logged with detailed token usage and pricing information, enabling you to understand exactly where your AI budget is going. By analyzing cost data across providers, models, and workflows, you can make data-driven decisions to optimize spending without sacrificing quality.
Important: The cost shown in Request Logs is an estimated cost based on the tokens sent and received, calculated using the pricing configured in your ModelRiver model definitions. For the most accurate and authoritative cost data, always refer to the billing dashboards of your specific AI providers (e.g., OpenAI Usage, Anthropic Console, Google Cloud billing).
Understanding cost data in Request Logs
Token usage breakdown
Each request log captures three key token metrics:
| Metric | Description | Cost implication |
|---|---|---|
| Prompt tokens (input) | Tokens in the request sent to the provider | Typically lower cost per token |
| Completion tokens (output) | Tokens in the response generated by the provider | Typically higher cost per token |
| Total tokens | Sum of prompt and completion tokens | Total consumption for the request |
Price field
The Price field shows the estimated cost of each individual request, calculated as:
estimated_cost = (prompt_tokens × input_price_per_token) + (completion_tokens × output_price_per_token)
This calculation uses the pricing information stored in your ModelRiver model definitions. The actual cost charged by the provider may differ due to:
- Provider pricing updates that haven't been reflected in your model definitions
- Cached tokens or context that receive discounted pricing
- Batch pricing or volume discounts from the provider
- Currency conversion or rounding differences
Cost analysis workflow
Step 1: Filter to production traffic
- Navigate to Request Logs in your project console
- Filter to Live mode to exclude test and playground traffic
- This ensures you're analyzing actual production costs
Step 2: Identify high-cost requests
Review the logs looking for:
- High token counts – Requests with unusually large prompt or completion token values
- Expensive models – Requests using premium models (e.g.,
gpt-4ovsgpt-4o-mini) - Failed attempts with costs – Failover chains where failed provider attempts still consumed tokens
Step 3: Analyze spending patterns
Look across multiple requests to identify trends:
- Which models drive the most cost? – Compare token usage across different models
- Are prompts efficiently sized? – Large prompt tokens may indicate excessive system prompts or context
- Are completions appropriately bounded? – Very large completion tokens may suggest missing
max_tokenslimits - Are failovers adding hidden costs? – Failed provider attempts still incur token usage on providers that processed the request before failing
Step 4: Optimize and validate
Apply optimizations and track their impact:
- Switch to more cost-effective models for appropriate use cases
- Optimize prompt length — reduce unnecessary context
- Set
max_tokenslimits to prevent runaway completions - Adjust provider fallback order to prefer cheaper providers first
Cost optimization strategies
Choose the right model for the task
Not every request needs the most powerful (and expensive) model:
| Use case | Recommended approach |
|---|---|
| Simple classification or extraction | Use smaller models (e.g., gpt-4o-mini, claude-3-5-haiku) |
| Creative writing or complex reasoning | Use larger models (e.g., gpt-4o, claude-3-5-sonnet) |
| High-volume batch processing | Use the most cost-efficient model that meets quality requirements |
| Critical business logic | Balance quality and cost — use larger models with shorter prompts |
Optimize prompt tokens
Prompt (input) tokens are charged on every request:
- Trim system prompts – Remove unnecessary instructions or examples
- Use structured outputs – Define schemas to get precise responses instead of instructing the model to format output
- Compress context – Summarize long conversation histories instead of sending full transcripts
- Avoid redundant data – Don't include data the model doesn't need
Control completion tokens
Set limits to prevent unexpectedly large responses:
- Use
max_tokens– Set appropriate limits based on your expected response size - Use structured outputs – Schema-driven responses naturally limit output to relevant fields
- Be specific in prompts – "Answer in one sentence" is cheaper than "Explain in detail"
Reduce failover costs
Failover attempts can add hidden costs:
- Monitor failover frequency with Provider Reliability
- Order providers by cost – Put cheaper providers first in your fallback chain
- Fix unreliable providers – Address the root cause of failures instead of relying on expensive fallbacks
- Consider removing consistently failing providers from your workflow
Provider cost comparison
Use Request Logs to compare costs across providers:
- Filter logs by provider to see per-provider token usage
- Compare the Price field across providers for similar request types
- Factor in quality differences — cheaper isn't always better
- Consider latency tradeoffs — faster providers may justify higher costs
Remember: The price shown is an estimated cost based on configured model pricing. For definitive cost data, always check your provider's billing dashboard.
Monitoring cost trends
Daily review
- Check total token consumption and estimated costs
- Identify any sudden cost spikes
- Review high-cost individual requests
Weekly analysis
- Compare week-over-week cost trends
- Analyze cost distribution across models and providers
- Evaluate the impact of recent optimizations
Monthly reconciliation
- Match estimated costs with actual provider invoices
- Update model pricing definitions if provider prices have changed
- See Billing Reconciliation for detailed guidance
Next steps
- Billing Reconciliation – Match API usage to invoices
- Performance Monitoring – Balance cost vs. speed
- Provider Reliability – Reduce failover-related costs
- Debugging – Investigate unexpected cost spikes
- Back to Observability – Return to the overview