Cost analysis and optimization

Overview

Every AI request through ModelRiver is logged with detailed token usage and pricing information, enabling you to understand exactly where your AI budget is going. By analyzing cost data across providers, models, and workflows, you can make data-driven decisions to optimize spending without sacrificing quality.

Important: The cost shown in Request Logs is an estimated cost based on the tokens sent and received, calculated using the pricing configured in your ModelRiver model definitions. For the most accurate and authoritative cost data, always refer to the billing dashboards of your specific AI providers (e.g., OpenAI Usage, Anthropic Console, Google Cloud billing).

Understanding cost data in Request Logs

Token usage breakdown

Each request log captures three key token metrics:

Metric	Description	Cost implication
Prompt tokens (input)	Tokens in the request sent to the provider	Typically lower cost per token
Completion tokens (output)	Tokens in the response generated by the provider	Typically higher cost per token
Total tokens	Sum of prompt and completion tokens	Total consumption for the request

Price field

The Price field shows the estimated cost of each individual request, calculated as:

estimated_cost = (prompt_tokens × input_price_per_token) + (completion_tokens × output_price_per_token)

This calculation uses the pricing information stored in your ModelRiver model definitions. The actual cost charged by the provider may differ due to:

Provider pricing updates that haven't been reflected in your model definitions
Cached tokens or context that receive discounted pricing
Batch pricing or volume discounts from the provider
Currency conversion or rounding differences

Cost analysis workflow

Step 1: Filter to production traffic

Navigate to Request Logs in your project console
Filter to Live mode to exclude test and playground traffic
This ensures you're analyzing actual production costs

Step 2: Identify high-cost requests

Review the logs looking for:

High token counts – Requests with unusually large prompt or completion token values
Expensive models – Requests using premium models (e.g., gpt-4o vs gpt-4o-mini)
Failed attempts with costs – Failover chains where failed provider attempts still consumed tokens

Step 3: Analyze spending patterns

Look across multiple requests to identify trends:

Which models drive the most cost? – Compare token usage across different models
Are prompts efficiently sized? – Large prompt tokens may indicate excessive system prompts or context
Are completions appropriately bounded? – Very large completion tokens may suggest missing max_tokens limits
Are failovers adding hidden costs? – Failed provider attempts still incur token usage on providers that processed the request before failing

Step 4: Optimize and validate

Apply optimizations and track their impact:

Switch to more cost-effective models for appropriate use cases
Optimize prompt length — reduce unnecessary context
Set max_tokens limits to prevent runaway completions
Adjust provider fallback order to prefer cheaper providers first

Cost optimization strategies

Choose the right model for the task

Not every request needs the most powerful (and expensive) model:

Use case	Recommended approach
Simple classification or extraction	Use smaller models (e.g., `gpt-4o-mini`, `claude-3-5-haiku`)
Creative writing or complex reasoning	Use larger models (e.g., `gpt-4o`, `claude-3-5-sonnet`)
High-volume batch processing	Use the most cost-efficient model that meets quality requirements
Critical business logic	Balance quality and cost — use larger models with shorter prompts

Optimize prompt tokens

Prompt (input) tokens are charged on every request:

Trim system prompts – Remove unnecessary instructions or examples
Use structured outputs – Define schemas to get precise responses instead of instructing the model to format output
Compress context – Summarize long conversation histories instead of sending full transcripts
Avoid redundant data – Don't include data the model doesn't need

Control completion tokens

Set limits to prevent unexpectedly large responses:

Use max_tokens – Set appropriate limits based on your expected response size
Use structured outputs – Schema-driven responses naturally limit output to relevant fields
Be specific in prompts – "Answer in one sentence" is cheaper than "Explain in detail"

Reduce failover costs

Failover attempts can add hidden costs:

Monitor failover frequency with Provider Reliability
Order providers by cost – Put cheaper providers first in your fallback chain
Fix unreliable providers – Address the root cause of failures instead of relying on expensive fallbacks
Consider removing consistently failing providers from your workflow

Provider cost comparison

Use Request Logs to compare costs across providers:

Filter logs by provider to see per-provider token usage
Compare the Price field across providers for similar request types
Factor in quality differences — cheaper isn't always better
Consider latency tradeoffs — faster providers may justify higher costs

Remember: The price shown is an estimated cost based on configured model pricing. For definitive cost data, always check your provider's billing dashboard.

Monitoring cost trends

Daily review

Check total token consumption and estimated costs
Identify any sudden cost spikes
Review high-cost individual requests

Weekly analysis

Compare week-over-week cost trends
Analyze cost distribution across models and providers
Evaluate the impact of recent optimizations

Monthly reconciliation

Match estimated costs with actual provider invoices
Update model pricing definitions if provider prices have changed
See Billing Reconciliation for detailed guidance

Next steps

Billing Reconciliation – Match API usage to invoices
Performance Monitoring – Balance cost vs. speed
Provider Reliability – Reduce failover-related costs
Debugging – Investigate unexpected cost spikes
Back to Observability – Return to the overview

Understand and optimize your AI spend