Understand and optimize your AI spend

Track token usage, compare pricing across providers, and identify optimization opportunities to reduce costs while maintaining quality.

Overview

Every AI request through ModelRiver is logged with detailed token usage and pricing information, enabling you to understand exactly where your AI budget is going. By analyzing cost data across providers, models, and workflows, you can make data-driven decisions to optimize spending without sacrificing quality.

Important: The cost shown in Request Logs is an estimated cost based on the tokens sent and received, calculated using the pricing configured in your ModelRiver model definitions. For the most accurate and authoritative cost data, always refer to the billing dashboards of your specific AI providers (e.g., OpenAI Usage, Anthropic Console, Google Cloud billing).


Understanding cost data in Request Logs

Token usage breakdown

Each request log captures three key token metrics:

MetricDescriptionCost implication
Prompt tokens (input)Tokens in the request sent to the providerTypically lower cost per token
Completion tokens (output)Tokens in the response generated by the providerTypically higher cost per token
Total tokensSum of prompt and completion tokensTotal consumption for the request

Price field

The Price field shows the estimated cost of each individual request, calculated as:

estimated_cost = (prompt_tokens × input_price_per_token) + (completion_tokens × output_price_per_token)

This calculation uses the pricing information stored in your ModelRiver model definitions. The actual cost charged by the provider may differ due to:

  • Provider pricing updates that haven't been reflected in your model definitions
  • Cached tokens or context that receive discounted pricing
  • Batch pricing or volume discounts from the provider
  • Currency conversion or rounding differences

Cost analysis workflow

Step 1: Filter to production traffic

  1. Navigate to Request Logs in your project console
  2. Filter to Live mode to exclude test and playground traffic
  3. This ensures you're analyzing actual production costs

Step 2: Identify high-cost requests

Review the logs looking for:

  • High token counts – Requests with unusually large prompt or completion token values
  • Expensive models – Requests using premium models (e.g., gpt-4o vs gpt-4o-mini)
  • Failed attempts with costs – Failover chains where failed provider attempts still consumed tokens

Step 3: Analyze spending patterns

Look across multiple requests to identify trends:

  • Which models drive the most cost? – Compare token usage across different models
  • Are prompts efficiently sized? – Large prompt tokens may indicate excessive system prompts or context
  • Are completions appropriately bounded? – Very large completion tokens may suggest missing max_tokens limits
  • Are failovers adding hidden costs? – Failed provider attempts still incur token usage on providers that processed the request before failing

Step 4: Optimize and validate

Apply optimizations and track their impact:

  • Switch to more cost-effective models for appropriate use cases
  • Optimize prompt length — reduce unnecessary context
  • Set max_tokens limits to prevent runaway completions
  • Adjust provider fallback order to prefer cheaper providers first

Cost optimization strategies

Choose the right model for the task

Not every request needs the most powerful (and expensive) model:

Use caseRecommended approach
Simple classification or extractionUse smaller models (e.g., gpt-4o-mini, claude-3-5-haiku)
Creative writing or complex reasoningUse larger models (e.g., gpt-4o, claude-3-5-sonnet)
High-volume batch processingUse the most cost-efficient model that meets quality requirements
Critical business logicBalance quality and cost — use larger models with shorter prompts

Optimize prompt tokens

Prompt (input) tokens are charged on every request:

  • Trim system prompts – Remove unnecessary instructions or examples
  • Use structured outputs – Define schemas to get precise responses instead of instructing the model to format output
  • Compress context – Summarize long conversation histories instead of sending full transcripts
  • Avoid redundant data – Don't include data the model doesn't need

Control completion tokens

Set limits to prevent unexpectedly large responses:

  • Use max_tokens – Set appropriate limits based on your expected response size
  • Use structured outputs – Schema-driven responses naturally limit output to relevant fields
  • Be specific in prompts – "Answer in one sentence" is cheaper than "Explain in detail"

Reduce failover costs

Failover attempts can add hidden costs:

  • Monitor failover frequency with Provider Reliability
  • Order providers by cost – Put cheaper providers first in your fallback chain
  • Fix unreliable providers – Address the root cause of failures instead of relying on expensive fallbacks
  • Consider removing consistently failing providers from your workflow

Provider cost comparison

Use Request Logs to compare costs across providers:

  1. Filter logs by provider to see per-provider token usage
  2. Compare the Price field across providers for similar request types
  3. Factor in quality differences — cheaper isn't always better
  4. Consider latency tradeoffs — faster providers may justify higher costs

Remember: The price shown is an estimated cost based on configured model pricing. For definitive cost data, always check your provider's billing dashboard.


Daily review

  • Check total token consumption and estimated costs
  • Identify any sudden cost spikes
  • Review high-cost individual requests

Weekly analysis

  • Compare week-over-week cost trends
  • Analyze cost distribution across models and providers
  • Evaluate the impact of recent optimizations

Monthly reconciliation

  • Match estimated costs with actual provider invoices
  • Update model pricing definitions if provider prices have changed
  • See Billing Reconciliation for detailed guidance

Next steps