Cost Analysis – ModelRiver Docs

Overview

AI costs can quickly grow as usage scales. Request Logs capture detailed token usage and estimated pricing for every request, giving you the data needed to understand where your money goes and how to reduce spending without sacrificing quality.

Understanding cost data in Request Logs

Token usage breakdown

Every request logs three key metrics:

Metric	Description	Cost impact
Prompt tokens	Tokens in your request (messages, system prompt, context)	Input cost — typically lower per-token rate
Completion tokens	Tokens generated by the AI model	Output cost — typically higher per-token rate
Total tokens	Sum of prompt + completion tokens	Overall usage metric

Price estimation

The Price column shows the estimated cost per request, calculated from token usage and provider pricing:

Request #1: gpt-4o
  Prompt tokens: 1,250
  Completion tokens: 380
  Estimated cost: $0.0124
 
Request #2: claude-3-5-sonnet
  Prompt tokens: 1,250
  Completion tokens: 420
  Estimated cost: $0.0098

Note: Prices shown are estimates based on published provider rates. For accurate billing, always refer to your provider's dashboard.

Step-by-step cost analysis

1. Identify high-cost requests

Navigate to Request Logs and filter to Live mode
Sort by the Price column (if visible) or review token counts
Look for requests with unusually high token usage
Click to inspect the request body to understand why

Common causes of high token usage:

Large system prompts — Repeated context or instructions in every request
Long conversation history — Sending full chat history without summarization
Verbose structured outputs — Complex JSON schemas that increase output tokens
Unhelpful retries — Failed requests that still consume tokens

2. Compare provider costs

For the same type of request, compare costs across providers:

┌──────────────────────────────────────────────────────────┐
│  Same prompt across providers                            │
│                                                          │
│  OpenAI gpt-4o          1,250 in / 380 out    $0.0124   │
│  Anthropic claude-3.5   1,250 in / 420 out    $0.0098   │
│  Google gemini-1.5-pro  1,250 in / 395 out    $0.0071   │
└──────────────────────────────────────────────────────────┘

Use this data to make informed decisions about provider selection in your workflows.

3. Track cost trends over time

Review logs across time periods to identify trends:

Daily cost spikes — Unusual traffic patterns driving up costs
Growing token usage — Conversation histories or prompts gradually increasing
New model costs — Impact of switching to a more expensive model
Failover costs — Hidden costs from failed provider attempts (they still consume tokens for the failed attempt)

4. Calculate true request cost (including failovers)

When a request triggers failovers, the true cost includes all attempts:

Request lifecycle:
  ⚠ OpenAI gpt-4o      FAILED   320 tokens    $0.0032
  ⚠ Anthropic claude    FAILED   310 tokens    $0.0025
  ✓ Google gemini       SUCCESS  715 tokens    $0.0071
 
  True cost: $0.0032 + $0.0025 + $0.0071 = $0.0128
  Displayed cost: $0.0071 (only the successful request)

Important: Failed provider attempts may still consume tokens and incur costs with the provider. Check the timeline's failed model attempts for the complete picture.

Cost optimization strategies

Optimize system prompts

Reduce prompt tokens by keeping system prompts concise:

JSON

1// Before: 450 tokens
2{
3  "role": "system",
4  "content": "You are a helpful assistant for Acme Corp. You should always be polite and professional. You have access to our product catalog which includes electronics, clothing, and home goods. When a customer asks about returns, refer them to our return policy which allows returns within 30 days with receipt. For shipping questions, we offer free shipping on orders over $50..."
5}
6 
7// After: 180 tokens
8{
9  "role": "system",
10  "content": "You are Acme Corp's assistant. Key policies: Returns within 30 days with receipt. Free shipping over $50. Products: electronics, clothing, home goods. Be concise and professional."
11}

Impact: ~60% reduction in prompt tokens per request.

Implement conversation summarization

Instead of sending the full chat history, periodically summarize: