Token Usage Optimization – ModelRiver Docs

Overview

Every AI request consumes tokens — input tokens for your prompt and output tokens for the response. These tokens directly determine your costs. Request Logs show exactly how many tokens each request uses, enabling data-driven optimization.

Finding optimization opportunities

Step 1: Identify high-consumption requests

Filter to Live mode in Request Logs
Look for requests with unusually high token counts
Click to inspect the request body

Red flags:

Normal request:    800 prompt + 200 completion = 1,000 total
High consumption: 4,500 prompt + 800 completion = 5,300 total  ← 5x higher

Step 2: Analyze prompt structure

Common sources of high prompt tokens:

JSON

1// Problem: Repeated system instructions (85 tokens, sent with every request)
2{
3  "role": "system",
4  "content": "You are an expert customer service agent for Acme Corporation, a leading provider of widget solutions since 1985. Our company values include excellence, integrity, and customer satisfaction. We offer three product lines: Standard Widgets, Premium Widgets, and Enterprise Widget Solutions. Each product line has specific warranty terms, return policies, and support tiers. Our standard warranty covers manufacturing defects for 12 months..."
5}
6 
7// Solution: Condensed system prompt (32 tokens)
8{
9  "role": "system",
10  "content": "You are Acme Corp's customer service agent. Be concise and helpful. Products: Standard, Premium, Enterprise widgets. Warranty: 12 months for defects."
11}

Savings: ~53 tokens × thousands of requests = significant cost reduction.

Step 3: Review conversation history management

Long conversations accumulate tokens:

Turn 1:   system(32) + user(15) + assistant(80) = 127 tokens
Turn 5:   system(32) + 5 pairs(475) = 507 tokens
Turn 10:  system(32) + 10 pairs(950) = 982 tokens
Turn 20:  system(32) + 20 pairs(1,900) = 1,932 tokens
Turn 50:  system(32) + 50 pairs(4,750) = 4,782 tokens  ← Expensive!

Solution: Sliding window + summarization

JSON

1// Keep only the last 5 turns, summarize older ones
2{
3  "messages": [
4    {"role": "system", "content": "..."},
5    {"role": "system", "content": "Context: User previously discussed product returns for Order #1234 and asked about shipping to Canada."},
6    {"role": "user", "content": "Turn 6 message"},
7    {"role": "assistant", "content": "Turn 6 response"},
8    {"role": "user", "content": "Turn 7 message"},
9    {"role": "assistant", "content": "Turn 7 response"},
10    {"role": "user", "content": "Current message"}
11  ]
12}

Model selection for cost efficiency

Not every task needs the most expensive model. Use Request Logs to identify where cheaper models would suffice:

Task	Expensive model	Cost-effective model	Savings
Routing/classification	gpt-4o ($0.01/1K)	gpt-4o-mini ($0.0002/1K)	~50x
Simple Q&A	claude-3-opus	claude-3-haiku	~30x
Content summarization	gpt-4o	gpt-4o-mini	~50x
Complex reasoning	gpt-4o (baseline)	gpt-4o (keep)	—

How to identify model candidates

Filter logs to specific task types (by workflow or event name)
Compare response quality between models
If a cheaper model produces acceptable results, switch

Setting token budgets

Use max_tokens to prevent runaway output:

JSON

1// Without max_tokens: model could generate 4,000+ tokens
2{
3  "model": "gpt-4o",
4  "messages": [{"role": "user", "content": "Describe our product"}]
5}
6 
7// With max_tokens: output capped at 200 tokens
8{
9  "model": "gpt-4o",
10  "messages": [{"role": "user", "content": "Describe our product in 2-3 sentences"}],
11  "max_tokens": 200
12}

Monitor in Request Logs: If finish_reason is "length", the response was truncated. Adjust max_tokens accordingly.

Measuring optimization impact

After making changes, compare in Request Logs:

Before optimization (week of Feb 3):
  Avg prompt tokens:      1,850
  Avg completion tokens:    420
  Avg total tokens:       2,270
  Estimated weekly cost:  $42.30
 
After optimization (week of Feb 10):
  Avg prompt tokens:        680  (-63%)
  Avg completion tokens:    380  (-10%)
  Avg total tokens:       1,060  (-53%)
  Estimated weekly cost:  $19.80  (-53%)

Next steps

Duration & Performance — Balance cost with speed
Cost Analysis Use Case — Detailed cost analysis guide
Back to Best Practices — Return to the overview

Optimize token usage to reduce AI costs

Overview

Finding optimization opportunities

Step 1: Identify high-consumption requests

Step 2: Analyze prompt structure

Step 3: Review conversation history management

Model selection for cost efficiency

How to identify model candidates

Setting token budgets

Measuring optimization impact

Next steps