Optimize token usage to reduce AI costs

Token usage directly drives your AI spending. Learn to identify waste, trim prompts, manage conversation history, and choose cost-effective models — all guided by Request Logs data.

Overview

Every AI request consumes tokens — input tokens for your prompt and output tokens for the response. These tokens directly determine your costs. Request Logs show exactly how many tokens each request uses, enabling data-driven optimization.


Finding optimization opportunities

Step 1: Identify high-consumption requests

  1. Filter to Live mode in Request Logs
  2. Look for requests with unusually high token counts
  3. Click to inspect the request body

Red flags:

Normal request: 800 prompt + 200 completion = 1,000 total
High consumption: 4,500 prompt + 800 completion = 5,300 total 5x higher

Step 2: Analyze prompt structure

Common sources of high prompt tokens:

JSON
1// Problem: Repeated system instructions (85 tokens, sent with every request)
2{
3 "role": "system",
4 "content": "You are an expert customer service agent for Acme Corporation, a leading provider of widget solutions since 1985. Our company values include excellence, integrity, and customer satisfaction. We offer three product lines: Standard Widgets, Premium Widgets, and Enterprise Widget Solutions. Each product line has specific warranty terms, return policies, and support tiers. Our standard warranty covers manufacturing defects for 12 months..."
5}
6 
7// Solution: Condensed system prompt (32 tokens)
8{
9 "role": "system",
10 "content": "You are Acme Corp's customer service agent. Be concise and helpful. Products: Standard, Premium, Enterprise widgets. Warranty: 12 months for defects."
11}

Savings: ~53 tokens × thousands of requests = significant cost reduction.

Step 3: Review conversation history management

Long conversations accumulate tokens:

Turn 1: system(32) + user(15) + assistant(80) = 127 tokens
Turn 5: system(32) + 5 pairs(475) = 507 tokens
Turn 10: system(32) + 10 pairs(950) = 982 tokens
Turn 20: system(32) + 20 pairs(1,900) = 1,932 tokens
Turn 50: system(32) + 50 pairs(4,750) = 4,782 tokens Expensive!

Solution: Sliding window + summarization

JSON
1// Keep only the last 5 turns, summarize older ones
2{
3 "messages": [
4 {"role": "system", "content": "..."},
5 {"role": "system", "content": "Context: User previously discussed product returns for Order #1234 and asked about shipping to Canada."},
6 {"role": "user", "content": "Turn 6 message"},
7 {"role": "assistant", "content": "Turn 6 response"},
8 {"role": "user", "content": "Turn 7 message"},
9 {"role": "assistant", "content": "Turn 7 response"},
10 {"role": "user", "content": "Current message"}
11 ]
12}

Model selection for cost efficiency

Not every task needs the most expensive model. Use Request Logs to identify where cheaper models would suffice:

TaskExpensive modelCost-effective modelSavings
Routing/classificationgpt-4o ($0.01/1K)gpt-4o-mini ($0.0002/1K)~50x
Simple Q&Aclaude-3-opusclaude-3-haiku~30x
Content summarizationgpt-4ogpt-4o-mini~50x
Complex reasoninggpt-4o (baseline)gpt-4o (keep)

How to identify model candidates

  1. Filter logs to specific task types (by workflow or event name)
  2. Compare response quality between models
  3. If a cheaper model produces acceptable results, switch

Setting token budgets

Use max_tokens to prevent runaway output:

JSON
1// Without max_tokens: model could generate 4,000+ tokens
2{
3 "model": "gpt-4o",
4 "messages": [{"role": "user", "content": "Describe our product"}]
5}
6 
7// With max_tokens: output capped at 200 tokens
8{
9 "model": "gpt-4o",
10 "messages": [{"role": "user", "content": "Describe our product in 2-3 sentences"}],
11 "max_tokens": 200
12}

Monitor in Request Logs: If finish_reason is "length", the response was truncated. Adjust max_tokens accordingly.


Measuring optimization impact

After making changes, compare in Request Logs:

Before optimization (week of Feb 3):
Avg prompt tokens: 1,850
Avg completion tokens: 420
Avg total tokens: 2,270
Estimated weekly cost: $42.30
 
After optimization (week of Feb 10):
Avg prompt tokens: 680 (-63%)
Avg completion tokens: 380 (-10%)
Avg total tokens: 1,060 (-53%)
Estimated weekly cost: $19.80 (-53%)

Next steps