Overview
Every AI request consumes tokens — input tokens for your prompt and output tokens for the response. These tokens directly determine your costs. Request Logs show exactly how many tokens each request uses, enabling data-driven optimization.
Finding optimization opportunities
Step 1: Identify high-consumption requests
- Filter to Live mode in Request Logs
- Look for requests with unusually high token counts
- Click to inspect the request body
Red flags:
Normal request: 800 prompt + 200 completion = 1,000 totalHigh consumption: 4,500 prompt + 800 completion = 5,300 total ← 5x higherStep 2: Analyze prompt structure
Common sources of high prompt tokens:
1// Problem: Repeated system instructions (85 tokens, sent with every request)2{3 "role": "system",4 "content": "You are an expert customer service agent for Acme Corporation, a leading provider of widget solutions since 1985. Our company values include excellence, integrity, and customer satisfaction. We offer three product lines: Standard Widgets, Premium Widgets, and Enterprise Widget Solutions. Each product line has specific warranty terms, return policies, and support tiers. Our standard warranty covers manufacturing defects for 12 months..."5}6 7// Solution: Condensed system prompt (32 tokens)8{9 "role": "system",10 "content": "You are Acme Corp's customer service agent. Be concise and helpful. Products: Standard, Premium, Enterprise widgets. Warranty: 12 months for defects."11}Savings: ~53 tokens × thousands of requests = significant cost reduction.
Step 3: Review conversation history management
Long conversations accumulate tokens:
Turn 1: system(32) + user(15) + assistant(80) = 127 tokensTurn 5: system(32) + 5 pairs(475) = 507 tokensTurn 10: system(32) + 10 pairs(950) = 982 tokensTurn 20: system(32) + 20 pairs(1,900) = 1,932 tokensTurn 50: system(32) + 50 pairs(4,750) = 4,782 tokens ← Expensive!Solution: Sliding window + summarization
1// Keep only the last 5 turns, summarize older ones2{3 "messages": [4 {"role": "system", "content": "..."},5 {"role": "system", "content": "Context: User previously discussed product returns for Order #1234 and asked about shipping to Canada."},6 {"role": "user", "content": "Turn 6 message"},7 {"role": "assistant", "content": "Turn 6 response"},8 {"role": "user", "content": "Turn 7 message"},9 {"role": "assistant", "content": "Turn 7 response"},10 {"role": "user", "content": "Current message"}11 ]12}Model selection for cost efficiency
Not every task needs the most expensive model. Use Request Logs to identify where cheaper models would suffice:
| Task | Expensive model | Cost-effective model | Savings |
|---|---|---|---|
| Routing/classification | gpt-4o ($0.01/1K) | gpt-4o-mini ($0.0002/1K) | ~50x |
| Simple Q&A | claude-3-opus | claude-3-haiku | ~30x |
| Content summarization | gpt-4o | gpt-4o-mini | ~50x |
| Complex reasoning | gpt-4o (baseline) | gpt-4o (keep) | — |
How to identify model candidates
- Filter logs to specific task types (by workflow or event name)
- Compare response quality between models
- If a cheaper model produces acceptable results, switch
Setting token budgets
Use max_tokens to prevent runaway output:
1// Without max_tokens: model could generate 4,000+ tokens2{3 "model": "gpt-4o",4 "messages": [{"role": "user", "content": "Describe our product"}]5}6 7// With max_tokens: output capped at 200 tokens8{9 "model": "gpt-4o",10 "messages": [{"role": "user", "content": "Describe our product in 2-3 sentences"}],11 "max_tokens": 20012}Monitor in Request Logs: If finish_reason is "length", the response was truncated. Adjust max_tokens accordingly.
Measuring optimization impact
After making changes, compare in Request Logs:
Before optimization (week of Feb 3): Avg prompt tokens: 1,850 Avg completion tokens: 420 Avg total tokens: 2,270 Estimated weekly cost: $42.30 After optimization (week of Feb 10): Avg prompt tokens: 680 (-63%) Avg completion tokens: 380 (-10%) Avg total tokens: 1,060 (-53%) Estimated weekly cost: $19.80 (-53%)Next steps
- Duration & Performance — Balance cost with speed
- Cost Analysis Use Case — Detailed cost analysis guide
- Back to Best Practices — Return to the overview