Overview
AI costs can quickly grow as usage scales. Request Logs capture detailed token usage and estimated pricing for every request, giving you the data needed to understand where your money goes and how to reduce spending without sacrificing quality.
Understanding cost data in Request Logs
Token usage breakdown
Every request logs three key metrics:
| Metric | Description | Cost impact |
|---|---|---|
| Prompt tokens | Tokens in your request (messages, system prompt, context) | Input cost — typically lower per-token rate |
| Completion tokens | Tokens generated by the AI model | Output cost — typically higher per-token rate |
| Total tokens | Sum of prompt + completion tokens | Overall usage metric |
Price estimation
The Price column shows the estimated cost per request, calculated from token usage and provider pricing:
Request #1: gpt-4o Prompt tokens: 1,250 Completion tokens: 380 Estimated cost: $0.0124 Request #2: claude-3-5-sonnet Prompt tokens: 1,250 Completion tokens: 420 Estimated cost: $0.0098Note: Prices shown are estimates based on published provider rates. For accurate billing, always refer to your provider's dashboard.
Step-by-step cost analysis
1. Identify high-cost requests
- Navigate to Request Logs and filter to Live mode
- Sort by the Price column (if visible) or review token counts
- Look for requests with unusually high token usage
- Click to inspect the request body to understand why
Common causes of high token usage:
- Large system prompts — Repeated context or instructions in every request
- Long conversation history — Sending full chat history without summarization
- Verbose structured outputs — Complex JSON schemas that increase output tokens
- Unhelpful retries — Failed requests that still consume tokens
2. Compare provider costs
For the same type of request, compare costs across providers:
┌──────────────────────────────────────────────────────────┐│ Same prompt across providers ││ ││ OpenAI gpt-4o 1,250 in / 380 out $0.0124 ││ Anthropic claude-3.5 1,250 in / 420 out $0.0098 ││ Google gemini-1.5-pro 1,250 in / 395 out $0.0071 │└──────────────────────────────────────────────────────────┘Use this data to make informed decisions about provider selection in your workflows.
3. Track cost trends over time
Review logs across time periods to identify trends:
- Daily cost spikes — Unusual traffic patterns driving up costs
- Growing token usage — Conversation histories or prompts gradually increasing
- New model costs — Impact of switching to a more expensive model
- Failover costs — Hidden costs from failed provider attempts (they still consume tokens for the failed attempt)
4. Calculate true request cost (including failovers)
When a request triggers failovers, the true cost includes all attempts:
Request lifecycle: ⚠ OpenAI gpt-4o FAILED 320 tokens $0.0032 ⚠ Anthropic claude FAILED 310 tokens $0.0025 ✓ Google gemini SUCCESS 715 tokens $0.0071 True cost: $0.0032 + $0.0025 + $0.0071 = $0.0128 Displayed cost: $0.0071 (only the successful request)Important: Failed provider attempts may still consume tokens and incur costs with the provider. Check the timeline's failed model attempts for the complete picture.
Cost optimization strategies
Optimize system prompts
Reduce prompt tokens by keeping system prompts concise:
1// Before: 450 tokens2{3 "role": "system",4 "content": "You are a helpful assistant for Acme Corp. You should always be polite and professional. You have access to our product catalog which includes electronics, clothing, and home goods. When a customer asks about returns, refer them to our return policy which allows returns within 30 days with receipt. For shipping questions, we offer free shipping on orders over $50..."5}6 7// After: 180 tokens8{9 "role": "system",10 "content": "You are Acme Corp's assistant. Key policies: Returns within 30 days with receipt. Free shipping over $50. Products: electronics, clothing, home goods. Be concise and professional."11}Impact: ~60% reduction in prompt tokens per request.
Implement conversation summarization
Instead of sending the full chat history, periodically summarize:
1// Before: sending 20 messages (2,500 tokens)2"messages": [3 {"role": "system", "content": "..."},4 {"role": "user", "content": "message 1"},5 {"role": "assistant", "content": "response 1"},6 // ... 18 more messages7]8 9// After: summarize older messages (800 tokens)10"messages": [11 {"role": "system", "content": "..."},12 {"role": "system", "content": "Previous conversation summary: User asked about product returns and shipping options. They're interested in the Premium Widget in blue."},13 {"role": "user", "content": "latest message"},14 {"role": "assistant", "content": "latest response"}15]Choose the right model for the task
Not every request needs the most powerful model:
| Task type | Recommended model | Cost savings |
|---|---|---|
| Simple Q&A, classification | gpt-4o-mini, claude-3-haiku | 5-10x cheaper |
| Content generation | gpt-4o, claude-3.5-sonnet | Baseline |
| Complex reasoning, analysis | gpt-4o, claude-3-opus | Premium — use only when needed |
Monitor and set alerts
- Set up cost thresholds and alert when daily spending exceeds normal levels
- Review the Cost Analytics dashboard regularly
- Track cost-per-request trends to catch gradual increases
Next steps
- Performance Monitoring — Balance cost with speed
- Provider Reliability — Factor reliability into cost decisions
- Cost Analytics Dashboard — Aggregated cost views
- Back to Observability — Return to the overview