Analyze and optimize AI costs with Request Logs

Track token usage and pricing per request to understand spending patterns, compare provider costs, and make data-driven optimization decisions.

Overview

AI costs can quickly grow as usage scales. Request Logs capture detailed token usage and estimated pricing for every request, giving you the data needed to understand where your money goes and how to reduce spending without sacrificing quality.


Understanding cost data in Request Logs

Token usage breakdown

Every request logs three key metrics:

MetricDescriptionCost impact
Prompt tokensTokens in your request (messages, system prompt, context)Input cost — typically lower per-token rate
Completion tokensTokens generated by the AI modelOutput cost — typically higher per-token rate
Total tokensSum of prompt + completion tokensOverall usage metric

Price estimation

The Price column shows the estimated cost per request, calculated from token usage and provider pricing:

Request #1: gpt-4o
Prompt tokens: 1,250
Completion tokens: 380
Estimated cost: $0.0124
 
Request #2: claude-3-5-sonnet
Prompt tokens: 1,250
Completion tokens: 420
Estimated cost: $0.0098

Note: Prices shown are estimates based on published provider rates. For accurate billing, always refer to your provider's dashboard.


Step-by-step cost analysis

1. Identify high-cost requests

  1. Navigate to Request Logs and filter to Live mode
  2. Sort by the Price column (if visible) or review token counts
  3. Look for requests with unusually high token usage
  4. Click to inspect the request body to understand why

Common causes of high token usage:

  • Large system prompts — Repeated context or instructions in every request
  • Long conversation history — Sending full chat history without summarization
  • Verbose structured outputs — Complex JSON schemas that increase output tokens
  • Unhelpful retries — Failed requests that still consume tokens

2. Compare provider costs

For the same type of request, compare costs across providers:

Same prompt across providers
OpenAI gpt-4o 1,250 in / 380 out $0.0124
Anthropic claude-3.5 1,250 in / 420 out $0.0098
Google gemini-1.5-pro 1,250 in / 395 out $0.0071

Use this data to make informed decisions about provider selection in your workflows.

Review logs across time periods to identify trends:

  • Daily cost spikes — Unusual traffic patterns driving up costs
  • Growing token usage — Conversation histories or prompts gradually increasing
  • New model costs — Impact of switching to a more expensive model
  • Failover costs — Hidden costs from failed provider attempts (they still consume tokens for the failed attempt)

4. Calculate true request cost (including failovers)

When a request triggers failovers, the true cost includes all attempts:

Request lifecycle:
OpenAI gpt-4o FAILED 320 tokens $0.0032
Anthropic claude FAILED 310 tokens $0.0025
Google gemini SUCCESS 715 tokens $0.0071
 
True cost: $0.0032 + $0.0025 + $0.0071 = $0.0128
Displayed cost: $0.0071 (only the successful request)

Important: Failed provider attempts may still consume tokens and incur costs with the provider. Check the timeline's failed model attempts for the complete picture.


Cost optimization strategies

Optimize system prompts

Reduce prompt tokens by keeping system prompts concise:

JSON
1// Before: 450 tokens
2{
3 "role": "system",
4 "content": "You are a helpful assistant for Acme Corp. You should always be polite and professional. You have access to our product catalog which includes electronics, clothing, and home goods. When a customer asks about returns, refer them to our return policy which allows returns within 30 days with receipt. For shipping questions, we offer free shipping on orders over $50..."
5}
6 
7// After: 180 tokens
8{
9 "role": "system",
10 "content": "You are Acme Corp's assistant. Key policies: Returns within 30 days with receipt. Free shipping over $50. Products: electronics, clothing, home goods. Be concise and professional."
11}

Impact: ~60% reduction in prompt tokens per request.

Implement conversation summarization

Instead of sending the full chat history, periodically summarize:

JSON
1// Before: sending 20 messages (2,500 tokens)
2"messages": [
3 {"role": "system", "content": "..."},
4 {"role": "user", "content": "message 1"},
5 {"role": "assistant", "content": "response 1"},
6 // ... 18 more messages
7]
8 
9// After: summarize older messages (800 tokens)
10"messages": [
11 {"role": "system", "content": "..."},
12 {"role": "system", "content": "Previous conversation summary: User asked about product returns and shipping options. They're interested in the Premium Widget in blue."},
13 {"role": "user", "content": "latest message"},
14 {"role": "assistant", "content": "latest response"}
15]

Choose the right model for the task

Not every request needs the most powerful model:

Task typeRecommended modelCost savings
Simple Q&A, classificationgpt-4o-mini, claude-3-haiku5-10x cheaper
Content generationgpt-4o, claude-3.5-sonnetBaseline
Complex reasoning, analysisgpt-4o, claude-3-opusPremium — use only when needed

Monitor and set alerts

  • Set up cost thresholds and alert when daily spending exceeds normal levels
  • Review the Cost Analytics dashboard regularly
  • Track cost-per-request trends to catch gradual increases

Next steps