Overview
Response time directly impacts user experience. A chatbot that takes 8 seconds to respond feels broken; one that responds in 1-2 seconds feels magical. Request Logs capture precise duration data for every request and every provider attempt, giving you the tools to monitor, compare, and optimize performance.
Key performance metrics
Request duration
The Duration column shows end-to-end latency in milliseconds:
| Duration range | User experience | Action |
|---|---|---|
| < 1,000ms | Excellent — feels instant | No action needed |
| 1,000-3,000ms | Good — acceptable for most use cases | Monitor |
| 3,000-5,000ms | Moderate — noticeable delay | Investigate |
| > 5,000ms | Poor — users may abandon | Immediate investigation |
Time to first token (streaming)
For streaming requests, the key metric is how quickly the first token arrives. While Request Logs show total duration, you can estimate first-token latency by comparing streaming vs. non-streaming requests with similar prompts.
Failover latency impact
Each failed provider attempt adds latency. A request with 3 failover attempts may show:
Attempt 1: OpenAI failed after 2,100msAttempt 2: Anthropic failed after 1,800msAttempt 3: Google success in 890ms ─────────Total effective latency: 4,790msWithout failovers, this request would have completed in ~890ms.
Step-by-step performance monitoring
1. Establish baseline metrics
Before you can identify anomalies, establish what "normal" looks like:
- Filter to Live mode and review 7 days of requests
- Note typical duration ranges for each provider/model
- Record average token counts per request type
- Document typical failover frequency
Example baseline:
Provider: OpenAI gpt-4o P50 latency: 1,200ms P95 latency: 3,400ms P99 latency: 5,100ms Avg tokens: 1,800 total Provider: Anthropic claude-3.5-sonnet P50 latency: 1,100ms P95 latency: 2,800ms P99 latency: 4,200ms Avg tokens: 1,900 total2. Identify performance outliers
- Navigate to Request Logs and filter to Live mode
- Look for requests with abnormally high duration values
- Click to inspect — check the timeline for:
- Multiple failover attempts — Primary cause of unexpected latency
- High token counts — More tokens = longer processing
- Specific provider/model — Some models are consistently slower
3. Compare provider performance
Run the same prompt through different providers to compare:
┌────────────────────────────────────────────────────────┐│ Provider Performance Comparison ││ ││ Provider Latency Tokens Cost ││ ──────────────────────────────────────────────────── ││ OpenAI gpt-4o-mini 420ms 800 $0.0004 ││ OpenAI gpt-4o 1,200ms 780 $0.0120 ││ Anthropic claude-3.5 1,100ms 820 $0.0098 ││ Google gemini-1.5 950ms 810 $0.0071 ││ Anthropic claude-3 2,100ms 790 $0.0450 │└────────────────────────────────────────────────────────┘Use this data to configure your workflow's provider priority for the best balance of speed, quality, and cost.
4. Monitor peak-hour performance
Performance often degrades during peak usage:
- Filter logs to specific time windows (e.g., 2-6 PM)
- Compare duration statistics to off-peak hours
- Look for higher failover rates during peak times
- Check if provider rate limits are being hit
Performance optimization strategies
Optimize provider selection
Configure your workflow to use the fastest reliable provider as primary:
Workflow provider priority: 1. Google Gemini 1.5 Pro (fastest, cost-effective) 2. Anthropic Claude 3.5 (reliable, good quality) 3. OpenAI GPT-4o (fallback, highest quality)Reduce token count
Fewer tokens = faster processing:
- Trim system prompts — Remove unnecessary instructions
- Limit conversation history — Summarize older messages
- Set appropriate
max_tokens— Don't allow more output than needed - Use structured outputs — Constrained outputs are typically faster
Minimize failovers
Failovers are the biggest source of unexpected latency:
- Monitor provider reliability — Use Provider Reliability data to choose stable providers
- Configure sensible timeouts — Don't wait too long for a slow provider before failing over
- Keep provider credentials current — Expired API keys cause immediate failures
Consider model-task matching
Use faster models for simpler tasks:
- Classification, routing →
gpt-4o-mini(< 500ms typical) - Content generation →
gpt-4oorclaude-3.5-sonnet(1-2s typical) - Complex analysis →
claude-3-opus(2-4s typical, but highest quality)
Setting up performance alerts
Based on your baseline metrics, set alerts for:
- P95 latency exceeding 2x baseline — Early warning for degradation
- Failover rate exceeding 10% — Provider instability
- Duration exceeding absolute threshold — e.g., 5,000ms for user-facing requests
- Consistent slowdown over time — Gradual degradation may indicate growing prompt sizes
Next steps
- Troubleshooting Failures — When performance issues become failures
- Provider Reliability — Choose reliable providers
- Performance Monitoring Dashboard — Aggregated views
- Back to Observability — Return to the overview