Reliability

Auto failover across providers

When a provider goes down or rate-limits, requests automatically route to backups. Your users never see an error.

Health-checked providers Exponential backoff Per-request routing Structured logs

Visual

Failover path at a glance

Straight-line view of how a request moves through primary, retry, and fallback.

Simple flow · health-aware · traced

Source

User request

Router

Health-aware routing

picks fastest healthy

Primary try

Provider A

success when healthy

Retry (429/5xx)

Backoff + Provider B

next attempt if needed

Fallback

Provider C

healthy · succeeds

Logged

Trace + metrics

every hop recorded

Live flow
workflow: "smart_summary"
providers: ["openai:gpt-4o", "anthropic:sonnet", "groq:mixtral"]
on_error: "next_available"
max_attempts: 3
backoff_ms: [400, 800, 1600]
trace_id: req_92f0...
              
1

Detect degraded providers

Health windows decide eligibility before the first call is made.

2

Retry with context

Same payload and routing metadata flow through retries with backoff.

3

See every attempt

Request logs capture provider, latency, tokens, and status for each hop.

Failover speed

~450ms

Median time to recover after a provider 429/500.

Observability

Full trace

Each hop recorded in request logs with tokens & latency.

Control

Per-request

Opt-in or customize retries for specific workflows.

Scroll the playbook

01 · Prepare

Select providers with health scores and prioritize by latency or cost.

02 · Route

Route requests with retry budgets and per-workflow configs.

03 · Recover

Failover to healthy models with exponential backoff.

04 · Observe

Inspect attempts, tokens, and timing in request logs.

When to use

  • Critical user flows that must not 500.
  • Routing across OpenAI, Anthropic, and Groq with cost/latency preferences.
  • Experiments where you want automatic fallbacks without client changes.

What you get

  • Health-aware routing before calls are made.
  • Consistent payloads with structured outputs and streaming.
  • Full transparency via analytics and request logs.

Ship uptime your users feel

Pair failover with rate limiting, structured outputs, and analytics to keep experiences fast and predictable.