Streaming responses over WebSockets
Show responses as they generate for a snappy, real-time experience. No waiting for the full response.
Visual
Stream journey
WebSocket session emits partials, tool calls, and the final message.
Client
WebSocket channel
ModelRiver stream
backpressure-aware
Tokens
streamed partials
Tool calls
functions + args
Final message
completion + metrics
Logs + close
request log + channel end
channel_id: "ws_9f2b..."
websocket_url: "wss://api.modelriver.com/ws"
status: "pending"
stream: true
events:
- type: "tokens"
data: "Once upon..."
- type: "final"
latency_ms: 1240
Instant UX
Render partial text while the model runs so users feel progress immediately.
Single source of truth
Streaming and finals are tracked in request logs with tokens and timing.
Fallback aware
If a provider fails mid-stream, failover continues the response on a healthy model.
UX speed
<200ms
Typical time to first token after connect.
Event types
tokens · status
Progress you can pipe straight to UI.
Observability
Logs on finish
Final response and metrics captured.
01 · Connect
Open the WebSocket with channel_id and auth.
02 · Receive
Render tokens and status events as they arrive.
03 · Finish
Capture final payload and metrics in logs.
04 · Recover
If a provider fails, failover continues the stream.
Use cases
- ● Chat UIs that need token-by-token updates.
- ● Dashboards that monitor long-running tasks.
- ● Any flow where perceived latency matters.
What’s unique
- ● Same channel_id also appears in request logs.
- ● Plays nicely with structured outputs and webhooks.
- ● Built-in failover if a stream errors mid-flight.
Delight users in real time
Pair streaming with structured outputs and webhooks for reliable, verifiable completions.