Streaming responses over WebSockets
Show responses as they generate for a snappy, real-time experience. No waiting for the full response.
Visual
Stream journey
WebSocket session emits partials, tool calls, and the final message.
Client connects
WebSocket channel established
ModelRiver streams
Backpressure-aware delivery
Tokens streamed
Partial responses as they generate
Tool calls (optional)
Function names + arguments
Final message
Completion + metrics
Logs + channel close
Request log captured, connection ends
channel_id: "ws_9f2b..." websocket_url: "wss://api.modelriver.com/ws" status: "pending" stream: true events: - type: "tokens" data: "Once upon..." - type: "final" latency_ms: 1240
Instant UX
Render partial text while the model runs so users feel progress immediately.
Single source of truth
Streaming and finals are tracked in request logs with tokens and timing.
Fallback aware
If a provider fails mid-stream, failover continues the response on a healthy model.
UX speed
<200ms
Typical time to first token after connect.
Event types
tokens · status
Progress you can pipe straight to UI.
Observability
Logs on finish
Final response and metrics captured.
01 · Connect
Open the WebSocket with channel_id and auth.
02 · Receive
Render tokens and status events as they arrive.
03 · Finish
Capture final payload and metrics in logs.
04 · Recover
If a provider fails, failover continues the stream.
Event-Driven Workflows with Real-Time Updates
When using event-driven workflows, the WebSocket channel provides intermediate status updates. Your frontend receives status: "ai_generated" when the AI completes, then status: "completed" after your backend processes and calls back. This keeps users informed throughout the entire workflow.
Use cases
- ● Chat UIs that need token-by-token updates.
- ● Dashboards that monitor long-running tasks.
- ● Any flow where perceived latency matters.
What's unique
- ● Same channel_id also appears in request logs.
- ● Plays nicely with structured outputs and webhooks.
- ● Built-in failover if a stream errors mid-flight.
Programmatic access
Use the async API + WebSocket for real-time streaming
// 1. Backend: Start async request POST https://api.modelriver.com/v1/ai/async { "workflow": "chat-assistant", "messages": [...] } // Returns: { channel_id, ws_token, websocket_url } // 2. Frontend: Connect via SDK import { ModelRiverClient } from '@modelriver/client'; const client = new ModelRiverClient({ baseUrl: 'wss://api.modelriver.com/socket' }); client.on('response', (data) => { console.log('AI Response:', data.data); }); client.connect({ wsToken: ws_token });
Backend calls the async API, frontend connects via WebSocket with the SDK. Automatic reconnection and failover built in.
Delight users in real time
Pair streaming with structured outputs and webhooks for reliable, verifiable completions.