Real-time

Streaming responses over WebSockets

Show responses as they generate for a snappy, real-time experience. No waiting for the full response.

WebSocket channels Live status events Token-by-token Logged completion

Visual

Stream journey

WebSocket session emits partials, tool calls, and the final message.

01

Client connects

WebSocket channel established

02

ModelRiver streams

Backpressure-aware delivery

03

Tokens streamed

Partial responses as they generate

04

Tool calls (optional)

Function names + arguments

05

Final message

Completion + metrics

06

Logs + channel close

Request log captured, connection ends

Channel example
channel_id: "ws_9f2b..."
websocket_url: "wss://api.modelriver.com/ws"
status: "pending"
stream: true
events:
  - type: "tokens"
    data: "Once upon..."
  - type: "final"
    latency_ms: 1240
              
1

Instant UX

Render partial text while the model runs so users feel progress immediately.

2

Single source of truth

Streaming and finals are tracked in request logs with tokens and timing.

3

Fallback aware

If a provider fails mid-stream, failover continues the response on a healthy model.

UX speed

<200ms

Typical time to first token after connect.

Event types

tokens · status

Progress you can pipe straight to UI.

Observability

Logs on finish

Final response and metrics captured.

Scroll the stream

01 · Connect

Open the WebSocket with channel_id and auth.

02 · Receive

Render tokens and status events as they arrive.

03 · Finish

Capture final payload and metrics in logs.

04 · Recover

If a provider fails, failover continues the stream.

Event-Driven Workflows with Real-Time Updates

When using event-driven workflows, the WebSocket channel provides intermediate status updates. Your frontend receives status: "ai_generated" when the AI completes, then status: "completed" after your backend processes and calls back. This keeps users informed throughout the entire workflow.

Use cases

  • Chat UIs that need token-by-token updates.
  • Dashboards that monitor long-running tasks.
  • Any flow where perceived latency matters.

What's unique

  • Same channel_id also appears in request logs.
  • Plays nicely with structured outputs and webhooks.
  • Built-in failover if a stream errors mid-flight.

Programmatic access

Use the async API + WebSocket for real-time streaming

// 1. Backend: Start async request
POST https://api.modelriver.com/v1/ai/async
{
  "workflow": "chat-assistant",
  "messages": [...]
}
// Returns: { channel_id, ws_token, websocket_url }

// 2. Frontend: Connect via SDK
import { ModelRiverClient } from '@modelriver/client';

const client = new ModelRiverClient({
  baseUrl: 'wss://api.modelriver.com/socket'
});

client.on('response', (data) => {
  console.log('AI Response:', data.data);
});

client.connect({ wsToken: ws_token });

Backend calls the async API, frontend connects via WebSocket with the SDK. Automatic reconnection and failover built in.

Delight users in real time

Pair streaming with structured outputs and webhooks for reliable, verifiable completions.