Documentation

Real-time streaming with Server-Sent Events

Deliver AI responses token-by-token for faster perceived performance. Includes heartbeat support, timeout handling, and examples in Python and Node.js.

Streaming allows your application to receive AI-generated tokens as they're produced, rather than waiting for the entire response. This dramatically reduces time-to-first-token and creates a more responsive user experience.

Enabling streaming

Set "stream": true in your request body to receive Server-Sent Events (SSE):

Python (OpenAI SDK)

PYTHON
1from openai import OpenAI
2 
3client = OpenAI(
4 base_url="https://api.modelriver.com/v1",
5 api_key="mr_live_YOUR_API_KEY"
6)
7 
8stream = client.chat.completions.create(
9 model="my_workflow",
10 messages=[{"role": "user", "content": "Tell me a story"}],
11 stream=True
12)
13 
14for chunk in stream:
15 if chunk.choices[0].delta.content:
16 print(chunk.choices[0].delta.content, end="")

Node.js (OpenAI SDK)

JAVASCRIPT
1import OpenAI from "openai";
2 
3const client = new OpenAI({
4 baseURL: "https://api.modelriver.com/v1",
5 apiKey: "mr_live_YOUR_API_KEY",
6});
7 
8const stream = await client.chat.completions.create({
9 model: "my_workflow",
10 messages: [{ role: "user", content: "Tell me a story" }],
11 stream: true,
12});
13 
14for await (const chunk of stream) {
15 const content = chunk.choices[0]?.delta?.content;
16 if (content) process.stdout.write(content);
17}

Native API (cURL)

Bash
curl -X POST https://api.modelriver.com/v1/ai \
-H "Authorization: Bearer mr_live_your_key" \
-H "Content-Type: application/json" \
-N \
-d '{
"workflow": "my-workflow",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'

Vercel AI SDK (React)

TYPESCRIPT
1import { useChat } from "ai/react";
2 
3export function Chat() {
4 const { messages, input, handleInputChange, handleSubmit } = useChat({
5 api: "/api/chat",
6 });
7 
8 return (
9 <div>
10 {messages.map((m) => (
11 <div key={m.id}>
12 <strong>{m.role}:</strong> {m.content}
13 </div>
14 ))}
15 <form onSubmit={handleSubmit}>
16 <input value={input} onChange={handleInputChange} />
17 <button type="submit">Send</button>
18 </form>
19 </div>
20 );
21}

SSE event format

Each Server-Sent Event follows the OpenAI delta format:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
 
data: [DONE]

Chunk structure

FieldDescription
idConsistent across all chunks in a stream
objectAlways "chat.completion.chunk"
choices[0].delta.roleSet to "assistant" in the first chunk
choices[0].delta.contentThe incremental text content
choices[0].finish_reasonnull during streaming, "stop" on the final content chunk
[DONE]Sentinel value indicating stream end

Streaming features

Heartbeat

ModelRiver sends heartbeat comments (: heartbeat) every 15 seconds to keep connections alive. This prevents proxy servers and load balancers from closing idle connections.

Timeout

Streaming requests have a 5-minute timeout for long-running requests. If the provider doesn't begin generating within this window, the stream closes with an error event.

Stream termination

The stream ends with data: [DONE] to signal completion. Always check for this sentinel value to properly close your connection and handle cleanup.


Error handling in streams

If an error occurs during streaming, ModelRiver sends an error event before closing the stream:

data: {"error":{"message":"Provider timeout","type":"upstream_error","code":"timeout"}}
 
data: [DONE]

Handling stream errors

PYTHON
1try:
2 stream = client.chat.completions.create(
3 model="my_workflow",
4 messages=[{"role": "user", "content": "Hello"}],
5 stream=True
6 )
7
8 for chunk in stream:
9 if chunk.choices[0].delta.content:
10 print(chunk.choices[0].delta.content, end="")
11except Exception as e:
12 print(f"Stream error: {e}")
13 # Implement retry logic or fallback
JAVASCRIPT
1try {
2 const stream = await client.chat.completions.create({
3 model: "my_workflow",
4 messages: [{ role: "user", content: "Hello" }],
5 stream: true,
6 });
7 
8 for await (const chunk of stream) {
9 const content = chunk.choices[0]?.delta?.content;
10 if (content) process.stdout.write(content);
11 }
12} catch (error) {
13 console.error("Stream error:", error.message);
14 // Implement retry logic or fallback
15}

Streaming with function calling

When streaming responses that include tool calls, the tool call data arrives incrementally:

PYTHON
1stream = client.chat.completions.create(
2 model="my_workflow",
3 messages=[{"role": "user", "content": "What's the weather in Paris?"}],
4 tools=[{
5 "type": "function",
6 "function": {
7 "name": "get_weather",
8 "description": "Get weather for a location",
9 "parameters": {
10 "type": "object",
11 "properties": {
12 "location": {"type": "string"}
13 }
14 }
15 }
16 }],
17 stream=True
18)
19 
20tool_calls = []
21for chunk in stream:
22 delta = chunk.choices[0].delta
23 if delta.tool_calls:
24 for tc in delta.tool_calls:
25 if tc.index >= len(tool_calls):
26 tool_calls.append({"id": tc.id, "function": {"name": "", "arguments": ""}})
27 if tc.function.name:
28 tool_calls[tc.index]["function"]["name"] = tc.function.name
29 if tc.function.arguments:
30 tool_calls[tc.index]["function"]["arguments"] += tc.function.arguments
31 
32print("Tool calls:", tool_calls)

For more details on tool use, see Function calling.


Best practices

  1. Use streaming for user-facing chat: Reduces perceived latency significantly
  2. Buffer before rendering: Consider buffering a few tokens before displaying to avoid jittery output
  3. Handle disconnections: Implement reconnection logic for long streams
  4. Set reasonable timeouts: Don't rely solely on the server timeout; implement client-side timeouts
  5. Process [DONE] sentinel: Always handle stream termination properly
  6. Use SDKs when possible: They handle SSE parsing, error handling, and reconnection automatically

Next steps