Streaming Responses – ModelRiver Docs

Streaming allows your application to receive AI-generated tokens as they're produced, rather than waiting for the entire response. This dramatically reduces time-to-first-token and creates a more responsive user experience.

Enabling streaming

Set "stream": true in your request body to receive Server-Sent Events (SSE):

Python (OpenAI SDK)

PYTHON

1from openai import OpenAI
2 
3client = OpenAI(
4    base_url="https://api.modelriver.com/v1",
5    api_key="mr_live_YOUR_API_KEY"
6)
7 
8stream = client.chat.completions.create(
9    model="my_workflow",
10    messages=[{"role": "user", "content": "Tell me a story"}],
11    stream=True
12)
13 
14for chunk in stream:
15    if chunk.choices[0].delta.content:
16        print(chunk.choices[0].delta.content, end="")

Node.js (OpenAI SDK)

JAVASCRIPT

1import OpenAI from "openai";
2 
3const client = new OpenAI({
4  baseURL: "https://api.modelriver.com/v1",
5  apiKey: "mr_live_YOUR_API_KEY",
6});
7 
8const stream = await client.chat.completions.create({
9  model: "my_workflow",
10  messages: [{ role: "user", content: "Tell me a story" }],
11  stream: true,
12});
13 
14for await (const chunk of stream) {
15  const content = chunk.choices[0]?.delta?.content;
16  if (content) process.stdout.write(content);
17}

Native API (cURL)

Bash

curl -X POST https://api.modelriver.com/v1/ai \
  -H "Authorization: Bearer mr_live_your_key" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "workflow": "my-workflow",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

Vercel AI SDK (React)

TYPESCRIPT

1import { useChat } from "ai/react";
2 
3export function Chat() {
4  const { messages, input, handleInputChange, handleSubmit } = useChat({
5    api: "/api/chat",
6  });
7 
8  return (
9    <div>
10      {messages.map((m) => (
11        <div key={m.id}>
12          <strong>{m.role}:</strong> {m.content}
13        </div>
14      ))}
15      <form onSubmit={handleSubmit}>
16        <input value={input} onChange={handleInputChange} />
17        <button type="submit">Send</button>
18      </form>
19    </div>
20  );
21}

SSE event format

Each Server-Sent Event follows the OpenAI delta format:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
 
data: [DONE]

Chunk structure

Field	Description
`id`	Consistent across all chunks in a stream
`object`	Always `"chat.completion.chunk"`
`choices[0].delta.role`	Set to `"assistant"` in the first chunk
`choices[0].delta.content`	The incremental text content
`choices[0].finish_reason`	`null` during streaming, `"stop"` on the final content chunk
`[DONE]`	Sentinel value indicating stream end

Streaming features

Heartbeat

ModelRiver sends heartbeat comments (: heartbeat) every 15 seconds to keep connections alive. This prevents proxy servers and load balancers from closing idle connections.

Timeout

Streaming requests have a 5-minute timeout for long-running requests. If the provider doesn't begin generating within this window, the stream closes with an error event.

Stream termination

The stream ends with data: [DONE] to signal completion. Always check for this sentinel value to properly close your connection and handle cleanup.

Error handling in streams

If an error occurs during streaming, ModelRiver sends an error event before closing the stream:

data: {"error":{"message":"Provider timeout","type":"upstream_error","code":"timeout"}}
 
data: [DONE]

Handling stream errors

PYTHON

1try:
2    stream = client.chat.completions.create(
3        model="my_workflow",
4        messages=[{"role": "user", "content": "Hello"}],
5        stream=True
6    )
7    
8    for chunk in stream:
9        if chunk.choices[0].delta.content:
10            print(chunk.choices[0].delta.content, end="")
11except Exception as e:
12    print(f"Stream error: {e}")
13    # Implement retry logic or fallback

JAVASCRIPT

1try {
2  const stream = await client.chat.completions.create({
3    model: "my_workflow",
4    messages: [{ role: "user", content: "Hello" }],
5    stream: true,
6  });
7 
8  for await (const chunk of stream) {
9    const content = chunk.choices[0]?.delta?.content;
10    if (content) process.stdout.write(content);
11  }
12} catch (error) {
13  console.error("Stream error:", error.message);
14  // Implement retry logic or fallback
15}

Streaming with function calling

When streaming responses that include tool calls, the tool call data arrives incrementally:

PYTHON

1stream = client.chat.completions.create(
2    model="my_workflow",
3    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
4    tools=[{
5        "type": "function",
6        "function": {
7            "name": "get_weather",
8            "description": "Get weather for a location",
9            "parameters": {
10                "type": "object",
11                "properties": {
12                    "location": {"type": "string"}
13                }
14            }
15        }
16    }],
17    stream=True
18)
19 
20tool_calls = []
21for chunk in stream:
22    delta = chunk.choices[0].delta
23    if delta.tool_calls:
24        for tc in delta.tool_calls:
25            if tc.index >= len(tool_calls):
26                tool_calls.append({"id": tc.id, "function": {"name": "", "arguments": ""}})
27            if tc.function.name:
28                tool_calls[tc.index]["function"]["name"] = tc.function.name
29            if tc.function.arguments:
30                tool_calls[tc.index]["function"]["arguments"] += tc.function.arguments
31 
32print("Tool calls:", tool_calls)

For more details on tool use, see Function calling.

Best practices

Use streaming for user-facing chat: Reduces perceived latency significantly
Buffer before rendering: Consider buffering a few tokens before displaying to avoid jittery output
Handle disconnections: Implement reconnection logic for long streams
Set reasonable timeouts: Don't rely solely on the server timeout; implement client-side timeouts
Process [DONE] sentinel: Always handle stream termination properly
Use SDKs when possible: They handle SSE parsing, error handling, and reconnection automatically

Next steps

Function calling: Use tools with streaming
OpenAI compatibility: Compatible SDKs
Error handling: Complete error reference

Real-time streaming with Server-Sent Events

Enabling streaming

Python (OpenAI SDK)

Node.js (OpenAI SDK)

Native API (cURL)

Vercel AI SDK (React)

SSE event format

Chunk structure

Streaming features

Heartbeat

Timeout

Stream termination

Error handling in streams

Handling stream errors

Streaming with function calling

Best practices

Next steps