Documentation

Real-time AI Streaming

Deliver AI responses character-by-character. Learn how to use WebSockets to build lightning-fast interfaces that users love.

In the world of AI, latency is the biggest enemy of user engagement. Waiting 5–10 seconds for a full response feels slow. Real-time AI streaming allows you to show words as they are generated, making your application feel instant and interactive.

How AI streaming works

Most Large Language Models (LLMs) generate text token by token. Instead of waiting for the model to finish, ModelRiver captures these tokens and pushes them to your application immediately.

Polling vs. Streaming

  • Polling: Your app repeatedly asks the server "Is it ready yet?". This adds significant overhead and latency.
  • Streaming: The server pushes data to your app as it becomes available. This is the gold standard for AI interfaces.

Implementing streaming with ModelRiver

ModelRiver provides two primary ways to handle real-time delivery:

1. Server-Sent Events (SSE)

Standard HTTP streaming. Best for simple integrations where you handle the connection on your own server.

ModelRiver's native WebSocket support allows for the lowest possible latency. Using the @modelriver/client SDK, you can connect directly to the ModelRiver gateway.

JAVASCRIPT
1import { useModelRiver } from '@modelriver/client';
2 
3const { connect, message } = useModelRiver();
4 
5// Connect to a specific AI request stream
6connect({ websocket_url, ws_token, channel_id });
7 
8// 'message' updates automatically in real-time

Optimizing for perceived performance

Streaming isn't just about faster data—it's about better UX.

  • Immediate Feedback: Show a "Thinking..." state the moment the user hits send.
  • Smooth Animation: Render tokens with a slight fade-in or slide-up effect.
  • Auto-scroll: Ensure the chat window stays at the bottom as new content flows in.

Handling edge cases in streaming

Connection drops

The @modelriver/client SDK includes automatic reconnection logic. If a user loses Wi-Fi for a moment, the stream can resume without losing context.

Incomplete JSON

When using Structured Outputs, streaming partial JSON can be tricky. ModelRiver handles this by buffering or providing validated partial chunks so your UI doesn't break.

Why use ModelRiver for streaming?

Building a robust streaming infrastructure is hard. You have to manage WebSocket states, handle cross-region connections, and deal with provider-specific streaming formats.

ModelRiver acts as a unified abstraction layer:

  • One protocol for OpenAI, Anthropic, and Gemini.
  • Built-in failover support—even mid-stream.
  • Dedicated CLI tool for testing streams locally.

Next steps