LlamaIndex builds the retrieval. Production breaks everything else.
ModelRiver adds the production layer LlamaIndex doesn't include — auto-failover, built-in observability, real-time streaming, and structured outputs.
No extra tools. No callback wiring. No observability patchwork.
Auto-failover, observability, and structured outputs — built in, not bolted on.
LlamaIndex production setup
3–5 tools
ModelRiver setup
One platform
Production Infrastructure
What LlamaIndex leaves to you
1. Provider failover
When a model goes down, traffic routes to a backup automatically.
2. Full observability
Every request tracked end-to-end — no Langfuse, no Phoenix, no extra setup.
3. Real-time streaming
WebSocket streaming with auto-reconnection, persistent connections, and client SDKs.
4. Structured outputs
Enforced response contracts across any provider — not best-effort parsing.
Why teams outgrow LlamaIndex alone
Retrieval works. Everything around it doesn't.
LlamaIndex is the best RAG framework available. But once your RAG pipeline hits real users, the gaps show up fast — and they're not retrieval problems.
No native failover
LlamaIndex doesn't handle provider outages. When OpenAI returns a 429, your RAG pipeline breaks.
Observability is patchwork
Callback-based tracing requires Langfuse, Arize, or W&B — and none show the full request lifecycle.
Stateless by default
LlamaIndex Workflows are stateless. Managing state across sessions requires custom persistence code.
Streaming is limited
Real-time streaming needs custom WebSocket infrastructure — reconnection, persistence, and SDK support.
Structured outputs aren't enforced
Output parsing varies across providers. What passes with GPT-4 breaks with Claude or Mistral.
The LlamaIndex production stack
RAG works. Then you need 4 more tools to run it.
One platform
ModelRiver replaces all of that
Production infrastructure for AI workflows, in one place. Use it alongside LlamaIndex or on its own.
Keep the retrieval. Fix the production layer.
Auto-failover
Built-in observability
Real-time streaming
Structured outputs
Exact-match caching
Event-driven workflows
Beyond retrieval
LlamaIndex handles retrieval. ModelRiver handles everything that breaks after.
ModelRiver is production infrastructure for AI workflows — failover, observability, streaming, and structured outputs in one platform.
LlamaIndex gets data into your model. ModelRiver gets your model into production.
Automatic provider failover
When OpenAI hits a rate limit or Anthropic returns an error, requests automatically route to a healthy backup provider. No code changes needed.
Full lifecycle observability
Track every request from entry to response — which provider was called, what was returned, how long each step took, and where failures occurred.
Real-time WebSocket streaming
Client SDKs for React, Vue, Angular, and Svelte with auto-reconnection, persistent connections, and graceful degradation built in.
Enforced structured outputs
Define your response schema once. Every response is validated against the contract — regardless of which provider or model served it.
Comparison
LlamaIndex vs ModelRiver
| Feature | LlamaIndex | ModelRiver |
|---|---|---|
| RAG & retrieval | Best-in-class | Not a retrieval framework |
| Provider failover | Not included | Automatic, multi-provider |
| Observability | Callbacks + third-party tools | Built-in, full lifecycle |
| Real-time streaming | Basic (no client SDKs) | WebSocket + SDKs + auto-reconnect |
| Structured outputs | Output parsers (best-effort) | Enforced response contracts |
| Caching | Not built-in | Exact-match caching |
| State management | Stateless by default | Event-driven workflows |
| Production readiness | Needs additional tooling | Built-in |
Use LlamaIndex if
- •You need advanced RAG pipelines with custom chunking and retrieval strategies.
- •You are building document Q&A or enterprise knowledge bases.
- •You need fine-grained control over data ingestion and indexing.
Use ModelRiver if
- •You need auto-failover across providers without custom code.
- •You want built-in observability for every request, not patchwork tracing.
- •You need real-time streaming with client SDKs and auto-reconnection.
- •You want enforced structured outputs that work across all providers.
How it works
Use LlamaIndex for retrieval. Use ModelRiver for production.
ModelRiver is OpenAI-compatible. Route your LlamaIndex model calls through ModelRiver and gain production infrastructure without rewriting retrieval logic.
Keep your LlamaIndex pipeline
Your data ingestion, indexing, and retrieval logic stays exactly as it is.
Route model calls through ModelRiver
Point your LLM's base_url to ModelRiver. Two lines of config — no code rewrite.
Get production infrastructure
Auto-failover, structured outputs, caching, and full request lifecycle visibility — all working instantly.
Ship with confidence
Deploy knowing that provider outages, rate limits, and response format issues won't hit your users.

Visual workflow builder
Configure failover, caching, and structured outputs — no code needed.
What makes ModelRiver different
Production infrastructure, not another framework.
Infrastructure vs framework
LlamaIndex is a retrieval framework. ModelRiver is production infrastructure. They solve different problems — and work well together.
Built-in vs bolt-on
Observability, failover, and structured outputs are core to ModelRiver — not third-party integrations you wire up yourself.
OpenAI-compatible
ModelRiver uses the OpenAI API format. If your LlamaIndex code uses any OpenAI-compatible LLM, you can route it through ModelRiver with a config change.
Learn more
Docs and next reads
Build a workflow
See how ModelRiver workflows are created and configured.
Debugging docs
Inspect failures, request logs, and production behavior.
LlamaIndex integration
Route LlamaIndex model calls through ModelRiver for failover, caching, and observability.
Structured outputs
How ModelRiver enforces response contracts across every provider and model.
FAQ
Is ModelRiver a full replacement for LlamaIndex? +
No. LlamaIndex excels at document ingestion, indexing, and RAG pipelines. ModelRiver handles the production infrastructure layer — failover, observability, streaming, and structured outputs. Many teams use both together.
Can I still use LlamaIndex with ModelRiver? +
Yes. ModelRiver is OpenAI-compatible. Route your LlamaIndex model calls through ModelRiver to gain auto-failover, caching, observability, and structured outputs without rewriting your retrieval logic.
Does LlamaIndex have built-in observability? +
LlamaIndex provides callback-based instrumentation that integrates with Langfuse, Arize Phoenix, and W&B. It doesn't include a built-in observability dashboard. ModelRiver provides full request lifecycle visibility natively.
What does ModelRiver do that LlamaIndex doesn't? +
ModelRiver provides production infrastructure that LlamaIndex does not include: automatic provider failover, built-in request lifecycle observability, real-time WebSocket streaming with client SDKs, exact-match caching, and enforced structured outputs across all providers.
Keep LlamaIndex. Add production infrastructure.
Ship your RAG pipeline with failover, observability, and streaming built in.
If LlamaIndex handles your retrieval but production infrastructure is still DIY, ModelRiver fills the gap.