Why Elixir + Phoenix Are Ideal for Building Scalable, Fault-Tolerant AI Gateways

The stack nobody talks about
Every AI gateway and orchestration platform in 2026 runs on one of three stacks: Python + FastAPI, TypeScript + Node, or Go. That's the default mental model. If you search "how to build an AI gateway," every tutorial, every open-source boilerplate, and every HackerNews thread assumes one of those three.
I'm going to make the case for a fourth option that almost nobody considers, and that I believe is architecturally superior for this specific class of problem: Elixir + Phoenix, running on the Erlang VM (BEAM).
This isn't a hot take for attention. I've spent over a year building a production AI gateway on this stack — handling multi-provider routing, failover chains, async job queues, WebSocket delivery, prompt caching, encrypted secrets management, rate limiting, webhook delivery, metered billing, and real-time observability. Every architectural decision in this post comes from production experience, not theory.
Let me walk through why BEAM is the most natural runtime for AI infrastructure, and why the Elixir ecosystem provides the exact primitives you need — often with less code and fewer moving parts than the alternatives.
The core problem: an AI gateway is a concurrent orchestration engine
Before comparing stacks, it helps to understand what an AI gateway actually does at runtime. It's not a simple proxy. For every incoming request, the system needs to:
- Authenticate and authorize the request against project-scoped API keys
- Load workflow configuration — which provider, which model, what system instructions, whether structured output is enabled, what backup providers exist
- Check rate limits at both the IP level and the project/organization level
- Check prompt cache (ETS in-memory first, then database fallback) to avoid redundant provider calls
- Run content moderation before forwarding to the provider
- Make the actual LLM API call — with timeout budgets, failure classification, and automatic failover to backup providers
- Log the request and response with full metadata: tokens, latency, provider used, cache status, failover attempts
- Meter usage for billing — report overage events to Stripe's metered billing API
- Deliver results — synchronously via HTTP response, or asynchronously via WebSocket channels and webhook delivery with HMAC-signed payloads
- Update caches — store the result in ETS for fast subsequent lookups
Every one of these steps involves I/O: database queries, HTTP calls to providers, HTTP calls to Stripe, WebSocket broadcasts, ETS reads/writes. And every one of them needs to happen concurrently — you can't block the entire server while waiting for OpenAI to respond to a single request.
This is where the BEAM changes the game.
Why BEAM is architecturally perfect for this
Lightweight processes for request isolation
Every request in an Elixir/Phoenix application runs in its own BEAM process. Not an OS thread. Not a goroutine. A BEAM process — a lightweight, isolated unit of execution with its own heap, its own garbage collection schedule, and its own mailbox.
The practical impact:
Process #1 waiting on OpenAI does not block process #2 or #3. There's no thread pool to exhaust, no event loop to block, no async/await ceremony to manage. Concurrency is the default, not something you opt into.
In a Python/FastAPI stack, achieving this requires asyncio with careful discipline — one blocking call in the wrong place stalls the event loop. In Node, you get non-blocking I/O by default, but the single-threaded model means CPU-bound work (JSON parsing large responses, computing HMAC signatures, SHA-256 hashing for cache keys) blocks everything. In Go, goroutines are lightweight but share memory, making isolation harder to guarantee.
BEAM processes are preemptively scheduled with soft real-time guarantees. A process that's doing heavy computation gets preempted after a reduction count — it can't starve other processes. This is critical for an AI gateway where some requests take 200ms (cache hits) and others take 15 seconds (complex multi-model chains).
Supervision trees: self-healing infrastructure
An AI gateway talks to unreliable external services all day. Providers go down. Webhook endpoints return 500s. Stripe's API occasionally hiccups. The DB connection pool temporarily exhausts under load spikes.
In most stacks, you handle this with try/catch blocks, retry libraries, and health check endpoints that you monitor in PagerDuty. In Elixir, you get OTP supervision trees — a battle-tested process management framework that's been keeping telecom systems alive since the 1980s.
Here's a simplified version of what the application supervision tree looks like in production:
Each child is supervised. If the Vault process crashes because the encryption key loaded incorrectly, the supervisor restarts it. If the WebSocket token store crashes, it restarts. If the database connection pool dies and comes back, the Repo child restarts and reconnects.
The strategy: :one_for_one means each child is independent — one crashing doesn't take down the others. The boot order is sequential and deterministic: the ETS table for AWS credentials initializes before the Repo starts, so database URLs fetched from AWS Secrets Manager are available immediately.
You don't get this in Python, Node, or Go without significant manual engineering. OTP supervision has been refined for 30+ years. It's not a library — it's a fundamental runtime capability.
ETS: in-process caching without Redis
Every AI gateway needs a fast cache. The obvious choice is Redis. But Redis introduces a network hop, a separate process to manage, and another failure point.
BEAM has ETS (Erlang Term Storage) — an in-memory key-value store that lives inside the VM, accessible from any process, with constant-time lookups and no serialization overhead.
Here's how prompt caching works:
No Redis connection to manage. No serialization/deserialization. No network latency. The cache check is a single ETS lookup — sub-microsecond. If the ETS cache misses, it falls back to a database query that checks for a matching request log with the same request body hash, same workflow, and an insertion time within the cache window.
This two-tier approach means:
- Hot path: ETS returns cached responses in < 1ms
- Warm path: Database returns cached responses in ~5ms (from previous request logs)
- Cold path: Full provider call, 1-15 seconds
Redis would add ~0.5ms to every cache check. That doesn't sound like much until you're handling 10,000 requests per minute and every millisecond compounds.
The ecosystem: purpose-built tools, not glue
Oban: background job processing without a separate queue service
AI requests that run asynchronously need a reliable job queue. Most stacks reach for Celery (Python), BullMQ (Node), or a standalone message broker like RabbitMQ or SQS.
Elixir has Oban — a robust, PostgreSQL-backed job processing library that runs inside the same application. No external queue service. No separate worker process. Just another supervised child in the application tree.
Five named queues with independent concurrency limits. The ai_requests queue handles async LLM calls. The webhooks queue handles webhook delivery with retries. The billing queue reports usage to Stripe. The emails queue sends transactional emails. And a cron plugin runs periodic tasks like checking for subscription downgrades.
Each worker is pattern-matched on its job arguments:
Why this matters: the job queue, the cron scheduler, and the job transactional guarantees all come from PostgreSQL. If the server crashes mid-job, the job isn't lost — it's still in the database, marked as in-progress, and will be picked up on restart. No Redis. No RabbitMQ. No SQS. One fewer infrastructure component to operate.
Comparison:
- Python + Celery requires Redis or RabbitMQ as a broker, plus a separate worker process. Two infrastructure components in addition to your app.
- Node + BullMQ requires Redis. One additional component.
- Go typically uses a standalone queue (SQS, NATS, or RabbitMQ). One or two additional components.
- Elixir + Oban uses PostgreSQL, which you already have for your application database. Zero additional components.
Phoenix Channels: real-time delivery without WebSocket libraries
Async AI requests need a way to deliver results back to clients. The standard approach: set up a WebSocket server, handle connection upgrades, manage authentication, implement heartbeats, handle reconnection.
Phoenix has Channels built in — a real-time abstraction on top of WebSockets with presence tracking, topic-based pub/sub, and authentication baked into the connection lifecycle.
When an async AI request completes, broadcasting the result is one line:
Every connected client on that channel receives the result. No polling. No long-lived HTTP connections. No SSE with reconnection quirks. Phoenix Channels handle connection management, heartbeats, and reconnection on the client side via the official JavaScript client.
This also powers CLI webhook delivery — CLI clients connect via WebSocket, and the server broadcasts webhook payloads to all connected CLI users for a project, with HMAC-SHA256 signed payloads for verification:
Cloak: application-level encryption
AI gateways store provider API keys — OpenAI keys, Anthropic keys, customer-provided keys for bring-your-own-key scenarios. These must be encrypted at rest.
The Elixir ecosystem has Cloak — an encryption library that integrates with Ecto (the database layer) to provide transparent field-level encryption using AES-256-GCM:
The Vault starts as a supervised GenServer in the application tree. Database fields marked with Cloak.Ecto.Binary are automatically encrypted on write and decrypted on read. API keys are never stored as plaintext in the database, never logged, and never serialized to disk.
Key rotation is a runtime operation — update the key, restart the Vault process, and re-encrypt. No redeployment needed. The supervisor tree ensures the Vault is available before any process tries to access encrypted data.
Behaviours: polymorphic provider adapters
An AI gateway needs to talk to OpenAI, Anthropic, Google, Mistral, Cohere, DeepSeek, Groq, xAI, Qwen, and custom endpoints. Each provider has different API formats, authentication schemes, and response structures.
Elixir's behaviours provide compile-time contracts for this:
Each provider module implements this behaviour. The routing layer dispatches based on provider name. Adding a new provider is one module — implement complete/2, models/0, and provider_name/0. No interface boilerplate. No factory pattern. No dependency injection framework. Just a behaviour contract enforced at compile time.
Hammer: rate limiting without external state
AI gateways need multi-tier rate limiting: per-IP, per-project, per-API-key. Most stacks use Redis for distributed rate limiting.
The Elixir ecosystem has Hammer — a rate limiting library with pluggable backends. For single-node deployments, ETS is the backend. Zero external dependencies.
ETS handles the sliding window counters. No Redis. No Lua scripts. No distributed clock synchronization issues for single-node deployment.
The stack comparison: concrete trade-offs
Python + FastAPI
Strengths: Fastest path to a working prototype. The AI/ML ecosystem is Python-native — LangChain, LlamaIndex, Hugging Face, every provider SDK. If your team already knows Python, the learning curve is minimal.
Weaknesses for AI gateways:
- Concurrency is bolted on.
asyncioworks but requires discipline. One blocking call — a synchronous database query, a forgottenawait, a library that doesn't support async — stalls the event loop. In an AI gateway making hundreds of concurrent provider calls, this is a constant source of bugs. - No supervision. If a background task crashes, you need to detect it and restart it manually. Celery workers that die mid-job need external monitoring (Flower, supervisor processes) to restart.
- GIL limitations. CPU-bound work (HMAC computation, JSON parsing, cache key hashing) is limited by the Global Interpreter Lock. You can work around it with multiprocessing, but that adds complexity.
- Separate infrastructure. You need Redis for Celery, Redis for caching (or Memcached), and a separate WebSocket server (or a library like
socket.iowith its own event loop). That's three additional components.
TypeScript + Node (Express/Fastify/Hono)
Strengths: Excellent TypeScript ecosystem. Good async I/O model. Large talent pool. Vercel/Cloudflare Workers make deployment trivial for simpler use cases.
Weaknesses for AI gateways:
- Single-threaded CPU. JSON parsing a 50KB LLM response, computing SHA-256 cache keys, generating HMAC signatures — all of this blocks the event loop.
worker_threadshelp but add complexity. - No built-in job queue. You need BullMQ + Redis, or an external service like SQS. Every additional dependency is another failure point.
- WebSocket management. Libraries like
wsorsocket.iowork but don't provide Phoenix-level abstractions. You manually manage connections, rooms, authentication, heartbeats, and reconnection. Phoenix Channels handle all of this declaratively. - No supervision trees. If a background process crashes, you catch it with
process.on('uncaughtException')and hope your PM2 or Docker restarts it quickly enough.
Go
Strengths: Excellent performance. Great concurrency model with goroutines. Strong standard library. Compiles to a single binary. Perfect for high-throughput proxies.
Weaknesses for AI gateways:
- Verbose for complex domain logic. AI gateway business logic — workflow loading, structured output merging, prompt caching with two-tier fallback, multi-step failover with attempt logging — produces a lot of code in Go. Error handling alone (
if err != nil) can double the line count. - No built-in application framework. You assemble everything from libraries: router, middleware, database layer, migrations, job queue, WebSocket server, rate limiter, encryption. Each has its own conventions. The cognitive overhead compounds.
- No OTP equivalent. Go has goroutines but no supervision trees. You write your own process management, health checking, and graceful shutdown. This is fine for simple services; it's significant work for a system with 15+ long-running supervised components.
- No built-in hot code reloading. In development, you restart the server for every change. Trivial for microservices, tedious for monolithic applications with complex startup sequences.
Elixir + Phoenix
Strengths:
- Concurrency is the default. Every request is isolated. No async/await ceremony. No event loop blocking. No GIL.
- OTP supervision trees. Self-healing process management refined for 30+ years.
- ETS. In-process caching without Redis.
- Oban. PostgreSQL-backed job queue without a separate broker service.
- Phoenix Channels. Real-time WebSocket delivery with presence tracking, built in.
- Cloak. Field-level AES-256-GCM encryption for secrets.
- Behaviours. Compile-time polymorphism for provider adapters.
- Pattern matching. Cleanest error handling of any mainstream language: destructure success/failure tuples at every call site.
- Telemetry. Built-in instrumentation framework for metrics and observability.
- Hot code reloading in development. Instant feedback loop.
Weaknesses:
- Smaller ecosystem. Fewer libraries than Python/Node/Go. No provider SDKs — you write HTTP adapters yourself (which, honestly, is fine for API gateways; you want control over the request).
- Smaller talent pool. Hiring is harder. Finding Elixir developers takes more effort than finding Python or Go developers.
- Learning curve. Functional programming, pattern matching, and OTP concepts take time to learn if your team comes from OOP backgrounds.
- Deployment. Elixir releases are well-documented but less "push-button" than Docker + Node or Go's single binary. Mix releases and Distillery/Burrito add a learning step.
What this looks like in practice: the execution pipeline
To make this concrete, here's the simplified execution flow for a single AI request in the production system:
For async requests, the flow is:
Every step in this pipeline runs in its own BEAM process. The controller process, the Oban worker process, the PubSub broadcast process, the webhook delivery process — all isolated, all supervised, all concurrent.
Telemetry: observability built into the foundation
Phoenix ships with Telemetry — a lightweight instrumentation library that's integrated into every layer of the stack. Database queries, HTTP requests, channel operations, and custom application events all emit Telemetry events.
No external APM agent required. No monkey-patching. The BEAM VM itself reports memory usage, run queue lengths, and I/O metrics. You plug in any reporter — Prometheus, Datadog, custom dashboards — and the metrics flow automatically.
The infrastructure simplification
Here's the final picture — the full infrastructure footprint for a production AI gateway:
With Python/Node/Go:
- Application server
- PostgreSQL
- Redis (caching + job queue broker)
- Message queue (RabbitMQ/SQS for reliable job processing)
- WebSocket server (separate process or library)
- Cron scheduler (cron jobs or CloudWatch Events)
- Rate limiter state (Redis-backed)
- Encryption service or vault
With Elixir + Phoenix:
- Application server (includes WebSocket server, job processor, cron scheduler, rate limiter, encryption vault, in-memory cache)
- PostgreSQL (for data + Oban job queue)
That's it. Two components. Every other capability lives inside the BEAM runtime as a supervised process. When you deploy, you deploy one thing. When you debug, you look at one set of logs. When something crashes, the supervision tree handles it before you wake up.
When not to use Elixir
I'd be dishonest if I didn't mention the cases where Elixir is the wrong choice:
- If your team has zero functional programming experience and you're on a 2-week deadline, the learning curve will cost you more than the architectural benefits save.
- If you need deep ML/AI library integration (model training, embedding generation, vectorDB clients), the Python ecosystem is years ahead.
- If you're building only a thin proxy with no business logic — just forwarding requests — Go or even Cloudflare Workers will be simpler and faster.
- If hiring is your bottleneck, the Elixir talent pool is measurably smaller than Python, Node, or Go.
The sweet spot for Elixir is exactly the AI gateway use case: a system that's I/O-heavy, concurrent, requires fault tolerance, manages many long-lived connections, processes background jobs, and needs to stay up reliably without a complex infrastructure footprint.
Conclusion
At ModelRiver, the entire backend — API gateway, provider routing with multi-level failover, prompt caching, async job processing, WebSocket delivery, CLI tooling, webhook delivery, encrypted secrets management, metered billing, rate limiting, content moderation, telemetry, and a full observability layer — runs on Elixir + Phoenix with a single PostgreSQL database.
The stack doesn't get talked about enough in AI circles because the AI ecosystem is overwhelmingly Python-centric. But if you're building the infrastructure layer — the routing, orchestration, and reliability layer that sits between application code and LLM providers — the BEAM VM gives you primitives that other runtimes either lack or require bolting on through third-party services.
The best stack for building AI applications is Python. The best stack for building AI infrastructure might be Elixir.
If you want to see what this looks like in practice, ModelRiver's documentation walks through the system, or you can point your base_url to https://api.modelriver.com/v1 and make your first request.
