Documentation

Pinecone + ModelRiver

Generate embeddings through ModelRiver with automatic failover. Store and query vectors in Pinecone for production-scale RAG.

Overview

Pinecone is a fully managed vector database optimised for AI applications. Combined with ModelRiver's embedding workflows, you can build resilient RAG systems where both embedding generation and LLM queries are protected by automatic failover.


Quick start

Install dependencies

Bash
pip install pinecone openai

Setup

PYTHON
1from pinecone import Pinecone, ServerlessSpec
2from openai import OpenAI
3 
4# ModelRiver client for embeddings + chat
5client = OpenAI(
6 base_url="https://api.modelriver.com/v1",
7 api_key="mr_live_YOUR_API_KEY",
8)
9 
10# Pinecone client
11pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
12 
13# Create index (one-time)
14pc.create_index(
15 name="knowledge-base",
16 dimension=1536, # Match your embedding model's dimensions
17 metric="cosine",
18 spec=ServerlessSpec(cloud="aws", region="us-east-1"),
19)
20 
21index = pc.Index("knowledge-base")

Ingest documents

PYTHON
1def embed_and_upsert(texts: list[str], ids: list[str], metadata: list[dict] = None):
2 """Generate embeddings via ModelRiver and store in Pinecone."""
3 response = client.embeddings.create(
4 model="my-embedding-workflow", # ModelRiver embedding workflow
5 input=texts,
6 )
7 
8 vectors = []
9 for i, embedding in enumerate(response.data):
10 vector = {
11 "id": ids[i],
12 "values": embedding.embedding,
13 "metadata": metadata[i] if metadata else {"text": texts[i]},
14 }
15 vectors.append(vector)
16 
17 index.upsert(vectors=vectors)
18 return len(vectors)
19 
20# Ingest documents
21documents = [
22 "ModelRiver routes AI requests across multiple providers.",
23 "Workflows configure provider, model, and fallback settings.",
24 "Structured outputs guarantee JSON schema compliance.",
25 "Request Logs provide complete observability for every request.",
26]
27 
28ids = [f"doc-{i}" for i in range(len(documents))]
29metadata = [{"text": doc, "source": "docs"} for doc in documents]
30 
31embed_and_upsert(documents, ids, metadata)

Query (RAG)

PYTHON
1def ask(question: str, top_k: int = 3) -> str:
2 """Query Pinecone for context, then generate an answer via ModelRiver."""
3 
4 # 1. Embed the question
5 query_embedding = client.embeddings.create(
6 model="my-embedding-workflow",
7 input=[question],
8 ).data[0].embedding
9 
10 # 2. Search Pinecone
11 results = index.query(vector=query_embedding, top_k=top_k, include_metadata=True)
12 
13 # 3. Build context
14 context = "\n\n".join([match.metadata["text"] for match in results.matches])
15 
16 # 4. Generate answer via ModelRiver
17 response = client.chat.completions.create(
18 model="my-chat-workflow",
19 messages=[
20 {"role": "system", "content": f"Answer based on this context:\n\n{context}"},
21 {"role": "user", "content": question},
22 ],
23 )
24 
25 return response.choices[0].message.content
26 
27answer = ask("How does ModelRiver handle failover?")
28print(answer)

Batch ingestion

PYTHON
1def batch_embed_and_upsert(texts: list[str], batch_size: int = 100):
2 """Process large document sets in batches."""
3 for i in range(0, len(texts), batch_size):
4 batch = texts[i : i + batch_size]
5 ids = [f"doc-{i+j}" for j in range(len(batch))]
6 metadata = [{"text": t} for t in batch]
7 embed_and_upsert(batch, ids, metadata)
8 print(f"Processed {i + len(batch)} / {len(texts)}")

Best practices

  1. Use a dedicated embedding workflow: Separate from your chat workflow for independent scaling
  2. Batch embeddings: Process documents in batches of 100 to reduce API calls
  3. Store text in metadata: Include the original text for retrieval without extra DB lookups
  4. Monitor embedding costs: Large ingestion jobs can generate significant token usage
  5. Match dimensions: Ensure your Pinecone index dimension matches your embedding model

Next steps