Pinecone Integration – ModelRiver Docs

Overview

Pinecone is a fully managed vector database optimised for AI applications. Combined with ModelRiver's embedding workflows, you can build resilient RAG systems where both embedding generation and LLM queries are protected by automatic failover.

Quick start

Install dependencies

Bash

pip install pinecone openai

Setup

PYTHON

1from pinecone import Pinecone, ServerlessSpec
2from openai import OpenAI
3 
4# ModelRiver client for embeddings + chat
5client = OpenAI(
6    base_url="https://api.modelriver.com/v1",
7    api_key="mr_live_YOUR_API_KEY",
8)
9 
10# Pinecone client
11pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
12 
13# Create index (one-time)
14pc.create_index(
15    name="knowledge-base",
16    dimension=1536,  # Match your embedding model's dimensions
17    metric="cosine",
18    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
19)
20 
21index = pc.Index("knowledge-base")

Ingest documents

PYTHON

1def embed_and_upsert(texts: list[str], ids: list[str], metadata: list[dict] = None):
2    """Generate embeddings via ModelRiver and store in Pinecone."""
3    response = client.embeddings.create(
4        model="my-embedding-workflow",  # ModelRiver embedding workflow
5        input=texts,
6    )
7 
8    vectors = []
9    for i, embedding in enumerate(response.data):
10        vector = {
11            "id": ids[i],
12            "values": embedding.embedding,
13            "metadata": metadata[i] if metadata else {"text": texts[i]},
14        }
15        vectors.append(vector)
16 
17    index.upsert(vectors=vectors)
18    return len(vectors)
19 
20# Ingest documents
21documents = [
22    "ModelRiver routes AI requests across multiple providers.",
23    "Workflows configure provider, model, and fallback settings.",
24    "Structured outputs guarantee JSON schema compliance.",
25    "Request Logs provide complete observability for every request.",
26]
27 
28ids = [f"doc-{i}" for i in range(len(documents))]
29metadata = [{"text": doc, "source": "docs"} for doc in documents]
30 
31embed_and_upsert(documents, ids, metadata)

Query (RAG)

PYTHON

1def ask(question: str, top_k: int = 3) -> str:
2    """Query Pinecone for context, then generate an answer via ModelRiver."""
3 
4    # 1. Embed the question
5    query_embedding = client.embeddings.create(
6        model="my-embedding-workflow",
7        input=[question],
8    ).data[0].embedding
9 
10    # 2. Search Pinecone
11    results = index.query(vector=query_embedding, top_k=top_k, include_metadata=True)
12 
13    # 3. Build context
14    context = "\n\n".join([match.metadata["text"] for match in results.matches])
15 
16    # 4. Generate answer via ModelRiver
17    response = client.chat.completions.create(
18        model="my-chat-workflow",
19        messages=[
20            {"role": "system", "content": f"Answer based on this context:\n\n{context}"},
21            {"role": "user", "content": question},
22        ],
23    )
24 
25    return response.choices[0].message.content
26 
27answer = ask("How does ModelRiver handle failover?")
28print(answer)

Batch ingestion

PYTHON

1def batch_embed_and_upsert(texts: list[str], batch_size: int = 100):
2    """Process large document sets in batches."""
3    for i in range(0, len(texts), batch_size):
4        batch = texts[i : i + batch_size]
5        ids = [f"doc-{i+j}" for j in range(len(batch))]
6        metadata = [{"text": t} for t in batch]
7        embed_and_upsert(batch, ids, metadata)
8        print(f"Processed {i + len(batch)} / {len(texts)}")

Best practices

Use a dedicated embedding workflow: Separate from your chat workflow for independent scaling
Batch embeddings: Process documents in batches of 100 to reduce API calls
Store text in metadata: Include the original text for retrieval without extra DB lookups
Monitor embedding costs: Large ingestion jobs can generate significant token usage
Match dimensions: Ensure your Pinecone index dimension matches your embedding model

Next steps

Weaviate integration: Open-source alternative
LlamaIndex integration: Framework with built-in Pinecone support
API reference: Endpoint documentation

Pinecone + ModelRiver

Overview

Quick start

Install dependencies

Setup

Ingest documents

Query (RAG)

Batch ingestion

Best practices

Next steps