Overview
Pinecone is a fully managed vector database optimised for AI applications. Combined with ModelRiver's embedding workflows, you can build resilient RAG systems where both embedding generation and LLM queries are protected by automatic failover.
Quick start
Install dependencies
Bash
pip install pinecone openaiSetup
PYTHON
1from pinecone import Pinecone, ServerlessSpec2from openai import OpenAI3 4# ModelRiver client for embeddings + chat5client = OpenAI(6 base_url="https://api.modelriver.com/v1",7 api_key="mr_live_YOUR_API_KEY",8)9 10# Pinecone client11pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")12 13# Create index (one-time)14pc.create_index(15 name="knowledge-base",16 dimension=1536, # Match your embedding model's dimensions17 metric="cosine",18 spec=ServerlessSpec(cloud="aws", region="us-east-1"),19)20 21index = pc.Index("knowledge-base")Ingest documents
PYTHON
1def embed_and_upsert(texts: list[str], ids: list[str], metadata: list[dict] = None):2 """Generate embeddings via ModelRiver and store in Pinecone."""3 response = client.embeddings.create(4 model="my-embedding-workflow", # ModelRiver embedding workflow5 input=texts,6 )7 8 vectors = []9 for i, embedding in enumerate(response.data):10 vector = {11 "id": ids[i],12 "values": embedding.embedding,13 "metadata": metadata[i] if metadata else {"text": texts[i]},14 }15 vectors.append(vector)16 17 index.upsert(vectors=vectors)18 return len(vectors)19 20# Ingest documents21documents = [22 "ModelRiver routes AI requests across multiple providers.",23 "Workflows configure provider, model, and fallback settings.",24 "Structured outputs guarantee JSON schema compliance.",25 "Request Logs provide complete observability for every request.",26]27 28ids = [f"doc-{i}" for i in range(len(documents))]29metadata = [{"text": doc, "source": "docs"} for doc in documents]30 31embed_and_upsert(documents, ids, metadata)Query (RAG)
PYTHON
1def ask(question: str, top_k: int = 3) -> str:2 """Query Pinecone for context, then generate an answer via ModelRiver."""3 4 # 1. Embed the question5 query_embedding = client.embeddings.create(6 model="my-embedding-workflow",7 input=[question],8 ).data[0].embedding9 10 # 2. Search Pinecone11 results = index.query(vector=query_embedding, top_k=top_k, include_metadata=True)12 13 # 3. Build context14 context = "\n\n".join([match.metadata["text"] for match in results.matches])15 16 # 4. Generate answer via ModelRiver17 response = client.chat.completions.create(18 model="my-chat-workflow",19 messages=[20 {"role": "system", "content": f"Answer based on this context:\n\n{context}"},21 {"role": "user", "content": question},22 ],23 )24 25 return response.choices[0].message.content26 27answer = ask("How does ModelRiver handle failover?")28print(answer)Batch ingestion
PYTHON
1def batch_embed_and_upsert(texts: list[str], batch_size: int = 100):2 """Process large document sets in batches."""3 for i in range(0, len(texts), batch_size):4 batch = texts[i : i + batch_size]5 ids = [f"doc-{i+j}" for j in range(len(batch))]6 metadata = [{"text": t} for t in batch]7 embed_and_upsert(batch, ids, metadata)8 print(f"Processed {i + len(batch)} / {len(texts)}")Best practices
- Use a dedicated embedding workflow: Separate from your chat workflow for independent scaling
- Batch embeddings: Process documents in batches of 100 to reduce API calls
- Store text in metadata: Include the original text for retrieval without extra DB lookups
- Monitor embedding costs: Large ingestion jobs can generate significant token usage
- Match dimensions: Ensure your Pinecone index dimension matches your embedding model
Next steps
- Weaviate integration: Open-source alternative
- LlamaIndex integration: Framework with built-in Pinecone support
- API reference: Endpoint documentation