Vector databases make semantic search possible — finding documents by meaning rather than exact keywords. Combined with LLMs, they power RAG (Retrieval-Augmented Generation) applications that answer questions from your own data. Here's the practical implementation.
What Vector Search Solves
Keyword search: finds documents containing the exact words.
Vector search: finds documents with similar meaning.
Query: "how do I reset my password"
Keyword search finds: "password reset instructions", "reset password page"
Vector search also finds: "account recovery", "forgot credentials", "login issues"
For documentation, customer support, and knowledge bases: vector search returns dramatically more relevant results.
The Embedding Pipeline
import OpenAI from 'openai'
const openai = new OpenAI()
async function embedText(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small', // 1536 dimensions, $0.02/1M tokens
input: text,
})
return response.data[0].embedding
}
// Or with Claude (via Voyage AI)
// voyage-3-lite: fast and cheap for large-scale indexing
Storing Vectors in Postgres with pgvector
-- Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Table with a vector column
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
metadata JSONB,
embedding vector(1536) -- dimension matches your model
);
-- Index for fast similarity search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
// Insert a document with its embedding
async function indexDocument(content: string, metadata: object) {
const embedding = await embedText(content)
await db.$executeRaw`
INSERT INTO documents (content, metadata, embedding)
VALUES (${content}, ${JSON.stringify(metadata)}, ${JSON.stringify(embedding)}::vector)
`
}
Semantic Search Query
async function semanticSearch(query: string, limit = 5) {
const queryEmbedding = await embedText(query)
const results = await db.$queryRaw<Array<{
id: number
content: string
metadata: object
similarity: number
}>>`
SELECT id, content, metadata,
1 - (embedding <=> ${JSON.stringify(queryEmbedding)}::vector) AS similarity
FROM documents
ORDER BY embedding <=> ${JSON.stringify(queryEmbedding)}::vector
LIMIT ${limit}
`
return results.filter(r => r.similarity > 0.7) // threshold
}
RAG: Answering Questions from Your Data
async function answerFromDocs(question: string): Promise<string> {
// 1. Find relevant documents
const relevantDocs = await semanticSearch(question, 5)
if (relevantDocs.length === 0) {
return 'I couldn\'t find relevant information to answer that question.'
}
// 2. Build context from retrieved documents
const context = relevantDocs
.map((doc, i) => `[${i + 1}] ${doc.content}`)
.join('
')
// 3. Ask Claude with grounding context
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-6',
system: 'Answer questions using only the provided context. If the context doesn\'t contain the answer, say so.',
messages: [{
role: 'user',
content: `Context:
${context}
Question: ${question}`,
}],
})
return response.content[0].text
}
Chunking Strategy
Document chunking significantly affects retrieval quality:
function chunkDocument(text: string, chunkSize = 500, overlap = 50): string[] {
const words = text.split(' ')
const chunks: string[] = []
for (let i = 0; i < words.length; i += chunkSize - overlap) {
chunks.push(words.slice(i, i + chunkSize).join(' '))
if (i + chunkSize >= words.length) break
}
return chunks
}
// Index each chunk separately
for (const chunk of chunkDocument(document)) {
await indexDocument(chunk, { sourceDocId: document.id })
}
Managed Options
If you don't want to manage pgvector yourself:
- Pinecone: Fully managed, generous free tier
- Qdrant: Open-source, self-hostable or cloud
- Supabase Vector: pgvector on Supabase
- Neon: pgvector on Neon (same DB as your app)
The AI SaaS Starter at whoffagents.com includes a vector search module with pgvector + Prisma, embedding pipeline, semantic search, and RAG pattern pre-built. $99 one-time.




