Embeddings and Vector Search: Prerequisite Knowledge to Understand RAG

AI Navigate Original / 4/27/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical Usage
共有:

Key Points

  • Embeddings map text to vectors; close meaning = close in space
  • Vector search beats keyword on synonyms/concepts; weak on exact match
  • Cosine similarity standard; many embedding models and vector DBs
  • RAG: chunk→embed→store→retrieve top K; pgvector lowest barrier

What Is an Embedding

Technology converting text or words into a vector (a sequence of numbers) of hundreds to thousands of dimensions. Close-meaning text has the property of being close in vector space.

Image

Explained with a 3-dimensional example:

  • "dog" → [0.8, 0.1, -0.5]
  • "cat" → [0.7, 0.2, -0.4] (close to dog)
  • "car" → [-0.3, 0.6, 0.9] (far from dog)

Actual models are 768-3072 dimensions etc. OpenAI's text-embedding-3-small is 1536-dim.

Difference from Keyword Search

MethodEx: "I want to buy a car"
Keyword searchDocuments containing "car," "buy"
Vector search"automobile purchase," "want a car" also similar

Vector search is strong on notation inconsistency, synonyms, concept matching. Weak where keyword match is needed (product codes, proper nouns).

Computing Similarity

Methods to measure similarity between 2 vectors:

  • Cosine similarity: closeness of vector direction. -1 to 1. Most used
  • Euclidean distance: physical distance. 0 means a match

Sign up to read the full article

Create a free account to access the full content of our original articles.

Embeddings and Vector Search: Prerequisite Knowledge to Understand RAG | AI Navigate