What Is an Embedding
Technology converting text or words into a vector (a sequence of numbers) of hundreds to thousands of dimensions. Close-meaning text has the property of being close in vector space.
Image
Explained with a 3-dimensional example:
- "dog" → [0.8, 0.1, -0.5]
- "cat" → [0.7, 0.2, -0.4] (close to dog)
- "car" → [-0.3, 0.6, 0.9] (far from dog)
Actual models are 768-3072 dimensions etc. OpenAI's text-embedding-3-small is 1536-dim.
Difference from Keyword Search
| Method | Ex: "I want to buy a car" |
|---|---|
| Keyword search | Documents containing "car," "buy" |
| Vector search | "automobile purchase," "want a car" also similar |
Vector search is strong on notation inconsistency, synonyms, concept matching. Weak where keyword match is needed (product codes, proper nouns).
Computing Similarity
Methods to measure similarity between 2 vectors:
- Cosine similarity: closeness of vector direction. -1 to 1. Most used
- Euclidean distance: physical distance. 0 means a match



