Similarity Search
Similarity search finds the most relevant document chunks given a query vector. The choice of similarity metric affects retrieval quality.
Cosine Similarity
Measures the cosine of the angle between two vectors. Range [-1, 1]. 1 = identical direction. Most common for embeddings.
cosine_similarity = (A·B) / (||A|| * ||B||)Dot Product (Inner Product)
Measures magnitude and direction. Works best when vectors are normalized (unit vectors). Many embedding models already produce normalized vectors.
Euclidean Distance (L2)
Straight‑line distance between vectors. Smaller distance = more similar. Less common for dense embeddings but still valid.
How to Choose?
- If your embedding model is normalized (e.g., OpenAI, many sentence‑transformers), cosine similarity and dot product are equivalent.
- Cosine similarity is the default for most RAG systems.
Two Minute Drill
- Cosine similarity measures angle between vectors.
- Dot product works with normalized vectors.
- Euclidean distance measures straight‑line distance.
- Cosine similarity is the most common choice for RAG.
Need more clarification?
Drop us an email at career@quipoinfotech.com
