Hybrid Search
Vector search (dense retrieval) is great for semantic similarity but may miss exact keyword matches. Keyword search (BM25) is excellent for exact terms but lacks semantic understanding. Hybrid search combines both, giving the best of both worlds.
Hybrid search = vector similarity + keyword matching (BM25) combined using Reciprocal Rank Fusion (RRF).
Why Hybrid Search?
Some queries need exact terms (e.g., product code "XYZ-123"). Others need semantic meaning (e.g., "best laptop for programming"). Hybrid search handles both by combining scores.
Implementation with LangChain
from langchain.retrievers import EnsembleRetriever
from langchain.retrievers import BM25Retriever
# Keyword retriever (BM25)
bm25_retriever = BM25Retriever.from_documents(documents)
# Vector retriever
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
# Ensemble (hybrid)
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, vector_retriever],
weights=[0.5, 0.5]
)Reciprocal Rank Fusion (RRF)
RRF combines rankings from multiple retrievers without needing score normalization. Formula: `score = sum(1 / (k + rank))` for each retriever. LangChain's `EnsembleRetriever` uses RRF by default.
When to Use Hybrid Search
- Documents contain codes, names, or proper nouns.
- Queries mix exact terms and semantic meaning.
- You need high recall across different query types.
Two Minute Drill
- Hybrid search = vector search + keyword (BM25) search.
- LangChain's `EnsembleRetriever` combines retrievers.
- RRF (Reciprocal Rank Fusion) merges rankings.
- Use when queries need both exact and semantic matches.
Need more clarification?
Drop us an email at career@quipoinfotech.com
