Reranking
Initial retrieval (e.g., top‑k = 20) is fast but may have noise. Reranking uses a more accurate (but slower) model to reorder the top results, significantly improving precision.
Reranking: retrieve many candidates (e.g., 20), then use a cross‑encoder to score and reorder the most relevant few (e.g., top 4).
Cross‑Encoders vs Bi‑Encoders
Bi‑encoders (embedding models) are fast but treat query and document independently. Cross‑encoders process query+document together, giving higher accuracy but slower. Use bi‑encoder for initial retrieval, cross‑encoder for reranking.
Implementation with LangChain
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
compressor = CrossEncoderReranker(model="BAAI/bge-reranker-base", top_n=4)
reranker = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=vector_retriever
)Popular Reranking Models
- BAAI/bge-reranker-base – good balance, free.
- Cohere Rerank – API, high quality.
- cross-encoder/ms-marco-MiniLM-L-6-v2 – small, fast.
When to Use Reranking
When precision is critical (e.g., legal, medical Q&A). Adds latency but dramatically improves answer quality.
Two Minute Drill
- Reranking reorders retrieved documents using a more accurate model.
- Cross‑encoders are slower but more accurate than bi‑encoders.
- Use `ContextualCompressionRetriever` with `CrossEncoderReranker`.
- Improves precision for critical applications.
Need more clarification?
Drop us an email at career@quipoinfotech.com
