Loading

Quipoin Menu

Learn • Practice • Grow

rag / Reranking
tutorial

Reranking

Initial retrieval (e.g., top‑k = 20) is fast but may have noise. Reranking uses a more accurate (but slower) model to reorder the top results, significantly improving precision.

Reranking: retrieve many candidates (e.g., 20), then use a cross‑encoder to score and reorder the most relevant few (e.g., top 4).

Cross‑Encoders vs Bi‑Encoders

Bi‑encoders (embedding models) are fast but treat query and document independently. Cross‑encoders process query+document together, giving higher accuracy but slower. Use bi‑encoder for initial retrieval, cross‑encoder for reranking.

Implementation with LangChain

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker

compressor = CrossEncoderReranker(model="BAAI/bge-reranker-base", top_n=4)
reranker = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=vector_retriever
)

Popular Reranking Models

  • BAAI/bge-reranker-base – good balance, free.
  • Cohere Rerank – API, high quality.
  • cross-encoder/ms-marco-MiniLM-L-6-v2 – small, fast.

When to Use Reranking

When precision is critical (e.g., legal, medical Q&A). Adds latency but dramatically improves answer quality.


Two Minute Drill
  • Reranking reorders retrieved documents using a more accurate model.
  • Cross‑encoders are slower but more accurate than bi‑encoders.
  • Use `ContextualCompressionRetriever` with `CrossEncoderReranker`.
  • Improves precision for critical applications.

Need more clarification?

Drop us an email at career@quipoinfotech.com