Contextual Compression
Retrieved chunks often contain irrelevant sentences or filler. Contextual compression uses an LLM to extract only the most relevant parts of each chunk, reducing noise and token usage.
Compression = filter or summarise retrieved chunks to keep only what matters.
LLM‑Chain Extractor
Uses an LLM to extract relevant statements from each chunk.
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)Embeddings Filter
Filters chunks by similarity threshold – drops chunks below a score.
from langchain.retrievers.document_compressors import EmbeddingsFilter
compressor = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.75)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)When to Use Compression
- Long chunks with irrelevant text.
- Cost reduction (fewer tokens).
- Improving answer focus.
Two Minute Drill
- Contextual compression removes irrelevant content from retrieved chunks.
- LLMChainExtractor uses an LLM to extract relevant sentences.
- EmbeddingsFilter drops low‑similarity chunks.
- Reduces token usage and improves answer quality.
Need more clarification?
Drop us an email at career@quipoinfotech.com
