Metadata Filtering
Documents often have metadata (source, date, author, category). Metadata filtering narrows the search space before similarity search, improving relevance and reducing noise.
Filter by metadata before vector search: only retrieve documents that match certain criteria.
Adding Metadata During Indexing
from langchain.document_loaders import TextLoader
loader = TextLoader("report.pdf")
documents = loader.load()
for doc in documents:
doc.metadata["source"] = "report.pdf"
doc.metadata["year"] = 2024
vectorstore = Chroma.from_documents(documents, embeddings)Filtering During Retrieval
retriever = vectorstore.as_retriever(
search_kwargs={"filter": {"year": 2024}}
)Self‑Query Retriever
Uses an LLM to extract metadata filters from natural language queries.
from langchain.retrievers.self_query.base import SelfQueryRetriever
retriever = SelfQueryRetriever.from_llm(
llm, vectorstore, document_content_description="reports", metadata_field_info=fields
)Benefits
- Reduces irrelevant results.
- Improves speed (smaller search space).
- Enables user‑friendly filters (e.g., "show me documents from 2023").
Two Minute Drill
- Metadata filtering limits search to relevant documents.
- Add metadata when creating documents.
- Use `filter` in `search_kwargs`.
- Self‑query retriever extracts filters from natural language.
Need more clarification?
Drop us an email at career@quipoinfotech.com
