Parent Document Retrieval
Small chunks are good for retrieval precision but may lose context. Large chunks have context but may include noise. Parent Document Retrieval (also called Small‑to‑Large) solves this: retrieve small chunks (child) but return larger parent chunks (or whole document) for generation.
Index small chunks for retrieval; return larger parent chunks (or full document) to the LLM.
Implementation with LangChain
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain.text_splitter import RecursiveCharacterTextSplitter
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=500)
retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=InMemoryStore(),
child_splitter=child_splitter,
parent_splitter=parent_splitter
)
retriever.add_documents(documents)How It Works
1. Split documents into parent chunks (large) and child chunks (small).
2. Embed and index child chunks for retrieval.
3. Store parent chunks in a document store (docstore).
4. When a child chunk is retrieved, return its associated parent chunk to the LLM.
Benefits
- High retrieval precision (small chunks).
- Rich context for generation (large chunks).
- Ideal for long documents.
Two Minute Drill
- Parent Document Retrieval = retrieve small, return large.
- Child chunks for retrieval, parent chunks for context.
- Use `ParentDocumentRetriever` from LangChain.
- Improves context without sacrificing retrieval precision.
Need more clarification?
Drop us an email at career@quipoinfotech.com
