Loading

Quipoin Menu

Learn • Practice • Grow

rag / Building Ground Truth
tutorial

Building Ground Truth

For reliable evaluation, you need a golden dataset (ground truth): a set of questions, expected answers, and relevant document IDs. This allows you to compute metrics like accuracy, recall, and precision.

Ground truth = human‑annotated pairs of questions and correct answers (and relevant context).

How to Build a Golden Dataset

  1. Select a sample of documents from your knowledge base.
  2. Write questions that can be answered from these documents.
  3. Annotate the correct answer (and which document/chunk contains it).
  4. Aim for at least 50‑200 examples for meaningful evaluation.

Synthetic Data Generation

Use an LLM to generate question‑answer pairs from your documents, then human‑validate a subset.
prompt = f"""Generate a question and answer based on this document chunk:n{chunk}nFormat: Question: ... Answer: ..."""

Continuous Evaluation

As your system changes (new chunking, new LLM, new retriever), re‑run evaluation on the golden dataset to detect regressions. Automate this in CI/CD.

Format for Golden Dataset

[
{
"question": "What is RAG?",
"answer": "Retrieval-Augmented Generation...",
"context_ids": ["doc1_chunk3", "doc2_chunk1"]
}
]


Two Minute Drill
  • Ground truth is essential for accurate evaluation.
  • Create a golden dataset with questions, answers, and relevant context.
  • Use synthetic generation + human validation.
  • Automate continuous evaluation to catch regressions.

Need more clarification?

Drop us an email at career@quipoinfotech.com