Building Ground Truth

For reliable evaluation, you need a golden dataset (ground truth): a set of questions, expected answers, and relevant document IDs. This allows you to compute metrics like accuracy, recall, and precision.

Ground truth = human‑annotated pairs of questions and correct answers (and relevant context).

How to Build a Golden Dataset

Select a sample of documents from your knowledge base.
Write questions that can be answered from these documents.
Annotate the correct answer (and which document/chunk contains it).
Aim for at least 50‑200 examples for meaningful evaluation.

Synthetic Data Generation

Use an LLM to generate question‑answer pairs from your documents, then human‑validate a subset.

prompt = f"""Generate a question and answer based on this document chunk:n{chunk}nFormat: Question: ... Answer: ..."""

Continuous Evaluation

As your system changes (new chunking, new LLM, new retriever), re‑run evaluation on the golden dataset to detect regressions. Automate this in CI/CD.

Format for Golden Dataset

[
  {
    "question": "What is RAG?",
    "answer": "Retrieval-Augmented Generation...",
    "context_ids": ["doc1_chunk3", "doc2_chunk1"]
  }
]

Two Minute Drill

Ground truth is essential for accurate evaluation.
Create a golden dataset with questions, answers, and relevant context.
Use synthetic generation + human validation.
Automate continuous evaluation to catch regressions.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

Building Ground Truth

Need more clarification?