Loading

Quipoin Menu

Learn • Practice • Grow

rag / Project: Simple PDF Q&A
tutorial

Project: Simple PDF Q&A

In this project, you will build a complete RAG system that answers questions from a PDF document. You will use LangChain, Chroma (vector database), and OpenAI.

Project 1: Upload a PDF, ask questions, get answers with citations.

Step 1: Install Dependencies

pip install langchain chromadb pypdf openai streamlit

Step 2: Create the RAG Script

Create a file `pdf_qa.py`:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
import os

os.environ["OPENAI_API_KEY"] = "your-key-here"

# Load PDF
loader = PyPDFLoader("document.pdf")
documents = loader.load()

# Chunk
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)

# Embed and store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)

# Create QA chain
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectorstore.as_retriever())

# Ask questions
while True:
query = input("nQuestion: ")
if query.lower() == "exit": break
answer = qa_chain.run(query)
print(f"Answer: {answer}")

Step 3: Run and Test

python pdf_qa.py
Upload your PDF as `document.pdf` in the same folder. Ask questions about its content.

What You Learned

  • Loading PDFs with `PyPDFLoader`.
  • Splitting text into chunks.
  • Creating embeddings and vector store.
  • Building a retrieval QA chain.


Two Minute Drill
  • Load PDF → chunk → embed → store in Chroma.
  • Use `RetrievalQA` chain for Q&A.
  • Run interactively in terminal.
  • This is the foundation for any RAG system.

Need more clarification?

Drop us an email at career@quipoinfotech.com