Project: Simple PDF Q&A
In this project, you will build a complete RAG system that answers questions from a PDF document. You will use LangChain, Chroma (vector database), and OpenAI.
Project 1: Upload a PDF, ask questions, get answers with citations.
Step 1: Install Dependencies
pip install langchain chromadb pypdf openai streamlitStep 2: Create the RAG Script
Create a file `pdf_qa.py`:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
import os
os.environ["OPENAI_API_KEY"] = "your-key-here"
# Load PDF
loader = PyPDFLoader("document.pdf")
documents = loader.load()
# Chunk
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)
# Embed and store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
# Create QA chain
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectorstore.as_retriever())
# Ask questions
while True:
query = input("nQuestion: ")
if query.lower() == "exit": break
answer = qa_chain.run(query)
print(f"Answer: {answer}")Step 3: Run and Test
python pdf_qa.pyUpload your PDF as `document.pdf` in the same folder. Ask questions about its content.What You Learned
- Loading PDFs with `PyPDFLoader`.
- Splitting text into chunks.
- Creating embeddings and vector store.
- Building a retrieval QA chain.
Two Minute Drill
- Load PDF → chunk → embed → store in Chroma.
- Use `RetrievalQA` chain for Q&A.
- Run interactively in terminal.
- This is the foundation for any RAG system.
Need more clarification?
Drop us an email at career@quipoinfotech.com
