Project: Simple PDF Q&A

In this project, you will build a complete RAG system that answers questions from a PDF document. You will use LangChain, Chroma (vector database), and OpenAI.

Project 1: Upload a PDF, ask questions, get answers with citations.

Step 1: Install Dependencies

pip install langchain chromadb pypdf openai streamlit

Step 2: Create the RAG Script

Create a file `pdf_qa.py`:

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
import os

os.environ["OPENAI_API_KEY"] = "your-key-here"

# Load PDF
loader = PyPDFLoader("document.pdf")
documents = loader.load()

# Chunk
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)

# Embed and store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)

# Create QA chain
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectorstore.as_retriever())

# Ask questions
while True:
    query = input("nQuestion: ")
    if query.lower() == "exit": break
    answer = qa_chain.run(query)
    print(f"Answer: {answer}")

Step 3: Run and Test

python pdf_qa.py

Upload your PDF as `document.pdf` in the same folder. Ask questions about its content.

What You Learned

Loading PDFs with `PyPDFLoader`.
Splitting text into chunks.
Creating embeddings and vector store.
Building a retrieval QA chain.

Two Minute Drill

Load PDF → chunk → embed → store in Chroma.
Use `RetrievalQA` chain for Q&A.
Run interactively in terminal.
This is the foundation for any RAG system.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

Project: Simple PDF Q&A

Need more clarification?