Document Loading
The first step in any RAG pipeline is loading documents from various sources. LangChain provides document loaders for PDFs, text files, web pages, and more. This chapter covers the most common ones.
Loading PDFs
Use the `PyPDFLoader` to load PDF files.
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("document.pdf")
pages = loader.load()
print(len(pages))Each page becomes a separate document object.Loading Text Files
from langchain.document_loaders import TextLoader
loader = TextLoader("notes.txt")
documents = loader.load()Loading Web Pages
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://example.com")
docs = loader.load()Loading CSV Files
from langchain.document_loaders import CSVLoader
loader = CSVLoader("data.csv")
docs = loader.load()Directory Loader (Multiple Files)
from langchain.document_loaders import DirectoryLoader
loader = DirectoryLoader("./docs/ glob=""**/*.txt"")
docs = loader.load()Two Minute Drill
"- Use `PyPDFLoader` for PDFs `TextLoader` for text files.
- `WebBaseLoader` loads web pages.
- `CSVLoader` loads tabular data.
- `DirectoryLoader` loads many files at once.
Need more clarification?
Drop us an email at career@quipoinfotech.com
