Generation LLMs
The generation component in RAG is the LLM that produces the final answer based on the augmented prompt. You can use cloud LLMs (OpenAI, Anthropic, Cohere) or local LLMs (Ollama, LlamaCPP).
OpenAI Models
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)Temperature 0 gives more deterministic answers.Anthropic Claude
from langchain.chat_models import ChatAnthropic
llm = ChatAnthropic(model="claude-3-haiku-20240307")Local LLMs with Ollama
First install Ollama (ollama.ai), pull a model, then use LangChain.
ollama pull llama3from langchain.llms import Ollama
llm = Ollama(model="llama3")Choosing a Model
- Cost‑sensitive, simple Q&A: GPT‑3.5‑turbo.
- High accuracy, complex reasoning: GPT‑4 or Claude‑3.
- Privacy, no cost: Local models (Llama 3, Mistral).
Two Minute Drill
- Cloud LLMs: OpenAI, Anthropic, Cohere.
- Local LLMs: Ollama, LlamaCPP (privacy, no cost).
- Use temperature 0 for factual Q&A.
- Local models require more RAM/GPU but give full control.
Need more clarification?
Drop us an email at career@quipoinfotech.com
