Loading

Quipoin Menu

Learn • Practice • Grow

rag / Generation LLMs
tutorial

Generation LLMs

The generation component in RAG is the LLM that produces the final answer based on the augmented prompt. You can use cloud LLMs (OpenAI, Anthropic, Cohere) or local LLMs (Ollama, LlamaCPP).

OpenAI Models

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
Temperature 0 gives more deterministic answers.

Anthropic Claude

from langchain.chat_models import ChatAnthropic

llm = ChatAnthropic(model="claude-3-haiku-20240307")

Local LLMs with Ollama

First install Ollama (ollama.ai), pull a model, then use LangChain.
ollama pull llama3
from langchain.llms import Ollama

llm = Ollama(model="llama3")

Choosing a Model

  • Cost‑sensitive, simple Q&A: GPT‑3.5‑turbo.
  • High accuracy, complex reasoning: GPT‑4 or Claude‑3.
  • Privacy, no cost: Local models (Llama 3, Mistral).


Two Minute Drill
  • Cloud LLMs: OpenAI, Anthropic, Cohere.
  • Local LLMs: Ollama, LlamaCPP (privacy, no cost).
  • Use temperature 0 for factual Q&A.
  • Local models require more RAM/GPU but give full control.

Need more clarification?

Drop us an email at career@quipoinfotech.com