How LLMs Generate Text

Now that we understand tokenization, embeddings, attention, and transformers, we can see the complete text generation process step by step.

Step‑by‑Step Generation

1. Input prompt: User types "The capital of France is"
2. Tokenization: Convert into token IDs.
3. Embedding + Positional Encoding: Add position information.
4. Transformer decoder: Pass through many layers of attention and feed‑forward networks.
5. Output logits: For each token in vocabulary, a raw score.
6. Probability distribution: Convert logits to probabilities (softmax).
7. Sampling: Pick next token (e.g., "Paris") based on temperature.
8. Append and repeat: Add "Paris" to context, repeat steps 3‑7 until stop token or max length.

Input: "The capital of France is"
→ Model predicts: "Paris" (probability 0.95)
New input: "The capital of France is Paris"
→ Model predicts: "." (probability 0.98)
→ Done.

Temperature and Sampling Recap

Temperature = 0 (greedy): always pick highest probability token. Deterministic, but repetitive.
Temperature = 1: sample according to original probabilities.
Temperature > 1: flattens distribution, more randomness.

Top‑p (nucleus sampling) picks the smallest set of tokens whose cumulative probability exceeds p (e.g., 0.9). This avoids very low probability tokens.

Why Multiple Steps?

The model has no internal state that persists across steps (except the context). Each step re‑processes the entire sequence. This is computationally expensive but allows the model to revise understanding as new tokens arrive.

Two Minute Drill

Generation is iterative: predict one token, append, repeat.
Each step uses the full previous context.
Temperature and top‑p control randomness.
The model never sees the future – only past tokens.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

How LLMs Generate Text

Need more clarification?