Max Tokens & Stop Sequences
Sometimes a model writes too much, or you want it to stop at a specific point. Two parameters control response length: max tokens and stop sequences.
Max Tokens
Max tokens limits the total number of tokens (word pieces) in the generated response, including the prompt? Usually it’s the number of new tokens generated. Once the limit is reached, the model stops.
- Set too low → incomplete sentences.
- Set too high → wastes tokens and money.
- Typical values: 50–500 for short answers, 500–2000 for essays.
OpenAI API: max_tokens=150
Cohere: max_tokens=100Stop Sequences
Stop sequences tell the model to stop generating when it encounters a specific string. This is extremely useful for getting structured outputs.
- Common stop sequences:
["n", "User:", "Question:", "###"] - In few‑shot prompting, you can use
"nn"to stop at the end of an example.
stop=["###", "nn"]Practical Tips
- Set max tokens to at least the length you expect, plus a buffer.
- Use stop sequences to enforce format (e.g., stop at newline after each bullet point).
- If the model stops mid‑sentence, increase max tokens.
Two Minute Drill
- Max tokens limits the length of generated output.
- Stop sequences tell the model when to stop early.
- Both help control cost and output format.
- Use stop sequences in multi‑turn conversations to separate turns.
Need more clarification?
Drop us an email at career@quipoinfotech.com
