LLM Generation Parameters
When calling an LLM API, you can control generation behavior with parameters like temperature, top‑p, and max tokens. These affect creativity, randomness, and length.
Temperature
Controls randomness: lower temperature = more deterministic, repetitive. Higher temperature = more creative, but also more likely to make mistakes.
- 0.0 – 0.2: factual tasks, code generation, translation.
- 0.5 – 0.8: general conversation, creative writing.
- 1.0 – 1.5: brainstorming, poetry (may become incoherent).
Top‑p (Nucleus Sampling)
Instead of considering all tokens, consider the smallest set whose cumulative probability exceeds p. For example, top‑p = 0.9 means only the top 90% probability mass is considered. This filters out very unlikely tokens. Often used together with temperature.
Max Tokens
The maximum number of tokens the model can generate in one response. This includes both input and output tokens (depending on API). Set to avoid runaway generations and control cost.
Other Parameters
- Frequency penalty: Reduces repetition by penalizing tokens that have already appeared.
- Presence penalty: Penalizes tokens that have appeared at all (regardless of count).
- Stop sequences: List of strings where generation stops (e.g., ["n", "User:"]).
Example API Call (OpenAI)
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello"}],
temperature=0.7,
max_tokens=150,
top_p=0.9,
frequency_penalty=0.5
)Two Minute Drill
- Temperature controls randomness (0 = deterministic, 1 = creative).
- Top‑p filters unlikely tokens.
- Max tokens limit response length and cost.
- Frequency/presence penalties reduce repetition.
Need more clarification?
Drop us an email at career@quipoinfotech.com
