LLM Generation Parameters

When calling an LLM API, you can control generation behavior with parameters like temperature, top‑p, and max tokens. These affect creativity, randomness, and length.

Temperature

Controls randomness: lower temperature = more deterministic, repetitive. Higher temperature = more creative, but also more likely to make mistakes.

0.0 – 0.2: factual tasks, code generation, translation.
0.5 – 0.8: general conversation, creative writing.
1.0 – 1.5: brainstorming, poetry (may become incoherent).

Top‑p (Nucleus Sampling)

Instead of considering all tokens, consider the smallest set whose cumulative probability exceeds p. For example, top‑p = 0.9 means only the top 90% probability mass is considered. This filters out very unlikely tokens. Often used together with temperature.

Max Tokens

The maximum number of tokens the model can generate in one response. This includes both input and output tokens (depending on API). Set to avoid runaway generations and control cost.

Other Parameters

Frequency penalty: Reduces repetition by penalizing tokens that have already appeared.
Presence penalty: Penalizes tokens that have appeared at all (regardless of count).
Stop sequences: List of strings where generation stops (e.g., ["n", "User:"]).

Example API Call (OpenAI)

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello"}],
    temperature=0.7,
    max_tokens=150,
    top_p=0.9,
    frequency_penalty=0.5
)

Two Minute Drill

Temperature controls randomness (0 = deterministic, 1 = creative).
Top‑p filters unlikely tokens.
Max tokens limit response length and cost.
Frequency/presence penalties reduce repetition.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

LLM Generation Parameters

Need more clarification?