Temperature

Temperature is a parameter that controls how random or deterministic an AI model's outputs are. It affects the probability distribution over possible next tokens, influencing whether the model chooses the most likely token or explores less common alternatives.

Temperature scale: 0.0 (deterministic — always picks the most likely token, producing consistent but potentially repetitive output), 0.1-0.3 (very focused — slight variation, good for factual Q&A), 0.5-0.7 (balanced — good mix of accuracy and natural variation, common default), 0.8-1.0 (creative — more diverse outputs, good for brainstorming and creative writing), and 1.0+ (very creative — highly varied, may sacrifice accuracy for novelty).

When to use different temperatures: factual Q&A (low, 0.1-0.3 — accuracy matters most), customer support (low-medium, 0.3-0.5 — consistent but natural), content creation (medium-high, 0.7-0.9 — creative variation), brainstorming (high, 0.9-1.0 — diverse ideas), and code generation (low, 0.1-0.3 — precision matters).

For AI agent builders, temperature is an important configuration choice. Support agents typically use lower temperatures for consistent, reliable answers. Creative assistants use higher temperatures for varied, interesting outputs. Most platforms default to 0.7, which works well for general conversation.

Related parameters include: top_p (nucleus sampling — limits token choices to the most likely subset), top_k (limits to the k most likely tokens), and frequency/presence penalties (discourage repetition). Together with temperature, these parameters give fine-grained control over output style.

Related Terms

Large Language Model (LLM)

Tokens

Prompt Engineering

Inference

Build AI Agents Without Code