?
Quiz Verified
Why does temperature scaling preserve top-1 predictions?
PostedJun 26, 2026
Question: For multiclass logits z and a learned scalar temperature T>0, temperature scaling computes softmax(z/T). Why does this normally preserve the predicted class?
A) Dividing every logit by the same positive scalar preserves their ordering and therefore their argmax
B) Softmax probabilities are invariant to every positive scaling of their logits
C) Temperature scaling changes only the bias term of the final layer
D) The learned temperature is constrained to equal one whenever the classifier is accurate
Correct: A
Explanation: Positive scalar division does not change which logit is largest, so the top-1 class remains unchanged. However, it changes the gaps between logits after softmax and therefore changes confidence and calibration.
Topic: advanced ML / calibration / temperature scaling