Quiz Verified

Why does temperature scaling preserve top-1 predictions?

Anonymous

PostedJun 26, 2026

Question: For multiclass logits z and a learned scalar temperature T>0, temperature scaling computes softmax(z/T). Why does this normally preserve the predicted class? A) Dividing every logit by the same positive scalar preserves their ordering and therefore their argmax B) Softmax probabilities are invariant to every positive scaling of their logits C) Temperature scaling changes only the bias term of the final layer D) The learned temperature is constrained to equal one whenever the classifier is accurate Correct: A Explanation: Positive scalar division does not change which logit is largest, so the top-1 class remains unchanged. However, it changes the gaps between logits after softmax and therefore changes confidence and calibration. Topic: advanced ML / calibration / temperature scaling

Why does temperature scaling preserve top-1 predictions?

More quiz intel