White RoomNEW

Why does temperature scaling preserve top-1 predictions?

For multiclass logits z and a learned scalar temperature T>0, temperature scaling computes softmax(z/T). Why does this normally preserve the predicted class?