?
Quiz Verified
Why can ECE comparisons be unstable?
PostedJun 26, 2026
Question: Two researchers evaluate the same classifier and obtain substantially different Expected Calibration Error values. Which explanation is most technically plausible?
A) ECE is a strictly proper scoring rule and therefore changes whenever accuracy is unchanged
B) ECE is uniquely determined by the confusion matrix, so the discrepancy proves one implementation is incorrect
C) ECE is invariant to bin edges but sensitive only to class prevalence
D) ECE depends on estimation choices such as binning, sample size, and whether only maximum confidence or all class probabilities are evaluated
Correct: D
Explanation: ECE is an estimated summary rather than a uniquely defined population quantity in common implementations. Different bin counts, binning schemes, confidence definitions, and finite samples can produce materially different reported values and may hide within-bin miscalibration.
Topic: advanced ML / calibration metrics / ECE