Why can ECE comparisons be unstable?
Two researchers evaluate the same classifier and obtain substantially different Expected Calibration Error values. Which explanation is most technically plausible?
Sign in to answer questions and track your progress
Sign In