How can label smoothing affect knowledge distillation?
A teacher network is trained using substantial label smoothing and is later used for knowledge distillation. Which effect is most consistent with published findings?
Sign in to answer questions and track your progress
Sign In