How can label smoothing affect knowledge distillation?

A teacher network is trained using substantial label smoothing and is later used for knowledge distillation. Which effect is most consistent with published findings?