Quiz Verified

How does AdamW differ from Adam with L2 regularization?

Anonymous

PostedJun 23, 2026

Question: What is the defining distinction between AdamW and adding an L2 penalty to the objective optimized by Adam? A) AdamW applies L2 regularization only to parameters whose second-moment estimate exceeds a threshold B) AdamW incorporates the regularization gradient into Adam's adaptive moment normalization C) AdamW removes the first-moment estimate when applying parameter shrinkage D) AdamW applies weight decay separately from the adaptive loss-gradient update Correct: D Explanation: In AdamW, parameter shrinkage is decoupled from the gradient of the training objective. With an L2 penalty inside the loss, the regularization gradient passes through Adam's coordinate-wise adaptive scaling, so it is generally not equivalent to ordinary weight decay. Topic: advanced ML / optimization / AdamW

How does AdamW differ from Adam with L2 regularization?

More quiz intel