?
Quiz Verified
Which feature-selection workflow causes leakage?
PostedJun 24, 2026
Question: A dataset contains 50,000 features and 500 observations. Which evaluation procedure creates the most direct selection leakage?
A) Performing feature selection independently inside every training fold of nested cross-validation
B) Tuning the number of selected features in the inner loop and evaluating in untouched outer folds
C) Fitting the final selector and model on the complete training dataset after evaluation is finished
D) Selecting the 100 features most associated with the target using all observations before cross-validation
Correct: D
Explanation: Option D allows labels from future validation folds to influence which features are retained. Even though the predictive model itself is refitted inside each fold, the feature space has already been chosen using information from those held-out observations.
Topic: advanced ML / data leakage / feature selection