Return by Death
In the series Re:Zero - Starting Life in Another World, the main character Natsuki Subaru dies countless times, utilizing his "Return by Death" ability to return to the past. With each iteration, he uses knowledge of future events to adjust his actions and achieve an optimal outcome.
Similarly, gradient descent is an iterative optimization process. Like Subaru's repeated attempts, the algorithm starts with poor initial parameters and "dies" (produces high error). Through repeated iterations, it uses knowledge of past errors (gradients) to adjust its parameters, converging toward an optimal prediction model.
Implement PolynomialRegression, a polynomial regression model of degree d trained via gradient descent that iteratively learns from its errors to achieve optimal predictions.

Constructor
PolynomialRegression(degree, learning_rate, max_iterations, tolerance)
degreeis the polynomial degreed. The model fits a polynomial of the form:learning_rateis the step size used during gradient descent parameter updates.max_iterationsis the maximum number of gradient descent steps to perform.toleranceis the early stopping threshold. If the change in loss between two consecutive iterations is less thantolerance, training stops.
Methods
fit(X, y)
Xis a container of training input values (scalars).X[i]is the input value for samplei.
yis a container of target values (continuous).y[i]is the target for samplei.
- Before training, normalize the input feature
xto have mean0and standard deviation1across the training set. Store the mean and standard deviation so that future predictions can be transformed into the same space. - Construct polynomial features internally: for each training sample with normalized input
x', generate features[1, x', (x')^2, ..., (x')^d].
predict(x)
xis a scalar input value representing one sample.- Normalize
xusing the mean and standard deviation computed duringfit. - Construct polynomial features
[1, x', (x')^2, ..., (x')^d]wherex'is the normalized input. - Return the predicted value:
loss_history()
- Returns the list of MSE loss values, one per iteration performed during training (including the initial loss before any update).
Training Rules
The model learns a parameter vector θ = [θ₀, θ₁, θ₂, ..., θₐ] where d+1 is the total number of parameters.
Initialize all parameters to 0.
At each iteration:
- Compute predictions for all training samples using the current parameters.
- Calculate the MSE loss.
- Compute gradients for all parameters with respect to the MSE loss.
- Update all parameters simultaneously using gradient descent.
Do not update parameters one at a time — all updates in a single iteration use the gradients computed from the same set of predictions.
Training stops when either max_iterations is reached or the absolute change in loss between two consecutive iterations is less than tolerance.
Gradient Descent Updates
For each parameter θⱼ where j ∈ {0, 1, 2, ..., d}, the gradient is:
where x'ᵢ is the normalized input for sample i, and (x'ᵢ)⁰ = 1.
The update rule is:
Loss Function
For predictions and targets over samples:
Input Normalization
Compute across the training set:
Replace each input value with:
If (i.e., all inputs are identical), set all normalized values to 0.
Notes
- You may not use any polynomial regression libraries or utilities. Compute all gradients and updates manually.
- You may not use the closed-form (normal equation) solution.
- Polynomial features can grow rapidly in magnitude. Normalization is critical to prevent numerical instability.
- Answers within
±0.01of the expected value will be marked as correct.
Input Format
The first line contains 5 values, in order:
N: the number of training samplesd: the polynomial degreelearning_ratemax_iterationstolerance
The next N lines each contain two floating-point numbers x y, representing one training sample.
The next line contains a single integer Q, the number of prediction queries.
The final Q lines each contain a single floating-point number, representing the input values to predict on.
Output Format
Output Q floating-point numbers, one per line, each rounded to 4 decimal places. These are the predicted target values for each query sample.
Examples
5 2 0.01 10000 1e-6
1.0 2.5
2.0 7.0
3.0 14.5
4.0 25.0
5.0 38.5
3
1.5
3.5
6.04.2500
19.2500
59.50005 1 0.01 10000 1e-6
1.0 3.0
2.0 5.0
3.0 7.0
4.0 9.0
5.0 11.0
3
1.5
3.5
6.03.9974
7.9949
12.9917