Omniscient Reader's Viewpoint
Omniscient Reader's Viewpoint is a South Korean fantasy series following Kim Dokja, an average office worker who becomes the only person to read the end of an obscure, long-running web novel. He becomes a protagonist of one of Yoo Joonghyuk's hundreds of regressions, and attempts to navigate the desolate plane that is the remnants of their world.
Implement LinearRegression, a multiple linear regression model trained via gradient descent.

Constructor
LinearRegression(learning_rate, max_iterations, tolerance)
learning_rateis the step size used during gradient descent weight updates.max_iterationsis the maximum number of gradient descent steps to perform.toleranceis the early stopping threshold. If the change in loss between two consecutive iterations is less thantolerance, training stops.
Methods
fit(X, y)
Xis a 2-dimensional container of training samples.X[i]is thei-th training sample.X[i][j]is the value of featurejfor samplei.
yis a container of target values (continuous, not binary).y[i]is the target for samplei.
- Before training, normalize each feature to have mean
0and standard deviation1across the training set. Store the mean and standard deviation per feature so that future predictions can be transformed into the same space.
predict(x)
xis a feature vector representing one sample.x[j]corresponds to featurej.
- Normalize
xusing the means and standard deviations computed duringfit, then return the predicted value.
loss_history()
- Returns the list of MSE loss values, one per iteration performed during training (including the initial loss before any update).
Training Rules
The model learns a weight vector w and a bias term b such that the prediction for sample x is:
Initialize all weights and the bias to 0.
At each iteration, compute the MSE loss, then update every weight and the bias simultaneously using the standard gradient descent update rules for linear regression. Do not update weights one at a time — all updates in a single iteration use the gradients computed from the same set of predictions.
Training stops when either max_iterations is reached or the absolute change in loss between two consecutive iterations is less than tolerance.
Loss Function
For predictions and targets over samples:
Feature Normalization
For each feature j, compute across the training set:
Replace each feature value with:
If for some feature (i.e., the feature is constant across all samples), set all normalized values for that feature to 0.
Notes
- You may not use any linear algebra libraries or regression utilities. Compute all gradients and updates manually.
- You may not use the closed-form (normal equation) solution.
- Answers within
±0.01of the expected value will be marked as correct.
Input Format
The first line contains 4 values, in order:
N: the number of training samplesD: the number of features per samplelearning_ratemax_iterationstolerance
The next N lines each contain D floating-point numbers, representing the training matrix X.
The next line contains N floating-point numbers, representing the target vector y.
The next line contains a single integer Q, the number of prediction queries.
The final Q lines each contain D floating-point numbers, representing the feature vectors to predict on.
Output Format
Output Q floating-point numbers, one per line, each rounded to 4 decimal places. These are the predicted target values for each query sample.
5 1 0.1 10000 0.000001
1.0
2.0
3.0
4.0
5.0
5.0 7.0 9.0 11.0 13.0
3
1.5
6.0
0.05.9992
14.9980
2.9996