Omniscient Reader's Viewpoint

Exclusive Normal+ 4 BasicsMachine LearningMath Meta

Omniscient Reader's Viewpoint is a South Korean fantasy series following Kim Dokja, an average office worker who becomes the only person to read the end of an obscure, long-running web novel. He becomes a protagonist of one of Yoo Joonghyuk's hundreds of regressions, and attempts to navigate the desolate plane that is the remnants of their world.

Implement LinearRegression, a multiple linear regression model trained via gradient descent.

Constructor

LinearRegression(learning_rate, max_iterations, tolerance)

learning_rate is the step size used during gradient descent weight updates.
max_iterations is the maximum number of gradient descent steps to perform.
tolerance is the early stopping threshold. If the change in loss between two consecutive iterations is less than tolerance, training stops.

Methods

fit(X, y)

X is a 2-dimensional container of training samples.
- X[i] is the i-th training sample.
- X[i][j] is the value of feature j for sample i.
y is a container of target values (continuous, not binary).
- y[i] is the target for sample i.
Before training, normalize each feature to have mean 0 and standard deviation 1 across the training set. Store the mean and standard deviation per feature so that future predictions can be transformed into the same space.

predict(x)

x is a feature vector representing one sample.
- x[j] corresponds to feature j.
Normalize x using the means and standard deviations computed during fit, then return the predicted value.

loss_history()

Returns the list of MSE loss values, one per iteration performed during training (including the initial loss before any update).

Training Rules

The model learns a weight vector w and a bias term b such that the prediction for sample x is:

$\hat{y} = w_1 x_1 + w_2 x_2 + \cdots + w_D x_D + b$

Initialize all weights and the bias to 0.

At each iteration, compute the MSE loss, then update every weight and the bias simultaneously using the standard gradient descent update rules for linear regression. Do not update weights one at a time — all updates in a single iteration use the gradients computed from the same set of predictions.

Training stops when either max_iterations is reached or the absolute change in loss between two consecutive iterations is less than tolerance.

Loss Function

For predictions $\hat{y}_i$ and targets $y_i$ over $N$ samples:

$\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i - y_i)^2$

Feature Normalization

For each feature j, compute across the training set:

$\mu_j = \frac{1}{N} \sum_{i=1}^{N} X[i][j]$

$\sigma_j = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (X[i][j] - \mu_j)^2}$

Replace each feature value with:

$X'[i][j] = \frac{X[i][j] - \mu_j}{\sigma_j}$

If $\sigma_j = 0$ for some feature (i.e., the feature is constant across all samples), set all normalized values for that feature to 0.

Notes

You may not use any linear algebra libraries or regression utilities. Compute all gradients and updates manually.
You may not use the closed-form (normal equation) solution.
Answers within ±0.01 of the expected value will be marked as correct.

Input Format

The first line contains 4 values, in order:

N: the number of training samples
D: the number of features per sample
learning_rate
max_iterations
tolerance

The next N lines each contain D floating-point numbers, representing the training matrix X.

The next line contains N floating-point numbers, representing the target vector y.

The next line contains a single integer Q, the number of prediction queries.

The final Q lines each contain D floating-point numbers, representing the feature vectors to predict on.

Output Format

Output Q floating-point numbers, one per line, each rounded to 4 decimal places. These are the predicted target values for each query sample.

Accepted 1/6

Acceptance 17%

Loading editor...

Sample Input:

5 1 0.1 10000 0.000001
1.0
2.0
3.0
4.0
5.0
5.0 7.0 9.0 11.0 13.0
3
1.5
6.0
0.0

Expected Output:

5.9992
14.9980
2.9996