The NOexistenceN of you AND me
The NOexistenceN of you AND me is a visual novel developed by Fontainebleau and published by Nino Games. The game revolves around navigating through hundreds of decisions to respond to the "existence" of a mysterious girl who suddenly appeared in your room.
Implement DecisionTree, a binary decision tree classifier.
"As long as you trust, hope, and need her, Lilith will always 'exist' here. Forever... only 'existing' for you."

Constructor
DecisionTree(max_depth, min_samples)
max_depthis the maximum depth of the decision tree. The root node has depth0, and each split increases the depth by1. Once a node'sdepth == max_depth, it becomes a leaf.min_samplesis the minimum number of training samples required to split a node. When a node contains fewer thanmin_samplessamples, it becomes a leaf.
Methods
fit(X, y)
Xis a 2-dimensional container of training samples. You may assume it is in the format of the most conventional container in your programming language of choice.X[i]is thei-th training sample.X[i][j]is the outcome of featurejfor samplei.
yis a container of binary labels. You may assume it is in the format of the most conventional container in your programming language of choice.y[i]is the label for samplei.- Each label is
0or1.
predict(x)
xis a feature vector representing one sample.x[j]corresponds to featurej.- This will have the same dimensions as a single training sample.
Training Rules
Your tree must be built greedily, top-down, using binary splits on continuous features only.
At each internal node, choose a feature index j and a threshold t, and split the node's samples into:
- left:
X[i][j] <= t - right:
X[i][j] > tBoth sides must be non-empty.
Splits are chosen to minimize weighted Gini impurity.
Gini Impurity
For a set of samples S, let:
p0be the fraction of samples inSwith label0p1be the fraction of samples inSwith label1
Then .
For a candidate split into L and R, the split score is .
Choose the split with the smallest score.
Candidate Thresholds
For a fixed feature j at a node, consider thresholds t of the form:
t = (a + b) / 2, wherea < bare adjacent values after sorting the samples byX[i][j]
(Equivalently: try thresholds halfway between adjacent distinct feature values. Note: t is the exact arithmetic mean between these adjacent values.)
Stopping Conditions
A node becomes a leaf if any of the following holds:
all labels in the node are identical
depth == max_depthnumber of samples
< min_samplesno valid split exists (i.e., every possible split makes one side empty)
A leaf predicts the majority label of its samples.
If the leaf has a tie (equal number of 0 and 1), predict 0.
Tie-Breaking (Determinism)
If multiple splits have the same best score, choose the split with:
smallest feature index
jthen smallest threshold
t
Notes
You may NOT use any external libraries that implement decision trees or related training logic.
You must explicitly create the tree nodes and compute split scores from scratch.
Input Format
The first line of the input will contain 4 integers, in order:
N: the number of samples in the training dataD: the number of features in each samplemax_depthmin_samples
The next N lines will contain D integers. Each line represents one training sample.
The next line will contain N integers, each of which is either 0 or 1.
The next line will contain a single integer Q, an integer representing the number of test prediction queries you will perform.
The final Q lines will contain D integers. Each line represents a test vector you are predicting the outcome of based off the training data. Note that these queries are independent, meaning the result of one prediction should not affect another.
Output Format
Output Q integers, all of which are either 0 or 1. These are the predicted labels for the query samples.
Constraints
-
max_depth -
min_samples
4 2 2 1
0 0
0 1
1 0
1 1
0 0 1 1
4
0 0
1 0
0 2
2 00
1
0
1