The NOexistenceN of you AND me is a visual novel developed by Fontainebleau and published by Nino Games. The game revolves around navigating through hundreds of decisions to respond to the "existence" of a mysterious girl who suddenly appeared in your room.
Implement DecisionTree, a binary decision tree classifier.
"As long as you trust, hope, and need her, Lilith will always 'exist' here. Forever... only 'existing' for you."

DecisionTree(max_depth, min_samples)max_depthis the maximum depth of the decision tree. The root node has depth0, and each split increases the depth by1. Once a node'sdepth == max_depth, it becomes a leaf.min_samplesis the minimum number of training samples required to split a node. When a node contains fewer thanmin_samplessamples, it becomes a leaf.
fit(X, y)Xis a 2-dimensional container of training samples. You may assume it is in the format of the most conventional container in your programming language of choice.X[i]is thei-th training sample.X[i][j]is the outcome of featurejfor samplei.
yis a container of binary labels. You may assume it is in the format of the most conventional container in your programming language of choice.y[i]is the label for samplei.- Each label is
0or1.
predict(x)xis a feature vector representing one sample.x[j]corresponds to featurej.- This will have the same dimensions as a single training sample.
Your tree must be built greedily, top-down, using binary splits on continuous features only.
At each internal node, choose a feature index j and a threshold t, and split the node's samples into:
- left:
X[i][j] <= t - right:
X[i][j] > tBoth sides must be non-empty.
Splits are chosen to minimize weighted Gini impurity.
For a set of samples S, let:
p0be the fraction of samples inSwith label0p1be the fraction of samples inSwith label1
Then .
For a candidate split into L and R, the split score is .
Choose the split with the smallest score.
For a fixed feature j at a node, consider thresholds t of the form:
t = (a + b) / 2, wherea < bare adjacent values after sorting the samples byX[i][j]
(Equivalently: try thresholds halfway between adjacent distinct feature values. Note: t is the exact arithmetic mean between these adjacent values.)
A node becomes a leaf if any of the following holds:
all labels in the node are identical
depth == max_depthnumber of samples
< min_samplesno valid split exists (i.e., every possible split makes one side empty)
A leaf predicts the majority label of its samples.
If the leaf has a tie (equal number of 0 and 1), predict 0.
If multiple splits have the same best score, choose the split with:
smallest feature index
jthen smallest threshold
t
You may NOT use any external libraries that implement decision trees or related training logic.
- You must explicitly create the tree nodes and compute split scores from scratch.
The first line of the input will contain 4 integers, in order:
N: the number of samples in the training dataD: the number of features in each samplemax_depthmin_samples
The next N lines will contain D integers. Each line represents one training sample.
The next line will contain N integers, each of which is either 0 or 1.
The next line will contain a single integer Q, an integer representing the number of test prediction queries you will perform.
The final Q lines will contain D integers. Each line represents a test vector you are predicting the outcome of based off the training data. Note that these queries are independent, meaning the result of one prediction should not affect another.
Output Q integers, all of which are either 0 or 1. These are the predicted labels for the query samples.
-
max_depth -
min_samples
4 2 2 1
0 0
0 1
1 0
1 1
0 0 1 1
4
0 0
1 0
0 2
2 00
1
0
1