You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current Decision Tree grow() implementation does not prune pure leaf nodes that have the same class outcome and both stem from a common ancestor node. Pruning would involve replacing the Split node with a pure leaf node. See image.
Instead of terminating after the Split node was added to the stack, we could detect the condition of a pure split containing only one class and then right away replace the Split node with a leaf node.
This should speed up inference by reducing the number of splits that need to be evaluated as well as reduce the memory and storage cost of trained Decision Tree models. Effects Classification/Regression Trees, Extras Trees, Gradient Boost, Logit Boost, Random Forest, and AdaBoost.
The text was updated successfully, but these errors were encountered:
The current Decision Tree
grow()
implementation does not prune pure leaf nodes that have the same class outcome and both stem from a common ancestor node. Pruning would involve replacing the Split node with a pure leaf node. See image.The problematic logic can be found here https://github.com/RubixML/ML/blob/master/src/Graph/Trees/DecisionTree.php#L188
Instead of terminating after the Split node was added to the stack, we could detect the condition of a pure split containing only one class and then right away replace the Split node with a leaf node.
Here is the test that generated this Graphviz visual (except the number of bins was set to 3 instead of 5) https://github.com/RubixML/ML/blob/master/tests/Classifiers/ClassificationTreeTest.php#L194.
This should speed up inference by reducing the number of splits that need to be evaluated as well as reduce the memory and storage cost of trained Decision Tree models. Effects Classification/Regression Trees, Extras Trees, Gradient Boost, Logit Boost, Random Forest, and AdaBoost.
The text was updated successfully, but these errors were encountered: