forked from jurra/pymurtree
-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Typically, models are trained after tuning the parameters. A common approach is to tune with for example a grid search on the parameters. Sklearn has utility functions for this. pymurtree should be able to work with this.
This requires the implementatin of two previous issues:
- implement the sklearn estimator interface
- check for similar or different data in the
fitmethod. Possibly, the solver for each dataset could be stored in memory, depending on how the gridsearch runs. If it runs: For each split in the data, for each parameter setting, then the cache can be re-used efficiently. If it runs for each parameter setting, for each split in the data, then the cache would be removed after each run, thus motivating to store the solvers for each different data set.
import pymurtree
import numpy
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
x = numpy.array([[0, 1, 0, 1], [1, 1, 0, 0], [1, 0, 0, 1],
[1, 1, 1, 1], [0, 1, 1, 1], [0, 1, 0, 1],
[0, 0, 1, 1], [1, 0, 1, 0], [1, 0, 1, 1],
[1, 0, 1, 1], [0, 0, 0, 0], [0, 0, 1, 0],
[1, 0, 0, 1], [1, 1, 0, 1], [1, 1, 0, 0]])
y = numpy.array([5, 5, 4, 4, 5,
4, 4, 5, 5, 4,
4, 4, 5, 5, 5])
model = pymurtree.OptimalDecisionTreeClassifier(max_depth=3, verbose=False)
parameters = {
"max_num_nodes": list(range(0, 8))
}
## To see how this is expected to work, compare with sklearn.tree.DecisionTreeClassifier
#model = DecisionTreeClassifier(max_depth=3)
#parameters = {
# "max_leaf_nodes": list(range(2, 9))
#}
tuning_model = GridSearchCV(
model, param_grid=parameters, scoring="accuracy", cv=5, verbose=0
)
tuning_model.fit(x, y)
model = pymurtree.OptimalDecisionTreeClassifier(**tuning_model.best_params_)
#model = DecisionTreeClassifier(**tuning_model.best_params_)
model.fit(x, y)Metadata
Metadata
Assignees
Labels
No labels