allow gridsearch for hyper tuning

Typically, models are trained after tuning the parameters. A common approach is to tune with for example a grid search on the parameters. Sklearn has utility functions for this. pymurtree should be able to work with this. 

This requires the implementatin of two previous issues:
1. implement the sklearn estimator interface
2. check for similar or different data in the `fit` method. Possibly, the solver for each dataset could be stored in memory, depending on how the gridsearch runs. If it runs: For each split in the data, for each parameter setting, then the cache can be re-used efficiently. If it runs for each parameter setting, for each split in the data, then the cache would be removed after each run, thus motivating to store the solvers for each different data set.


```python
import pymurtree
import numpy
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

x = numpy.array([[0, 1, 0, 1], [1, 1, 0, 0], [1, 0, 0, 1], 
                 [1, 1, 1, 1], [0, 1, 1, 1], [0, 1, 0, 1],
                 [0, 0, 1, 1], [1, 0, 1, 0], [1, 0, 1, 1],
                 [1, 0, 1, 1], [0, 0, 0, 0], [0, 0, 1, 0],
                 [1, 0, 0, 1], [1, 1, 0, 1], [1, 1, 0, 0]])
y = numpy.array([5, 5, 4, 4, 5,
                 4, 4, 5, 5, 4,
                 4, 4, 5, 5, 5]) 

model = pymurtree.OptimalDecisionTreeClassifier(max_depth=3, verbose=False)
parameters = {
 "max_num_nodes": list(range(0, 8))  
}

## To see how this is expected to work, compare with sklearn.tree.DecisionTreeClassifier
#model = DecisionTreeClassifier(max_depth=3)
#parameters = {
# "max_leaf_nodes": list(range(2, 9))  
#}

tuning_model = GridSearchCV(
    model, param_grid=parameters, scoring="accuracy", cv=5, verbose=0
)
tuning_model.fit(x, y)
model = pymurtree.OptimalDecisionTreeClassifier(**tuning_model.best_params_)
#model = DecisionTreeClassifier(**tuning_model.best_params_)

model.fit(x, y)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

allow gridsearch for hyper tuning #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

allow gridsearch for hyper tuning #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions