Skip to content

allow gridsearch for hyper tuning #36

@kjgm

Description

@kjgm

Typically, models are trained after tuning the parameters. A common approach is to tune with for example a grid search on the parameters. Sklearn has utility functions for this. pymurtree should be able to work with this.

This requires the implementatin of two previous issues:

  1. implement the sklearn estimator interface
  2. check for similar or different data in the fit method. Possibly, the solver for each dataset could be stored in memory, depending on how the gridsearch runs. If it runs: For each split in the data, for each parameter setting, then the cache can be re-used efficiently. If it runs for each parameter setting, for each split in the data, then the cache would be removed after each run, thus motivating to store the solvers for each different data set.
import pymurtree
import numpy
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

x = numpy.array([[0, 1, 0, 1], [1, 1, 0, 0], [1, 0, 0, 1], 
                 [1, 1, 1, 1], [0, 1, 1, 1], [0, 1, 0, 1],
                 [0, 0, 1, 1], [1, 0, 1, 0], [1, 0, 1, 1],
                 [1, 0, 1, 1], [0, 0, 0, 0], [0, 0, 1, 0],
                 [1, 0, 0, 1], [1, 1, 0, 1], [1, 1, 0, 0]])
y = numpy.array([5, 5, 4, 4, 5,
                 4, 4, 5, 5, 4,
                 4, 4, 5, 5, 5]) 

model = pymurtree.OptimalDecisionTreeClassifier(max_depth=3, verbose=False)
parameters = {
 "max_num_nodes": list(range(0, 8))  
}

## To see how this is expected to work, compare with sklearn.tree.DecisionTreeClassifier
#model = DecisionTreeClassifier(max_depth=3)
#parameters = {
# "max_leaf_nodes": list(range(2, 9))  
#}

tuning_model = GridSearchCV(
    model, param_grid=parameters, scoring="accuracy", cv=5, verbose=0
)
tuning_model.fit(x, y)
model = pymurtree.OptimalDecisionTreeClassifier(**tuning_model.best_params_)
#model = DecisionTreeClassifier(**tuning_model.best_params_)

model.fit(x, y)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions