Skip to content

Hyperparameter optimization #264

@lars-reimann

Description

@lars-reimann

Is your feature request related to a problem?

Finding appropriate values for hyperparameters by hand is tedious. There should be automation to try different combinations of values.

Desired solution

  1. For all hyperparameters of models of type T it should also be possible to pass a Choice[T] (see feat: add Choice class for possible values of hyperparameter #325). Example:
# Before
class KNearestNeighbors(Classifier):
  def __init__(self, number_of_neighbors: int) -> None:
    ...

# After
class KNearestNeighbors(Classifier):
  def __init__(self, number_of_neighbors: int | Choice[int]) -> None:
    ...

# Usage
KNearestNeighbors(number_of_neighbors = Choice(1, 10, 100))
  1. Adjust the getters (Getters for hyperparameters of models #260) accordingly.
  2. When a user tries to call fit on a model that contains Choice at any level (can be nested), raise an exception. Also point to the correct method (see 4.).
  3. Add new method fit_by_exhaustive_search to Classifier and subclasses with parameter:
    • optimization_metric: The metric to use to find the best model. It should have type ClassifierMetric, which is an enum with one value for each classifier metric we have available:
    class ClassifierMetric(Enum):
        ACCURACY = "accuracy"
        PRECISION = "precision
        RECALL = "recall"
        F1_SCORE = "f1_score"
    The parameter should be required.
  4. Add new method fit_by_exhaustive_search to Regressor and subclasses with parameter:
    • optimization_metric: The metric to use to find the best model. It should have type RegressorMetric, which is an enum with one value for each regressor metric we have available:
    class RegressorMetric(Enum):
        MEAN_SQUARED_ERROR = "mean_squared_error"
        MEAN_ABSOLUTE_ERROR = "mean_absolute_error"
    The parameter should be required.
  5. Both of those methods should then collect the Choices inside of the model and its children, and for each possible setting create a model without choices, fit this, and compute the listed metric on it. It should then keep track of the best (fitted) model according to the metric and return it at the end. GridSearchCV of scikit-learn can be useful for this.

Metadata

Metadata

Assignees

Labels

releasedIncluded in a release

Type

No type

Projects

Status

✔️ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions