Skip to content
This repository was archived by the owner on Jul 16, 2021. It is now read-only.
This repository was archived by the owner on Jul 16, 2021. It is now read-only.

Model trait APIs #124

@theotherphil

Description

@theotherphil

Continuing the discussion on model traits started in the comments for #120.

Suggestions for possible changes:

Separate the notions of modelling method and model

(Or model and fit, or trainer and model, or some other names to be decided.)

The new functions would have signatures along the lines of:

Trainer::train : training data -> Model
Trainer::predict : (data, Model) -> Prediction

Advantages:

  • Removes the possibility of calling predict on an untrained model.
  • Clearly delineates the model fit data from the training algorithm. This will be useful when implementing serialisation.

Drawbacks:

  • API could be less intuitive for people familiar with popular current machine learning libraries.

Further questions:

  • How should online training be dealt with? What about updating models to take account of new data without completely retraining (e.g. updating class distributions in the leaves of a random forest)?

Associated types

Currently SupModel<T, U> takes its input and output types as type parameters. Should they be associated types instead?

Advantages:

  • Fits more closely with the typical meanings of type parameters vs associated types in Rust (I think!). In general, the users of a model don't get to choose the types it acts on - these are determined by the model itself (some models do give the user some say over the types used, but in this case these should be parameters to the specific model rather than the trait).
  • Shrinks signatures of functions which are generic over models. f<M, T, U> where M: SupModel<T, U> becomes f<M: SupModel>.

Drawbacks:

  • I can't think of any off the top of my head, but I may well be missing something.

Other bits and pieces

  • Algorithms using randomness should always let the user provide a seed. Otherwise regression testing becomes impossible.
  • Should we consider the algebraic traits used in HLearn? This might give us efficiency wins in some cases, but might also scare away potential users.

Caveat: I have very little rust experience, and so don't really know what I'm talking about!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions