GTO: Document SemVer practices for ML models

Semantic versioning is the accepted way to version code. How should artifacts be versioned?
I have been asked this by a Data Scientist some time ago. Given that everyone is free to do whatever he wants, perhaps giving a hint is not bad...?

I formulated a reasonable convention for models, not sure if it could be of any use:

#### Patch
Model as a black-box is as before, it only outputs different numbers.

Typical scenario: model have been trained with more recent data
Typical scenario 2: changed hyper-parameters


#### Minor
May want to take advantage of additional outputs or additional functionalities

Typical scenario 1: model now has `predict_proba()` in addition to `predict()`
Typical scenario 2: model now outputs a json with an additional field `confidence_interval`, in addition to `predicted_values`


#### Major
Need to re-visit the code that calls the model to serve it (breaking change)

Typical scenario 1: model APIs have changed
Typical scenario 2: model expects different input data format
Typical scenario 3: model relies on different libraries, need to re-build the venv (or even the OS-level libraries)

_Originally posted by @francesco086 in https://github.com/iterative/mlem.ai/pull/199#discussion_r1018328705_
      
🧵 See the thread for more opinions on this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GTO: Document SemVer practices for ML models #231

Patch

Minor

Major

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GTO: Document SemVer practices for ML models #231

Description

Patch

Minor

Major

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions