Skip to content
This repository was archived by the owner on Apr 23, 2025. It is now read-only.
This repository was archived by the owner on Apr 23, 2025. It is now read-only.

GTO: Document SemVer practices for ML models #231

@aguschin

Description

@aguschin

Semantic versioning is the accepted way to version code. How should artifacts be versioned?
I have been asked this by a Data Scientist some time ago. Given that everyone is free to do whatever he wants, perhaps giving a hint is not bad...?

I formulated a reasonable convention for models, not sure if it could be of any use:

Patch

Model as a black-box is as before, it only outputs different numbers.

Typical scenario: model have been trained with more recent data
Typical scenario 2: changed hyper-parameters

Minor

May want to take advantage of additional outputs or additional functionalities

Typical scenario 1: model now has predict_proba() in addition to predict()
Typical scenario 2: model now outputs a json with an additional field confidence_interval, in addition to predicted_values

Major

Need to re-visit the code that calls the model to serve it (breaking change)

Typical scenario 1: model APIs have changed
Typical scenario 2: model expects different input data format
Typical scenario 3: model relies on different libraries, need to re-build the venv (or even the OS-level libraries)

Originally posted by @francesco086 in #199 (comment)

🧵 See the thread for more opinions on this

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: docsArea: user documentation (gatsby-theme-iterative)p2-nice-to-haveLess of a priority at the moment. We don't usually deal with this immediately.type: discussion

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions