Skip to content

Model card design #29

@BenjaminBossan

Description

@BenjaminBossan

I would like to suggest that we go back to the drawing board to discuss the model cards feature. Below you can find my opinions but of course feel free to disagree.

User

What does the user of this feature want to achieve?

As a user...

  1. I want to have a convenient way of creating a model card of my trained sklearn model (or sklearn-compatible model).
  2. I don't want to make big changes to my existing code to be able to use model cards.
  3. I don't want to be required to learn new concepts for creating a model card.
  4. I want to output different model card formats (perhaps?).
  5. I want to have arbitrary sections/figures/tables etc. in the model card (perhaps?).

Supported metrics

Ideally, we want to be able to use as much as possible of what sklearn already provides. These things come to my mind:

  1. Scalar metrics (accuracy, f1, r2, etc.). Do we want to support used-defined scalar metrics?
  2. Confusion matrix.
  3. CV results (hyper param search, cross_validate)
  4. classification_report
  5. Some kind of feature importance.
  6. Model visualizations?

Note that it's absolutely possible to start with a subset of those features and add more later on.

API

Here I believe adoption will be better if users can use their existing code bases/scripts and only need to add a few lines to be able to create model cards.

Below are some suggestions I collected:

clf = SVC(random_state=0)
clf.fit(X_train, y_train)

predictions = clf.predict(X_test)
cm = confusion_matrix(y_test, predictions, labels=clf.classes_)
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
                              display_labels=clf.classes_)
card.add_inspection('Confusion Matrix', disp)
model.fit(X, y)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
clf_report = classification_report(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

# everything above is just vanilla sklearn, below comes model card stuff
model_card = skops.card.create_model_card(accuracy=accuracy, clf_report=clf_report, conf_matrix=conf_matrix, description=...)
model_card.save(...)

Implementation

It could be advantageous if we use a more generic representation of a "card". This will allow us to more easily change:

  • the card format (if there is no fixed specification)
  • the storage back end (disk, remote)
  • formats (markdown, rst, json/yaml?)

On the other hand, we could make things easier by directly relying on Nate's model card library, making this feature a thin wrapper around it. This will result in less work here but tightly couples this feature to that specific implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions