I would like to suggest that we go back to the drawing board to discuss the model cards feature. Below you can find my opinions but of course feel free to disagree.
User
What does the user of this feature want to achieve?
As a user...
- I want to have a convenient way of creating a model card of my trained sklearn model (or sklearn-compatible model).
- I don't want to make big changes to my existing code to be able to use model cards.
- I don't want to be required to learn new concepts for creating a model card.
- I want to output different model card formats (perhaps?).
- I want to have arbitrary sections/figures/tables etc. in the model card (perhaps?).
Supported metrics
Ideally, we want to be able to use as much as possible of what sklearn already provides. These things come to my mind:
- Scalar metrics (accuracy, f1, r2, etc.). Do we want to support used-defined scalar metrics?
- Confusion matrix.
- CV results (hyper param search,
cross_validate)
- classification_report
- Some kind of feature importance.
- Model visualizations?
Note that it's absolutely possible to start with a subset of those features and add more later on.
API
Here I believe adoption will be better if users can use their existing code bases/scripts and only need to add a few lines to be able to create model cards.
Below are some suggestions I collected:
clf = SVC(random_state=0)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
cm = confusion_matrix(y_test, predictions, labels=clf.classes_)
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
display_labels=clf.classes_)
card.add_inspection('Confusion Matrix', disp)
model.fit(X, y)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
clf_report = classification_report(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
# everything above is just vanilla sklearn, below comes model card stuff
model_card = skops.card.create_model_card(accuracy=accuracy, clf_report=clf_report, conf_matrix=conf_matrix, description=...)
model_card.save(...)
Implementation
It could be advantageous if we use a more generic representation of a "card". This will allow us to more easily change:
- the card format (if there is no fixed specification)
- the storage back end (disk, remote)
- formats (markdown, rst, json/yaml?)
On the other hand, we could make things easier by directly relying on Nate's model card library, making this feature a thin wrapper around it. This will result in less work here but tightly couples this feature to that specific implementation.
I would like to suggest that we go back to the drawing board to discuss the model cards feature. Below you can find my opinions but of course feel free to disagree.
User
What does the user of this feature want to achieve?
As a user...
Supported metrics
Ideally, we want to be able to use as much as possible of what sklearn already provides. These things come to my mind:
cross_validate)Note that it's absolutely possible to start with a subset of those features and add more later on.
API
Here I believe adoption will be better if users can use their existing code bases/scripts and only need to add a few lines to be able to create model cards.
Below are some suggestions I collected:
Implementation
It could be advantageous if we use a more generic representation of a "card". This will allow us to more easily change:
On the other hand, we could make things easier by directly relying on Nate's model card library, making this feature a thin wrapper around it. This will result in less work here but tightly couples this feature to that specific implementation.