Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
18ef31b
permutation importance
merveenoyan Sep 16, 2022
2ff7a5c
Update examples/plot_model_card.py
merveenoyan Sep 29, 2022
5b8d4c9
Update examples/plot_model_card.py
merveenoyan Nov 7, 2022
ae99b89
Merge branch 'main' into feature_importance
merveenoyan Nov 7, 2022
e80359d
added test and got rid of pandas
merveenoyan Nov 7, 2022
4ef3549
change import
merveenoyan Nov 7, 2022
133cc2e
fixes
merveenoyan Nov 7, 2022
1c448bc
fixes
merveenoyan Nov 7, 2022
95ad03b
updated docs & more
merveenoyan Nov 7, 2022
2bc714f
docs
merveenoyan Nov 7, 2022
b457fb9
added another test, updated docs, will add to model card rst
merveenoyan Nov 8, 2022
7228e83
removed unnecessary files
merveenoyan Nov 8, 2022
9471c76
added importance to model card guide
merveenoyan Nov 8, 2022
48c656d
moved filepaths to tempfile
merveenoyan Nov 8, 2022
a514060
moved filepaths to tempfile
merveenoyan Nov 8, 2022
ac699c5
test windows fix
merveenoyan Nov 8, 2022
76e5b0e
added types
merveenoyan Nov 8, 2022
baccd1b
Update skops/card/_model_card.py
merveenoyan Nov 9, 2022
d3d0c1c
Update skops/card/_model_card.py
merveenoyan Nov 9, 2022
2489693
added matplotlib mock and mock test
merveenoyan Nov 10, 2022
a7ba718
fixed test
merveenoyan Nov 17, 2022
1735a01
forgot to commit this lol
merveenoyan Nov 17, 2022
8dd9692
change type
merveenoyan Nov 17, 2022
1462ac2
Merge branch 'main' into feature_importance
merveenoyan Nov 22, 2022
510d41b
added error and tests
merveenoyan Nov 22, 2022
583c16d
Merge branch 'feature_importance' of github.com:merveenoyan/skops int…
merveenoyan Nov 22, 2022
2828ae2
Merge branch 'main' into feature_importance
merveenoyan Nov 22, 2022
97ebde9
fix for windows tests
merveenoyan Nov 22, 2022
ce75f12
merger
merveenoyan Nov 22, 2022
dd6e7aa
fix for windows tests
merveenoyan Nov 22, 2022
fcc16dd
fix for windows tests
merveenoyan Nov 22, 2022
7fbaf89
fix for windows tests
merveenoyan Nov 22, 2022
a3435b8
fix for windows tests
merveenoyan Nov 23, 2022
abba95b
Merge branch 'main' into feature_importance
merveenoyan Nov 24, 2022
d0adddf
swapped with path
merveenoyan Nov 24, 2022
003544f
added import_or_raise
merveenoyan Nov 29, 2022
1da4747
changed pre-commit config
merveenoyan Nov 29, 2022
280c602
changed pre-commit config
merveenoyan Nov 29, 2022
1521444
black
merveenoyan Nov 29, 2022
61d1784
fixed test
merveenoyan Nov 29, 2022
35b5196
Merge branch 'main' into feature_importance
merveenoyan Nov 29, 2022
5e5d6b3
trigger CI
merveenoyan Nov 29, 2022
2bf6c81
minor fix after merge conflict
merveenoyan Nov 29, 2022
0465163
changed test
merveenoyan Nov 29, 2022
8bf590e
latest version
merveenoyan Dec 14, 2022
64996fb
fix, but no idea why
adrinjalali Dec 14, 2022
298f6fb
Merge branch 'main' into feature_importance
merveenoyan Dec 15, 2022
e03d30f
minor try
merveenoyan Jan 6, 2023
c9b8813
trigger ci
merveenoyan Jan 6, 2023
a71cdac
revert
merveenoyan Jan 6, 2023
9e63a42
mypy fix
merveenoyan Jan 16, 2023
ec82e47
merge main
merveenoyan Jan 16, 2023
e33e254
fixed test and fixture
merveenoyan Jan 17, 2023
da185f5
Merge branch 'main' into feature_importance
merveenoyan Jan 17, 2023
d66be43
fix
merveenoyan Jan 17, 2023
b183d9c
Merge branch 'feature_importance' of github.com:merveenoyan/skops int…
merveenoyan Jan 17, 2023
ff9c7bf
removed redundant fixture
merveenoyan Jan 17, 2023
3a93880
Merge branch 'main' into feature_importance
merveenoyan Jan 20, 2023
bdeabc4
trigger black
merveenoyan Jan 20, 2023
e999a6f
trigger black
merveenoyan Jan 20, 2023
c9219d5
trigger black
merveenoyan Jan 20, 2023
bb864a6
trigger isort
merveenoyan Jan 20, 2023
9113aad
Update skops/card/_model_card.py
merveenoyan Jan 23, 2023
5e80732
Update skops/card/_model_card.py
merveenoyan Jan 23, 2023
322ed4d
Update skops/card/_model_card.py
merveenoyan Jan 23, 2023
e5afe26
Update skops/card/_model_card.py
merveenoyan Jan 23, 2023
95189d8
Update skops/card/_model_card.py
merveenoyan Jan 23, 2023
55b968d
Update skops/card/_model_card.py
merveenoyan Jan 23, 2023
6e3fd2b
removed test, nits and more
merveenoyan Jan 23, 2023
9d98003
Update skops/card/_model_card.py
merveenoyan Jan 23, 2023
17e0253
Update skops/card/_model_card.py
merveenoyan Jan 23, 2023
2b4df99
Merge branch 'main' into feature_importance
merveenoyan Jan 24, 2023
cf18588
iterated
merveenoyan Jan 25, 2023
7a02cd4
added print to debug on ubuntu
merveenoyan Jan 29, 2023
602b8d2
more debugging
merveenoyan Jan 29, 2023
c23f63f
more debugging
merveenoyan Jan 29, 2023
5407351
Merge branch 'skops-dev:main' into feature_importance
merveenoyan Jan 29, 2023
e046f88
removed debug
merveenoyan Jan 29, 2023
a16d83f
removed debugging line from github workflow
merveenoyan Jan 29, 2023
fce9b14
removed mypy ignores
merveenoyan Jan 30, 2023
1281274
Merge branch 'main' into feature_importance
merveenoyan Jan 30, 2023
01a5d78
removed mypy ignores
merveenoyan Jan 30, 2023
77f9f8c
removed mypy ignores
merveenoyan Jan 30, 2023
7c70656
merge local
merveenoyan Jan 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion docs/model_card.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,10 @@ plots, save them on disk and then add them to the card by passing the path name
to the :meth:`.Card.add_plot` method. For tables, you can pass either
dictionaries with the key being the header and the values being list of row
entries, or a pandas ``DataFrame``; use the :meth:`.Card.add_table` method for
this.
this. If you would like to add permutation importance results, you can pass
your importances to :meth:`.Card.add_permutation_importances`. If you want to
have multiple importance plots, you should pass a file name and a title for the
plot. This will create a boxplot and write it to the model card for you.

To add content to an existing subsection, or create a new subsection, use a
``"/"`` to indicate the subsection. E.g. let's assume you would like to add a
Expand Down
9 changes: 9 additions & 0 deletions examples/plot_model_card.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.experimental import enable_halving_search_cv # noqa
from sklearn.inspection import permutation_importance
from sklearn.metrics import (
ConfusionMatrixDisplay,
accuracy_score,
Expand Down Expand Up @@ -153,6 +154,14 @@
**{"Model description/Evaluation Results/Confusion Matrix": "confusion_matrix.png"}
)

importances = permutation_importance(model, X_test, y_test, n_repeats=10)
model_card.add_permutation_importances(
importances,
X_test.columns,
plot_file=Path(local_repo) / "importance.png",
plot_name="Permutation Importance",
)

cv_results = model.cv_results_
clf_report = classification_report(
y_test, y_pred, output_dict=True, target_names=["malignant", "benign"]
Expand Down
58 changes: 56 additions & 2 deletions skops/card/_model_card.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

from skops.card._templates import CONTENT_PLACEHOLDER, SKOPS_TEMPLATE, Templates
from skops.io import load
from skops.utils.importutils import import_or_raise

# Repr attributes can be used to control the behavior of repr
aRepr = Repr()
Expand Down Expand Up @@ -206,7 +207,7 @@ def split_subsection_names(key: str) -> list[str]:


def _getting_started_code(
file_name: str, model_format: Literal["pickle", "skops"], indent=" "
file_name: str, model_format: Literal["pickle", "skops"], indent: str = " "
) -> list[str]:
# get lines of code required to load the model
lines = [
Expand Down Expand Up @@ -1085,11 +1086,64 @@ def add_metrics(
"You can find the details about evaluation process and "
"the evaluation results."
)

self._metrics.update(kwargs)
self._add_metrics(section, self._metrics, description=description)
return self

def add_permutation_importances(
self,
permutation_importances,
columns: Sequence[str],
plot_file: str = "permutation_importances.png",
plot_name: str = "Permutation Importances",
overwrite: bool = False,
) -> "Card":
"""Plots permutation importance and saves it to model card.

Parameters
----------
permutation_importances : sklearn.utils.Bunch
Output of :func:`sklearn.inspection.permutation_importance`.

columns : str, list or pandas.Index
Column names of the data used to generate importances.

plot_file : str
Filename for the plot.

plot_name : str
Name of the plot.

overwrite : bool (default=False)
Whether to overwrite the permutation importance plot file, if a plot by that
name already exists.

Returns
-------
self : object
Card object.
"""
plt = import_or_raise("matplotlib.pyplot", "permutation importance")

if Path(plot_file).exists() and overwrite is False:
raise ValueError(
f"{str(plot_file)} already exists. Set `overwrite` to `True` or pass a"
" different filename for the plot."
)
sorted_importances_idx = permutation_importances.importances_mean.argsort()
_, ax = plt.subplots()
ax.boxplot(
x=permutation_importances.importances[sorted_importances_idx].T,
labels=columns[sorted_importances_idx],
vert=False,
)
ax.set_title(plot_name)
ax.set_xlabel("Decrease in Score")
plt.savefig(plot_file)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should care or not, but I wonder if we should check if a file by that name already exists and maybe warn the user? If they re-create the same plot, I guess it's annoying to get a warning. But what if they already happen to have a permutation_importances.png in the folder? Then we would just overwrite that. Really not sure...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we should have a flag, where by default we raise it the file exists, and only overwrite if the flag is True: overwrite=True/False

self.add_plot(**{plot_name: plot_file})

return self

def _add_metrics(
self,
section: str,
Expand Down
97 changes: 96 additions & 1 deletion skops/card/tests/test_card.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,14 @@
import textwrap
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pytest
import sklearn
from huggingface_hub import CardData, metadata_load
from sklearn.datasets import load_iris
from sklearn.inspection import permutation_importance
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import f1_score, make_scorer
from sklearn.neighbors import KNeighborsClassifier

from skops import hub_utils
Expand Down Expand Up @@ -403,6 +404,96 @@ def test_add_twice(self, model_card):
assert text1 == text2


def test_permutation_importances(
iris_estimator, iris_data, model_card, destination_path
):
X, y = iris_data
result = permutation_importance(
iris_estimator, X, y, n_repeats=10, random_state=42, n_jobs=2
)

model_card.add_permutation_importances(
result,
X.columns,
Path(destination_path) / "importance.png",
"Permutation Importance",
)
temp_path = Path(destination_path) / "importance.png"
assert f"![Permutation Importance]({temp_path}" in model_card.render()


def test_multiple_permutation_importances(
iris_estimator, iris_data, model_card, destination_path
):
X, y = iris_data
result = permutation_importance(
iris_estimator, X, y, n_repeats=10, random_state=42, n_jobs=2
)
model_card.add_permutation_importances(
result, X.columns, plot_file=Path(destination_path) / "importance.png"
)
f1 = make_scorer(f1_score, average="micro")
result = permutation_importance(
iris_estimator, X, y, scoring=f1, n_repeats=10, random_state=42, n_jobs=2
)
model_card.add_permutation_importances(
result,
X.columns,
plot_file=Path(destination_path) / "f1_importance.png",
plot_name="Permutation Importance on f1",
)
# check for default one
temp_path = Path(destination_path) / "importance.png"
assert f"![Permutation Importances]({temp_path}" in model_card.render()
# check for F1
temp_path_f1 = Path(destination_path) / "f1_importance.png"
assert f"![Permutation Importance on f1]({temp_path_f1}" in model_card.render()


def test_duplicate_permutation_importances(
iris_estimator, iris_data, model_card, destination_path
):
X, y = iris_data
result = permutation_importance(
iris_estimator, X, y, n_repeats=10, random_state=42, n_jobs=2
)
plot_path = os.path.join(destination_path, "importance.png")
model_card.add_permutation_importances(result, X.columns, plot_file=plot_path)
with pytest.raises(
ValueError,
match=(
"already exists. Set `overwrite` to `True` or pass a"
" different filename for the plot."
),
):
model_card.add_permutation_importances(
result,
X.columns,
plot_file=plot_path,
plot_name="Permutation Importance on f1",
)


def test_duplicate_permutation_importances_overwrite(
iris_estimator, iris_data, model_card, destination_path
):
X, y = iris_data
result = permutation_importance(
iris_estimator, X, y, n_repeats=10, random_state=42, n_jobs=2
)
plot_path = os.path.join(destination_path, "importance.png")
model_card.add_permutation_importances(result, X.columns, plot_file=plot_path)

model_card.add_permutation_importances(
result,
X.columns,
plot_file=plot_path,
plot_name="Permutation Importance on f1",
overwrite=True,
)
assert f"![Permutation Importance on f1]({plot_path}" in model_card.render()


class TestAddGetStartedCode:
"""Tests for getting started code"""

Expand Down Expand Up @@ -856,13 +947,17 @@ def test_delete_empty_key_subsection_raises(self, model_card):

class TestAddPlot:
def test_add_plot(self, destination_path, model_card):
import matplotlib.pyplot as plt

plt.plot([4, 5, 6, 7])
plt.savefig(Path(destination_path) / "fig1.png")
model_card = model_card.add_plot(fig1="fig1.png")
plot_content = model_card.select("fig1").content.format()
assert plot_content == "![fig1](fig1.png)"

def test_add_plot_to_existing_section(self, destination_path, model_card):
import matplotlib.pyplot as plt

plt.plot([4, 5, 6, 7])
plt.savefig(Path(destination_path) / "fig1.png")
model_card = model_card.add_plot(**{"Model description/Figure 1": "fig1.png"})
Expand Down
1 change: 0 additions & 1 deletion skops/card/tests/test_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,6 @@ def test_example_model_cards(tmp_path, file_name):
path = Path(os.getcwd()) / "skops" / "card" / "tests" / "examples"
file0 = path / file_name
diff = (path / file_name).with_suffix(".md.diff")

parsed_card = parse_modelcard(file0)
file1 = tmp_path / "readme-parsed.md"
parsed_card.save(file1)
Expand Down
29 changes: 28 additions & 1 deletion skops/conftest.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import builtins
from unittest.mock import patch

import pytest
Expand All @@ -7,7 +8,8 @@
def pandas_not_installed():
# patch import so that it raises an ImportError when trying to import
# pandas. This works because pandas is only imported lazily.
orig_import = __import__

orig_import = builtins.__import__

def mock_import(name, *args, **kwargs):
if name == "pandas":
Expand All @@ -16,3 +18,28 @@ def mock_import(name, *args, **kwargs):

with patch("builtins.__import__", side_effect=mock_import):
yield


@pytest.fixture
def matplotlib_not_installed():
# patch import so that it raises an ImportError when trying to import
# matplotlib. This works because matplotlib is only imported lazily.

# ugly way of removing matplotlib from cached imports
import sys

for key in list(sys.modules.keys()):
if key.startswith("matplotlib"):
del sys.modules[key]

orig_import = builtins.__import__

def mock_import(name, *args, **kwargs):
if name == "matplotlib":
raise ImportError
return orig_import(name, *args, **kwargs)

with patch("builtins.__import__", side_effect=mock_import):
yield
Comment thread
merveenoyan marked this conversation as resolved.

import matplotlib # noqa
4 changes: 2 additions & 2 deletions skops/hub_utils/_hf_hub.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ def _create_config(
"text-regression",
],
data,
model_format: Literal[ # type: ignore
model_format: Literal[
"skops",
"pickle",
"auto",
Expand Down Expand Up @@ -337,7 +337,7 @@ def init(
"text-regression",
],
data,
model_format: Literal[ # type: ignore
model_format: Literal[
"skops",
"pickle",
"auto",
Expand Down
29 changes: 29 additions & 0 deletions skops/utils/importutils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from importlib import import_module


def import_or_raise(module, feature_name):
"""Raise error if a given library is not present in the environment.

Parameters
----------
module: str
Name of the module.

feature_name: str
Name of the feature module is required for.

Raises
------
ModuleNotFoundError
Is raised if a given module is not present in the environment
"""
try:
module = import_module(module)
except ImportError as e:
package = module.split(".")[0]
raise ModuleNotFoundError(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding: Why not a simple ImportError?

f"{feature_name.capitalize()} requires {package} to be installed. In order"
f" to use {feature_name}, you need to install the package in your current"
" python environment."
) from e
return module
16 changes: 16 additions & 0 deletions skops/utils/tests/test_importutils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import pytest

from skops.utils.importutils import import_or_raise


@pytest.mark.usefixtures("matplotlib_not_installed")
def test_import_or_raise():
with pytest.raises(
ModuleNotFoundError,
match=(
"Permutation importance requires matplotlib to be installed. In order"
" to use permutation importance, you need to install the package in"
" your current python environment."
),
):
import_or_raise("matplotlib", "permutation importance")