Skip to content

DOC california housing example#308

Merged
adrinjalali merged 11 commits intoskops-dev:mainfrom
BenjaminBossan:DOC-california-housing-example
Mar 23, 2023
Merged

DOC california housing example#308
adrinjalali merged 11 commits intoskops-dev:mainfrom
BenjaminBossan:DOC-california-housing-example

Conversation

@BenjaminBossan
Copy link
Copy Markdown
Collaborator

To quote:

The goal of this exercise is to go through a semi-realistic data science
and machine learning task and develop a practical solution for it. We
will learn about the following topics:

  • Perform exploratory data analysis
  • Do some non-trivial feature engineering
  • Explain how the feature engineering informs the choice of machine
    learning model
    and vice versa
  • Show how to make use of a couple of advanced scikit-learn features
    and explain why we use them - Create a model card that provides
    useful information about the model
  • Share the model by uploading it to the Hugging Face Hub

To quote:

The goal of this exercise is to go through a semi-realistic data science
and machine learning task and develop a practical solution for it. We
will learn about the following topics:

- Perform *exploratory data analysis*
- Do some non-trivial *feature engineering*
- Explain how the feature engineering informs the *choice of machine
  learning model* and vice versa
- Show how to make use of a couple of *advanced scikit-learn* features
  and explain why we use them - Create a *model card* that provides
  useful information about the model
- Share the model by uploading it to the *Hugging Face Hub*
@BenjaminBossan
Copy link
Copy Markdown
Collaborator Author

@skops-dev/maintainers ready for review

It was unfortunately quite some work to convert this from ipynb to this percent-py format. Hopefully, I didn't miss anything.

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to ship this, it's pretty nice!

Comment thread examples/plot_model_card.py Outdated
importances,
X_test.columns,
plot_file=Path(local_repo) / "importance.png",
plot_file=str(Path(local_repo) / "importance.png"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_permutation_importances expects plot_file to be str.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we be fixing that instead?

Comment thread examples/plot_california_housing.py
Comment thread examples/plot_california_housing.py
Comment thread examples/plot_california_housing.py Outdated
@adrinjalali adrinjalali added this to the v0.6 milestone Mar 21, 2023
Copy link
Copy Markdown
Collaborator Author

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied to your comments, please review again.

Comment thread examples/plot_model_card.py Outdated
importances,
X_test.columns,
plot_file=Path(local_repo) / "importance.png",
plot_file=str(Path(local_repo) / "importance.png"),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_permutation_importances expects plot_file to be str.

Comment thread examples/plot_california_housing.py
Comment thread examples/plot_california_housing.py
Comment thread examples/plot_california_housing.py Outdated
Previously, it was annotated as only taking str.
@adrinjalali
Copy link
Copy Markdown
Member

RTD failing?

@BenjaminBossan
Copy link
Copy Markdown
Collaborator Author

For posterity, a quick summary of what caused the bug:

When sphinx-gallery runs the examples, it seems to use a single process for all. Because of that, sklearnex's patch_sklearn() was active when running the new example. This resulted in the new example using the patched KNeighborRegressor, even though it was not supposed to.

KNeighborRegressor from sklearnex has a bug, namely that it has a classes_ attribute, even though it is a regressor. Consequently, DecisionBoundaryDisplay fails because it makes a check hasattr(estimator, "classes_") to determine if the estimator is a classifier. This will result in the observed failure down the line.

When running the new example in isolation, the bug does not occur because patching is not happening.

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the debugging @BenjaminBossan , this was nasty to deal with.

cc @ahuber21, @napetrov, your example is causing very hard to debug issues. We're removing relevant pieces from the example.

Comment thread examples/plot_intelex.py Outdated
Comment on lines 247 to 248
# in faster inference times, since loading a persisted model will always load
# the objects exactly as they were saved.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment section should also be removed, I think we should remove all mentions of patch_sklearn() since it should never been used.

@ahuber21
Copy link
Copy Markdown
Contributor

As an intermediate check, you could also call unpatch_sklearn() to restore the original state in the sphinx-gallery process. The part of the example using patch_sklearn() was just left there to clarify the behavior of loading unpatched models, since it caused some discussion before. Sorry for the trouble!

@adrinjalali adrinjalali changed the title Doc california housing example DOC california housing example Mar 23, 2023
@adrinjalali adrinjalali merged commit 3490842 into skops-dev:main Mar 23, 2023
@BenjaminBossan BenjaminBossan deleted the DOC-california-housing-example branch March 23, 2023 14:07
@napetrov
Copy link
Copy Markdown

@BenjaminBossan - thanks for reporting - fixing this here - uxlfoundation/scikit-learn-intelex#1224

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants