Integrate skops serialization format#174
Integrate skops serialization format#174merveenoyan merged 6 commits intohuggingface:mainfrom merveenoyan:integrate_skops
Conversation
merveenoyan
left a comment
There was a problem hiding this comment.
Left some notes @adrinjalali @BenjaminBossan
| config.get("sklearn", {}).get("model", {}).get("file", DEFAULT_FILENAME) | ||
| ) | ||
| self.model_format = ( | ||
| config.get("sklearn", {}).get("model_format", "pickle") |
There was a problem hiding this comment.
FYI this assumes file is pickle because of how there's no skops models without config (the ones without config are to maintain vanilla sklearn models that weren't uploaded with skops)
| "skops-tests/iris-sklearn-latest-logistic_regression-without-config", | ||
| "skops-tests/iris-sklearn-latest-hist_gradient_boosting-with-config", | ||
| "skops-tests/iris-sklearn-latest-hist_gradient_boosting-without-config", | ||
| "skops-tests/iris-sklearn-1.0-logistic_regression-with-config-skops", |
There was a problem hiding this comment.
I added a skops test case for every pickle test case with config, do you think I should've reduced it? The GHActions never works on these tests anyway, so there's not much limitation except for local dev. @BenjaminBossan @adrinjalali
There was a problem hiding this comment.
It would be nice if we could eventually get the GH to work, in which case it might matter. Otherwise, I don't have any opinion on that.
BenjaminBossan
left a comment
There was a problem hiding this comment.
Overall, looks very good, thanks! I have some minor comments only.
| self.model = joblib.load( | ||
| open(Path(cached_folder) / self.model_file, "rb") | ||
| ) | ||
| if len(record) > 0: |
There was a problem hiding this comment.
Should this whole check not be outside of the if self.model_format == "pickle": condition? Otherwise, the warning is only recorded when the format is pickle.
There was a problem hiding this comment.
You're right, I just addressed it.
| from sklearn.pipeline import make_pipeline | ||
| from sklearn.preprocessing import FunctionTransformer, StandardScaler | ||
| from skops import hub_utils | ||
| from skops import hub_utils, io |
There was a problem hiding this comment.
We wanted to use import skops.io as sio.
| repo_name = REPO_NAMES[task_name].format( | ||
| version=version, est_name=est_name, w_or_wo="without" | ||
| ) | ||
| if serialization_format == "pickle": |
There was a problem hiding this comment.
Could you explain why this is only done for "pickle"?
There was a problem hiding this comment.
it doesn't make any sense to push a file with skops format without config, based on assumption where if someone pushes a skops model, config should be there. this is only done to maintain backwards compatibility for vanilla sklearn models that is serialized using pickle and pushed without skops. this way, test cases are less too.
| "skops-tests/iris-sklearn-latest-logistic_regression-without-config", | ||
| "skops-tests/iris-sklearn-latest-hist_gradient_boosting-with-config", | ||
| "skops-tests/iris-sklearn-latest-hist_gradient_boosting-without-config", | ||
| "skops-tests/iris-sklearn-1.0-logistic_regression-with-config-skops", |
There was a problem hiding this comment.
It would be nice if we could eventually get the GH to work, in which case it might matter. Otherwise, I don't have any opinion on that.
merveenoyan
left a comment
There was a problem hiding this comment.
@BenjaminBossan I addressed all your comments
| repo_name = REPO_NAMES[task_name].format( | ||
| version=version, est_name=est_name, w_or_wo="without" | ||
| ) | ||
| if serialization_format == "pickle": |
There was a problem hiding this comment.
it doesn't make any sense to push a file with skops format without config, based on assumption where if someone pushes a skops model, config should be there. this is only done to maintain backwards compatibility for vanilla sklearn models that is serialized using pickle and pushed without skops. this way, test cases are less too.
| from sklearn.pipeline import make_pipeline | ||
| from sklearn.preprocessing import FunctionTransformer, StandardScaler | ||
| from skops import hub_utils | ||
| from skops import hub_utils, io |
| self.model = joblib.load( | ||
| open(Path(cached_folder) / self.model_file, "rb") | ||
| ) | ||
| if len(record) > 0: |
There was a problem hiding this comment.
You're right, I just addressed it.
|
@adrinjalali @BenjaminBossan all of the tests pass except for the single one with methodcaller and skops. if you think this is going to be fixed on skops side I don't think we should swap it. |
|
Addressed comments and applied style changes. |
However, even if it gets merged, it might take a while before we have our next skops release, since we just had one very recently. |
|
@BenjaminBossan should I merge? |
|
It's not deployed yet. We're migrating some stuff to support the private hub. Should be deployed tomorrow. |
No description provided.