MNT: Suggestion how to simplify serialization of funcs#320
MNT: Suggestion how to simplify serialization of funcs#320BenjaminBossan wants to merge 2 commits intoskops-dev:mainfrom
Conversation
Description It looks like the state we save for functions contains unnecessary information (the class name, which is not used) and duplicate information (the module name appears twice). This change gets rid of that unnecessary information. This also allows to remove the need for ufunc_get_state, which can now be replaced by function_get_state. Comment Maybe I'm missing something and there is a reason for the previous structure, but if there is, there should be a test to check whatever is covered by that, or at the very least a comment to explain. Btw. can't remember who wrote this, could be my fault :)
|
@skops-dev/maintainers ready for review |
| @@ -200,13 +201,9 @@ def _construct(self): | |||
| # get_state method for them here. The load is the same as other functions. | |||
| def ufunc_get_state(obj: Any, save_context: SaveContext) -> dict[str, Any]: | |||
There was a problem hiding this comment.
did you want to remove this then?
There was a problem hiding this comment.
ah yes, good catch, done
|
We finally got here. This simplification means users with a new skops version won't be able to load a model dumped with an older skops version. So we need to increase the file format version with this, and somehow route loading to the right methods. The "easiest" way would be to have a separate folder/module for each version, I'm not sure how we can do better. |
|
Aha, yes, totally right.
I assume that is to avoid cluttering the code with It would be nice if we could automatically dispatch not only on type but also protocol. Then we could have different versions for, say, |
The comparison here is with adding a version to the dispatcher sounds like a very good idea to me, do you think you could come up with a prototype for that? |
Hmm, yes and no. Of course, we're positioning ourselves to replace pickle for this purpose. But if a user wants to load a 5 year old sklearn model dumped with skops, they would either be on an unsupported sklearn version or they would load with a new sklearn version, which gives no guarantees that the model still works. I think it would provide tremendous value if we could give long guarantees, longer than sklearn would. But I think we should allow ourselves a "breaking" release (skops 1.0) as we should expect the format to still change quite a lot.
Not sure, given the problems I mentioned above, I'll think about it. |
Taking the changes from skops-dev#320 to how functions are persisted and adding them here. Therefore, this PR supersedes skops-dev#320. That change is added here because it is the perfect test case for the update route of the skops protocol.
|
Closed in favor of #322 |
Description
It looks like the state we save for functions contains unnecessary information (the class name, which is not used) and duplicate information (the module name appears twice). This change gets rid of that unnecessary information.
This also allows to remove the need for
ufunc_get_state, which can now be replaced byfunction_get_state.Comment
Maybe I'm missing something and there is a reason for the previous structure, but if there is, there should be a test to check whatever is covered by that, or at the very least a comment to explain.
Btw. can't remember who wrote this, could be my fault :)