-
-
Notifications
You must be signed in to change notification settings - Fork 748
Allow pickle to fall back to dask_serialize #7567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow pickle to fall back to dask_serialize #7567
Conversation
| try: | ||
| serialize = dask_serialize.dispatch(type(obj)) | ||
| deserialize = dask_deserialize.dispatch(type(obj)) | ||
| return deserialize, serialize(obj) | ||
| except TypeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we should be dealing with other families, e.g. cuda as well here, don't we? any suggestions on how to do this in an elegant way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how widespread the use of families truly is. Open to suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possible way of dealing with this would be a custom dispatcher that controls this specific behavior and isn't entangled with the other serialization logic. Thoughts?
|
in the existing test suite there are only few types that are actually not pickle-able. For instance, h5py forbids pickle by default. most other objects are pickle-able. This raises the question if this Pickler object shouldn't attempt to ordinarily pickle the data first before falling back to the dask_serializer |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 24 files ± 0 24 suites ±0 10h 19m 47s ⏱️ + 3m 43s For more details on these failures, see this check. Results for commit 354308b. ± Comparison against base commit 9e8876d. ♻️ This comment has been updated with latest results. |
|
In general I think that this is fine. Pickle should be fine in almost all cases and the serialization families idea was premature (pretty much everything is in the I think that we can be pretty lax here. |
|
I chose to go for the "try standard pickle first" approach such that we only use the dispatching if pickle doesn't work, i.e. in most cases we just use plain pickle. I'll move forward with this once CI is green-ish. If there are some issues with this we can follow up later, I think |
This is a requirement for HLG serialization via pickle, see #7564
To actually allow for graphs to be shipped via pickle we actually need to implement a custom Pickler class since otherwise user arguments to graphs may end up being not serializable. One example are H5Py objects which intentionally disallow pickling. All of these cases are already handled by our dask_(de)serialize dispatch so the new pickler simply uses the dask serializer if one is available and falls back to ordinary pickle instead.