[WIP] Fine grained serialization #4897
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Warning, this is very much work-in-progress
This PR implements fine grained serialization by only serializing none-msgpack-serializable objects. E.g.:
Motivation
In
mainwe serialize:to_serialize()on both the function and all its argumentsdumps_function()on the function andpickle.dumps()on its arguments.This means that once serialized, we cannot access or modify the function arguments, which can be a problem: #4673.
Also, this means that we have to separate code paths for nested and non-nested tasks in the Scheduler to the Worker.
The Protocol
Notice
msgpack(seemsgpack_persist_lists()){"x": None}will result in a task just containingNone. This is a potential problem for the Scheduler, which we have to handle.Serializeobjects within tasks #4673tuples &lists in MsgPack serialization #4575dask.dataframe.read_csv('./filepath/*.csv')returning tuple dask#7777black distributed/flake8 distributed/isort distributed