Use memoryview in unpack_frames
#3980
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As part of
unpack_frames, we slice out each frame we'd like to extract (see code snippet below).distributed/distributed/protocol/utils.py
Line 135 in 8a0e4b6
However this causes a copy, which increases memory usage and creates a notable bottleneck when unpacking frames. Closer inspection of
unpack_framesshows this dominates the time of that function and takes up roughly half of the time indeserialize_bytes. Also asdeserialize_bytestypically works with abytesobject, these frames end up beingbytesobjects, which we wind up needing to copy later to produce mutable frames ( see PR #3967 and related context ). IOW performing a copy inunpack_framesis wasted effort.To fix this issue, we coerce the input of
unpack_framesto amemoryview. This means slicing later merely produces views onto the data, which is essentially free. This avoids the copy and alleviates this bottleneck. Also this just works in most Python calls (likestruct.unpack_from) as they arebytes-like compatible so work onmemoryviews. The details can be seen in the benchmark below usingdeserialize_bytespart of the unspilling code path, which calls intounpack_frames. This speeds up the unspilling code path by ~50%.Before:
After: