Skip to content

Conversation

@jakirkham
Copy link
Member

Ensure bytes and bytearray serialization are handled correctly for each type respectively. Also adds a fast path for the common case where only a single frame of the right type is provided. This will also nicely build off of the work in PR ( #4004 ) to improve serialization further. This results in more efficient serialization for these types as result. For example take this case of bytearray serialization before and after this change.

Before:

In [1]: from distributed.protocol import serialize, deserialize

In [2]: b = 1_000_000 * bytearray(b"abc")

In [3]: %timeit deserialize(*serialize(b))
137 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

After:

In [1]: from distributed.protocol import serialize, deserialize

In [2]: b = 1_000_000 * bytearray(b"abc")

In [3]: %timeit deserialize(*serialize(b))
6.37 µs ± 51 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Handle these two separately to ensure we are creating the right types in
each respective case.
Make sure that `bytes` and `bytearray` types are deserialized correctly
even if the frames are of a different type or more frames are involved.
@mrocklin mrocklin merged commit 4311caf into dask:master Aug 3, 2020
@jakirkham jakirkham deleted the improve_bytes_serialization branch August 3, 2020 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants