Skip to content

Conversation

@jakirkham
Copy link
Member

@jakirkham jakirkham commented Jul 14, 2020

As the frames we receive are typically mutable, non-bytes objects like bytearrays or NumPy ndarrays, coercing to bytes at this stage triggers a copy of all frames. As we are going to toss those copied versions anyways when joining them into a larger bytes object, this ends up being wasteful with memory. Fortunately bytes.join(...) accepts any and all bytes-like objects. So instead just pass them all through as-is to bytes.join(...), which is free and doesn't require a copy. Should cutdown on the memory usage in this part of the code.

As the frames we receive are typically mutable, non-`bytes` objects like
`bytearray`s or NumPy `ndarray`s, coercing to `bytes` at this stage
triggers a copy of all frames. As we are going to toss those copied
versions anyways when joining them into a larger `bytes` object, this
ends up being wasteful with memory. Fortunately `bytes.join(...)`
accepts any and all `bytes`-like objects. So instead just pass them all
through as-is to `bytes.join(...)`, which is free and doesn't require a
copy.  Should cutdown on the memory usage in this part of the code.
@jakirkham
Copy link
Member Author

jakirkham commented Jul 14, 2020

FWIW here's a little example showing this works.

In [1]: import numpy as np                                                      

In [2]: b"".join([ 
   ...:     b"abc", 
   ...:     bytearray(b"def"), 
   ...:     memoryview(b"ghi"), 
   ...:     np.arange(106, 109, dtype="u1") 
   ...: ])                                                                      
Out[2]: b'abcdefghijkl'

This was raised in Python issue ( https://bugs.python.org/issue15958 ) and implemented in commit ( python/cpython@cfc22b4 ) as part of Python 3.4+.

@quasiben
Copy link
Member

I kicked the CI here so test should be passing now

@mrocklin mrocklin merged commit 1d92125 into dask:master Jul 14, 2020
@jakirkham jakirkham deleted the avoid_copy_merge_frames branch July 14, 2020 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants