Skip to content

Conversation

@ogrisel
Copy link
Contributor

@ogrisel ogrisel commented Dec 18, 2017

This is a work in progress integration of an experimental branch of cloudpickle that allows for nocopy dump and load of nested numpy arrays using a bytelist API.

This relies on: cloudpipe/cloudpickle#138

In particular, this should help make the workers more stable when dealing with large numpy arrays or pandas data frames: spilling to disk (and loading back spilled data structures) should no longer incur large temporary buffer allocations. Also the _BytelistFile helper class is not properly tested.

There are broken tests (e.g. I just noticed that pickling arrays of objects is broken) but I wanted to do a full run on CI and communicate about the final goal of my work on cloudpickle to other cloudpickle and dask developers.

@mrocklin
Copy link
Member

I'm glad to see this work happen. Hopefully the Dask test suite can provide some useful feedback!

@ogrisel
Copy link
Contributor Author

ogrisel commented Dec 18, 2017

I have to work on other things in the coming days but plan to resume work ASAP.

@ogrisel
Copy link
Contributor Author

ogrisel commented Mar 18, 2019

Closing this as cloudpipe/cloudpickle#138 was closed in favor of PEP 574 in upstream python and numpy.

@ogrisel ogrisel closed this Mar 18, 2019
@ogrisel ogrisel deleted the cloudpickle_dump_load_bytelist branch March 18, 2019 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants