[Question] A new approach to memory spilling

**Question**: would the Dask/Distributed community be interested in an improved memory spilling model that fixes the shortcomings of the current one but make use of proxy object wrappers? 

In Dask-CUDA we have introduced a new approach to memory spilling that handles object aliasing and JIT memory un-spilling: https://github.com/rapidsai/dask-cuda/pull/451 

The result is memory spilling that: 
  - Avoids double counting: https://github.com/dask/distributed/issues/4186 
  - Avoids spilling of the same object multiple times. 
  - Avoids memory spikes because of incorrect memory tally. 
  - Implement just-in-time un-spilling: https://github.com/dask/distributed/pull/3998 
  - Support communication of spilled data so that GPU data don’t have to be un-spilled just to be spilled again as part of communication: https://github.com/rapidsai/dask-cuda/issues/342
  - First step to enable partial spilled objects such as the spilling of individual columns in a data frame: https://github.com/BlazingDB/blazingsql/issues/1128

The current implement in Dask-CUDA handles CUDA device objects but it is possible to generalize to also handle spilling to disk. 

**The disadvantage** of this approach is the use of proxy objects that get exposed to the users. The inputs to a tasks might be wrapped in a [proxy object](https://github.com/rapidsai/dask-cuda/blob/branch-0.19/dask_cuda/proxy_object.py#L122), which doesn't mimic the proxied object perfectly. E.g.:
```python
    # Type checking using instance() works as expected but direct type checking doesn't:
    >>> import numpy as np
    >>> from dask_cuda.proxy_object import asproxy
    >>> x = np.arange(3)
    >>> isinstance(asproxy(x), type(x))
    True
    >>>  type(asproxy(x)) is type(x)
    False
```
Because of this, the approach shouldn't be enabled by default but do you think that the Dask community would be interested in a generalization of this approach? Or is the proxy object hurdle too much of an issue?

cc. @mrocklin, @jrbourbeau, @quasiben 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] A new approach to memory spilling #4568

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Question] A new approach to memory spilling #4568

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions