Skip to content

[Question] A new approach to memory spilling #4568

@madsbk

Description

@madsbk

Question: would the Dask/Distributed community be interested in an improved memory spilling model that fixes the shortcomings of the current one but make use of proxy object wrappers?

In Dask-CUDA we have introduced a new approach to memory spilling that handles object aliasing and JIT memory un-spilling: rapidsai/dask-cuda#451

The result is memory spilling that:

The current implement in Dask-CUDA handles CUDA device objects but it is possible to generalize to also handle spilling to disk.

The disadvantage of this approach is the use of proxy objects that get exposed to the users. The inputs to a tasks might be wrapped in a proxy object, which doesn't mimic the proxied object perfectly. E.g.:

    # Type checking using instance() works as expected but direct type checking doesn't:
    >>> import numpy as np
    >>> from dask_cuda.proxy_object import asproxy
    >>> x = np.arange(3)
    >>> isinstance(asproxy(x), type(x))
    True
    >>>  type(asproxy(x)) is type(x)
    False

Because of this, the approach shouldn't be enabled by default but do you think that the Dask community would be interested in a generalization of this approach? Or is the proxy object hurdle too much of an issue?

cc. @mrocklin, @jrbourbeau, @quasiben

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions