Skip to content

migrate from dill to cloudpickle for advanced serialization #7870

@jrwalk

Description

@jrwalk

Description

Usage of dill for optional serialization in PythonVirtualenvOperator may be replaced with cloudpickle as its serialization library. This should be a mostly drop-in replacement.

Use case / motivation

Currently, the PythonVirtualenvOperator optionally uses dill in place of stock pickle to serialize advanced types. However, most major distributed compute frameworks have opted to shift to cloudpickle, meaning using dill for Airflow can introduce redundant dependencies for calling out to other distributed compute (e.g., farming compute-heavy tasks out to a remote dask cluster), and can interfere with serialization of tasks for those tools.

Since both dill and cloudpickle are largely drop-in replacements for pickle, the migration should be fairly minor.

Related Issues

kubeflow/pipelines#1387

dask/distributed#3606

piskvorky/gensim#558 (comment)

uqfoundation/multiprocess#22 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions