Skip to content

Fix persist error due to pickling #104

@noemifrisina

Description

@noemifrisina

Right now when trying to bin multiple sequences, we run into an error on dask.persist, due to the fact that hdf5 files can't be pickled.
Essentially we're passing a memory mapped file handle as part of the metadata to dask.persist and it's not happy.

Things tried:

  • passing different serializers/deserializers (that shouldn't use pickle) to the Client and/or dask.persist. It always fell back on default
  • passing only the event_id and time_bins column to compute_with_progress
  • creating a tmp_collection dataframe inside compute_with_progress with only the relevant columns and calling persist on it
  • using dask arrays in compute_with_progress to try and avoid the open file handle

As it's been a few months since the last time these tools were needed, it's possible something changed in the way dask handles persist/compute.

Full error trace for info:

2023-09-21 14:08:57,792 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 1 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7f4026517450>
 0. make_images-c26a593aeb9229ba8092420f91c84cbf
>.
Traceback (most recent call last):
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't pickle local object '_take_dask_array_from_numpy.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
AttributeError: Can't pickle local object '_take_dask_array_from_numpy.<locals>.<lambda>'

During handling of the above exception, another exception occurred:
erialize.py", line 75, in pickle_dumps
    frames[0] = pickle.dumps(
                ^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
    raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/dls/science/users/uhz96441/tristan-env/bin/images", line 33, in <module>
    sys.exit(load_entry_point('tristan', 'console_scripts', 'images')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/command_line/images.py", line 774, in main
    args.func(args)
  File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/command_line/images.py", line 505, in multiple_sequences_cli
    compute_with_progress(events_data)
  File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/__init__.py", line 32, in compute_with_progress
    (collection,) = dask.persist(collection)
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhz96441/.local/lib/python3.11/site-packages/dask/base.py", line 917, in persist
    results = client.persist(
              ^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/client.py", line 3566, in persist
    futures = self._graph_to_futures(
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/client.py", line 3146, in _graph_to_futures
    header, frames = serialize(ToPickle(dsk), on_error="raise")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 374, in serialize
    raise TypeError(msg, str(x)[:10000]) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f4026517450>\n 0. make_images-c26a593aeb9229ba8092420f91c84cbf\n>')
Traceback (most recent call last):
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
    raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled
Traceback (most recent call last):
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't pickle local object '_take_dask_array_from_numpy.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
AttributeError: Can't pickle local object '_take_dask_array_from_numpy.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 352, in serialize
    header, frames = dumps(x, context=context) if wants_context else dumps(x)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 75, in pickle_dumps
    frames[0] = pickle.dumps(
                ^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
    raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/dls/science/users/uhz96441/tristan-env/bin/images", line 33, in <module>
    sys.exit(load_entry_point('tristan', 'console_scripts', 'images')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/command_line/images.py", line 774, in main
    args.func(args)
  File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/command_line/images.py", line 505, in multiple_sequences_cli
    compute_with_progress(events_data)
  File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/__init__.py", line 32, in compute_with_progress
    (collection,) = dask.persist(collection)
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhz96441/.local/lib/python3.11/site-packages/dask/base.py", line 917, in persist
    results = client.persist(
              ^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/client.py", line 3566, in persist
    futures = self._graph_to_futures(
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/client.py", line 3146, in _graph_to_futures
    header, frames = serialize(ToPickle(dsk), on_error="raise")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 374, in serialize
    raise TypeError(msg, str(x)[:10000]) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f4026517450>\n 0. make_images-c26a593aeb9229ba8092420f91c84cbf\n>')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions