-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Description
Right now when trying to bin multiple sequences, we run into an error on dask.persist, due to the fact that hdf5 files can't be pickled.
Essentially we're passing a memory mapped file handle as part of the metadata to dask.persist and it's not happy.
Things tried:
- passing different serializers/deserializers (that shouldn't use pickle) to the Client and/or dask.persist. It always fell back on default
- passing only the event_id and time_bins column to compute_with_progress
- creating a tmp_collection dataframe inside compute_with_progress with only the relevant columns and calling persist on it
- using dask arrays in compute_with_progress to try and avoid the open file handle
As it's been a few months since the last time these tools were needed, it's possible something changed in the way dask handles persist/compute.
Full error trace for info:
2023-09-21 14:08:57,792 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 1 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7f4026517450>
0. make_images-c26a593aeb9229ba8092420f91c84cbf
>.
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 63, in dumps
result = pickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't pickle local object '_take_dask_array_from_numpy.<locals>.<lambda>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 68, in dumps
pickler.dump(x)
AttributeError: Can't pickle local object '_take_dask_array_from_numpy.<locals>.<lambda>'
During handling of the above exception, another exception occurred:
erialize.py", line 75, in pickle_dumps
frames[0] = pickle.dumps(
^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 81, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
return Pickler.dump(self, obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/bin/images", line 33, in <module>
sys.exit(load_entry_point('tristan', 'console_scripts', 'images')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/command_line/images.py", line 774, in main
args.func(args)
File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/command_line/images.py", line 505, in multiple_sequences_cli
compute_with_progress(events_data)
File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/__init__.py", line 32, in compute_with_progress
(collection,) = dask.persist(collection)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/uhz96441/.local/lib/python3.11/site-packages/dask/base.py", line 917, in persist
results = client.persist(
^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/client.py", line 3566, in persist
futures = self._graph_to_futures(
^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/client.py", line 3146, in _graph_to_futures
header, frames = serialize(ToPickle(dsk), on_error="raise")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 374, in serialize
raise TypeError(msg, str(x)[:10000]) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f4026517450>\n 0. make_images-c26a593aeb9229ba8092420f91c84cbf\n>')
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 81, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
return Pickler.dump(self, obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 63, in dumps
result = pickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't pickle local object '_take_dask_array_from_numpy.<locals>.<lambda>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 68, in dumps
pickler.dump(x)
AttributeError: Can't pickle local object '_take_dask_array_from_numpy.<locals>.<lambda>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 352, in serialize
header, frames = dumps(x, context=context) if wants_context else dumps(x)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 75, in pickle_dumps
frames[0] = pickle.dumps(
^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 81, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
return Pickler.dump(self, obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/bin/images", line 33, in <module>
sys.exit(load_entry_point('tristan', 'console_scripts', 'images')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/command_line/images.py", line 774, in main
args.func(args)
File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/command_line/images.py", line 505, in multiple_sequences_cli
compute_with_progress(events_data)
File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/__init__.py", line 32, in compute_with_progress
(collection,) = dask.persist(collection)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/uhz96441/.local/lib/python3.11/site-packages/dask/base.py", line 917, in persist
results = client.persist(
^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/client.py", line 3566, in persist
futures = self._graph_to_futures(
^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/client.py", line 3146, in _graph_to_futures
header, frames = serialize(ToPickle(dsk), on_error="raise")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 374, in serialize
raise TypeError(msg, str(x)[:10000]) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f4026517450>\n 0. make_images-c26a593aeb9229ba8092420f91c84cbf\n>')
Metadata
Metadata
Assignees
Labels
No labels