Skip to content

Exception while computing hashjoin-p2p #8301

@mrocklin

Description

@mrocklin

I'm running Query 7 of the TPC-H benchmarks at scale 100 and running into a shuffling error.

  File "/opt/coiled/env/lib/python3.11/site-packages/distributed/shuffle/_merge.py", line 183, in merge_unpack
    right = ext.get_output_partition(
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/coiled/env/lib/python3.11/site-packages/distributed/shuffle/_worker_plugin.py", line 431, in get_output_partition
    return sync(
           ^^^^^
  File "/opt/coiled/env/lib/python3.11/site-packages/distributed/utils.py", line 433, in sync
    raise error
  File "/opt/coiled/env/lib/python3.11/site-packages/distributed/utils.py", line 407, in f
    result = yield future
             ^^^^^^^^^^^^
  File "/opt/coiled/env/lib/python3.11/site-packages/tornado/gen.py", line 767, in run
    value = future.result()
            ^^^^^^^^^^^^^^^
  File "/opt/coiled/env/lib/python3.11/site-packages/distributed/shuffle/_core.py", line 272, in get_output_partition
    await self._ensure_output_worker(partition_id, key)
  File "/opt/coiled/env/lib/python3.11/site-packages/distributed/shuffle/_core.py", line 235, in _ensure_output_worker
    assigned_worker = self._get_assigned_worker(i)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/coiled/env/lib/python3.11/site-packages/distributed/shuffle/_shuffle.py", line 525, in _get_assigned_worker
    return self.worker_for[id]
           ~~~~~~~~~~~~~~~^^^^
  File "/opt/coiled/env/lib/python3.11/site-packages/pandas/core/series.py", line 1040, in __getitem__
    return self._get_value(key)
           ^^^^^^^^^^^^^^^^^^^^
  File "/opt/coiled/env/lib/python3.11/site-packages/pandas/core/series.py", line 1156, in _get_value
    loc = self.index.get_loc(label)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/coiled/env/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3797, in get_loc
    raise KeyError(key) from err

KeyError(38)

cc @hendrikmakait

https://cloud.coiled.io/clusters/298981/information?account=dask-benchmarks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions