Reduce memory footprint of P2P shuffling #8157

hendrikmakait · 2023-09-04T15:39:11Z

Closes #8015
Supersedes #8128
~~Blocked by #dask/dask#10493~~

Tests added / passed
Passes pre-commit run --all-files

hendrikmakait · 2023-09-04T15:42:56Z

distributed/shuffle/_arrow.py

+    with pa.OSFile(str(path), mode="rb") as f:
+        size = f.seek(0, whence=2)
+        f.seek(0)
+        while f.tell() < size:
+            sr = pa.RecordBatchStreamReader(f)
+            shard = sr.read_all()
+            arrs = [pa.concat_arrays(column.chunks) for column in shard.columns]
+            shard = pa.table(data=arrs, schema=schema)
+            shards.append(shard)


By interleaving disk reads and deserialization, we reduced the size of the individual buffers that get created.

hendrikmakait · 2023-09-04T15:47:06Z

distributed/shuffle/_arrow.py

+        while f.tell() < size:
+            sr = pa.RecordBatchStreamReader(f)
+            shard = sr.read_all()
+            arrs = [pa.concat_arrays(column.chunks) for column in shard.columns]


From what I understand, the RecordBatchStreamReader creates one buffer per record batch. On main, this is a problem when we convert the pa.Table consisting of all those batches into a pd.DataFrame. This conversion frees buffers on a per-column basis. Effectively, this means that all buffers from all record batches will not be freed until we converted the last column. To avoid this, we force a copy for each column directly after reading it with pa.concat_arrays. This way, we (should) have one buffer per column per batch.

Similarly, pa.Table.combine_chunks proceeds on a per-column basis causing a spike in temporary memory usage (see #8128).

hendrikmakait · 2023-09-04T15:49:36Z

cc @phofl in case you have some thoughts on this

github-actions · 2023-09-04T17:06:46Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      21 files ±  0       21 suites ±0 10h 58m 4s ⏱️ + 22m 54s
  3 814 tests ±  0   3 702 ✔️ -   2   107 💤 ±0 5 ❌ +2
36 855 runs - 12 35 041 ✔️ - 24 1 807 💤 +8 7 ❌ +4

For more details on these failures, see this check.

Results for commit f23c1aa. ± Comparison against base commit e350c99.

♻️ This comment has been updated with latest results.

distributed/shuffle/_shuffle.py

fjetter

Did a very rough review. So far LGTM but I'll want to test drive this before merging. Will come back asap

fjetter · 2023-09-06T13:55:10Z

distributed/shuffle/_arrow.py



-def convert_partition(data: bytes, meta: pd.DataFrame) -> pd.DataFrame:
+def convert_shards(shards: list[pa.Table], meta: pd.DataFrame) -> pd.DataFrame:


(disclaimer: still in early review) I once tried to move tables around instead of bytes but that messed up the event loop. We should check this before merging

fjetter · 2023-09-07T07:50:20Z

The increase of the minimal pyarrow version is something we have to do more carefully. This will otherwise silently cause the shuffle default method to fall back to tasks unless users are upgrading their pyarrow version. The very very least we should do is to raise a warning if pyarrow is installed but the version is too low.

Safer would likely be to raise hard in this case. ~~For instance, if the version check fails we can raise a ValueError instead which is not handled by dask.utils.get_default_shuffle_method~~ Edit: This won't work, the get_default_shuffle_method is weird in how it is catching all kinds of exceptions.

I doubt anybody would want to use pyarrow, shuffle a dataframe but not use p2p because the version is too old

fjetter · 2023-09-07T08:54:31Z

A/B test would obviously be nice

hendrikmakait · 2023-09-07T08:57:48Z

A/B test would obviously be nice

Running those today

hendrikmakait · 2023-09-07T09:13:31Z

distributed/shuffle/_arrow.py

+    Raises a ModuleNotFoundError if pyarrow is not installed or an
+    ImportError if the installed version is not recent enough.
    """
-    # First version to introduce Table.sort_by
-    minversion = "7.0.0"
+    # First version that supports concatenating extension arrays (apache/arrow#14463)
+    minversion = "12.0.0"
    try:
        import pyarrow as pa
-    except ImportError:
-        raise RuntimeError(f"P2P shuffling requires pyarrow>={minversion}")
-
+    except ModuleNotFoundError:
+        raise ModuleNotFoundError(f"P2P shuffling requires pyarrow>={minversion}")
    if parse(pa.__version__) < parse(minversion):
-        raise RuntimeError(
+        raise ImportError(


@fjetter: Together with dask/dask#10496, get_default_shuffle_method should raise if pyarrow is outdated and choose tasks if it's not installed.

(testing it manually)

hendrikmakait · 2023-09-08T12:28:34Z

distributed/shuffle/_arrow.py

+    batch_size = parse_bytes("1 MiB")
+    batch = []
+    shards = []
+    schema = pa.Schema.from_pandas(meta, preserve_index=True)


Not using pyarrow_schema_dispatch here because it doesn't support preserve_index yet.

dask/dask#10500

hendrikmakait · 2023-09-08T12:46:56Z

distributed/shuffle/_arrow.py

+def read_from_disk(path: Path, meta: pd.DataFrame) -> tuple[Any, int]:
+    import pyarrow as pa
+
+    batch_size = parse_bytes("1 MiB")


This is fragile and I don't really like it, but for now it seems to do the job. We will have to spend more time on performance optimization and understanding memory (de)allocation here to make this more robust.

hendrikmakait · 2023-09-08T16:21:28Z

A/B test results (https://github.com/coiled/benchmarks/actions/runs/6120630936):

Runtime performance takes a minor hit on some tests

but average

and peak memory improve significantly.

I'm confident that we'll get runtime down again through further performance optimization and batching on the write side.

phofl · 2023-09-08T16:22:30Z

I think the hit is ok when looking at the memory improvement

hendrikmakait · 2023-09-08T16:24:14Z

test_h2o.py::test_q6[5 GB (parquet+pyarrow)-p2p] is curious because it is the only one suffering a significant increase in peak memory (from 7.75 GB to 9.7 GB cluster-wide memory).

fjetter · 2023-09-11T06:11:36Z

At the very least test_default_get is a related failure. I'll look into it

fjetter · 2023-09-12T12:12:31Z

I'm currently running a CI test on my fork against the dask/dask sibling branch to verify this works as expected

fjetter · 2023-09-13T13:50:46Z

There may actually be a related failure. test_restarting_during_unpack_raises_killed_worker is timing out (pytest-timeout) due to a CancelledError being raised

2023-09-12 14:18:03,520 - distributed.worker.state_machine - WARNING - Async instruction for <Task cancelled name="execute(('shuffle_p2p-f443ad86627e550c09ed5235d5a76ddc', 31))" coro=<Worker.execute() done, defined at D:\a\distributed\distributed\distributed\worker_state_machine.py:3608>> ended with CancelledError

https://github.com/fjetter/distributed/actions/runs/6160365960/job/16717162228

I have to check if #8110 is included in this (I guess it is)

fjetter · 2023-09-13T16:27:29Z

Ok, I could track this CancelledError somewhat down... The important message is that this is not an actual computation deadlock. The above "async instruction was cancelled" msg is expected if a worker closes while a task is being executed. The state machine task is cancelled but the thread still keeps running unnoticed.

The test actually reaches a check_worker_cleanup which raises an AssertionError. The test gets then stuck during test teardown because the shuffle plugin teardown is actually locking up. Bad but not as bad as the locked up computation... still investigating why that is.

I think the CancelledError is actually a red herring since it is triggered by the test hitting a timeout while the shuffle plugin is closing.

fjetter · 2023-09-13T16:32:22Z

I strongly suspect that test failure is unrelated but I will spend some more time trying to hunt this down... 🤞

fjetter · 2023-09-13T17:34:07Z

Found the cause why this was blocking, see #8184

This is an unrelated fix and we should be able to proceed here

hendrikmakait added 9 commits September 2, 2023 10:54

Early copy

671ab10

read fn

9f08ff3

OSFile

bf4c737

Move deser logic

c80bf63

meta

2417a5a

Remove delayed enforcement

871c043

Remove delayed enforcement

20634e9

Rechunk

f76c622

Fix tests

5a3b33e

hendrikmakait commented Sep 4, 2023

View reviewed changes

Merge branch 'main' into reduced-p2p-footprint

0149bbd

hendrikmakait added 3 commits September 4, 2023 19:10

Resolve path

a60cc07

Increase minimum pyarrow version

d5f81f4

Add assertion

491b9e6

hendrikmakait commented Sep 6, 2023

View reviewed changes

distributed/shuffle/_shuffle.py Outdated Show resolved Hide resolved

This was referenced Sep 6, 2023

Fix _partitions dtype in meta for DataFrame.set_index and DataFrame.sort_values dask/dask#10493

Merged

Raise in P2P if column dtype is wrong #8167

Merged

Merge branch 'main' into reduced-p2p-footprint

bbbe303

hendrikmakait mentioned this pull request Sep 6, 2023

[DNM] Reduce memory footprint of P2P shuffle #8128

Closed

2 tasks

hendrikmakait changed the title ~~Reduce memory footprint of P2P shuffling /2~~ Reduce memory footprint of P2P shuffling Sep 6, 2023

hendrikmakait marked this pull request as ready for review September 6, 2023 12:03

hendrikmakait requested a review from fjetter as a code owner September 6, 2023 12:03

fjetter reviewed Sep 6, 2023

View reviewed changes

hendrikmakait added shuffle performance labels Sep 6, 2023

Merge branch 'main' into reduced-p2p-footprint

6c80354

Merge branch 'main' into reduced-p2p-footprint

bdc0d8e

check_minimal_arrow_version

f14aba6

hendrikmakait mentioned this pull request Sep 7, 2023

get_default_shuffle_method raises if pyarrow is outdated dask/dask#10496

Merged

3 tasks

hendrikmakait commented Sep 7, 2023

View reviewed changes

hendrikmakait added 5 commits September 7, 2023 18:56

Offload entire read conversion

67282f7

offload

429a7ac

batching

2f92de0

minor

e2368b5

smaller batches

629124a

hendrikmakait commented Sep 8, 2023

View reviewed changes

hendrikmakait mentioned this pull request Sep 8, 2023

Add **kwargs support for pyarrow_schema_dispatch dask/dask#10500

Merged

3 tasks

hendrikmakait commented Sep 8, 2023

View reviewed changes

hendrikmakait added 3 commits September 8, 2023 16:04

Dispatch

ecfe534

Merge branch 'main' into reduced-p2p-footprint

396c719

[skip-caching]

ecca1d8

hendrikmakait added 3 commits September 8, 2023 18:33

Fix test

3bc2c5a

Merge branch 'main' into reduced-p2p-footprint

d0286a4

[skip-caching]

f23c1aa

fjetter merged commit e57d1c5 into dask:main Sep 14, 2023



		def convert_partition(data: bytes, meta: pd.DataFrame) -> pd.DataFrame:
		def convert_shards(shards: list[pa.Table], meta: pd.DataFrame) -> pd.DataFrame:

Uh oh!

Reduce memory footprint of P2P shuffling #8157

Reduce memory footprint of P2P shuffling #8157

Uh oh!

Conversation

hendrikmakait commented Sep 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hendrikmakait commented Sep 4, 2023

Uh oh!

github-actions bot commented Sep 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

Uh oh!

fjetter left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fjetter commented Sep 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fjetter commented Sep 7, 2023

Uh oh!

hendrikmakait commented Sep 7, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hendrikmakait commented Sep 8, 2023

Uh oh!

phofl commented Sep 8, 2023

Uh oh!

hendrikmakait commented Sep 8, 2023

Uh oh!

fjetter commented Sep 11, 2023

Uh oh!

fjetter commented Sep 12, 2023

Uh oh!

fjetter commented Sep 13, 2023

Uh oh!

fjetter commented Sep 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fjetter commented Sep 13, 2023

Uh oh!

fjetter commented Sep 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hendrikmakait commented Sep 4, 2023 •

edited

Loading

github-actions bot commented Sep 4, 2023 •

edited

Loading

fjetter commented Sep 7, 2023 •

edited

Loading

fjetter commented Sep 13, 2023 •

edited

Loading