Failure to spill breaks available resources #6703

crusaderky · 2022-07-09T22:20:09Z

Blocked by and incorporates Partial matches for worker state machine instructions #6704
Follow-up: Failure to pickle/unpickle on flight breaks the worker state machine #6705

A task finishes its computation successfully, returning an output that is individually larger than 60% of memory_limit.
The task is spilled immediately to disk; however it fails to pickle (this is not the same as OSError, which is handled transparently by the SpillBuffer).
The task is marked as having status=error.

This PR fixes a bug where, if the task was using resources, the resources are returned twice, causing available_resources to become higher than total_resources.

crusaderky · 2022-07-09T22:38:32Z

distributed/tests/test_worker_memory.py

+        ExecuteSuccessEvent.dummy("x", None, stimulus_id="s1")
+    )
+    assert instructions == [TaskErredMsg.match(key="x", stimulus_id="s1")]
+    assert ws.tasks["x"].state == "error"


Without the change to the WorkerState in this PR, this test was failing on teardown with available_resources={R: 2}.

github-actions · 2022-07-09T23:25:35Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files ±  0       15 suites ±0 6h 21m 43s ⏱️ + 1m 21s
  2 955 tests +  5   2 866 ✔️ +  2     87 💤 +  1 2 ❌ +2
21 921 runs +49 20 885 ✔️ +35 1 033 💤 +11 3 ❌ +3

For more details on these failures, see this check.

Results for commit e76dfb4. ± Comparison against base commit 6765e6e.

hendrikmakait

LGTM, I would like to see the nit regarding the test docs addresses, but feel free to skip if this adds too much overhead to getting this change in.

hendrikmakait · 2022-07-11T11:21:36Z

distributed/tests/test_worker_memory.py

+
+@pytest.mark.xfail(reason="https://github.com/dask/distributed/issues/6705")
+def test_workerstate_fail_to_pickle_flight(ws):
+    """Same as test_workerstate_fail_to_pickle_execute_1, but the task was


nit: I'm personally not a fan of these one-directional references to other tests. I'd suggest either making the reference bi-directional or copying the description of the performed test over to this docstring. From my experience, these on-directional references have a tendency to get out of sync as tests change over time.

crusaderky mentioned this pull request Jul 9, 2022

Failure to pickle/unpickle on flight breaks the worker state machine #6705

Open

crusaderky force-pushed the fail_to_spill branch from 5fec6df to e76dfb4 Compare July 9, 2022 22:36

crusaderky commented Jul 9, 2022

View reviewed changes

crusaderky requested a review from hendrikmakait July 9, 2022 22:38

crusaderky self-assigned this Jul 9, 2022

crusaderky marked this pull request as ready for review July 9, 2022 22:38

crusaderky added 2 commits July 11, 2022 10:06

Partial matches for worker state machine instructions

c5d0f4e

Partial matches for worker state machine instructions

1bf1407

crusaderky force-pushed the fail_to_spill branch from e76dfb4 to 1bf1407 Compare July 11, 2022 09:07

crusaderky added 2 commits July 11, 2022 11:55

Merge branch 'main' into fail_to_spill

52c6bcc

fix

34edf24

hendrikmakait approved these changes Jul 11, 2022

View reviewed changes

xrefs

6d90f12

crusaderky merged commit d2912c6 into dask:main Jul 11, 2022

crusaderky deleted the fail_to_spill branch July 11, 2022 11:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Failure to spill breaks available resources #6703

Failure to spill breaks available resources #6703

Uh oh!

crusaderky commented Jul 9, 2022 •

edited

Loading

Uh oh!

crusaderky Jul 9, 2022

Uh oh!

github-actions bot commented Jul 9, 2022

Uh oh!

hendrikmakait left a comment

Uh oh!

hendrikmakait Jul 11, 2022

Uh oh!

crusaderky Jul 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Failure to spill breaks available resources #6703

Failure to spill breaks available resources #6703

Uh oh!

Conversation

crusaderky commented Jul 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crusaderky Jul 9, 2022

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 9, 2022

Unit Test Results

Uh oh!

hendrikmakait left a comment

Choose a reason for hiding this comment

Uh oh!

hendrikmakait Jul 11, 2022

Choose a reason for hiding this comment

Uh oh!

crusaderky Jul 11, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

crusaderky commented Jul 9, 2022 •

edited

Loading