Skip to content

Conversation

@crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Jul 9, 2022

A task finishes its computation successfully, returning an output that is individually larger than 60% of memory_limit.
The task is spilled immediately to disk; however it fails to pickle (this is not the same as OSError, which is handled transparently by the SpillBuffer).
The task is marked as having status=error.

This PR fixes a bug where, if the task was using resources, the resources are returned twice, causing available_resources to become higher than total_resources.

ExecuteSuccessEvent.dummy("x", None, stimulus_id="s1")
)
assert instructions == [TaskErredMsg.match(key="x", stimulus_id="s1")]
assert ws.tasks["x"].state == "error"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the change to the WorkerState in this PR, this test was failing on teardown with available_resources={R: 2}.

@crusaderky crusaderky requested a review from hendrikmakait July 9, 2022 22:38
@crusaderky crusaderky self-assigned this Jul 9, 2022
@crusaderky crusaderky marked this pull request as ready for review July 9, 2022 22:38
@github-actions
Copy link
Contributor

github-actions bot commented Jul 9, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±  0         15 suites  ±0   6h 21m 43s ⏱️ + 1m 21s
  2 955 tests +  5    2 866 ✔️ +  2       87 💤 +  1  2 +2 
21 921 runs  +49  20 885 ✔️ +35  1 033 💤 +11  3 +3 

For more details on these failures, see this check.

Results for commit e76dfb4. ± Comparison against base commit 6765e6e.

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I would like to see the nit regarding the test docs addresses, but feel free to skip if this adds too much overhead to getting this change in.


@pytest.mark.xfail(reason="https://github.com/dask/distributed/issues/6705")
def test_workerstate_fail_to_pickle_flight(ws):
"""Same as test_workerstate_fail_to_pickle_execute_1, but the task was
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm personally not a fan of these one-directional references to other tests. I'd suggest either making the reference bi-directional or copying the description of the performed test over to this docstring. From my experience, these on-directional references have a tendency to get out of sync as tests change over time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@crusaderky crusaderky merged commit d2912c6 into dask:main Jul 11, 2022
@crusaderky crusaderky deleted the fail_to_spill branch July 11, 2022 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants