Increase worker.suspicious_counter threshold #5228
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This stress test spawns and kills workers continuously. The way the worker suspicious counter is implemented, the
ValueError: Could not find dependentexception is almost guaranteed to be provoked eventually.If anyhing, I interpret the failure of this test a success since the logic of this suspicious counter was broken in the past (tasks were too eagerly released to give the counter any chance to be ever increased)
I chose to set this threshold privately and opted against exposing it as a dask configuration for now. In #5046 I ended up removing it because there was no sane way to actually trigger this condition. this stress test is the first time I'm seeing this but this stress test is also a good case for why this is a bad mechanism.
In the past I encountered this bad_dep handler in cases where the remote raised an exception trying to serialize the data, for instance. I would prefer dealing with this case by handling that exception instead of resorting to a spurious suspicious counter.
Anyhow, fixing this requires more effort than I currently can invest in this and this seems like a reasonably compromise
@madsbk thanks for spotting this!
black distributed/flake8 distributed/isort distributed