Skip to content

Conversation

@fjetter
Copy link
Member

@fjetter fjetter commented Aug 18, 2021

This stress test spawns and kills workers continuously. The way the worker suspicious counter is implemented, the ValueError: Could not find dependent exception is almost guaranteed to be provoked eventually.

If anyhing, I interpret the failure of this test a success since the logic of this suspicious counter was broken in the past (tasks were too eagerly released to give the counter any chance to be ever increased)

I chose to set this threshold privately and opted against exposing it as a dask configuration for now. In #5046 I ended up removing it because there was no sane way to actually trigger this condition. this stress test is the first time I'm seeing this but this stress test is also a good case for why this is a bad mechanism.
In the past I encountered this bad_dep handler in cases where the remote raised an exception trying to serialize the data, for instance. I would prefer dealing with this case by handling that exception instead of resorting to a spurious suspicious counter.

Anyhow, fixing this requires more effort than I currently can invest in this and this seems like a reasonably compromise

@madsbk thanks for spotting this!

@fjetter fjetter force-pushed the worker_suspicious_counter_threshold branch from 5d67fed to 67c38bb Compare September 8, 2021 12:47
@fjetter fjetter merged commit 4355e9d into dask:main Sep 8, 2021
@fjetter fjetter deleted the worker_suspicious_counter_threshold branch September 8, 2021 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant