-
-
Notifications
You must be signed in to change notification settings - Fork 748
Cluster dump inspection improvements #6015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add print statements to show progress, option to run in background thread, and more easily cross-referenceable names for worker directores. YAML dumping is still so slow! Why?
Why does pending data have so much in it?? xref dask#5960 (comment)
the `wants_what` can be extremely long
| return { | ||
| addr: _worker_story(keys, wlog, datetimes=True) | ||
| for addr, worker_dump in self.dump["workers"].items() | ||
| if isinstance(worker_dump, dict) and (wlog := worker_dump.get("log")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First use of the walrus operator in dask/distributed I believe :-)
|
|
||
|
|
||
| def worker_story(keys: set, log: Iterable) -> list: | ||
| def worker_story(keys: set, log: Iterable, datetimes: bool = False) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth adding the same parameter to scheduler_story above.
distributed/cluster_dump.py
Outdated
|
|
||
| # Compact smaller keys into a general dict | ||
| scheduler_state = self._compact_state(self.dump[context], scheduler_expand_keys) | ||
| for i, (name, _logs) in enumerate(scheduler_state.items()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitty suggestion that saves the i+1 lower down.
| for i, (name, _logs) in enumerate(scheduler_state.items()): | |
| for i, (name, _logs) in enumerate(scheduler_state.items(), 1): |
distributed/cluster_dump.py
Outdated
| worker_id = info["id"] | ||
| except KeyError: | ||
| continue | ||
| for i, (addr, info) in enumerate(workers.items()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitty suggestion that saves the i+1 lower down.
| for i, (addr, info) in enumerate(workers.items()): | |
| for i, (addr, info) in enumerate(workers.items(), 1): |
distributed/cluster_dump.py
Outdated
|
|
||
| return dict(stories) | ||
| stories = self.worker_stories(*key_or_stimulus_id) | ||
| for i, (addr, story) in enumerate(stories.items()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for i, (addr, story) in enumerate(stories.items()): | |
| for i, (addr, story) in enumerate(stories.items(), 1): |
| @@ -198,22 +207,43 @@ def scheduler_story(self, *key_or_stimulus_id: str) -> dict: | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If scheduler_story is modified to take a datetime bool argument, it would be nice if it was set to True in this call.
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 6h 52m 5s ⏱️ + 41m 45s For more details on these failures, see this check. Results for commit c3cbee3. ± Comparison against base commit 43d7aa4. ♻️ This comment has been updated with latest results. |
a1ce116 to
c3cbee3
Compare
Various quality-of-life improvements to the cluster dump inspection process
to_yamls, since it can be pretty slow and it's nice to see what's going onto_yamlsin a background thread, so you can start inspecting theDumpArtefactright awayto_yamlsby address (more easily cross-referenceable)worker_storyto just return each worker's story separately. Trying to combine them all by event type is ultimately less usefulworker_stories_to_yamlsto dump all the worker stories to separate YAML files (matching yourto_yamlsdirectories)to_yamlsmethods for easier reading and cross-referencing to logsexpand_keyslists|) so\nactually becomes a newline, instead of printing as an escaped character. This makes tracebacks much easier to read.pre-commit run --all-files