Cluster dump inspection improvements #6015

gjoseph92 · 2022-03-29T00:42:11Z

Various quality-of-life improvements to the cluster dump inspection process

Add print statements to to_yamls, since it can be pretty slow and it's nice to see what's going on
Add option to run to_yamls in a background thread, so you can start inspecting the DumpArtefact right away
Name worker subdirectories in to_yamls by address (more easily cross-referenceable)
Fix worker_story to just return each worker's story separately. Trying to combine them all by event type is ultimately less useful
Add worker_stories_to_yamls to dump all the worker stories to separate YAML files (matching your to_yamls directories)
Format event log timestamps as datetimes in to_yamls methods for easier reading and cross-referencing to logs
Add more things by default to the expand_keys lists
When dumping logs, use YAML block literals (|) so \n actually becomes a newline, instead of printing as an escaped character. This makes tracebacks much easier to read.

Tests added / passed
Passes pre-commit run --all-files

Add print statements to show progress, option to run in background thread, and more easily cross-referenceable names for worker directores. YAML dumping is still so slow! Why?

Why does pending data have so much in it?? xref dask#5960 (comment)

the `wants_what` can be extremely long

github-actions · 2022-03-29T02:10:33Z

Unit Test Results

      12 files ±0       12 suites ±0 6h 32m 55s ⏱️ + 20m 57s
  2 674 tests ±0   2 588 ✔️ -   3   83 💤 +1   3 ❌ +  2
15 948 runs ±0 15 074 ✔️ - 15 861 💤 +3 13 ❌ +12

For more details on these failures, see this check.

Results for commit 4b3e0c2. ± Comparison against base commit e8c0669.

sjperkins · 2022-03-29T08:27:14Z

distributed/cluster_dump.py

+        return {
+            addr: _worker_story(keys, wlog, datetimes=True)
+            for addr, worker_dump in self.dump["workers"].items()
+            if isinstance(worker_dump, dict) and (wlog := worker_dump.get("log"))


First use of the walrus operator in dask/distributed I believe :-)

sjperkins · 2022-03-29T08:52:10Z

distributed/stories.py



-def worker_story(keys: set, log: Iterable) -> list:
+def worker_story(keys: set, log: Iterable, datetimes: bool = False) -> list:


It's probably worth adding the same parameter to scheduler_story above.

sjperkins · 2022-03-29T08:55:19Z

distributed/cluster_dump.py

+
+        # Compact smaller keys into a general dict
+        scheduler_state = self._compact_state(self.dump[context], scheduler_expand_keys)
+        for i, (name, _logs) in enumerate(scheduler_state.items()):


Nitty suggestion that saves the i+1 lower down.

Suggested change

for i, (name, _logs) in enumerate(scheduler_state.items()):

for i, (name, _logs) in enumerate(scheduler_state.items(), 1):

sjperkins · 2022-03-29T08:55:58Z

distributed/cluster_dump.py

-                worker_id = info["id"]
-            except KeyError:
-                continue
+        for i, (addr, info) in enumerate(workers.items()):


Nitty suggestion that saves the i+1 lower down.

Suggested change

for i, (addr, info) in enumerate(workers.items()):

for i, (addr, info) in enumerate(workers.items(), 1):

sjperkins · 2022-03-29T09:04:37Z

distributed/cluster_dump.py


-        return dict(stories)
+        stories = self.worker_stories(*key_or_stimulus_id)
+        for i, (addr, story) in enumerate(stories.items()):


Suggested change

for i, (addr, story) in enumerate(stories.items()):

for i, (addr, story) in enumerate(stories.items(), 1):

sjperkins · 2022-03-29T09:11:00Z

distributed/cluster_dump.py

@@ -198,22 +207,43 @@ def scheduler_story(self, *key_or_stimulus_id: str) -> dict:



If scheduler_story is modified to take a datetime bool argument, it would be nice if it was set to True in this call.

github-actions · 2022-07-19T23:07:22Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files ±0       15 suites ±0 6h 52m 5s ⏱️ + 41m 45s
  3 166 tests ±0   3 076 ✔️ -   5   83 💤 ±0   7 ❌ +  5
23 426 runs +2 22 504 ✔️ - 17 909 💤 +8 13 ❌ +11

For more details on these failures, see this check.

Results for commit c3cbee3. ± Comparison against base commit 43d7aa4.

♻️ This comment has been updated with latest results.

gjoseph92 added 11 commits March 24, 2022 19:07

improve ergonomics of to_yamls

9d95e3d

Add print statements to show progress, option to run in background thread, and more easily cross-referenceable names for worker directores. YAML dumping is still so slow! Why?

update dump_cluster_state docs

1cc6908

expand incoming_transfer_log

8dbe030

just use worker addrs, it's more common

69fb54c

daemon thread for background dump

18aeb3d

worker stories by worker and worker story yamls

eae4069

i+1 in prints

c9aa85b

YAML block literals for logs

e41d6ed

split outgoing transfer log and pending data

1f0871f

Why does pending data have so much in it?? xref dask#5960 (comment)

format timestamps in to_yamls

a4b5013

separate clients from scheduler general

4b3e0c2

the `wants_what` can be extremely long

gjoseph92 requested a review from sjperkins March 29, 2022 00:42

sjperkins reviewed Mar 29, 2022

View reviewed changes

gjoseph92 added 11 commits April 12, 2022 09:25

handle failed workers in to_yamls

7f35ba8

simplify scheduler_story

fb8a0a4

datetimes in scheduler_story

b96eb5f

scheduler short stories

8ad979d

timestamps in scheduler events

ae2b35c

Merge remote-tracking branch 'upstream/main' into cluster-dump-helpers

19a1622

fix literal

7717ee9

enumerate(..., 1) trick

9c69c25

add processing_on

3716dfa

fix tests

83d66c1

fix test_cluster_dump_plugin

47a63a7

gjoseph92 added 4 commits August 4, 2022 13:14

and as separate files

2b0b23d

add worker_short_stories

1c8de3d

sometimes keys are null somehow?

5bbaaa4

Merge remote-tracking branch 'upstream/main' into cluster-dump-helpers

c3cbee3

gjoseph92 force-pushed the cluster-dump-helpers branch from a1ce116 to c3cbee3 Compare October 31, 2022 20:46

gjoseph92 mentioned this pull request Oct 31, 2022

All tasks without dependencies are root-ish #7221

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cluster dump inspection improvements #6015

Cluster dump inspection improvements #6015

Uh oh!

gjoseph92 commented Mar 29, 2022

Uh oh!

github-actions bot commented Mar 29, 2022

Uh oh!

sjperkins Mar 29, 2022

Uh oh!

sjperkins Mar 29, 2022

Uh oh!

sjperkins Mar 29, 2022

Uh oh!

sjperkins Mar 29, 2022

Uh oh!

sjperkins Mar 29, 2022

Uh oh!

sjperkins Mar 29, 2022

Uh oh!

github-actions bot commented Jul 19, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		def worker_story(keys: set, log: Iterable) -> list:
		def worker_story(keys: set, log: Iterable, datetimes: bool = False) -> list:

	for i, (name, _logs) in enumerate(scheduler_state.items()):
	for i, (name, _logs) in enumerate(scheduler_state.items(), 1):

	for i, (addr, info) in enumerate(workers.items()):
	for i, (addr, info) in enumerate(workers.items(), 1):

	for i, (addr, story) in enumerate(stories.items()):
	for i, (addr, story) in enumerate(stories.items(), 1):

		@@ -198,22 +207,43 @@ def scheduler_story(self, *key_or_stimulus_id: str) -> dict:

Uh oh!

Cluster dump inspection improvements #6015

Are you sure you want to change the base?

Cluster dump inspection improvements #6015

Uh oh!

Conversation

gjoseph92 commented Mar 29, 2022

Uh oh!

github-actions bot commented Mar 29, 2022

Unit Test Results

Uh oh!

sjperkins Mar 29, 2022

Choose a reason for hiding this comment

Uh oh!

sjperkins Mar 29, 2022

Choose a reason for hiding this comment

Uh oh!

sjperkins Mar 29, 2022

Choose a reason for hiding this comment

Uh oh!

sjperkins Mar 29, 2022

Choose a reason for hiding this comment

Uh oh!

sjperkins Mar 29, 2022

Choose a reason for hiding this comment

Uh oh!

sjperkins Mar 29, 2022

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Jul 19, 2022 •

edited

Loading