Skip to content

Conversation

@gjoseph92
Copy link
Collaborator

Various quality-of-life improvements to the cluster dump inspection process

  • Add print statements to to_yamls, since it can be pretty slow and it's nice to see what's going on
  • Add option to run to_yamls in a background thread, so you can start inspecting the DumpArtefact right away
  • Name worker subdirectories in to_yamls by address (more easily cross-referenceable)
  • Fix worker_story to just return each worker's story separately. Trying to combine them all by event type is ultimately less useful
  • Add worker_stories_to_yamls to dump all the worker stories to separate YAML files (matching your to_yamls directories)
  • Format event log timestamps as datetimes in to_yamls methods for easier reading and cross-referencing to logs
  • Add more things by default to the expand_keys lists
  • When dumping logs, use YAML block literals (|) so \n actually becomes a newline, instead of printing as an escaped character. This makes tracebacks much easier to read.
  • Tests added / passed
  • Passes pre-commit run --all-files

@gjoseph92 gjoseph92 requested a review from sjperkins March 29, 2022 00:42
@github-actions
Copy link
Contributor

Unit Test Results

       12 files  ±0         12 suites  ±0   6h 32m 55s ⏱️ + 20m 57s
  2 674 tests ±0    2 588 ✔️  -   3    83 💤 +1    3 +  2 
15 948 runs  ±0  15 074 ✔️  - 15  861 💤 +3  13 +12 

For more details on these failures, see this check.

Results for commit 4b3e0c2. ± Comparison against base commit e8c0669.

return {
addr: _worker_story(keys, wlog, datetimes=True)
for addr, worker_dump in self.dump["workers"].items()
if isinstance(worker_dump, dict) and (wlog := worker_dump.get("log"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First use of the walrus operator in dask/distributed I believe :-)



def worker_story(keys: set, log: Iterable) -> list:
def worker_story(keys: set, log: Iterable, datetimes: bool = False) -> list:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth adding the same parameter to scheduler_story above.


# Compact smaller keys into a general dict
scheduler_state = self._compact_state(self.dump[context], scheduler_expand_keys)
for i, (name, _logs) in enumerate(scheduler_state.items()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitty suggestion that saves the i+1 lower down.

Suggested change
for i, (name, _logs) in enumerate(scheduler_state.items()):
for i, (name, _logs) in enumerate(scheduler_state.items(), 1):

worker_id = info["id"]
except KeyError:
continue
for i, (addr, info) in enumerate(workers.items()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitty suggestion that saves the i+1 lower down.

Suggested change
for i, (addr, info) in enumerate(workers.items()):
for i, (addr, info) in enumerate(workers.items(), 1):


return dict(stories)
stories = self.worker_stories(*key_or_stimulus_id)
for i, (addr, story) in enumerate(stories.items()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i, (addr, story) in enumerate(stories.items()):
for i, (addr, story) in enumerate(stories.items(), 1):

@@ -198,22 +207,43 @@ def scheduler_story(self, *key_or_stimulus_id: str) -> dict:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If scheduler_story is modified to take a datetime bool argument, it would be nice if it was set to True in this call.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 19, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±0         15 suites  ±0   6h 52m 5s ⏱️ + 41m 45s
  3 166 tests ±0    3 076 ✔️  -   5    83 💤 ±0    7 +  5 
23 426 runs  +2  22 504 ✔️  - 17  909 💤 +8  13 +11 

For more details on these failures, see this check.

Results for commit c3cbee3. ± Comparison against base commit 43d7aa4.

♻️ This comment has been updated with latest results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants