-
-
Notifications
You must be signed in to change notification settings - Fork 748
Simplify update graph #8047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify update graph #8047
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 19 files - 1 19 suites - 1 10h 16m 35s ⏱️ - 1h 26m 17s For more details on these failures, see this check. Results for commit d5340c0. ± Comparison against base commit d6758bd. This pull request removes 1 and adds 7 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
| pre_stringified_keys, | ||
| ) = self.materialize_graph(graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously I passed throuhg non-stringified keys to have them available when dealing with annotations at a later point. it's much better to get this out of the way and do this here.
Once/if stringification is removed this should simplify things with annotations a lot
| if internal_priority is None: | ||
| # Removing all non-local keys before calling order() | ||
| dsk_keys = set(dsk) # intersection() of sets is much faster than dict_keys | ||
| stripped_deps = { | ||
| k: v.intersection(dsk_keys) | ||
| for k, v in dependencies.items() | ||
| if k in dsk_keys | ||
| } | ||
| internal_priority = dask.order.order(dsk, dependencies=stripped_deps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just a preparatory step. In the follow up PR I will rename the variables. This is just about the logical grouping of doing static/non-mutating operations first instead of grouping the operations by content
| annotations_for_plugin: defaultdict[str, dict[str, Any]] = defaultdict(dict) | ||
| for key in keys_with_annotations: | ||
| ts = self.tasks[key] | ||
| for annot, value in ts.annotations.items(): | ||
| annotations_for_plugin[annot][key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to remove passing annotations to plugins. If a scheduler plugin wants to know what annotations are there, it should just look this up with the task itself. I wanted to separate this change from the refactoring so I decided to keep a slightly simplified version in
|
|
|
Done. tests should pass now |
hendrikmakait
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM with a couple of nits or questions for clarification that might lead to small code changes.
Co-authored-by: Hendrik Makait <hendrik.makait@gmail.com>
Working my way towards #7980
This makes a couple of methods static. This is mostly cosmetic to signal that they are in fact not relying on the scheduler and are not (supposed to) mutate any state.
The second thing is that this simplifies the way annotations are handled which is currently quite confusing. Haven't run any benchmarks but I'm pretty sure this is also faster than before. However, nothing here is flagged by #7980