-
-
Notifications
You must be signed in to change notification settings - Fork 748
Respect layer Annotations in graph_to_futures #4288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2101d22 to
74419e0
Compare
|
Hrm, another thought that came up is checking with dask.annotate(retries=10):
x.compute() |
Support for adding compute-time annotations seems sensible. Expanding your snippet above: x = ... # Create Dask collection here w/o annotations
with dask.annotate(retries=10):
# Add compute-time annotations
x.compute()We'll want to make sure that compute-time annotations take precedent over any overlapping annotations which may already exist on HLG layers. |
dask/dask#6889 and #4279 do currently handle transmission of arbitrary annotations to the scheduler (and SchedulerPlugins) To me this looks like a special case handling of the existing distributed Client.{submit,compute,persist} taxonomy. This is handled here in #4279 and tested here. I think the advantage of this PR's approach is that it saves the scheduler work unpacking the existing taxonomy out of the transmitted annotations. Also, one thing that didn't immediately occur to me is that this approach will provide backward compatibility for dictionary graphs. I've merged this branch into dask/dask#6889 -- hope this OK. Let me add support for resources and resources and loose restrictions in that PR. I'll work on that now and open up both the dask and distributed PR's for review this evening (UTC + 02h00). |
Ah right. My original approach to this was "let's just handle the annotations that we know about now, which we can do simply. However, you're right in that users may have needs here with their own code, as in with SchedulerPlugins. I was hoping to find a simpler solution than your existing PRs, but let me take another look with this use case in mind. Thanks for pushing back here. |
Could you critique the current complexity in more detail? I think part of it may stem from the current Layer transmission mechanisms. My impression is that this is currently in flux. |
The code is fine. I gave it a more thorough pass just now. I was a bit pedantic in some cases where minor code complexity arose just to make a point. I think that I'm probably more picky than most when it comes to reducing indirection in code. My original point wasn't about the shape of your code in particular. It was more about the machinery involved in packing/unpacking things. I think that you're trying to do a more thorough job than I was trying to do here, and so your solution was understandably more complex. I do think that what is in this PR is simpler, mostly because it only relies on systems that feel pretty solid today. |
As someone who has been eagerly anticipating the use-cases a general annotation framework can enable I'd very much like a general rather than limited/specific solution to be implemented now. In the last year it seems there's been a lot of focus from a number of core contributors on this difficult problem and I feel the project is pretty close to a workable general-purpose solution acceptable to everyone. I think if the project were to stop now with a limited solution that it may be significantly more difficult to generalise later on when everyone has moved on to other problems. A general purpose solution will necessarily be more complex than a limited/specific solution but I think the use-cases it enables will justify the additional complexity and I think with the focus this feature is getting now that there will never be a better time to design/implement the more complex solution. |
This is undoubtedly true, but I think it will suffer a similar problem to that experienced with Client.{submit, persist, compute} kwargs: expanding the taxonomy or collection scheduling functionality relies on adding more args/kwargs to It may be worth revisiting the points originally discussed here
Expert users can now write custom SchedulerPlugins which implement collection specific scheduling behaviour based on annotations without affecting downstream projects. If the plugins are very effective, they can then be added to the scheduler by default, or their functionality integrated into the scheduler. |
|
I'm totally fine going with the general approach. My comment about this being simpler because it relies on solid systems today was more of a statement of "This approach is only simpler because I'm solving a weaker problem" rather than "Let's go with this because it's simpler" |
|
I have no specific desire to merge in this PR over yours. I do have some mild requests on yours though. |
I do appreciate the feedback: your comments have improved the PRs. I think I've addressed most of your existing requests, except for one point which I think requires input from others who've worked on the Layer hierarchy. |
cc @sjperkins this is what I was thinking. Thoughts?