Respect layer Annotations in graph_to_futures #4288

mrocklin · 2020-11-30T17:51:54Z

cc @sjperkins this is what I was thinking. Thoughts?

distributed/client.py

mrocklin · 2020-11-30T21:50:33Z

Hrm, another thought that came up is checking dask.config.get("annotations") when we call compute/persist/_graph_to_futures. This would support workflows like

with dask.annotate(retries=10):
    x.compute()

…otations

jrbourbeau · 2020-12-01T05:14:14Z

another thought that came up is checking dask.config.get("annotations") when we call compute/persist/_graph_to_futures

Support for adding compute-time annotations seems sensible. Expanding your snippet above:

x = ... # Create Dask collection here w/o annotations
with dask.annotate(retries=10):
    # Add compute-time annotations
    x.compute()

We'll want to make sure that compute-time annotations take precedent over any overlapping annotations which may already exist on HLG layers.

sjperkins · 2020-12-01T12:33:03Z

cc @sjperkins this is what I was thinking. Thoughts?

dask/dask#6889 and #4279 do currently handle transmission of arbitrary annotations to the scheduler (and SchedulerPlugins)

To me this looks like a special case handling of the existing distributed Client.{submit,compute,persist} taxonomy. This is handled here in #4279 and tested here.

I think the advantage of this PR's approach is that it saves the scheduler work unpacking the existing taxonomy out of the transmitted annotations. Also, one thing that didn't immediately occur to me is that this approach will provide backward compatibility for dictionary graphs.

I've merged this branch into dask/dask#6889 -- hope this OK. Let me add support for resources and resources and loose restrictions in that PR. I'll work on that now and open up both the dask and distributed PR's for review this evening (UTC + 02h00).

mrocklin · 2020-12-01T15:48:59Z

dask/dask#6889 and #4279 do currently handle transmission of arbitrary annotations to the scheduler (and SchedulerPlugins)

Ah right. My original approach to this was "let's just handle the annotations that we know about now, which we can do simply.
I don't see a need for full arbitrary annotations today"

However, you're right in that users may have needs here with their own code, as in with SchedulerPlugins.

I was hoping to find a simpler solution than your existing PRs, but let me take another look with this use case in mind. Thanks for pushing back here.

sjperkins · 2020-12-01T15:55:07Z

I was hoping to find a simpler solution than your existing PRs, but let me take another look with this use case in mind. Thanks for pushing back here.

Could you critique the current complexity in more detail? I think part of it may stem from the current Layer transmission mechanisms. My impression is that this is currently in flux.

mrocklin · 2020-12-02T05:16:19Z

Could you critique the current complexity in more detail? I think part of it may stem from the current Layer transmission mechanisms. My impression is that this is currently in flux.

The code is fine. I gave it a more thorough pass just now. I was a bit pedantic in some cases where minor code complexity arose just to make a point. I think that I'm probably more picky than most when it comes to reducing indirection in code.

My original point wasn't about the shape of your code in particular. It was more about the machinery involved in packing/unpacking things. I think that you're trying to do a more thorough job than I was trying to do here, and so your solution was understandably more complex.

I do think that what is in this PR is simpler, mostly because it only relies on systems that feel pretty solid today.

dhirschfeld · 2020-12-02T11:02:02Z

My original approach to this was "let's just handle the annotations that we know about now, which we can do simply. I don't see a need for full arbitrary annotations today"

As someone who has been eagerly anticipating the use-cases a general annotation framework can enable I'd very much like a general rather than limited/specific solution to be implemented now.

In the last year it seems there's been a lot of focus from a number of core contributors on this difficult problem and I feel the project is pretty close to a workable general-purpose solution acceptable to everyone. I think if the project were to stop now with a limited solution that it may be significantly more difficult to generalise later on when everyone has moved on to other problems.

A general purpose solution will necessarily be more complex than a limited/specific solution but I think the use-cases it enables will justify the additional complexity and I think with the focus this feature is getting now that there will never be a better time to design/implement the more complex solution.

sjperkins · 2020-12-02T14:28:12Z

I do think that what is in this PR is simpler, mostly because it only relies on systems that feel pretty solid today.

This is undoubtedly true, but I think it will suffer a similar problem to that experienced with Client.{submit, persist, compute} kwargs: expanding the taxonomy or collection scheduling functionality relies on adding more args/kwargs to update-graph ops with potential knock-on effects to every downstream project. I think the complexity surrounding those issues are far greater than the code complexity introduced by annotations.

It may be worth revisiting the points originally discussed here

Modifying existing scheduler behaviour is risky for the existing dask user base.
Encoding annotations in the graph provides hints to upstream schedulers that they may respect, but are under no obligation to do so.
As it currently stands, users may
- Create graphs via the collections (Array, Dataframe, Bag) and trust the scheduler to schedule work optimally.
- Optimally schedule work imperatively via the distributed Client interface.
Neither approach seems entirely satisfactory: Collections are great abstractions which are disappointing to discard for the optimal compute/data placement afforded by the Client interface.
Annotations therefore bridge the gap between the two paradigms by providing an interface or glue layer.

Expert users can now write custom SchedulerPlugins which implement collection specific scheduling behaviour based on annotations without affecting downstream projects.

If the plugins are very effective, they can then be added to the scheduler by default, or their functionality integrated into the scheduler.

mrocklin · 2020-12-02T15:03:24Z

I'm totally fine going with the general approach. My comment about this being simpler because it relies on solid systems today was more of a statement of "This approach is only simpler because I'm solving a weaker problem" rather than "Let's go with this because it's simpler"

mrocklin · 2020-12-02T15:03:59Z

I have no specific desire to merge in this PR over yours. I do have some mild requests on yours though.

sjperkins · 2020-12-02T15:08:25Z

I have no specific desire to merge in this PR over yours. I do have some mild requests on yours though.

I do appreciate the feedback: your comments have improved the PRs. I think I've addressed most of your existing requests, except for one point which I think requires input from others who've worked on the Layer hierarchy.

This was referenced Nov 30, 2020

Transmit Layer Annotations to Scheduler dask/dask#6889

Merged

Transmit Layer annotations to scheduler #4279

Merged

Respect layer Annotations in graph_to_futures

74419e0

mrocklin force-pushed the annotations branch from 2101d22 to 74419e0 Compare November 30, 2020 17:54

jrbourbeau reviewed Nov 30, 2020

View reviewed changes

distributed/client.py Show resolved Hide resolved

Only inspect layers with annotations

0245f43

jrbourbeau mentioned this pull request Nov 30, 2020

Ensure shuffle layers have annotations dask/dask#6912

Closed

2 tasks

Merge branch 'master' of https://github.com/dask/distributed into ann…

b2bc758

…otations

mrocklin closed this Dec 10, 2020

mrocklin deleted the annotations branch January 4, 2021 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Respect layer Annotations in graph_to_futures #4288

Respect layer Annotations in graph_to_futures #4288

Uh oh!

mrocklin commented Nov 30, 2020

Uh oh!

Uh oh!

mrocklin commented Nov 30, 2020

Uh oh!

jrbourbeau commented Dec 1, 2020

Uh oh!

sjperkins commented Dec 1, 2020

Uh oh!

mrocklin commented Dec 1, 2020

Uh oh!

sjperkins commented Dec 1, 2020 •

edited

Loading

Uh oh!

mrocklin commented Dec 2, 2020

Uh oh!

dhirschfeld commented Dec 2, 2020

Uh oh!

sjperkins commented Dec 2, 2020

Uh oh!

mrocklin commented Dec 2, 2020

Uh oh!

mrocklin commented Dec 2, 2020

Uh oh!

sjperkins commented Dec 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Respect layer Annotations in graph_to_futures #4288

Respect layer Annotations in graph_to_futures #4288

Uh oh!

Conversation

mrocklin commented Nov 30, 2020

Uh oh!

Uh oh!

mrocklin commented Nov 30, 2020

Uh oh!

jrbourbeau commented Dec 1, 2020

Uh oh!

sjperkins commented Dec 1, 2020

Uh oh!

mrocklin commented Dec 1, 2020

Uh oh!

sjperkins commented Dec 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrocklin commented Dec 2, 2020

Uh oh!

dhirschfeld commented Dec 2, 2020

Uh oh!

sjperkins commented Dec 2, 2020

Uh oh!

mrocklin commented Dec 2, 2020

Uh oh!

mrocklin commented Dec 2, 2020

Uh oh!

sjperkins commented Dec 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sjperkins commented Dec 1, 2020 •

edited

Loading