-
-
Notifications
You must be signed in to change notification settings - Fork 748
Extract SchedulerState from Scheduler #5839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e5ce10c to
3e0057e
Compare
|
@jakirkham , do you think this is the time that we should introduce a CI run which cythonizes? (cc @quasiben ) |
|
We already do 🙂 distributed/.github/workflows/tests.yaml Lines 84 to 87 in 60d82c2
Edit: Work was done in a couple PRs. Most notably PR ( #4764 ), which fixed a few lingering issues. |
|
Well OK! It's not too clear from the run names. |
3e0057e to
8f53e13
Compare
|
Yeah that's fair. Maybe there is a way to get GH Actions to show this? I'm not really sure, but if anyone has suggestions on how to do this, can give it a try. |
9c928c9 to
bfe66da
Compare
Unit Test Results 12 files ±0 12 suites ±0 7h 16m 3s ⏱️ + 9m 31s For more details on these failures, see this check. Results for commit 4f02566. ± Comparison against base commit fb8484e. ♻️ This comment has been updated with latest results. |
0830e10 to
ffec8f4
Compare
0c0b47a to
79686a4
Compare
79686a4 to
62f57ba
Compare
|
Fixed a few other small test failures that cropped up (changing from Also resolved the Started working on splitting out |
|
Shouldn't we push for merge now before splitting the files? |
|
Sounds ok to me. Would help minimize the merge conflicts we run into by getting some of this in sooner 🙂 Edit: Marked as ready for review. Dropped WIP from the PR title. Also updated OP to note this doesn't move things into a separate module, but merely separates into different objects. Planning on continuing to work on that step though 😉 |
|
Thoughts @dask/maintenance? 🙂 |
fjetter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit on the fence with this refactoring. On the one side, decoupling server code and state is great and moves us with the scheduler in a similar direction as we intend to go with the worker (see #5736).
On the other side, this refactoring draws the line a bit too arbitrary for my taste. I don't want us to split methods and state up by "cythonized" and "not-cythonized" but I feel this is what is happening here.
We are accessing the Scheduler.state 270 times! I didn't count how many of these actually mutate but many do. This violates principles like encapsulation or "Separation of Concern". These are obviously not principles we have to follow but I believe we should. I am afraid that this split, particularly when one side is cythonized, will make our lives much harder.
| @@ -4015,31 +4164,136 @@ def __init__( | |||
| self.rpc.allow_offload = False | |||
| self.status = Status.undefined | |||
|
|
|||
| #################### | |||
| # state properties # | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer us to be much more conservative with API exposure. This PR already removes a few underscores and I would rather see the entire state to be closed off unless there is any concrete reason for it to be public.
| def aliases(self): | ||
| return self.state.aliases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should aliases and host_info not rather be server attributes than state attributes?
| @property | ||
| def extensions(self): | ||
| return self.state.extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like for plugins, I'm wondering how the extensions now are supposed to work (are extensions even something we do publicly? this is merely an internal thing to do, isn't it? Users would use plugins I assume)
| def get_task_duration(self, ts: TaskState) -> double: | ||
| return self.state.get_task_duration(ts) | ||
|
|
||
| def get_comm_cost(self, *args, **kwargs): | ||
| return self.state.get_comm_cost(*args, **kwargs) | ||
|
|
||
| def check_idle_saturated(self, *args, **kwargs): | ||
| return self.state.check_idle_saturated(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These all are only accessed by stealing. Why not let stealing access scheduler.state instead?
|
|
||
| def check_idle_saturated(self, *args, **kwargs): | ||
| return self.state.check_idle_saturated(*args, **kwargs) | ||
|
|
||
| def update_graph_hlg( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire update_graph_* functions appear to be much close to state than server. Why not move these to SchedulerState?
|
Those concerns are reasonable Florian and I share them as well, but I fear we are putting the cart before the horse. The main issue is the Scheduler has been rather monolithic for a long time.
The first step at least as we saw it was to break apart the inheritance structure, which I think you can now see was a fair bit of work on its own. The next step would be to get this into its own file so we are not trying to compile the whole Scheduler (kind of slow, touches a bunch of complex code that doesn't necessarily benefit from Cythonization, etc.). I've already started this, but would rather put it up after this PR lands. Then we can tug further at these various threads to get However each of these passes is going to touch a fair bit of code, which likely are touched by other PRs. As a result to get this done, I think we are going to need to do a series of PRs that are more incremental to avoid too much churn at once. None of them will give us the warm fuzzy feeling that things are done, but each will move us incrementally in that direction. To that end, it would help to have some feedback on this in light of this being a step on that path. How do we feel about the fact the two objects are split here? What things would we like to keep in mind for the next PRs? Etc.? |
|
Thanks. I appreciate that this is not an all-in-one PR and we have to do this incrementally. I think we're not too far apart.
Indeed, this was an important step and this alone is probably reason enough for this PR to be merged
I think this is the point where we disagree. I'm more concerned about the overall state of the code, its maintainability and architecture than the cythonization attempt. I would prefer us cleaning up the interfaces between the two classes and finish the refactor before we invest any more time in cythonization. We agreed on "not dropping Cython" under the condition that we can isolate the code to a dedicated module such that "other work" may not be impacted as much by the cythonization as it is right now. With this PR this would amount to about half the code of |
|
I feel like the alignment between us is actually pretty good. This PR is trying to do the minimum, so that we can answer exactly those interface questions; i.e., many decisions on whether a thing is in the state or not were simply taken from the previous superclass model. Now that we separate the state, we can make a more formal decision on where each thing should be. So I suggest we don't develop further in this PR, but add each of @fjetter 's points as to-do in a the next PR. The third PR would move the state into a different file to show our intent at encapsulation, and I would aim for the state file to contain 20% of the code, rather than the other way around. |
|
Generally agree with Martin's points above. Also don't think we are doing any additional Cythonization here (and that is not my current priority either though likely would be an interest in the medium term). So also seeing pretty good alignment. The one thing I've noticed on an ergonomic side of things is that fixing merge conflicts has been very slow (for example diff resolution software has been struggling with the size of the |
|
Code count in this branch:
So we are talking ~30% of the code to be factored into a different file, depending how many functions and smaller classes except SchedulerState class go with |
I did a very rough split of what I thought should be done by the
That's mostly where my concern is. We're still unclear about what functions or smaller classes should be part of the state and what shouldn't. I would love to talk about about what the final interface of a SchedulerState should look like. This would give me a better understanding of where we're going |
|
I agree that if indeed it is ~most of the code, then the split is not worthwhile, and deciding where the demarkation line is, is an important necessary step before trying to implement the split. Still, does this affect the current PR? |
|
So, I'm not sure that I support this PR. Let me write down a couple of pros and cons:
Historically we've used main as a testing ground to figure out whether or not we should do Cythonization. This was because we were pretty sure that we wanted to do Cythonization, and because it was hard to manage the merge conflcts. I think at this point, after a year of doing this, I think that we can call this a mistake. We should not use main to experiment with Cythonization today. This PR feels like it's changing around main, which comes with the costs mentioned above, in order to answer the "should we do cythonization" question. I think that I'm mildly against merging it in. I think that we need to find out ways to answer that question without causing churn in main. Thoughts? |
|
Did @fjetter have time to go through this PR as well ? |
|
I believe that he's somewhat saturated currently, so I wouldn't be surprised if not. |
|
Ok well when we met last we discussed Florian would take a look at this PR. So I think we are a bit confused on what is going on here. Maybe we need to have another meeting? Would suggest we do something next week (personally am pretty saturated this week). |
|
What do you think, @jakirkham , do you still have any energy to give to this, aside from all the sunken costs? Personally, I am surprised that the scheduler as a bottleneck doesn't come up more often (yes, I know there have been improvements elsewhere); but when it IS - is there any other plausible acceleration on the horizon, or is cython the only game in town? pyston? To be sure, rewriting with something else is probably much more work than we have been considering here. I'd be happy to discuss this question in another meeting, if we have any chance of coming to consensus. |
I'm sure that if Florian has time that he'll chime in. I think that currently he's really focused on removing deadlocks and other arguably more critical issues though.
Sure, I'm happy to meet any time.
To be clear, I'm still enthusiastic about the Cythonization work, we just need to find a way to explore this without operating directly in Do you all disagree? If so, I'd love to engage on that higher level question. |
|
I had another look over this PR. I'm really torn but I'm still not convinced about this change and would prefer us to go in another direction by dropping Cython from main and reverting any complications introduced to make Cython work, i.e. revert the I would really love to see us separating scheduling logic from server code. I believe such a modular software architecture has many advantages over the current, rather monolithic one. This PR does not propose a modular architecture. I entirely understand that such an architecture can not be achieved with a single PR and this should be considered an increment. However, at the same time this PR draws a relatively arbitrary line to define
Before this modular architecture is achieved, this intermediary state will cause pain since until then it is just another indirection in an already complex and complicated system. For me to support this PR, I would need confidence that the direction this is pursuing is correct. Right now, I'm lacking this confidence and am more concerned about the cost and possible churn this introduces. I already mentioned that I consider this PR a slight improvement over main in a few areas. Particularly, because it removes an otherwise very confusing inheritance from the |
Refresh & follow-up to @martindurant's PR ( #5176 )
Continues the worker to pull out
SchedulerStateas a separate object used by (though no longer a class inherited by)Scheduler.Note: This does not separate
SchedulerStateinto a new module, but this sets things up to make that step easier.pre-commit run --all-files