-
-
Notifications
You must be signed in to change notification settings - Fork 748
Drop custom __eq__ from Status
#4270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This appears to eat up a good chunk of time in places where the `Status` needs to be checked. So try dropping this method in favor of the default implementation.
|
cc @Carreau |
If we drop the special `__eq__` method, then these tests are no longer relevant as they would error instead. So go ahead and drop them for now should we decide to go ahead with this change.
|
For context this |
|
From a search through the Dask org it looks like strings are still used in dask-cloudprovider and dask-jobqueue. |
|
What sort of places are these |
| If other object instance is string, we compare with the values, but we | ||
| actually want to make sure the value compared with is in the list of | ||
| possible Status, this avoid comparison with non-existing status. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the need for some backwards compatibility, I wonder if we might first try the following:
try:
return self.value == other.value
except AttributeError:
...My experience is that try-except tends to be quite a bit faster than if/else in the try case, and this avoids the type check as well (which should be fine).
We can also migrate over all of the other projects if that will provide a large boost. It isn't hard to do.
|
This should be easily fixable in dask-jobqueue, if I believe my original comment in #3853 (comment). I have not done anything about it (and probably not planning to do anything about it in the next few days ...) though. Don't hesitate to ping me if you open a PR in dask-jobqueue! |
|
… On Wed, Nov 25, 2020 at 12:10 AM Loïc Estève ***@***.***> wrote:
This should be easily fixable in dask-jobqueue, if I believe my original
comment in #3853 (comment)
<#3853 (comment)>. I
have not done anything about it (and probably not planning to do anything
about it in the next few days ...) though. Don't hesitate ping me if you
open a PR in dask-jobqueue!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTBP42EE7HXOUYB67E3SRS3V5ANCNFSM4UAGME4Q>
.
|
|
And dask/dask-cloudprovider#200
Also #4280 for the future
…On Wed, Nov 25, 2020 at 7:27 AM Matthew Rocklin ***@***.***> wrote:
See dask/dask-jobqueue#476
On Wed, Nov 25, 2020 at 12:10 AM Loïc Estève ***@***.***>
wrote:
> This should be easily fixable in dask-jobqueue, if I believe my original
> comment in #3853 (comment)
> <#3853 (comment)>.
> I have not done anything about it (and probably not planning to do anything
> about it in the next few days ...) though. Don't hesitate ping me if you
> open a PR in dask-jobqueue!
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#4270 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AACKZTBP42EE7HXOUYB67E3SRS3V5ANCNFSM4UAGME4Q>
> .
>
|
|
Thanks for filing all of these Matt! 😄 |
__eq__ from Status__eq__ from Status
|
Looks like those PRs have been merged. Thanks all! 😄 Marking this as ready for review 🙂 |
|
Is anything else needed here? Or are we good to merge? |
|
@jacobtomlinson @lesteve @guillaumeeb it would be convenient for the Scheduler performance work to drop backwards compatibility for string-valued statuses in the core dask.distributed package. This would break interoperability with dask-cloudprovider and dask-jobqueue prior to the recent PRs sent in last week. Are you comfortable releasing soonish and dealing with the inevitable user complaints? |
|
It's worth noting the I don't think we are planning on releasing Distributed soon, but I could be wrong about that. |
|
Yeah, the enum has been around, but we didn't do a good job originally in
actually making the change throughout the subprojects consistently until
now. This puts the users and maintainers of the subprojects in a
suboptimal position.
…On Tue, Dec 1, 2020, 10:51 AM jakirkham ***@***.***> wrote:
It's worth noting the Enum object has been around since distributed
version 2.19.0 ( 1408ab5
<1408ab5>
).
I don't think we are planning on releasing Distributed soon, but I could
be wrong about that.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTFL5GJ5KSXQDQ5UYC3SSU3LPANCNFSM4UAGME4Q>
.
|
|
Yep I understand. Just sharing info for the sake of transparency. |
In our profiling of the scheduler, we identified to main transitions that took a good chunk of time. These are `transition_processing_memory` and `transition_waiting_processing`. The former takes slightly longer than the other, but both are easily 2x slower than any transition that follows them. Both of them directly or indirectly make a call to `check_idle_saturated`. While this is not necessarily the worst bottleneck for either of them, it does stick out on the callgraph and stands a good chance of improving both transitions runtimes at once. Additionally `check_idle_saturated` includes a fair bit of code that simply crunches numbers and does not touch Python objects as much. It also isn't as dependent on the Cythonization of other classes as other functions in the profile are. So this makes it easier to Cythonize this piece of code without needing to touch too much other code. Here we go through and annotate the local variables with types. Also we assign non-local variables accessed through attributes to local variables, which we type. Additionally we changed default argument values to be more friendly with C-style types. Initially we tried to type `self.idle`. However as [`self.idle` is a `sortedcontainers.SortedSet`]( https://github.com/dask/distributed/blob/9460e3fe1e0bcdb2daf6ebafe5335d536fa4f492/distributed/scheduler.py#L1292 ), this didn't work (as we need an actual Python `set` to type it). So we left this as untyped. Should add as a good chunk of the time in `check_idle_saturated` is just spent doing `ws.status == Status.closed`, we still need PR ( #4270 ) to cutdown on the time spent in this method. Combine assignments and separate them from `if` blocks. * Change `occ` default to not be `None` To make it easier to type `occ` later, define this as a non-`None` value, which is also clearly bogus (namely `-1.0`). That way we can still replace this value when it hasn't been otherwise set while also including a type that includes the default value. * Annotate `check_idle_saturated` for Cythonization * Hackily type `set`s through local assignment For now to just see what is possible, assign these attributes to local variables typed as `set`s. This will allow Cython to use the corresponding Python C APIs with these objects; thus, optimizing how they are handled. This saves us temporarily from trying to more generally Cythonize this class while exploring optimizations here. * Drop typing of `self.idle` As `self.idle` is actually a `SortedSet` instead of a `set`, we can't type it. So revert the typing of `idle`. Though leave all other `set`-based typing for other optimizations to be applied where possible. Also leave the assignment of `self.idle` to `idle` as this generates the attribute access code only once and we need this for both branches anyways.
|
Sorry for the late reply; this does looks good to me; If you are a bit worries, you can make one release where you turn this warning on by default. It will raise on users' code but they will still be able to turn the error off with a warning filter to ignore. |
|
Dask cloudprovider 0.5.1 is now on PyPI. Thanks @mrocklin for pushing this forwards. |
|
Dask-Jobqueue 0.7.2 has also been released. Since it sounds like we are planning to do a release of Dask + Distributed soon. Maybe we can merge this after that release? That would allow for more uptake of the downstream projects' releases before this change shows up. Thoughts? |
|
That seems like a good idea. Thank you for being flexible with the timing
on this one. I know you'd like to have it out of your profiles
…On Wed, Dec 9, 2020 at 7:50 AM jakirkham ***@***.***> wrote:
Dask-Jobqueue 0.7.2 has also been released.
Since it sounds like we are planning to do a release of Dask + Distributed
soon. Maybe we can merge this after that release? That would allow for more
uptake of the downstream projects' releases before this change shows up.
Thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTCYHRGRNLAJ4D4B34DST6MDHANCNFSM4UAGME4Q>
.
|
|
Now that Dask + Distributed have been released and a week has passed, am curious are we comfortable merging this change? Or would we like to wait longer? |
|
Thoughts @mrocklin? 🙂 |
|
Sure. Let's go for it. |
This appears to eat up a good chunk of time in places where the
Statusneeds to be checked. So try dropping this method in favor of the default implementation.That said, I'm sure this is here for some reason. So marking this as WIP so we can discuss whether removing is ok or if we need to figure out some alternative solution.