-
-
Notifications
You must be signed in to change notification settings - Fork 748
Annotate ClientState for Cythonization
#4290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Avoid building a `tuple` and simply inject the `client_key` directly into the `str`.
|
From my perspective this looks pretty lightweight. I'm pleasantly surprised :) |
|
Thanks! This is basically done on my end if we are ok with it. Wasn't sure to what extent the |
|
cc @quasiben (for vis) |
When you say "outside the scheduler" do you mean
|
|
Sorry I mean does anything outside of this This applies to other |
|
There are several uses of TaskState and WorkerState in the dashboard code.
Here is an example file, I recommend grepping for "ts."
https://github.com/dask/distributed/blob/master/distributed/dashboard/components/scheduler.py
…On Tue, Dec 1, 2020 at 10:15 AM jakirkham ***@***.***> wrote:
Sorry I mean does anything outside of this scheduler.py file do from ...
import ClientState and then do something with a ClientState's attributes.
Like does cs.client_key (or any other attribute) get used outside of this
file.
This applies to other *State objects as well. So am curious both in terms
of this specific type and then generally the others.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4290 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTBQSRTUJLILKBAVT4LSSUXCPANCNFSM4UIWPYFQ>
.
|
|
Ok thanks for that data point. We might want to adjust this approach for them. To the second part, is |
|
In general scheduler state is used outside of scheduler.py. Scheduler logic is spread to other files like stealing.py or locks.py intentionally to try to keep the core somewhat lightweight. Is it possible to draw the Cython box a big bigger around multiple files? It is probably doable to keep things contained within this repository though |
|
Ok thanks that helps. I'll keep that in mind going forward. Also will push some changes to address that here. We could use Cython in more places. I don't think we need to yet, but it could happen. Stealing is a good example of where we may consider this eventually. Independently we might want to start tracking what is intended to be Cythonized and how to enable it. Initially was thinking we could add an optional flag to Sure. Was more wondering whether we need to be cognizant of breakage. We are not entering that territory yet (and may not at all). Just trying to understand the constraints a bit more since we are discussing them. |
|
Have pushed a couple of commits to add
The advantage of using If we prefer something more succinct while still remaining rather Pythonic, Cython's Using Out of all of the options slightly prefer the |
|
This seems like a sensible path forward. I'll be curious to see if this has an impact when we hit TaskState and WorkerState, which I suspect will be more significant and more work. |
ClientState for CythonizationClientState for Cythonization
|
Ok this is ready for review. Please let me know if you have any other questions 🙂 |
|
This looks good to me. Thanks @jakirkham |
| except ImportError: | ||
| from ctypes import ( | ||
| c_double as double, | ||
| c_ssize_t as Py_hash_t, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, this is reasonable as Python does the same thing itself. Though it shouldn't matter whether we get this right in the pure Python case as this doesn't have any effect on the implementation besides documentation. Just noting this to provide background as to how this was decided.
Analogous to PR ( #4290 ) except instead of `ClientState`, this is annotating `WorkerState` and usages thereof. This is a bit longer simply because of how many attributes `WorkerState` has. That said, this just uses the same strategy here as was used for `ClientState`. * Coerce `bandwidth` to `float` Typically `bandwidth` seems to be a `float` elsewhere in the code. So coerce it to a `float` here as well to align with that practice. * Use `ws` variable name for `WorkerState` objects * Name `WorkerState` variable distinctly in closure Avoids confusion with the variable name used outside this closure. Also bypasses any potential leakage from the external scope. * Assign selected `WorkerState` to variable * Annotate `WorkerState` for Cythonization * Annotate all `WorkerState` variables * Prefix `WorkerState` attributes with `_` Since these are effectively private given they are only available in Cython/C, go ahead and mark them as such. We can then add Python `@property`s on top of them to handle Python access to the original attribute names with minimal disruption. * Add Python-level `property`s for attributes * Run `black` * Add some `property.setter`s Includes `setter`s for `occupancy` and `nbytes`. * Create `list` from generator This ensures Cython still uses `WorkerState` to annotate the variable iterated over. Otherwise it constructs a generator with its own scope where this is ignored. * Relax `_name` to `object` Apparently `_name` is sometimes an `int`. So relax typing around it to just an `object`.
From profiling the
Scheduler, we know that there are a series oftransition_*functions that take the bulk of the time. These functions and functions they call primarily work with the various*Stateobjects. In these functions they spend time accessing their attributes, hashing them, etc. At the Python level we have optimized these operations about as much as we can. Already we use__slots__on the*Stateobjects to speed up access. Also we pre-compute theirhashand store it during construction to speed up laterhashing of these objects. That said, working with the*Stateobjects still takes a fair bit of time.Thus it seems reasonable to try and optimize these objects with Cython. Analogous to PR ( #4283 ) and PR ( #4289 ) we try to leverage Cython's pure Python mode to minimize disruption. Just to get a taste for these kinds of changes, we pick the simplest
*Statetype to start,ClientState. This only has a handful of attributes and while it does see a fair bit of usage it is not as heavily used asWorkerStateorTaskState.In order to benefit from annotating
classes, we need to use one feature from Cython,cclass, to decorate theclassin question. This has the same effect as usingcdef classin Cython. We then define and type attributes using type annotations, which Cython converts into C level attributes in astructwith very fast access. By default these act likecdefattributes, so are only visible in C/Cython code when compiled (there is no affect in pure Python). We can make them visible in Python by either using Cython'sdeclare, a.pxdfile, or rolling our own@properties(which is also Cython'sdeclaredoes for us). Have forgone this atm as it doesn't appearClientStateis used externally. This also helped with the next part. Additionally have typed a few method variables used byClientState.When trying to running with these changes to
ClientState, we found any spots where aClientStatewas assigned to a variable and annotated that variable accordingly. This was done by picking up on conventioncstypically meantClientState. SometimescmeantClientState, but those have been changed tocsfor clarity. Finally in running the profile anything we might have missed was easy to pick up as there would be aClientStateattribute access error (since attributes are not visible to Python currently). So any other variable needing typing was caught and fixed that way.After getting a sense of what kinds of issues would crop up and how to fix them, this went pretty smoothly. The code changes also remained fairly light. Some changes to
ClientState. A fewimports with fallbacks. Otherwise just typing aClientStatevariable when it came up. This should help us prepare for the other*Stateobjects as well. Would be interested to get others takes on these changes. 🙂