RFC: Use a const initializer for GIL_COUNT if possible#3177
RFC: Use a const initializer for GIL_COUNT if possible#3177
GIL_COUNT if possible#3177Conversation
I think we should apply this change even if we cannot measure an improvement, as long as we do not measure a regression. The reason is that the effects will most likely be hard to measure as you point, mostly due to branch prediction hiding the costs of the branch. But it still one branch less and hence less pressure on the global resources backing branch prediction in the first place. Meaning that effects would most likely only be measurable when these global resources are oversubscribed. I also think the change is nicely localized and carries a reasonable maintenance cost which we will loose eventually as our MSRV passes 1.59 (which are targetting 1.56 for PyO3 0.20 at the moment, but nothing is set in stone yet). |
|
Agreed, I think this is a worthwhile improvement - even if immeasurable now and requires a conditional flag, with a future Rust version this will have no maintainance cost and be strictly better. |
Since `Vec::with_capacity` is not a const function, it now does not allocate any initial memory.
|
LGTM bors r+ |
|
Build succeeded! The publicly hosted instance of bors-ng is deprecated and will go away soon. If you want to self-host your own instance, instructions are here. If you want to switch to GitHub's built-in merge queue, visit their help page. |
Normally
LocalKey::try_withneeds to check for the initialization flag to lazily initialize the TLS. This behaves badly with a compiler optimization and it seems that multiple calls togil_is_acquired()cannot be correctly eliminated. Rust 1.59 added aconst { ... }initializer (a special form only allowed here for now) inthread_local!which removes the initialization flag, allowing those kind of optimizations.I should note that the performance impact is probably minimal, because the check branch is extremely well predicted and the optimization is only possible when every PyO3 code that leads to
gil_is_acquired()is inlined, a pretty uncommon situation. I couldn't demonstrate any consistent improvement or regression from my machine, which performance seems to be flaky as well. But at least we would have one less branch there. I'll leave this as an RFC so that someone else can prove or disprove that this is indeed beneficial.