Skip to content

RFC: Use a const initializer for GIL_COUNT if possible#3177

Merged
bors[bot] merged 2 commits intoPyO3:mainfrom
lifthrasiir:gil-count-const-init
May 24, 2023
Merged

RFC: Use a const initializer for GIL_COUNT if possible#3177
bors[bot] merged 2 commits intoPyO3:mainfrom
lifthrasiir:gil-count-const-init

Conversation

@lifthrasiir
Copy link
Copy Markdown
Contributor

Normally LocalKey::try_with needs to check for the initialization flag to lazily initialize the TLS. This behaves badly with a compiler optimization and it seems that multiple calls to gil_is_acquired() cannot be correctly eliminated. Rust 1.59 added a const { ... } initializer (a special form only allowed here for now) in thread_local! which removes the initialization flag, allowing those kind of optimizations.

I should note that the performance impact is probably minimal, because the check branch is extremely well predicted and the optimization is only possible when every PyO3 code that leads to gil_is_acquired() is inlined, a pretty uncommon situation. I couldn't demonstrate any consistent improvement or regression from my machine, which performance seems to be flaky as well. But at least we would have one less branch there. I'll leave this as an RFC so that someone else can prove or disprove that this is indeed beneficial.

@adamreichold adamreichold added the CI-skip-changelog Skip checking changelog entry label May 23, 2023
@adamreichold
Copy link
Copy Markdown
Member

I should note that the performance impact is probably minimal, because the check branch is extremely well predicted and the optimization is only possible when every PyO3 code that leads to gil_is_acquired() is inlined, a pretty uncommon situation. I couldn't demonstrate any consistent improvement or regression from my machine, which performance seems to be flaky as well. But at least we would have one less branch there. I'll leave this as an RFC so that someone else can prove or disprove that this is indeed beneficial.

I think we should apply this change even if we cannot measure an improvement, as long as we do not measure a regression. The reason is that the effects will most likely be hard to measure as you point, mostly due to branch prediction hiding the costs of the branch. But it still one branch less and hence less pressure on the global resources backing branch prediction in the first place. Meaning that effects would most likely only be measurable when these global resources are oversubscribed.

I also think the change is nicely localized and carries a reasonable maintenance cost which we will loose eventually as our MSRV passes 1.59 (which are targetting 1.56 for PyO3 0.20 at the moment, but nothing is set in stone yet).

@davidhewitt
Copy link
Copy Markdown
Member

Agreed, I think this is a worthwhile improvement - even if immeasurable now and requires a conditional flag, with a future Rust version this will have no maintainance cost and be strictly better.

Since `Vec::with_capacity` is not a const function, it now does not
allocate any initial memory.
@adamreichold
Copy link
Copy Markdown
Member

LGTM

bors r+

@bors
Copy link
Copy Markdown
Contributor

bors bot commented May 24, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit dfc667f into PyO3:main May 24, 2023
@lifthrasiir lifthrasiir deleted the gil-count-const-init branch May 24, 2023 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI-skip-changelog Skip checking changelog entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants