Remove the anon query modifier#154122
Conversation
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Remove the `anon` query modifier
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (692d51a): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.5%, secondary -1.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 3.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 483.892s -> 479.954s (-0.81%) |
|
Having established a baseline, let's now try applying @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Remove the `anon` query modifier
|
I think |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (72b77f1): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.9%, secondary 7.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.0%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 481.595s -> 480.03s (-0.32%) |
1feb7fe to
aee5aef
Compare
|
This should now be ready for review. r? nnethercote |
This was previously not possible because `check_representability` was `anon`.
According to its comment, this query was only using `anon` to save a little bit of work for a dep graph node that is expected to have no dependencies. Benchmarks indicate that the perf loss, if any, is small enough to be justified by the fact that we can now remove support for `anon` queries. Adding `no_hash` appears to give marginally better perf results.
We still have some anon-task machinery for `DepKind::TraitSelect` tasks, but there are no longer any queries that use the `anon` modifier.
|
I updated the dev guide (plus a FIXME to migrate modifier docs into The @bors r=nnethercote rollup=never |
This comment has been minimized.
This comment has been minimized.
Remove the `anon` query modifier Prior experiments: - #152268 - #153996 [Zulip thread: *Removing the `anon` query modifier*](https://rust-lang.zulipchat.com/#narrow/channel/582699-t-compiler.2Fquery-system/topic/Removing.20the.20.60anon.60.20query.20modifier/with/580760962) --- There are currently three queries that use the `anon` modifier: - `check_representability` - `check_representability_adt_ty` - `erase_and_anonymize_regions_ty` It seems that none of them are using `anon` in an essential way. According to comments and tests, the *representability* queries mainly care about not being eligible for forcing (despite having a recoverable key type), so that if a cycle does occur then the entire cycle will be on the query stack. Replacing `anon` with a new `no_force` modifier gives a modest perf improvement. The `erase_and_anonymize_regions_ty` query appears to be using `anon` to reduce the dep-graph overhead of a query that is expected to not call any other queries (and thus have no dependencies). Replacing `anon` with either `no_hash` or nothing appears to give only a very small perf hit on `cargo` benchmarks, which is justified by the fact that it lets us remove a lot of machinery for anonymous queries. We still need to retain some of the machinery for anonymous *tasks*, because the non-query task `DepKind::TraitSelect` still uses it. --- I have some ideas for a follow-up that will reduce dep-graph overhead by replacing *all* zero-dependency nodes with a singleton node, but I want to keep that separate in case it causes unexpected issues and needs to be bisected or reverted.
|
❗ You can only retry pull requests that are approved and have a previously failed auto build. Hint: There is currently a pending auto build on this PR. To cancel it, run |
|
@bors yield |
|
Auto build was cancelled. Cancelled workflows: The next pull request likely to be tested is #154122. |
This comment has been minimized.
This comment has been minimized.
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 20f19f4 (parent) -> 562dee4 (this PR) Test differencesShow 126 test diffs126 doctest diffs were found. These are ignored, as they are noisy. Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 562dee4820c458d823175268e41601d4c060588a --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (562dee4): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowOur benchmarks found a performance regression caused by this PR. Next Steps:
@rustbot label: +perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 496.727s -> 485.484s (-2.26%) |
|
Small regressions to I'm experimenting with #154162 to undo those small regressions, but that work will have to be judged on its own perf/complexity merits. |
Prior to rust-lang#154122 it wasn't used on all paths, so we only computed it when necessary -- sometimes in `check_if_ensure_can_skip_execution`, sometimes in one of two places in `execute_job_incr` -- and pass around `Option<DepNode>` to allow this. But now it's always used, so this commit makes us compute it earlier, which simplifies things. - `check_if_ensure_can_skip_execution` can be made simpler, returning a bool and eliminating the need for `EnsureCanSkip`. - `execute_job_incr` no longer uses two slightly different methods to create a `DepNode` (`get_or_insert_with` vs `unwrap_or_else`).
Prior to rust-lang#154122 it wasn't used on all paths, so we only computed it when necessary -- sometimes in `check_if_ensure_can_skip_execution`, sometimes in one of two places in `execute_job_incr` -- and pass around `Option<DepNode>` to allow this. But now it's always used, so this commit makes us compute it earlier, which simplifies things. - `check_if_ensure_can_skip_execution` can be made simpler, returning a bool and eliminating the need for `EnsureCanSkip`. - `execute_job_incr` no longer uses two slightly different methods to create a `DepNode` (`get_or_insert_with` vs `unwrap_or_else`).
Prior to rust-lang#154122 it wasn't used on all paths, so we only computed it when necessary -- sometimes in `check_if_ensure_can_skip_execution`, sometimes in one of two places in `execute_job_incr` -- and pass around `Option<DepNode>` to allow this. But now it's always used, so this commit makes us compute it earlier, which simplifies things. - `check_if_ensure_can_skip_execution` can be made simpler, returning a bool and eliminating the need for `EnsureCanSkip`. - `execute_job_incr` no longer uses two slightly different methods to create a `DepNode` (`get_or_insert_with` vs `unwrap_or_else`).
|
perf trirage: Marking as triaged based on the comment above. @rustbot label: +perf-regression-triaged |
View all comments
Prior experiments:
erase_and_anonymize_regions_tyfromanontono_hash#152268no_forcequery modifier forcheck_representability#153996Zulip thread: Removing the
anonquery modifierThere are currently three queries that use the
anonmodifier:check_representabilitycheck_representability_adt_tyerase_and_anonymize_regions_tyIt seems that none of them are using
anonin an essential way.According to comments and tests, the representability queries mainly care about not being eligible for forcing (despite having a recoverable key type), so that if a cycle does occur then the entire cycle will be on the query stack. Replacing
anonwith a newno_forcemodifier gives a modest perf improvement.The
erase_and_anonymize_regions_tyquery appears to be usinganonto reduce the dep-graph overhead of a query that is expected to not call any other queries (and thus have no dependencies). Replacinganonwith eitherno_hashor nothing appears to give only a very small perf hit oncargobenchmarks, which is justified by the fact that it lets us remove a lot of machinery for anonymous queries.We still need to retain some of the machinery for anonymous tasks, because the non-query task
DepKind::TraitSelectstill uses it.I have some ideas for a follow-up that will reduce dep-graph overhead by replacing all zero-dependency nodes with a singleton node, but I want to keep that separate in case it causes unexpected issues and needs to be bisected or reverted.