Perf Tests: Wild attempt at improving test determinism#47400
Closed
dmsnell wants to merge 2 commits into
Closed
Conversation
|
Size Change: 0 B Total Size: 1.31 MB ℹ️ View Unchanged
|
In this patch we're tossing in Chrome CLI args in an attempt to run things in a more deterministic manner, as well as trying to improve the timer precision, which is normally reduced as a mitigation against speculative execution attacks. The goal is to be able to run the performance tests CI workflow on a branch against itself or against another branch with a no-code change (such as a Markdown doc update) and end up with performance test results that are close enough to each other to be effectively equal. Currently the tests run with variation in the results that exceeds any actual variation between the branches, whereas we find statistical confidence that random noise is unlikely to account for the differences in the readings that we measure. Hopefully we can adjust some command-line arguments and figure out that some of them will help with the test reliability and we can add those to the repository. https://peter.sh/experiments/chromium-command-line-switches/ https://chromium.googlesource.com/v8/v8/+/master/src/flags/flag-definitions.h#188
22b558d to
479eba1
Compare
|
Flaky tests detected in 479eba14f9855c0cd7e3387aa5174bb725a18009. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/4108976751
|
025b93e to
e5c613d
Compare
e5c613d to
9c5b722
Compare
Member
Author
|
Closing since the introduction of #47889 greatly improved test determinism. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status
Please ignore this PR. It's for testing and exploration.
--js-flags="--predictable --predictable_gc_schedule --single_threaded"works but didn't noticeably impact the test results--deterministic-modeleads to test failures after longer-than-30s delays--deterministic-modealso gets cut after 6 hours by Github Actions' test deadlinesWhat?
Attempt to add Chrome flags to our test suites that might improve test reliability.
Why?
Because our tests are reporting performance metrics that are wrong.
How?
In this patch we're tossing in Chrome CLI args in an attempt to run things in a more deterministic manner, as well as trying to improve the timer precision, which is normally reduced as a mitigation against speculative execution attacks.
The goal is to be able to run the performance tests CI workflow on a branch against itself or against another branch with a no-code change (such as a Markdown doc update) and end up with performance test results that are close enough to each other to be effectively equal.
Currently the tests run with variation in the results that exceeds any actual variation between the branches, whereas we find statistical confidence that random noise is unlikely to account for the differences in the readings that we measure.
Hopefully we can adjust some command-line arguments and figure out that some of them will help with the test reliability and we can add those to the repository.
https://peter.sh/experiments/chromium-command-line-switches/ https://chromium.googlesource.com/v8/v8/+/master/src/flags/flag-definitions.h#188
Testing Instructions
This is the test. Please ignore the PR for now.