Refactor performance tests artifacts handling#48684
Conversation
|
Size Change: 0 B Total Size: 1.34 MB ℹ️ View Unchanged
|
|
Flaky tests detected in 7c86fa4. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/4415097223
|
0c299b1 to
d27f085
Compare
89698f2 to
0c1446a
Compare
b4eb3c0 to
1d8219e
Compare
| }, | ||
| { | ||
| file: 'front-end-classic-theme-performance-results.json', | ||
| file: 'front-end-classic-theme.results.json', |
There was a problem hiding this comment.
is there a need to remove this? it was my assumption that we might end up with more test results in the future and this could help make it obvious which result belongs to which test.
it's fine to change it, and we can handle it later of course, but it has been helpful to me at least seeing all the information in the filename
There was a problem hiding this comment.
The idea here was that if CI zips it into performance-results folder, then it doesn't need a .performance-results.js suffix. On second thought, though, there's no zipping when running locally, and everything lands in the artifacts folder, so it would be better to keep that suffix AND add it to the raw result jsons as well. Will make an update.
| `packages/e2e-tests/specs/performance/${ testSuite }.test.results.json` | ||
| ) | ||
| await runShellScript( | ||
| `npm run test:performance -- ${ testSuite } --wordpress-artifacts-path=${ artifactsPath } --results-filename=${ resultsFilename }`, |
There was a problem hiding this comment.
if we're relying on the ENV is it necessary to pass this here or could the performance test read the same ENV?
There was a problem hiding this comment.
Yeah, I thought it would, but it looks like only the top-level process (node cli.js perf) receives the ENV from CI. The await runShellScript( "npm run ... spawns a separate process that doesn't receive the same ENV so I needed to figure out a way to pass it down. Maybe there's a better way to share the env, but at my current level of understanding, this made the most sense.
There was a problem hiding this comment.
we can manually pass through any ENV we want in runShellScript
gutenberg/bin/plugin/lib/utils.js
Lines 17 to 26 in 6434e0b
@youknowriad would it make sense to pass env: { ...process.env, …, ...env } as a default? instead of env: { …, ...env }? would that be dangerous?
There was a problem hiding this comment.
we can manually pass through any ENV we want in runShellScript
Yeah I've just noticed that! 😄 Made already some changes in that regard in a53065b. I'm not passing the whole parent env though - just the relevant bits. I'm curious though if passing the whole thing would be safe.
| async function runPerformanceTests( branches, options ) { | ||
| const runningInCI = !! process.env.CI || !! options.ci; | ||
| const TEST_ROUNDS = options.rounds || 1; | ||
| const artifactsPath = process.env.WP_ARTIFACTS_PATH || ''; |
There was a problem hiding this comment.
to double-check, if this is empty, it will print to the current directory? or should we pass './' as the default? or something like __dirname?
There was a problem hiding this comment.
If this is empty, it will be resolved down the stream (per job) when the puppeteer env is set up (when running npm run test:performance):
gutenberg/packages/scripts/config/jest-environment-puppeteer/index.js
Lines 45 to 49 in b3ab0ee
Note that the GITHUB_WORKSPACE var is not available here when running in CI (as mentioned here) so it will always resolve to the cwd. This was causing the artifacts to be saved along the env folders that we create during the perf tests setup and the necessity to copy them later on to the workspace's ./__test-results folder, so that they're available for the CI.
There was a problem hiding this comment.
Note that the
GITHUB_WORKSPACEvar is not available here when running in CI (...)
| '/.wp-env.json' | ||
| // Create the config file for the current env. | ||
| fs.writeFileSync( | ||
| path.join( environmentDirectory, '.wp-env.json' ), |
There was a problem hiding this comment.
similar question here about removing performance from the name. when I download these files they lose some inherent linkage to the performance CI job that created them.
There was a problem hiding this comment.
It was the same before - we were copying the .wp-env.performance.json and renaming it to .wp-env.json, so in this regard nothing has changed. I assumed the default .wp-env.json name is used for the wp-env start script to pick it up without any extra arguments.
| rawResults[ i ] = {}; | ||
| for ( const branch of branches ) { | ||
| const runKey = `${ branch }_${ testSuite }_run-${ i }`; | ||
| const runKey = `${ testSuite }_${ branch }_run-${ i }`; |
There was a problem hiding this comment.
is there context behind this swap? not the most important thing, but it stands out in this PR where we're working on the path changes.
I guess there's a claim that grouping by test suite is more favorable than grouping by branch?
There was a problem hiding this comment.
I've swapped it because it's the actual order in which we iterate through those tests and how we compare the results (suite A: branch X vs. branch Y, suite B: branch X vs. branch Y, etc.). So both the log info and results filenames will now be compiled in this fashion:
| runner info (current) | runner info (new) |
|---|---|
![]() |
![]() |
| results upload (current) | results upload (new) |
|---|---|
![]() |
![]() |
There was a problem hiding this comment.
As a side note, I think we should also replace trunk in the reference measurements with commit SHA, because trunk alone doesn't say much. Also, the summary should point to what is the reference and what is subject of the comparison, and display a 3rd column (delta) to show whether we're slowing down or speeding up, for example:
>> Comparing trunk (13a14ca) with the current branch (32b4c12)
┌──────────────────────┬─────────────────────────┬────────────────────────┬─────────────────┐
│ (index) │ 13a14ca (reference) │ 32b4c12 (subject) │ Δ │
├──────────────────────┼─────────────────────────┼────────────────────────┼─────────────────┤
│ serverResponse │ '235.12 ms' │ '233.46 ms' │ -0.5% │
│ firstPaint │ '251.88 ms' │ '186.88 ms' │ +18.2% │
│ domContentLoaded │ '685.96 ms' │ '651.18 ms' │ +0.5% │
│ loaded │ '686.72 ms' │ '652.04 ms' │ x.xx% │
│ firstContentfulPaint │ '8436.28 ms' │ '8057 ms' │ x.xx% │
│ firstBlock │ '9095.38 ms' │ '8713.32 ms' │ x.xx% │
│ type │ '42.98 ms' │ '43.03 ms' │ x.xx% │
│ minType │ '41.9 ms' │ '41.92 ms' │ x.xx% │
│ maxType │ '45.71 ms' │ '46.52 ms' │ x.xx% │
│ typeContainer │ '14.81 ms' │ '14.99 ms' │ x.xx% │
│ minTypeContainer │ '13.35 ms' │ '13.69 ms' │ x.xx% │
│ maxTypeContainer │ '18.87 ms' │ '17.7 ms' │ x.xx% │
│ focus │ '40.99 ms' │ '44.78 ms' │ x.xx% │
│ minFocus │ '35.64 ms' │ '35.16 ms' │ x.xx% │
│ maxFocus │ '64.35 ms' │ '71.31 ms' │ x.xx% │
│ inserterOpen │ '33.11 ms' │ '32.39 ms' │ x.xx% │
│ minInserterOpen │ '27.96 ms' │ '27.73 ms' │ x.xx% │
│ maxInserterOpen │ '52.56 ms' │ '52.9 ms' │ x.xx% │
│ inserterSearch │ '13.53 ms' │ '12.29 ms' │ x.xx% │
│ minInserterSearch │ '5.76 ms' │ '6.11 ms' │ x.xx% │
│ maxInserterSearch │ '19.31 ms' │ '18.43 ms' │ x.xx% │
│ inserterHover │ '26.31 ms' │ '26.4 ms' │ x.xx% │
│ minInserterHover │ '22.51 ms' │ '22.38 ms' │ x.xx% │
│ maxInserterHover │ '47.75 ms' │ '46.44 ms' │ x.xx% │
│ listViewOpen │ '157.32 ms' │ '159.09 ms' │ x.xx% │
│ minListViewOpen │ '142.89 ms' │ '142.65 ms' │ x.xx% │
│ maxListViewOpen │ '235.52 ms' │ '253.54 ms' │ x.xx% │
└──────────────────────┴─────────────────────────┴────────────────────────┴─────────────────┘
What do you think?
There was a problem hiding this comment.
I think we should also replace trunk in the reference measurements with commit SHA
if we can display both that also has value. I think different needs use them differently. for most purposes trunk does a better job communicating than 13a14ca because we recognize that when a CI job runs it's running against the latest trunk at the time the CI job ran. we can always lookup what 13a14ca is, but that involves additional steps when trunk is pretty bare and simple.
what is the reference and what is subject of the comparison
for most runs this matters, but the script supports multiple "branches" beyond two, and in those cases I'm not sure one is the reference. if we include this we will be making a decision that one of the "branches" is the reference, likely the first one. (branches is in quotes because it's really just a git ref).
and display a 3rd column (delta) to show whether we're slowing down or speeding up, for example:
no strong opinion here on what we should do, but I can share my personal hesitancy with this due to the fact that we're still hitting some wild variation across tests (even with the recent fixes), and percentages always make changes look simpler than they are. I like making the easy-to-misuse things harder to use, whereas simply reporting ms is just that, a fact reported without any interpretation, because the interpretation is itself extremely difficult to automate.
…r e2e env variables are set. Remove the GITHUB_WORKSPACE path variable as it's not reachable for the child process in CI.
| 'test/emptytheme' | ||
| ), | ||
| 'https://downloads.wordpress.org/theme/twentytwentyone.1.7.zip', | ||
| 'https://downloads.wordpress.org/theme/twentytwentythree.1.0.zip', |
There was a problem hiding this comment.
are these things we want to hard-code here in this seemingly independent function? I could easily imagine someone wanting to update the themes and then overlooking these lines…
There was a problem hiding this comment.
This is only carried over from the .wp-env.performance.json boilerplate file. Hard-coding those here was a solution to an issue described in #49063. In my mind, it was an intermediate solution because those themes should be installed within the tests that use them so that when running tests in isolation (e.g. via npm run test:performance theme-tests), same themes are used as in CI or when we're running locally via the CLI. I would consider this a blocker - I plan to address this in a follow-up PR.
There was a problem hiding this comment.
I would consider this a blocker - I plan to address this in a follow-up PR.
if you meant "I would not consider this a blocker" then that sounds good. if not, then I'm confused 😄
There was a problem hiding this comment.
Ah, good catch! Missed the important 'not' there 😅
|
Really love where you've taken this. What do you see as continued blockers to get this in, or do you think it's ready to go? |
@dmsnell, I don't see any blockers at this point. In a follow-up PR which I'm already working on, I plan to make the builds reusable for subsequent jobs (at least locally) and parallelize them (builds) to make things faster. |
dmsnell
left a comment
There was a problem hiding this comment.
Thanks for the work. I don't see any reason to block this.
Let's update our setup following WordPress/gutenberg#48684




What?
TL;DR:
./artifactsfolder regardless of the triggering environment:CI via CLI / locally via CLI / locally via npm, etc.
current-branch vs. trunkcomparison, as they provide useful information,.wp-env.performance.jsonboilerplate.Save all the performance result files to the main artifacts folder
Instead of storing files around the source code, let's use the already available
WP_ARTIFACTS_PATHdirectory.The
process.env.WP_ARTIFACTS_PATHhas been primarily used for storing the failed test artifacts (screenshots, HTML snapshots), but we can also use it for storing any other artifacts, like performance results or temporary trace files. Those files are currently stored alongside the source code and then copied to an intermediate./__test-results/folder, which is then picked up by the results upload CI action. After this PR, those files will be saved directly to theWP_ARTIFACTS_PATHfolder, so we won't need to:Once created, they're ready to be picked up by the upload action because the CI will define the
WP_ARTIFACTS_PATHvariable.Additionally, the final (calculated) results file that we currently save to the enigmatic path (before uploading them to codehealth), will now be saved to the
WP_ARTIFACTS_PATHas well and zipped with the other (raw) results.Finally, if the CI job fails, the results upload step will not occur. However, if there are any results that have already been created, they will still be uploaded along with the failure artifacts.
Archive performance results for each comparison step
Currently, we are archiving performance results only for the
Compare performance with trunkstep. Now, the results will be uploaded for every comparison step as they provide useful information:In case of a failure, the results upload action will not be dispatched. If any result files are written during a failed job, though, they will be uploaded within the failure artifacts upload, along with the screenshot and HTML snapshot of the failed test.
Sort results and test logs primarily by the suite name
This aligns with the test running order (test suite -> branch). See the comparison below of the current vs. new results upload content.
I've updated the test log to reflect the above and added the
test roundsinfo that was missing:Write the env-specific configs directly
For the performance tests, we're creating "environment" folders for each branch we want to test against. Those folders need to contain env-specific
wp-envconfig, which we're currently copying from a boilerplate config that we copy over and amend with the target env config values. There's no point keeping this boilerplate and doing the extra copy step, because we can write this file directly. Also, this way we can provide absolute paths which makes it more readable.Testing Instructions
CI:
Locally:
wp-envis stopped,bin/plugin/cli.js perf refactor/perf-test-results-path trunk,./artifactsfolder.npm run test:performance test-editorTo test if the failure artifacts are still uploaded as expected:
expect(false).toBe(true),bin/plugin/cli.js perf refactor/perf-test-results-path trunkagain,./artifactsas well.