Adding the isa outerloop jobs#24630
Conversation
|
Unsure if there is additional work needed in AzDO to enable this, but it looks like |
| <TestEnvironment Include="jitstress2_jitstressregs0x80" JitStress="2" JitStressRegs="0x80" /> | ||
| <TestEnvironment Include="jitstress2_jitstressregs0x1000" JitStress="2" JitStressRegs="0x1000" /> | ||
| <TestEnvironment Include="tailcallstress" TailcallStress="1" /> | ||
| <TestEnvironment Include="jitsse2only" EnableAVX="0" EnableSSE3_4="0" /> |
There was a problem hiding this comment.
These removed jobs are now covered by the above, which is more expansive. Where there are fewer flags specified, it is because the Enable{ISA}=0 flags are hierarchical in nature.
- These "scenario names" only looked to be referenced from the
netci.groovyand theci-trigger-phrases.mddocumentation (the latter of which should probably be updated for AzDO).
For example, jitstressisas_nosse3_4 is the exact replacement for jitsse2only since EnableSSE3_4=0 also implicitly sets EnableAVX=0.
Likewise, jitstressisas_nohwintrinsicx86 is the replacement for jitnox86hwintrinsic but most ISAs are implicitly disabled by EnableSSE=0
There was a problem hiding this comment.
We also no longer need EnableIncompleteISAClass for most jobs since the x86 intrinsics are "feature complete" at this point (and we have no incomplete isas for x86).
| - jitstressisas_nosimd | ||
| ${{ if and(eq(parameters.testGroup, 'outerloop-jitstressisas'), or(eq(parameters.archType, 'x64'), eq(parameters.archType, 'x86'))) }}: | ||
| scenarios: | ||
| - jitstressisas_noaes |
There was a problem hiding this comment.
There is no point in wasting ARM/ARM64 machines running the x86/x64 configuration switches. Likewise, I see no reason to run the ARM/ARM64 specific switches on x86/x64 once we start ramping those up.
There was a problem hiding this comment.
This is the only major error I've seen in your YAML: you redefined scenarios here
| - jitstressisas_nosse41 | ||
| - jitstressisas_nosse42 | ||
| - jitstressisas_nossse3 | ||
| ${{ if eq(parameters.testGroup, 'outerloop-jitstressregs') }}: |
There was a problem hiding this comment.
As a note, it is likely worthwhile to add an x86/x64 specific outerloop-jistressregs-noavx scenario. AVX can impact register allocation scenarios because most SIMD instructions change from ins op1, src (where op1 is both a dst and src) to ins dst, src, src. This applies to both the scalar SIMD instructions, like used for normal floating-point arithmetic and vector SIMD instructions.
Most other isa knobs don't have any significant impact other than the number of instructions emitted, so they may not be particularly interesting to combine with the other stress scenarios.
There was a problem hiding this comment.
Agree - that scenario would be very useful (and also agree that exploding to cover all the ISA configurations doesn't provide a lot of incremental value).
There was a problem hiding this comment.
I'll open a separate PR to cover that scenario, after this ones goes in.
|
This LGTM, but @sandreenko or @echesakovMSFT (with more YAML experience than I) should look. |
|
The YAML looks fine. As for the test infra, they definitely know better, but I've added the definition to get to test this. |
|
/azp list |
|
CI/CD Pipelines for this repository: |
|
@hoyosjs, do you know why the other |
|
They are not set to be PR validation builds, so azp won't list them here. |
CarolEidt
left a comment
There was a problem hiding this comment.
This LGTM generally, and I concur with the intent, but I'm not all that familiar with this.
|
/azp run coreclr-outerloop-jitstressisas |
|
Azure Pipelines failed to run 1 pipeline(s). |
That seems incorrect. They are outerloop (like these new jobs) and so shouldn't be run in PRs by default. However, they are useful for explicit triggering in certain types of PRs and that was done not infrequently under the Jenkins setup. #24555 is a good example of where I needed to run those jobs and that currently requires manually locating the queue and knowing how git internally represents PR branches so you can explicitly kick off the group. |
Looks like it doesn't like the second scenarios grouping. I'll see if I need to explicitly filter the previous entry to just arm/arm64 (and therefore duplicate those four jobs under each scenarios grouping) or if I can move the x86/x64 specific check under the main scenarios listing... |
| - jitstressisas_nosimd | ||
| ${{ if and(eq(parameters.testGroup, 'outerloop-jitstressisas'), or(eq(parameters.archType, 'x64'), eq(parameters.archType, 'x86'))) }}: | ||
| scenarios: | ||
| - jitstressisas_noaes |
There was a problem hiding this comment.
This is the only major error I've seen in your YAML: you redefined scenarios here
|
/azp run coreclr-outerloop-jitstressisas |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
(i fully expect the |
echesakov
left a comment
There was a problem hiding this comment.
Do we need to run jitstressisas on arm32 machines, musls, rhel? If not, maybe it would makes sense to limit the testing platforms in the similar way as it's done for gcstress scenarios.
Changes to .yml and testenvironment.proj look good - I left only one nit comment.
My suggestion is to make sure that all the needed COMPlus_-env. variables are set correctly by looking at the logs on mc.dot.net.
sandreenko
left a comment
There was a problem hiding this comment.
Could you add a separator between jitstress and isa? I also don't mind if you use its full name, like coreclr-outerloop-jitstress-isa or coreclr-outerloop-jitstress-iinstruction-sets
We do not want to have any additional jobs in PR validation, because when they were people unintentionally triggered them via |
It would be nice, but there's currently a less-that-ideal behavior in azp where the default run command triggers all the legs and that's a waste of resources. Once it gets fixed, we can enable them for manual triggering. |
Is there an existing ask on AzDO to allow jobs to be individually triggered but not triggered via the regular run command? |
Yes, dnceng is mediating this |
|
@tannergooding Yes. @chcosta Is tracking it, hopefully with the highest urgency possible. |
The four jobs that are currently not limited to just x86/x64 could have some impact on these platforms (nosimd impacts S.N.Vector code, for example; which ARM32 supports). However, if it is a question of machine resources, I think we could disable them for those machines as most changes should be caught by other testing groups. |
|
@sandreenko, do you also want a spacing in the scenario names? That is, should |
Yes, I would like it. It is hard for me to parse these names without a space between. |
I like this idea |
Done. |
| - Windows_NT_arm64 | ||
| ${{ if eq(variables['Build.DefinitionName'], 'coreclr-outerloop-jitstress-isas-x86') }}: | ||
| - Linux_x64 | ||
| - OSX_x64 |
There was a problem hiding this comment.
Linux and OSX cover the same calling conventions and general codegen, but the OSX machines have caught a number of bugs due to a number of the machines in the pool having AVX but not AVX2 support.
This has been impactful for a number of cases for the older System.Numerics.Vector code when refactoring the JIT.
| - tailcallstress | ||
| ${{ if eq(parameters.testGroup, 'outerloop-jitstress-isas-arm') }}: | ||
| scenarios: | ||
| - jitstress_isas_incompletehwintrinsic |
There was a problem hiding this comment.
The first four jobs are shared between both x86 and ARM and so they don't have an architecture moniker in the name.
They were duplicated in each listing.
There was a problem hiding this comment.
This list will expand quite a bit once the work on the ARM64 HWIntrinsics starts up. But for now, it should be "good enough" for the experimental HWIntrinsics we currently have and the pre-existing System.Numerics.Vector support.
| - Linux_x64 | ||
| - Windows_NT_x64 | ||
| - Windows_NT_x86 | ||
| ${{ if eq(variables['Build.DefinitionName'], 'coreclr-outerloop-jitstress-isas-arm') }}: |
There was a problem hiding this comment.
In this case then we just need to add and rename new build definitions.
There was a problem hiding this comment.
Do you mean add a new coreclr-outerloop-jitstress-isas-arm entry in AzDO and rename the coreclr-outterloop-jitstressisas job to coreclr-outerloop-jitstress-isas-x86?
There was a problem hiding this comment.
@hoyosjs, do we have a document describing how to create the new build-definitions for coreclr? (do we just create a new definition and point it to the azure-pipelines.yml; do we clone an existing job and just rename it; is there any additional steps required for coreclr in particular; etc...)
There was a problem hiding this comment.
Indeed, clone, set the names correctly, and in the triggers please disable the PR validation ones.
There was a problem hiding this comment.
Thanks. Done.
What cadence do we want these scheduled to run and how should they be staggered as compared to the other outerloop jobs? It looks like:
outerloopis never scheduled, but is part of PR validationouterloop-jitstressis22:00 Mon through Sunouterloop-jitstressregsis2:00 Sat and Sunouterloop-gcstress0x3-gcstress0xcis5:00 Sat and Sunouterloop-jitstress2-jitstressregsis9:00 Sat and Sunouterloop-gcstress-extrais13:00 Sat and Sunouterloop-r2r-extrais18:00 Sat and Sun
Unless I've missed something, I don't believe Arm32 has support for S.N.Vector |
Looks like I was not remembering correctly. I've removed the arm32 platform from the |
|
/azp list |
|
CI/CD Pipelines for this repository: |
|
Ok, I think everything is now configured correctly and jobs are now running. We probably want to have these new build-definitions scheduled for running (as with most of the other outerloop jobs): #24630 (comment); but otherwise I think this is good for final review (provided all jobs finish running as expected). |
sandreenko
left a comment
There was a problem hiding this comment.
LGTM, thanks for adding this.
Could you please also mark what you did in https://github.com/dotnet/coreclr/issues/24358?
Done |
|
Looks like at least one bug to look at for I'll see if I can repro locally (against current master) and log a bug. CC. @CarolEidt. |
|
Looks like the failure is in https://source.dot.net/#System.Private.CoreLib/shared/System/SpanHelpers.Char.cs,217 (the only other option that would match the I'd guess the failure is likely do to the |
|
Failure was actually in the constructor call here: https://source.dot.net/#System.Private.CoreLib/shared/System/SpanHelpers.Char.cs,274 We were checking all the config flags except I've got fixes for both ready and will put them up shortly. |
|
Going to merge this first, however, since the jobs are running, doing the right thing, and the failures are not introduced by this PR. I'll run the jobs as part of the PR with the fix to validate everything passes there. |
|
Fix for the found job failures is here: #24649 |
* Adding the isa outerloop jobs * Don't redefine scenarios for the jitstressisas group * Splitting jitstress-isas into jitstress-isas-arm and jitstress-isas-x86 * Fixing the azure-pipelines.yml to include platforms: * Removing Linux_arm as a platform for the jitstress-isas-arm group * Ensure that the platforms for the jitstress-isas jobs are listed in both places required * Removing Windows_NT_arm from the jitstress-isas-arm platforms Commit migrated from dotnet/coreclr@9b0a199
This resolves https://github.com/dotnet/coreclr/issues/21339
CC. @sandreenko, @CarolEidt, @BruceForstall, @echesakovMSFT