Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Adding the isa outerloop jobs#24630

Merged
tannergooding merged 7 commits into
dotnet:masterfrom
tannergooding:jitstressisa
May 18, 2019
Merged

Adding the isa outerloop jobs#24630
tannergooding merged 7 commits into
dotnet:masterfrom
tannergooding:jitstressisa

Conversation

@tannergooding
Copy link
Copy Markdown
Member

@tannergooding
Copy link
Copy Markdown
Member Author

Unsure if there is additional work needed in AzDO to enable this, but it looks like ci.dot.net doesn't even resolve anymore and not having these makes it difficult to properly validate changes like #24555

<TestEnvironment Include="jitstress2_jitstressregs0x80" JitStress="2" JitStressRegs="0x80" />
<TestEnvironment Include="jitstress2_jitstressregs0x1000" JitStress="2" JitStressRegs="0x1000" />
<TestEnvironment Include="tailcallstress" TailcallStress="1" />
<TestEnvironment Include="jitsse2only" EnableAVX="0" EnableSSE3_4="0" />
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These removed jobs are now covered by the above, which is more expansive. Where there are fewer flags specified, it is because the Enable{ISA}=0 flags are hierarchical in nature.

  • These "scenario names" only looked to be referenced from the netci.groovy and the ci-trigger-phrases.md documentation (the latter of which should probably be updated for AzDO).

For example, jitstressisas_nosse3_4 is the exact replacement for jitsse2only since EnableSSE3_4=0 also implicitly sets EnableAVX=0.

Likewise, jitstressisas_nohwintrinsicx86 is the replacement for jitnox86hwintrinsic but most ISAs are implicitly disabled by EnableSSE=0

Copy link
Copy Markdown
Member Author

@tannergooding tannergooding May 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also no longer need EnableIncompleteISAClass for most jobs since the x86 intrinsics are "feature complete" at this point (and we have no incomplete isas for x86).

Comment thread eng/test-job.yml Outdated
- jitstressisas_nosimd
${{ if and(eq(parameters.testGroup, 'outerloop-jitstressisas'), or(eq(parameters.archType, 'x64'), eq(parameters.archType, 'x86'))) }}:
scenarios:
- jitstressisas_noaes
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no point in wasting ARM/ARM64 machines running the x86/x64 configuration switches. Likewise, I see no reason to run the ARM/ARM64 specific switches on x86/x64 once we start ramping those up.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only major error I've seen in your YAML: you redefined scenarios here

Comment thread eng/test-job.yml
- jitstressisas_nosse41
- jitstressisas_nosse42
- jitstressisas_nossse3
${{ if eq(parameters.testGroup, 'outerloop-jitstressregs') }}:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, it is likely worthwhile to add an x86/x64 specific outerloop-jistressregs-noavx scenario. AVX can impact register allocation scenarios because most SIMD instructions change from ins op1, src (where op1 is both a dst and src) to ins dst, src, src. This applies to both the scalar SIMD instructions, like used for normal floating-point arithmetic and vector SIMD instructions.

Most other isa knobs don't have any significant impact other than the number of instructions emitted, so they may not be particularly interesting to combine with the other stress scenarios.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - that scenario would be very useful (and also agree that exploding to cover all the ISA configurations doesn't provide a lot of incremental value).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a separate PR to cover that scenario, after this ones goes in.

@BruceForstall
Copy link
Copy Markdown

This LGTM, but @sandreenko or @echesakovMSFT (with more YAML experience than I) should look.

@hoyosjs
Copy link
Copy Markdown
Member

hoyosjs commented May 17, 2019

The YAML looks fine. As for the test infra, they definitely know better, but I've added the definition to get to test this.

@hoyosjs
Copy link
Copy Markdown
Member

hoyosjs commented May 17, 2019

/azp list

@azure-pipelines
Copy link
Copy Markdown

@tannergooding
Copy link
Copy Markdown
Member Author

@hoyosjs, do you know why the other coreclr-outerloop (jitstressregs and the gcstress ones) don't show up in the list command?

@hoyosjs
Copy link
Copy Markdown
Member

hoyosjs commented May 17, 2019

They are not set to be PR validation builds, so azp won't list them here.

Copy link
Copy Markdown

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM generally, and I concur with the intent, but I'm not all that familiar with this.

@hoyosjs
Copy link
Copy Markdown
Member

hoyosjs commented May 17, 2019

/azp run coreclr-outerloop-jitstressisas

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines failed to run 1 pipeline(s).

@tannergooding
Copy link
Copy Markdown
Member Author

They are not set to be PR validation builds, so azp won't list them here.

That seems incorrect. They are outerloop (like these new jobs) and so shouldn't be run in PRs by default.

However, they are useful for explicit triggering in certain types of PRs and that was done not infrequently under the Jenkins setup.

#24555 is a good example of where I needed to run those jobs and that currently requires manually locating the queue and knowing how git internally represents PR branches so you can explicitly kick off the group.

@tannergooding
Copy link
Copy Markdown
Member Author

tannergooding commented May 17, 2019

Azure Pipelines failed to run 1 pipeline(s).

Looks like it doesn't like the second scenarios grouping. I'll see if I need to explicitly filter the previous entry to just arm/arm64 (and therefore duplicate those four jobs under each scenarios grouping) or if I can move the x86/x64 specific check under the main scenarios listing...

Comment thread eng/test-job.yml Outdated
- jitstressisas_nosimd
${{ if and(eq(parameters.testGroup, 'outerloop-jitstressisas'), or(eq(parameters.archType, 'x64'), eq(parameters.archType, 'x86'))) }}:
scenarios:
- jitstressisas_noaes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only major error I've seen in your YAML: you redefined scenarios here

Comment thread eng/test-job.yml Outdated
@tannergooding
Copy link
Copy Markdown
Member Author

/azp run coreclr-outerloop-jitstressisas

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@tannergooding
Copy link
Copy Markdown
Member Author

(i fully expect the jitstressisas_noavx job to fail due to the issue being resolved in #24555)

Copy link
Copy Markdown

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to run jitstressisas on arm32 machines, musls, rhel? If not, maybe it would makes sense to limit the testing platforms in the similar way as it's done for gcstress scenarios.

Changes to .yml and testenvironment.proj look good - I left only one nit comment.

My suggestion is to make sure that all the needed COMPlus_-env. variables are set correctly by looking at the logs on mc.dot.net.

Comment thread eng/test-job.yml Outdated
Copy link
Copy Markdown

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a separator between jitstress and isa? I also don't mind if you use its full name, like coreclr-outerloop-jitstress-isa or coreclr-outerloop-jitstress-iinstruction-sets

@sandreenko
Copy link
Copy Markdown

@hoyosjs, do you know why the other coreclr-outerloop (jitstressregs and the gcstress ones) don't show up in the list command?

We do not want to have any additional jobs in PR validation, because when they were people unintentionally triggered them via /a_z_p run, if you want to trigger additional testing you can click do to azp portal, choose a job, clock queue, instead of default “master” type “refs/pull/YOUR_PR_NUMBER/head" in the branch box, click “Queue”.

@hoyosjs
Copy link
Copy Markdown
Member

hoyosjs commented May 17, 2019

They are not set to be PR validation builds, so azp won't list them here.

That seems incorrect. They are outerloop (like these new jobs) and so shouldn't be run in PRs by default.

However, they are useful for explicit triggering in certain types of PRs and that was done not infrequently under the Jenkins setup.

#24555 is a good example of where I needed to run those jobs and that currently requires manually locating the queue and knowing how git internally represents PR branches so you can explicitly kick off the group.

It would be nice, but there's currently a less-that-ideal behavior in azp where the default run command triggers all the legs and that's a waste of resources. Once it gets fixed, we can enable them for manual triggering.

@tannergooding
Copy link
Copy Markdown
Member Author

because when they were people unintentionally triggered them

Is there an existing ask on AzDO to allow jobs to be individually triggered but not triggered via the regular run command?

@hoyosjs
Copy link
Copy Markdown
Member

hoyosjs commented May 17, 2019

because when they were people unintentionally triggered them

Is there an existing ask on AzDO to allow jobs to be individually triggered but not triggered via the regular run command?

Yes, dnceng is mediating this

@RussKeldorph
Copy link
Copy Markdown

@tannergooding Yes. @chcosta Is tracking it, hopefully with the highest urgency possible.

@tannergooding
Copy link
Copy Markdown
Member Author

Do we need to run jitstressisas on arm32 machines, musls, rhel?

The four jobs that are currently not limited to just x86/x64 could have some impact on these platforms (nosimd impacts S.N.Vector code, for example; which ARM32 supports).

However, if it is a question of machine resources, I think we could disable them for those machines as most changes should be caught by other testing groups.

@tannergooding
Copy link
Copy Markdown
Member Author

@sandreenko, do you also want a spacing in the scenario names? That is, should jitstressisas_nosimd also be changed to jitstress_isas_nosimd?

@sandreenko
Copy link
Copy Markdown

@sandreenko, do you also want a spacing in the scenario names? That is, should jitstressisas_nosimd also be changed to jitstress_isas_nosimd?

Yes, I would like it. It is hard for me to parse these names without a space between.

@echesakov
Copy link
Copy Markdown

Do we need to run jitstressisas on arm32 machines, musls, rhel?

I could also help mitigate this by splitting it into jitstress-isas-x86 and jitstress-isas-arm which are filtered to fewer platforms each? Most changes aren't cross cutting and you can always run both sets in the case they are.
Thoughts?

I like this idea

@tannergooding
Copy link
Copy Markdown
Member Author

I like this idea

Done.

Comment thread azure-pipelines.yml
- Windows_NT_arm64
${{ if eq(variables['Build.DefinitionName'], 'coreclr-outerloop-jitstress-isas-x86') }}:
- Linux_x64
- OSX_x64
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux and OSX cover the same calling conventions and general codegen, but the OSX machines have caught a number of bugs due to a number of the machines in the pool having AVX but not AVX2 support.

This has been impactful for a number of cases for the older System.Numerics.Vector code when refactoring the JIT.

Comment thread eng/test-job.yml
- tailcallstress
${{ if eq(parameters.testGroup, 'outerloop-jitstress-isas-arm') }}:
scenarios:
- jitstress_isas_incompletehwintrinsic
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first four jobs are shared between both x86 and ARM and so they don't have an architecture moniker in the name.

They were duplicated in each listing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list will expand quite a bit once the work on the ARM64 HWIntrinsics starts up. But for now, it should be "good enough" for the experimental HWIntrinsics we currently have and the pre-existing System.Numerics.Vector support.

Comment thread azure-pipelines.yml
- Linux_x64
- Windows_NT_x64
- Windows_NT_x86
${{ if eq(variables['Build.DefinitionName'], 'coreclr-outerloop-jitstress-isas-arm') }}:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case then we just need to add and rename new build definitions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean add a new coreclr-outerloop-jitstress-isas-arm entry in AzDO and rename the coreclr-outterloop-jitstressisas job to coreclr-outerloop-jitstress-isas-x86?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoyosjs, do we have a document describing how to create the new build-definitions for coreclr? (do we just create a new definition and point it to the azure-pipelines.yml; do we clone an existing job and just rename it; is there any additional steps required for coreclr in particular; etc...)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, clone, set the names correctly, and in the triggers please disable the PR validation ones.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

What cadence do we want these scheduled to run and how should they be staggered as compared to the other outerloop jobs? It looks like:

  • outerloop is never scheduled, but is part of PR validation
  • outerloop-jitstress is 22:00 Mon through Sun
  • outerloop-jitstressregs is 2:00 Sat and Sun
  • outerloop-gcstress0x3-gcstress0xc is 5:00 Sat and Sun
  • outerloop-jitstress2-jitstressregs is 9:00 Sat and Sun
  • outerloop-gcstress-extra is 13:00 Sat and Sun
  • outerloop-r2r-extra is 18:00 Sat and Sun

@CarolEidt
Copy link
Copy Markdown

The four jobs that are currently not limited to just x86/x64 could have some impact on these platforms (nosimd impacts S.N.Vector code, for example; which ARM32 supports).

Unless I've missed something, I don't believe Arm32 has support for S.N.Vector

@tannergooding
Copy link
Copy Markdown
Member Author

Unless I've missed something, I don't believe Arm32 has support for S.N.Vector

Looks like I was not remembering correctly. I've removed the arm32 platform from the isas-arm group.

@tannergooding
Copy link
Copy Markdown
Member Author

/azp list

@azure-pipelines
Copy link
Copy Markdown

CI/CD Pipelines for this repository:
coreclr-outerloop
coreclr-ci
dotnet-coreclr-nullablefeature

@tannergooding
Copy link
Copy Markdown
Member Author

Ok, I think everything is now configured correctly and jobs are now running. We probably want to have these new build-definitions scheduled for running (as with most of the other outerloop jobs): #24630 (comment); but otherwise I think this is good for final review (provided all jobs finish running as expected).

Copy link
Copy Markdown

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding this.

Could you please also mark what you did in https://github.com/dotnet/coreclr/issues/24358?

@tannergooding
Copy link
Copy Markdown
Member Author

Could you please also mark what you did in #24358?

Done

@tannergooding
Copy link
Copy Markdown
Member Author

Looks like at least one bug to look at for nohwintrinsic and nosse3_4:

Assert failure(PID 2496 [0x000009c0], Thread: 1528 [0x05f8]): Assertion failed 'tree->IsLocal() || (tree->OperGet() == GT_RET_EXPR) || (tree->OperGet() == GT_CALL) || ((tree->gtOper == GT_ADDR) && varTypeIsSIMD(tree->gtGetOp1()))' in 'System.SpanHelpers:IndexOf(byref,ushort,int):int' (IL size 339)

    File: f:\\workspace\\_work\\1\\s\\src\\jit\\simd.cpp Line: 1112
    Image: C:\\dotnetbuild\\work\\4e7e96c9-6d3c-4d41-bf75-f7ce7f7707ac\\Payload\\CoreRun.exe

I'll see if I can repro locally (against current master) and log a bug.

CC. @CarolEidt.

@tannergooding
Copy link
Copy Markdown
Member Author

Looks like the failure is in https://source.dot.net/#System.Private.CoreLib/shared/System/SpanHelpers.Char.cs,217 (the only other option that would match the (byref, ushort, int):int signature is https://source.dot.net/#System.Private.CoreLib/shared/System/SpanHelpers.T.cs,124, but that doesn't use any SIMD code).

I'd guess the failure is likely do to the Unsafe.Read<Vector<ushort>>(pCh) call that is happening, since that is the only Vector byref (but we'll see after things finish building locally).

@tannergooding
Copy link
Copy Markdown
Member Author

Failure was actually in the constructor call here: https://source.dot.net/#System.Private.CoreLib/shared/System/SpanHelpers.Char.cs,274

We were checking all the config flags except EnableSSE3_4 when determining whether the max vector size is 32 or 16 and were incorrectly returning 32. We are also in a similar situation with EnableHWIntrinsic, but the fix for that is slightly different...

I've got fixes for both ready and will put them up shortly.

@tannergooding
Copy link
Copy Markdown
Member Author

Going to merge this first, however, since the jobs are running, doing the right thing, and the failures are not introduced by this PR.

I'll run the jobs as part of the PR with the fix to validate everything passes there.

@tannergooding
Copy link
Copy Markdown
Member Author

Fix for the found job failures is here: #24649

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Adding the isa outerloop jobs

* Don't redefine scenarios for the jitstressisas group

* Splitting jitstress-isas into jitstress-isas-arm and jitstress-isas-x86

* Fixing the azure-pipelines.yml to include platforms:

* Removing Linux_arm as a platform for the jitstress-isas-arm group

* Ensure that the platforms for the jitstress-isas jobs are listed in both places required

* Removing Windows_NT_arm from the jitstress-isas-arm platforms


Commit migrated from dotnet/coreclr@9b0a199
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CI jobs for each X86 instruction set architecture

7 participants