Skip to content

Conversation

@SingleAccretion
Copy link
Contributor

In #76491 the Jit became a little more eager to optimize code using vectors as a whole. Unfortunately, this regressed Matrix4x4.CreateScale on some platforms because code for it is written in a way that favors promotion over whole-register access.

This change fixes the regression by using style that is more in line with what the Jit expects of vectors. It also optimizes related overloads by avoiding the relatively expensive get_Identity property call (that is not even inlined due to loading a value type field).

Linux x64 benchmark results:

Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
CreateScaleFromVectorBenchmark Job-AFMBXB Root/corerun 12.575 ns 0.1031 ns 0.0861 ns 12.555 ns 12.452 ns 12.770 ns 0.74 0.01 - NA
CreateScaleFromVectorBenchmark Job-KPUHXS /crb/corerun 16.988 ns 0.1283 ns 0.1138 ns 16.959 ns 16.789 ns 17.189 ns 1.00 0.00 - NA
CreateScaleFromVectorWithCenterBenchmark Job-AFMBXB Root/corerun 9.548 ns 0.0628 ns 0.0588 ns 9.529 ns 9.476 ns 9.663 ns 0.44 0.02 - NA
CreateScaleFromVectorWithCenterBenchmark Job-KPUHXS /crb/corerun 21.963 ns 0.9062 ns 1.0435 ns 21.291 ns 21.155 ns 24.310 ns 1.00 0.00 - NA
CreateScaleFromScalarBenchmark Job-AFMBXB Root/corerun 9.321 ns 0.0701 ns 0.0655 ns 9.329 ns 9.207 ns 9.442 ns 0.54 0.00 - NA
CreateScaleFromScalarBenchmark Job-KPUHXS /crb/corerun 17.138 ns 0.0886 ns 0.0828 ns 17.154 ns 16.954 ns 17.255 ns 1.00 0.00 - NA
CreateScaleFromScalarWithCenterBenchmark Job-AFMBXB Root/corerun 12.807 ns 0.1044 ns 0.0977 ns 12.789 ns 12.689 ns 12.999 ns 0.74 0.01 - NA
CreateScaleFromScalarWithCenterBenchmark Job-KPUHXS /crb/corerun 17.203 ns 0.0963 ns 0.0901 ns 17.216 ns 16.980 ns 17.342 ns 1.00 0.00 - NA
CreateScaleFromScalarXYZBenchmark Job-AFMBXB Root/corerun 9.454 ns 0.0688 ns 0.0609 ns 9.433 ns 9.389 ns 9.577 ns 0.46 0.00 - NA
CreateScaleFromScalarXYZBenchmark Job-KPUHXS /crb/corerun 20.481 ns 0.1567 ns 0.1466 ns 20.431 ns 20.296 ns 20.769 ns 1.00 0.00 - NA
CreateScaleFromScalarXYZWithCenterBenchmark Job-AFMBXB Root/corerun 9.434 ns 0.0380 ns 0.0297 ns 9.438 ns 9.384 ns 9.477 ns 0.55 0.00 - NA
CreateScaleFromScalarXYZWithCenterBenchmark Job-KPUHXS /crb/corerun 17.160 ns 0.1791 ns 0.1588 ns 17.171 ns 16.901 ns 17.433 ns 1.00 0.00 - NA

Where Root/corerun is this change and /crb/corerun is main.

Fixes #78977.

@ghost ghost added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member labels Dec 2, 2022
@ghost
Copy link

ghost commented Dec 2, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

In #76491 the Jit became a little more eager to optimize code using vectors as a whole. Unfortunately, this regressed Matrix4x4.CreateScale on some platforms because code for it is written in a way that favors promotion over whole-register access.

This change fixes the regression by using style that is more in line with what the Jit expects of vectors. It also optimizes related overloads by avoiding the relatively expensive get_Identity property call (that is not even inlined due to loading a value type field).

Linux x64 benchmark results:

Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
CreateScaleFromVectorBenchmark Job-AFMBXB Root/corerun 12.575 ns 0.1031 ns 0.0861 ns 12.555 ns 12.452 ns 12.770 ns 0.74 0.01 - NA
CreateScaleFromVectorBenchmark Job-KPUHXS /crb/corerun 16.988 ns 0.1283 ns 0.1138 ns 16.959 ns 16.789 ns 17.189 ns 1.00 0.00 - NA
CreateScaleFromVectorWithCenterBenchmark Job-AFMBXB Root/corerun 9.548 ns 0.0628 ns 0.0588 ns 9.529 ns 9.476 ns 9.663 ns 0.44 0.02 - NA
CreateScaleFromVectorWithCenterBenchmark Job-KPUHXS /crb/corerun 21.963 ns 0.9062 ns 1.0435 ns 21.291 ns 21.155 ns 24.310 ns 1.00 0.00 - NA
CreateScaleFromScalarBenchmark Job-AFMBXB Root/corerun 9.321 ns 0.0701 ns 0.0655 ns 9.329 ns 9.207 ns 9.442 ns 0.54 0.00 - NA
CreateScaleFromScalarBenchmark Job-KPUHXS /crb/corerun 17.138 ns 0.0886 ns 0.0828 ns 17.154 ns 16.954 ns 17.255 ns 1.00 0.00 - NA
CreateScaleFromScalarWithCenterBenchmark Job-AFMBXB Root/corerun 12.807 ns 0.1044 ns 0.0977 ns 12.789 ns 12.689 ns 12.999 ns 0.74 0.01 - NA
CreateScaleFromScalarWithCenterBenchmark Job-KPUHXS /crb/corerun 17.203 ns 0.0963 ns 0.0901 ns 17.216 ns 16.980 ns 17.342 ns 1.00 0.00 - NA
CreateScaleFromScalarXYZBenchmark Job-AFMBXB Root/corerun 9.454 ns 0.0688 ns 0.0609 ns 9.433 ns 9.389 ns 9.577 ns 0.46 0.00 - NA
CreateScaleFromScalarXYZBenchmark Job-KPUHXS /crb/corerun 20.481 ns 0.1567 ns 0.1466 ns 20.431 ns 20.296 ns 20.769 ns 1.00 0.00 - NA
CreateScaleFromScalarXYZWithCenterBenchmark Job-AFMBXB Root/corerun 9.434 ns 0.0380 ns 0.0297 ns 9.438 ns 9.384 ns 9.477 ns 0.55 0.00 - NA
CreateScaleFromScalarXYZWithCenterBenchmark Job-KPUHXS /crb/corerun 17.160 ns 0.1791 ns 0.1588 ns 17.171 ns 16.901 ns 17.433 ns 1.00 0.00 - NA

Where Root/corerun is this change and /crb/corerun is main.

Fixes #78977.

Author: SingleAccretion
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@SingleAccretion SingleAccretion added area-System.Numerics and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Dec 2, 2022
@SingleAccretion SingleAccretion marked this pull request as ready for review December 2, 2022 23:11
@ghost
Copy link

ghost commented Dec 2, 2022

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

In #76491 the Jit became a little more eager to optimize code using vectors as a whole. Unfortunately, this regressed Matrix4x4.CreateScale on some platforms because code for it is written in a way that favors promotion over whole-register access.

This change fixes the regression by using style that is more in line with what the Jit expects of vectors. It also optimizes related overloads by avoiding the relatively expensive get_Identity property call (that is not even inlined due to loading a value type field).

Linux x64 benchmark results:

Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
CreateScaleFromVectorBenchmark Job-AFMBXB Root/corerun 12.575 ns 0.1031 ns 0.0861 ns 12.555 ns 12.452 ns 12.770 ns 0.74 0.01 - NA
CreateScaleFromVectorBenchmark Job-KPUHXS /crb/corerun 16.988 ns 0.1283 ns 0.1138 ns 16.959 ns 16.789 ns 17.189 ns 1.00 0.00 - NA
CreateScaleFromVectorWithCenterBenchmark Job-AFMBXB Root/corerun 9.548 ns 0.0628 ns 0.0588 ns 9.529 ns 9.476 ns 9.663 ns 0.44 0.02 - NA
CreateScaleFromVectorWithCenterBenchmark Job-KPUHXS /crb/corerun 21.963 ns 0.9062 ns 1.0435 ns 21.291 ns 21.155 ns 24.310 ns 1.00 0.00 - NA
CreateScaleFromScalarBenchmark Job-AFMBXB Root/corerun 9.321 ns 0.0701 ns 0.0655 ns 9.329 ns 9.207 ns 9.442 ns 0.54 0.00 - NA
CreateScaleFromScalarBenchmark Job-KPUHXS /crb/corerun 17.138 ns 0.0886 ns 0.0828 ns 17.154 ns 16.954 ns 17.255 ns 1.00 0.00 - NA
CreateScaleFromScalarWithCenterBenchmark Job-AFMBXB Root/corerun 12.807 ns 0.1044 ns 0.0977 ns 12.789 ns 12.689 ns 12.999 ns 0.74 0.01 - NA
CreateScaleFromScalarWithCenterBenchmark Job-KPUHXS /crb/corerun 17.203 ns 0.0963 ns 0.0901 ns 17.216 ns 16.980 ns 17.342 ns 1.00 0.00 - NA
CreateScaleFromScalarXYZBenchmark Job-AFMBXB Root/corerun 9.454 ns 0.0688 ns 0.0609 ns 9.433 ns 9.389 ns 9.577 ns 0.46 0.00 - NA
CreateScaleFromScalarXYZBenchmark Job-KPUHXS /crb/corerun 20.481 ns 0.1567 ns 0.1466 ns 20.431 ns 20.296 ns 20.769 ns 1.00 0.00 - NA
CreateScaleFromScalarXYZWithCenterBenchmark Job-AFMBXB Root/corerun 9.434 ns 0.0380 ns 0.0297 ns 9.438 ns 9.384 ns 9.477 ns 0.55 0.00 - NA
CreateScaleFromScalarXYZWithCenterBenchmark Job-KPUHXS /crb/corerun 17.160 ns 0.1791 ns 0.1588 ns 17.171 ns 16.901 ns 17.433 ns 1.00 0.00 - NA

Where Root/corerun is this change and /crb/corerun is main.

Fixes #78977.

Author: SingleAccretion
Assignees: -
Labels:

area-System.Numerics, community-contribution

Milestone: -

@EgorBo EgorBo merged commit c782569 into dotnet:main Dec 5, 2022
@EgorBo
Copy link
Member

EgorBo commented Dec 5, 2022

Thanks for the workaround!

@SingleAccretion SingleAccretion deleted the Speed-Up-Matrix4x4-CreateScale branch December 7, 2022 17:27
@ghost ghost locked as resolved and limited conversation to collaborators Jan 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Numerics community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regressions in System.Numerics.Tests.Perf_Matrix4x4

3 participants