JIT: Correct some AVX-512 instruction timings#126555
JIT: Correct some AVX-512 instruction timings#126555saucecontrol wants to merge 2 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR updates the CoreCLR JIT’s xarch perf-score model to better reflect Skylake-X (uops.info) throughput/latency for several AVX-512-related instructions, improving instruction cost estimation used by JIT heuristics.
Changes:
- Adjusts latency selection logic for several SIMD move/convert/shift instructions to account for wider (YMM/ZMM) operation sizes.
- Refines gather instruction throughput/latency modeling across 128/256/512-bit widths.
- Adds a new latency constant (
PERFSCORE_LATENCY_17C) for updated gather modeling.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/coreclr/jit/emitxarch.cpp |
Updates instruction throughput/latency modeling for specific SIMD and gather instructions based on operand size. |
src/coreclr/jit/emit.h |
Adds PERFSCORE_LATENCY_17C used by updated timing model. |
tannergooding
left a comment
There was a problem hiding this comment.
This is mostly just ensuring that TYP_SIMD64 uses the appropriate timings, as many of these were entered prior to that existing and so were assigning the TYP_SIMD16 timings instead.
|
I think everything here was covered by #127109? |
|
Ah, possibly. I had thought this was already merged when I did my PR 😅 I do think there's some throughput info that's still "off" for SIMD32/SIMD64, namely for the rare cases the wider variant takes more cycles, but it should all be generally correct now. |
|
Looks like it was indeed all handled. |
Verified using Skylake-X timings from uops.info