Skip to content

JIT: Reduce Vector256/512 Sum to a shared per-lane reduction on x64#127329

Merged
tannergooding merged 9 commits intomainfrom
copilot/fix-vector256-intrinsics-regression
Apr 25, 2026
Merged

JIT: Reduce Vector256/512 Sum to a shared per-lane reduction on x64#127329
tannergooding merged 9 commits intomainfrom
copilot/fix-vector256-intrinsics-regression

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 23, 2026

  • Update TYP_DOUBLE vpermilpd imm in gtNewSimdSumNode from 0b0001 to 0b01010101 so each 128-bit lane is swapped correctly for V128/V256/V512
  • Collapse V512 GetLower128 call onto a single line for jit-format
  • Baseline build succeeded
  • All 12,944 System.Runtime.Intrinsics.Tests pass

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/f98b46df-b011-4898-8d7f-4edea22a5662

Co-authored-by: tannergooding <10487869+tannergooding@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 23, 2026 17:02
Comment thread src/coreclr/jit/gentree.cpp Outdated
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 23, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot AI changed the title [WIP] Fix Vector256 explicit intrinsics performance regression on .NET 10 JIT: Reduce Vector256/512 Sum to a single V128 horizontal reduction on x64 Apr 23, 2026
Copilot AI requested a review from tannergooding April 23, 2026 17:27
Copilot AI requested review from Copilot and removed request for Copilot April 23, 2026 17:56
Copilot AI changed the title JIT: Reduce Vector256/512 Sum to a single V128 horizontal reduction on x64 JIT: Reduce Vector256/512 Sum to a shared per-lane reduction on x64 Apr 23, 2026
Comment thread src/coreclr/jit/gentree.cpp Outdated
Comment thread src/coreclr/jit/gentree.cpp Outdated
Comment thread src/coreclr/jit/gentree.cpp
Copilot AI requested review from Copilot and tannergooding and removed request for Copilot April 23, 2026 18:36
Copilot AI requested a review from tannergooding April 23, 2026 19:53
Comment thread src/coreclr/jit/gentree.cpp Outdated
Copilot AI requested review from Copilot and removed request for Copilot April 23, 2026 20:19
Copilot AI requested a review from tannergooding April 23, 2026 20:21
Comment thread src/coreclr/jit/gentree.cpp Outdated
Copilot AI requested review from Copilot and removed request for Copilot April 24, 2026 12:50
Comment thread src/coreclr/jit/gentree.cpp Outdated
Copilot AI requested review from Copilot and removed request for Copilot April 24, 2026 21:08
Copilot AI requested a review from tannergooding April 24, 2026 21:09
@tannergooding tannergooding enabled auto-merge (squash) April 24, 2026 21:09
@tannergooding tannergooding merged commit f886264 into main Apr 25, 2026
135 checks passed
@tannergooding tannergooding deleted the copilot/fix-vector256-intrinsics-regression branch April 25, 2026 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector256 explicit intrinsics 71% slower on .NET 10 vs .NET 8 on AVX-512 hardware

5 participants