Skip to content

Add CMake option to enable saturation checker for ConvSymKernelAvx2#24220

Merged
yihonglyu merged 15 commits intomainfrom
yilyu/dbg-ConvSymKernelAvx2-saturate
Apr 28, 2025
Merged

Add CMake option to enable saturation checker for ConvSymKernelAvx2#24220
yihonglyu merged 15 commits intomainfrom
yilyu/dbg-ConvSymKernelAvx2-saturate

Conversation

@yihonglyu
Copy link
Contributor

@yihonglyu yihonglyu commented Mar 27, 2025

Description

This PR adds a new CMake option: onnxruntime_ENABLE_CONVSYMKERNELAVX2_SAT_CHECKER. When enabled, this option activates a saturation checker for the VPMADDUBSW instruction used in the ConvSymKernelAvx2 path.

The checker works by calling a helper function before each VPMADDUBSW instruction. This function simulates the computation using C++ and intrinsics with higher-precision types (int32_t) to detect whether the result exceeds the bounds of int16_t (i.e., greater than INT16_MAX or less than INT16_MIN).

By default, the checker logs a warning only once per inference session. However, the logic can be easily extended to print more frequently if needed. Developers can also reuse this pattern to implement similar saturation checks for other instructions.

Motivation and Context

On some models running with AVX2 (instead of AVX-VNNI), we've observed accuracy degradation due to saturation in vectorized instructions. This saturation checker provides a way to debug and detect those cases by reporting potential overflow in intermediate computations.

@yihonglyu yihonglyu force-pushed the yilyu/dbg-ConvSymKernelAvx2-saturate branch from f57282d to a22b284 Compare April 3, 2025 01:24
@yihonglyu yihonglyu changed the title Yilyu/dbg conv sym kernel avx2 saturate Add ConvSymKernelAvx2 assembly saturation checker Apr 3, 2025
@yihonglyu yihonglyu marked this pull request as ready for review April 3, 2025 23:02
@yihonglyu yihonglyu requested a review from a team as a code owner April 3, 2025 23:02
@yihonglyu yihonglyu force-pushed the yilyu/dbg-ConvSymKernelAvx2-saturate branch from 7bdc0a2 to 40f9145 Compare April 12, 2025 05:29
@yihonglyu yihonglyu changed the title Add ConvSymKernelAvx2 assembly saturation checker Add CMake option to enable saturation checker for ConvSymKernelAvx2 Apr 15, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 5 out of 9 changed files in this pull request and generated no comments.

Files not reviewed (4)
  • cmake/CMakeLists.txt: Language not supported
  • cmake/onnxruntime_mlas.cmake: Language not supported
  • onnxruntime/core/mlas/lib/amd64/ConvSymKernelAvx2.asm: Language not supported
  • onnxruntime/core/mlas/lib/x86_64/ConvSymKernelAvx2.S: Language not supported

@yihonglyu yihonglyu merged commit 7e0ee2b into main Apr 28, 2025
84 of 89 checks passed
@yihonglyu yihonglyu deleted the yilyu/dbg-ConvSymKernelAvx2-saturate branch April 28, 2025 06:46
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request May 12, 2025
…icrosoft#24220)

### Description
<!-- Describe your changes. -->

This PR adds a new CMake option:
onnxruntime_ENABLE_CONVSYMKERNELAVX2_SAT_CHECKER. When enabled, this
option activates a saturation checker for the VPMADDUBSW instruction
used in the ConvSymKernelAvx2 path.

The checker works by calling a helper function before each VPMADDUBSW
instruction. This function simulates the computation using C++ and
intrinsics with higher-precision types (int32_t) to detect whether the
result exceeds the bounds of int16_t (i.e., greater than INT16_MAX or
less than INT16_MIN).

By default, the checker logs a warning only once per inference session.
However, the logic can be easily extended to print more frequently if
needed. Developers can also reuse this pattern to implement similar
saturation checks for other instructions.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

On some models running with AVX2 (instead of AVX-VNNI), we've observed
accuracy degradation due to saturation in vectorized instructions. This
saturation checker provides a way to debug and detect those cases by
reporting potential overflow in intermediate computations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants