Skip to content

Standalone operator Convolution_backwards for MLIR#4292

Merged
causten merged 14 commits intodevelopfrom
lw/conv_backwards_mlir
Sep 20, 2025
Merged

Standalone operator Convolution_backwards for MLIR#4292
causten merged 14 commits intodevelopfrom
lw/conv_backwards_mlir

Conversation

@lakhinderwalia
Copy link
Contributor

@lakhinderwalia lakhinderwalia commented Sep 10, 2025

Motivation

Support Operator Convolution_backwards via MLIR

Technical Details

This support is necessary without MIOPEN, e.g. on Windows platforms.

Changelog Category

    • Added: New functionality.
    • Changed: Changes to existing functionality.
    • Removed: Functionality or support that has been removed. (Compared to a previous release)
    • Optimized: Component performance that has been optimized or improved.
    • Resolved Issues: Known issues from a previous version that have been resolved.
    • Not Applicable: This PR is not to be included in the changelog.

@lakhinderwalia lakhinderwalia self-assigned this Sep 10, 2025
@causten causten requested a review from Copilot September 10, 2025 13:31
@causten
Copy link
Collaborator

causten commented Sep 10, 2025

Can you add some unit tests

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds MLIR support for the convolution_backwards operator to provide an alternative implementation when MIOPEN is not available (e.g., on Windows platforms).

  • Adds mapping for convolution_backwards operator to MLIR dialect operation
  • Implements predicate function to determine when convolution_backwards can use MLIR
  • Integrates the new backward convolution matcher into the MLIR fusion pipeline

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/targets/gpu/mlir.cpp Maps convolution_backwards operator name to MLIR dialect operation
src/targets/gpu/fuse_mlir.cpp Adds predicate function and matcher for backward convolution operations in MLIR

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@pfultz2
Copy link
Collaborator

pfultz2 commented Sep 10, 2025

We do need to limit this to 2d convolution_backwards because I think thats the only one supported right now in MLIR.

@codecov
Copy link

codecov bot commented Sep 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #4292   +/-   ##
========================================
  Coverage    92.22%   92.22%           
========================================
  Files          557      557           
  Lines        25924    25924           
========================================
  Hits         23908    23908           
  Misses        2016     2016           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lakhinderwalia lakhinderwalia requested a review from a team as a code owner September 11, 2025 21:11
@pfultz2
Copy link
Collaborator

pfultz2 commented Sep 12, 2025

You need to add tests for fuse_mlir as well.

Copy link
Collaborator

@TedThemistokleous TedThemistokleous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Have one ask and a question

Ask

  • Add a test for batch size greater than one

Question - What do we do for datatypes like fp8/int8/ and smaller or is that not supported/intended for the backwards convolution since that should resolve to quant_convolution

@lakhinderwalia
Copy link
Contributor Author

Overall looks good. Have one ask and a question

Ask

  • Add a test for batch size greater than one

Question - What do we do for datatypes like fp8/int8/ and smaller or is that not supported/intended for the backwards convolution since that should resolve to quant_convolution

This PR is to bypass MIOpen when we cannot use it. The current MLIR support, as reflected here in the PR, is for only 2D kernels, and of only the float types that have been added: fp32, fp16, bf16. There is no MLIR support for fp8 or int8 right now.

@TedThemistokleous
Copy link
Collaborator

@lakhinderwalia link updated test to this PR when you upload

Copy link
Collaborator

@pfultz2 pfultz2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just need to add fuse_mlir tests. You can see an example how we check the env variable in the test in #4261.

@lakhinderwalia
Copy link
Contributor Author

Overall looks good, just need to add fuse_mlir tests. You can see an example how we check the env variable in the test in #4261.

It is not the case of a simple environment variable here. I was looking for an example to handle more of get_mode(), and that is what I was looking for a clarification on. Not sure if we want to replicate all that complexity in a simple test case, @pfultz2. Thanks.

@pfultz2
Copy link
Collaborator

pfultz2 commented Sep 18, 2025

Overall looks good, just need to add fuse_mlir tests. You can see an example how we check the env variable in the test in #4261.

It is not the case of a simple environment variable here. I was looking for an example to handle more of get_mode(), and that is what I was looking for a clarification on. Not sure if we want to replicate all that complexity in a simple test case, @pfultz2. Thanks.

convolution_backwards should be added to specific ops in the jenkins file so it gets tested on the CI, and then you can check it on the tests to disabled. Thats the way we currently do it.

@causten causten dismissed pfultz2’s stale review September 19, 2025 17:06

Requested changes have been applied

@migraphx-bot
Copy link
Collaborator

Test Batch Rate new
5bec67
Rate old
43d3f3
Diff Compare
torchvision-resnet50 64 3,173.58 3,157.73 0.50%
torchvision-resnet50_fp16 64 6,610.52 6,587.32 0.35%
torchvision-densenet121 32 2,445.70 2,437.35 0.34%
torchvision-densenet121_fp16 32 4,132.47 4,122.32 0.25%
torchvision-inceptionv3 32 1,672.88 1,665.47 0.44%
torchvision-inceptionv3_fp16 32 2,594.22 2,589.37 0.19%
cadene-inceptionv4 16 797.71 794.82 0.36%
cadene-resnext64x4 16 806.68 801.96 0.59%
slim-mobilenet 64 8,233.95 8,209.01 0.30%
slim-nasnetalarge 64 222.87 221.76 0.50%
slim-resnet50v2 64 3,304.88 3,295.13 0.30%
bert-mrpc-onnx 8 1,143.42 1,133.98 0.83%
bert-mrpc-tf 1 484.76 490.52 -1.17%
pytorch-examples-wlang-gru 1 313.52 317.20 -1.16%
pytorch-examples-wlang-lstm 1 431.06 439.21 -1.86%
torchvision-resnet50_1 1 799.36 802.49 -0.39%
cadene-dpn92_1 1 453.65 436.73 3.88% 🔆
cadene-resnext101_1 1 369.64 368.11 0.41%
onnx-taau-downsample 1 398.97 398.29 0.17%
dlrm-criteoterabyte 1 32.05 31.92 0.39%
dlrm-criteoterabyte_fp16 1 51.06 51.00 0.11%
agentmodel 1 9,570.39 9,837.19 -2.71%
unet_fp16 2 58.92 58.80 0.20%
resnet50v1_fp16 1 988.06 995.41 -0.74%
resnet50v1_int8 1 1,003.53 995.73 0.78%
bert_base_cased_fp16 64 1,104.20 1,100.48 0.34%
bert_large_uncased_fp16 32 345.70 344.24 0.42%
bert_large_fp16 1 198.46 197.60 0.44%
distilgpt2_fp16 16 2,087.76 2,076.97 0.52%
yolov5s 1 586.29 587.20 -0.15%
tinyllama 1 43.97 43.80 0.38%
vicuna-fastchat 1 45.33 45.12 0.46%
whisper-tiny-encoder 1 411.04 409.76 0.31%
whisper-tiny-decoder 1 415.35 414.17 0.28%
llama2_7b 1 19.16 19.12 0.17%
qwen1.5-7b 1 23.52 23.43 0.38%
phi3-3.8b 1 26.69 26.64 0.17%
mask-rcnn 1 12.10 12.18 -0.65%
llama3-8b 1 21.72 21.67 0.23%
whisper-large-encoder 1 10.22 10.17 0.48%
whisper-large-decoder 1 100.27 100.09 0.19%
mistral-7b 1 23.73 23.65 0.36%
FLUX.1-schnell 1 720.79 728.36 -1.04%
nan nan nan nan nan%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

❌bert-mrpc-tf: ERROR - check error output2025-09-19 12:05:27.610617: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 306, in main
graph = load_tf_graph(model_name)
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 300, in load_tf_graph
graph_def.ParseFromString(f.read())
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 116, in read
self._preread_check()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' not implemented (file: '/new-saved-models/tf-misc/bert_mrpc1.pb')


     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ bert_large: PASSED: MIGraphX meets tolerance

     ✅ yolov5s: PASSED: MIGraphX meets tolerance

     ✅ tinyllama: PASSED: MIGraphX meets tolerance

     ✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

     ✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

     ✅ llama2_7b: PASSED: MIGraphX meets tolerance

     ✅ qwen1.5-7b: PASSED: MIGraphX meets tolerance

     ✅ phi3-3.8b: PASSED: MIGraphX meets tolerance

🔴mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ llama3-8b: PASSED: MIGraphX meets tolerance

     ✅ whisper-large-decoder: PASSED: MIGraphX meets tolerance

     ✅ mistral-7b: PASSED: MIGraphX meets tolerance

     ✅ FLUX.1-schnell: PASSED: MIGraphX meets tolerance

@causten causten merged commit 38fdc6b into develop Sep 20, 2025
37 of 38 checks passed
@causten causten deleted the lw/conv_backwards_mlir branch September 20, 2025 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants