Standalone operator Convolution_backwards for MLIR by lakhinderwalia · Pull Request #4292 · ROCm/AMDMIGraphX

lakhinderwalia · 2025-09-10T00:06:45Z

Motivation

Support Operator Convolution_backwards via MLIR

Technical Details

This support is necessary without MIOPEN, e.g. on Windows platforms.

Changelog Category

- Added: New functionality.
- Changed: Changes to existing functionality.
- Removed: Functionality or support that has been removed. (Compared to a previous release)
- Optimized: Component performance that has been optimized or improved.
- Resolved Issues: Known issues from a previous version that have been resolved.
- Not Applicable: This PR is not to be included in the changelog.

causten · 2025-09-10T13:32:18Z

Can you add some unit tests

Copilot

Pull Request Overview

This PR adds MLIR support for the convolution_backwards operator to provide an alternative implementation when MIOPEN is not available (e.g., on Windows platforms).

Adds mapping for convolution_backwards operator to MLIR dialect operation
Implements predicate function to determine when convolution_backwards can use MLIR
Integrates the new backward convolution matcher into the MLIR fusion pipeline

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
src/targets/gpu/mlir.cpp	Maps `convolution_backwards` operator name to MLIR dialect operation
src/targets/gpu/fuse_mlir.cpp	Adds predicate function and matcher for backward convolution operations in MLIR

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/targets/gpu/fuse_mlir.cpp

pfultz2 · 2025-09-10T14:27:31Z

We do need to limit this to 2d convolution_backwards because I think thats the only one supported right now in MLIR.

codecov · 2025-09-11T20:51:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #4292   +/-   ##
========================================
  Coverage    92.22%   92.22%           
========================================
  Files          557      557           
  Lines        25924    25924           
========================================
  Hits         23908    23908           
  Misses        2016     2016

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/targets/gpu/fuse_mlir.cpp

src/targets/gpu/mlir.cpp

pfultz2 · 2025-09-12T17:53:40Z

You need to add tests for fuse_mlir as well.

docs/reference/MIGraphX-dev-env-vars.rst

TedThemistokleous

Overall looks good. Have one ask and a question

Ask

Add a test for batch size greater than one

Question - What do we do for datatypes like fp8/int8/ and smaller or is that not supported/intended for the backwards convolution since that should resolve to quant_convolution

lakhinderwalia · 2025-09-16T21:59:31Z

Overall looks good. Have one ask and a question

Ask

Add a test for batch size greater than one

Question - What do we do for datatypes like fp8/int8/ and smaller or is that not supported/intended for the backwards convolution since that should resolve to quant_convolution

This PR is to bypass MIOpen when we cannot use it. The current MLIR support, as reflected here in the PR, is for only 2D kernels, and of only the float types that have been added: fp32, fp16, bf16. There is no MLIR support for fp8 or int8 right now.

TedThemistokleous · 2025-09-17T19:50:09Z

@lakhinderwalia link updated test to this PR when you upload

pfultz2

Overall looks good, just need to add fuse_mlir tests. You can see an example how we check the env variable in the test in #4261.

lakhinderwalia · 2025-09-17T22:04:02Z

Overall looks good, just need to add fuse_mlir tests. You can see an example how we check the env variable in the test in #4261.

It is not the case of a simple environment variable here. I was looking for an example to handle more of get_mode(), and that is what I was looking for a clarification on. Not sure if we want to replicate all that complexity in a simple test case, @pfultz2. Thanks.

pfultz2 · 2025-09-18T17:01:04Z

Overall looks good, just need to add fuse_mlir tests. You can see an example how we check the env variable in the test in #4261.

It is not the case of a simple environment variable here. I was looking for an example to handle more of get_mode(), and that is what I was looking for a clarification on. Not sure if we want to replicate all that complexity in a simple test case, @pfultz2. Thanks.

convolution_backwards should be added to specific ops in the jenkins file so it gets tested on the CI, and then you can check it on the tests to disabled. Thats the way we currently do it.

Requested changes have been applied

migraphx-bot · 2025-09-19T17:58:27Z

Test	Batch	Rate new 5bec67	Rate old 43d3f3	Diff	Compare
torchvision-resnet50	64	3,173.58	3,157.73	0.50%	✅
torchvision-resnet50_fp16	64	6,610.52	6,587.32	0.35%	✅
torchvision-densenet121	32	2,445.70	2,437.35	0.34%	✅
torchvision-densenet121_fp16	32	4,132.47	4,122.32	0.25%	✅
torchvision-inceptionv3	32	1,672.88	1,665.47	0.44%	✅
torchvision-inceptionv3_fp16	32	2,594.22	2,589.37	0.19%	✅
cadene-inceptionv4	16	797.71	794.82	0.36%	✅
cadene-resnext64x4	16	806.68	801.96	0.59%	✅
slim-mobilenet	64	8,233.95	8,209.01	0.30%	✅
slim-nasnetalarge	64	222.87	221.76	0.50%	✅
slim-resnet50v2	64	3,304.88	3,295.13	0.30%	✅
bert-mrpc-onnx	8	1,143.42	1,133.98	0.83%	✅
bert-mrpc-tf	1	484.76	490.52	-1.17%	✅
pytorch-examples-wlang-gru	1	313.52	317.20	-1.16%	✅
pytorch-examples-wlang-lstm	1	431.06	439.21	-1.86%	✅
torchvision-resnet50_1	1	799.36	802.49	-0.39%	✅
cadene-dpn92_1	1	453.65	436.73	3.88%	🔆
cadene-resnext101_1	1	369.64	368.11	0.41%	✅
onnx-taau-downsample	1	398.97	398.29	0.17%	✅
dlrm-criteoterabyte	1	32.05	31.92	0.39%	✅
dlrm-criteoterabyte_fp16	1	51.06	51.00	0.11%	✅
agentmodel	1	9,570.39	9,837.19	-2.71%	✅
unet_fp16	2	58.92	58.80	0.20%	✅
resnet50v1_fp16	1	988.06	995.41	-0.74%	✅
resnet50v1_int8	1	1,003.53	995.73	0.78%	✅
bert_base_cased_fp16	64	1,104.20	1,100.48	0.34%	✅
bert_large_uncased_fp16	32	345.70	344.24	0.42%	✅
bert_large_fp16	1	198.46	197.60	0.44%	✅
distilgpt2_fp16	16	2,087.76	2,076.97	0.52%	✅
yolov5s	1	586.29	587.20	-0.15%	✅
tinyllama	1	43.97	43.80	0.38%	✅
vicuna-fastchat	1	45.33	45.12	0.46%	✅
whisper-tiny-encoder	1	411.04	409.76	0.31%	✅
whisper-tiny-decoder	1	415.35	414.17	0.28%	✅
llama2_7b	1	19.16	19.12	0.17%	✅
qwen1.5-7b	1	23.52	23.43	0.38%	✅
phi3-3.8b	1	26.69	26.64	0.17%	✅
mask-rcnn	1	12.10	12.18	-0.65%	✅
llama3-8b	1	21.72	21.67	0.23%	✅
whisper-large-encoder	1	10.22	10.17	0.48%	✅
whisper-large-decoder	1	100.27	100.09	0.19%	✅
mistral-7b	1	23.73	23.65	0.36%	✅
FLUX.1-schnell	1	720.79	728.36	-1.04%	✅
nan	nan	nan	nan	nan%	❌

This build is not recommended to merge 🔴

migraphx-bot · 2025-09-19T17:58:32Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

❌bert-mrpc-tf: ERROR - check error output

2025-09-19 12:05:27.610617: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 306, in main
graph = load_tf_graph(model_name)
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 300, in load_tf_graph
graph_def.ParseFromString(f.read())
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 116, in read
self._preread_check()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' not implemented (file: '/new-saved-models/tf-misc/bert_mrpc1.pb')

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

✅ llama2_7b: PASSED: MIGraphX meets tolerance

✅ qwen1.5-7b: PASSED: MIGraphX meets tolerance

✅ phi3-3.8b: PASSED: MIGraphX meets tolerance

🔴mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output

✅ llama3-8b: PASSED: MIGraphX meets tolerance

✅ whisper-large-decoder: PASSED: MIGraphX meets tolerance

✅ mistral-7b: PASSED: MIGraphX meets tolerance

✅ FLUX.1-schnell: PASSED: MIGraphX meets tolerance

lakhinderwalia added 2 commits September 5, 2025 17:32

Add name migraphx.backwards.data.convolution

85632bd

standalone MLIR convolution_backwards

a413179

lakhinderwalia self-assigned this Sep 10, 2025

lakhinderwalia requested a review from causten as a code owner September 10, 2025 00:06

causten requested a review from Copilot September 10, 2025 13:31

Copilot AI reviewed Sep 10, 2025

View reviewed changes

src/targets/gpu/fuse_mlir.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Sep 10, 2025

View reviewed changes

src/targets/gpu/fuse_mlir.cpp Show resolved Hide resolved

pfultz2 reviewed Sep 10, 2025

View reviewed changes

src/targets/gpu/fuse_mlir.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Sep 10, 2025

View reviewed changes

src/targets/gpu/fuse_mlir.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Sep 10, 2025

View reviewed changes

src/targets/gpu/fuse_mlir.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Sep 10, 2025

View reviewed changes

src/targets/gpu/fuse_mlir.cpp Outdated Show resolved Hide resolved

Update the Op per the required support matrix

40298bb

tidy + doc update

be78cd9

lakhinderwalia requested a review from a team as a code owner September 11, 2025 21:11

lakhinderwalia requested a review from pfultz2 September 11, 2025 21:39

lakhinderwalia and others added 2 commits September 11, 2025 17:21

add (and update) some tests for convolution_backwards

95eb30e

Merge branch 'develop' into lw/conv_backwards_mlir

5c32f73

pfultz2 reviewed Sep 12, 2025

View reviewed changes

src/targets/gpu/fuse_mlir.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Sep 12, 2025

View reviewed changes

src/targets/gpu/fuse_mlir.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Sep 12, 2025

View reviewed changes

src/targets/gpu/fuse_mlir.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Sep 12, 2025

View reviewed changes

src/targets/gpu/mlir.cpp Outdated Show resolved Hide resolved

lakhinderwalia added 2 commits September 11, 2025 19:42

copyright + shorten expression

06d5b5e

ci failure text fix

4bd4e3b

spolifroni-amd reviewed Sep 12, 2025

View reviewed changes

docs/reference/MIGraphX-dev-env-vars.rst Outdated Show resolved Hide resolved

spolifroni-amd approved these changes Sep 12, 2025

View reviewed changes

anisha-amd approved these changes Sep 12, 2025

View reviewed changes

lakhinderwalia added 2 commits September 15, 2025 10:19

get_mode() was given incorrect mode values

e1ef383

Expand data types for conv_backwards 2D tests

bc6c0fc

TedThemistokleous requested changes Sep 16, 2025

View reviewed changes

update MLIR branch:fixes

b4f8784

anisha-amd approved these changes Sep 17, 2025

View reviewed changes

TedThemistokleous approved these changes Sep 17, 2025

View reviewed changes

update rocMLIR version

4445d97

pfultz2 previously requested changes Sep 17, 2025

View reviewed changes

add conv_backwards test to fuse_mlir

bd1a646

lakhinderwalia requested a review from pfultz2 September 18, 2025 18:30

update MLIR version, merge develop

5bec67a

pfultz2 approved these changes Sep 19, 2025

View reviewed changes

causten merged commit 38fdc6b into develop Sep 20, 2025
37 of 38 checks passed

causten deleted the lw/conv_backwards_mlir branch September 20, 2025 04:34

Conversation

lakhinderwalia commented Sep 10, 2025 • edited by CharlieL7 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Changelog Category

Uh oh!

causten commented Sep 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pfultz2 commented Sep 10, 2025

Uh oh!

codecov bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pfultz2 commented Sep 12, 2025

Uh oh!

Uh oh!

TedThemistokleous left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lakhinderwalia commented Sep 16, 2025

Uh oh!

TedThemistokleous commented Sep 17, 2025

Uh oh!

pfultz2 left a comment

Choose a reason for hiding this comment

Uh oh!

lakhinderwalia commented Sep 17, 2025

Uh oh!

pfultz2 commented Sep 18, 2025

Uh oh!

migraphx-bot commented Sep 19, 2025

Uh oh!

migraphx-bot commented Sep 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

lakhinderwalia commented Sep 10, 2025 •

edited by CharlieL7

Loading

codecov bot commented Sep 11, 2025 •

edited

Loading

TedThemistokleous left a comment •

edited

Loading