[CK tests] Extend conv GPU reference #3539

johannes-graner · 2026-01-09T10:33:35Z

Proposed changes

This PR extends GPU reference implementation support for convolution operations with elementwise fusion and output operations. The changes enable GPU-accelerated reference implementations for tests involving scale, bias, batchnorm, clamp, and bilinear operations across forward, backward data, and backward weight convolutions.

Key improvements:

Extended naive_conv_fwd_gpu, naive_conv_bwd_data_gpu, and naive_conv_bwd_weight_gpu to support elementwise operations, batchnorm, clamp, scale, and bilinear fusion
Significantly improved test execution times across 13 test suites

Performance impact:

Test Name	Before (s)	After (s)	Speedup
test_convnd_fwd	99.0	5.4	18.3x
test_convnd_bwd_data	41.0	13.0	3.2x
test_grouped_conv_bwd_data_scale	36.0	29.0	1.2x
test_grouped_convnd_fwd_clamp	352.0	211.0	1.7x
test_grouped_convnd_fwd_scale	108.0	36.0	3.0x
test_grouped_convnd_fwd_bias_bnorm_clamp	81.0	18.0	4.5x
test_grouped_convnd_fwd_bias_clamp	291.0	235.0	1.2x
test_grouped_convnd_fwd_gk_bias_bnorm_clamp	53.0	15.0	3.5x
test_grouped_convnd_fwd_gk_bias_clamp	290.0	228.0	1.3x
test_grouped_convnd_fwd_bilinear	140.0	41.0	3.4x
test_grouped_convnd_fwd_scaleadd_ab	171.0	22.0	7.8x
test_grouped_conv_bwd_data_bilinear	4.9	3.3	1.5x
test_grouped_convnd_bwd_weight_bilinear	6.7	2.5	2.7x

These improvements reduce total execution time for these tests from 1674 seconds to 859 seconds, saving approximately 14 minutes.

Checklist

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

The implementation focuses on extending the GPU reference path to match the functionality available in the CPU reference path.

Additional improvement is possible by using GPU for verification and tensor initialization. This PR is already large, so those improvements are deferred.

johannes-graner added 14 commits January 7, 2026 09:31

test_convnd_fwd

c7da77d

test_convnd_bwd_data

e00ef08

test_conv_bwd_data_scale

2f83bac

test_grouped_convnd_fwd_clamp

2f9b366

test_grouped_convnd_fwd_scale

0c106d2

multiple A/B tensors and D tensor for fwd GPU ref

9e95a2a

test_grouped_convnd_fwd_scaleadd_ab

7004943

test_grouped_convnd_fwd_bias_clamp

3298801

test_grouped_convnd_fwd_bilinear

2e36ef8

test_grouped_convnd_fwd_gk_bias_clamp

2992269

Extend GPU reference to enable batchnorm epilogue

e2f75fa

test_grouped_convnd_fwd{,_gk}_bias_bnorm_clamp

6da4576

test_grouped_conv_bwd_data_bilinear

64cf835

test_grouped_convnd_bwd_weight_bilinear

1556359

johannes-graner requested review from a team, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway and vidyasagar-amd as code owners January 9, 2026 10:33

johannes-graner requested a review from tenpercent as a code owner January 9, 2026 10:33

johannes-graner and others added 2 commits January 9, 2026 11:38

Merge branch 'develop' into jograner/extend-gpu-reference

9c2a899

Add missing template instantiation

0d5b27d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK tests] Extend conv GPU reference #3539

[CK tests] Extend conv GPU reference #3539

Uh oh!

johannes-graner commented Jan 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CK tests] Extend conv GPU reference #3539

Are you sure you want to change the base?

[CK tests] Extend conv GPU reference #3539

Uh oh!

Conversation

johannes-graner commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johannes-graner commented Jan 9, 2026 •

edited

Loading