Update CUDA TopK kernel registration to opset 24 with BFloat16 support by Copilot · Pull Request #27735 · microsoft/onnxruntime

Copilot · 2026-03-18T17:52:27Z

Cap existing CUDA TopK kernel to versioned [11, 23] and add opset 24
Add BFloat16 support for CUDA TopK opset 24 (topk_impl_bf16.cu, helpers, NumericLimits)
Add BFloat16 test cases for TopK opset 24
Fix CUB build error: map BFloat16 → __nv_bfloat16 for BlockRadixSort and DeviceRadixSort
- Add CubSortType trait in topk_impl.cuh
- Update RadixTopK kernel to use CubSortType for BlockRadixSort
- Update TopKImpl DeviceRadixSort calls to use CubSortType pointers

💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

tianleiwu · 2026-03-18T18:23:59Z

/azp run Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-03-18T18:24:10Z

Azure Pipelines successfully started running 1 pipeline(s).

…ill-opset-gap

… + 24+) Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

tianleiwu · 2026-03-20T18:18:28Z

@copilot, TopK-24 supports bfloat16 according to https://github.com/onnx/onnx/blob/main/docs/Operators.md#TopK
Please help add bfloat16 support for opset 24.

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/cd70afdc-35b3-4472-a3d3-262dbb68f2b5

Copilot · 2026-03-20T18:28:56Z

@copilot, TopK-24 supports bfloat16 according to https://github.com/onnx/onnx/blob/main/docs/Operators.md#TopK
Please help add bfloat16 support for opset 24.

Added BFloat16 support for CUDA TopK opset 24 in commit 58775e4:

Created topk_impl_bf16.cu with BFloat16 template instantiation
Added BFloat16 overloads for SamePrefix, Radix, SetByte in topk_impl.cuh
Added NumericLimits<BFloat16> specialization in cuda_utils.h
Added BFloat16 to the opset 24 type constraint and ComputeInternal dispatch in topk.cc
Updated docs/OperatorKernels.md

tianleiwu · 2026-03-20T21:40:56Z

@copilot, could you add a test case for bfloat16 TopK-24?

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/9eab8ca4-7d1f-4b1e-bcb9-b65c411b4256

Copilot · 2026-03-20T21:47:23Z

@copilot, could you add a test case for bfloat16 TopK-24?

Added three BFloat16 test cases in commit 3ea0233:

NthElementBFloat16 — basic TopK with positive values at opset 24
NthElementBFloat16_NegativeVals — TopK with mixed positive/negative values at opset 24
TopKBFloat16_2D — 2D input TopK at opset 24

All tests use the CudaHasBF16Support() guard (requires SM 8.0+). Also fixed RunTest to handle largest/sorted attributes for op_set >= 11 (was only handling == 11).

… operations Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d0c34466-e21b-4457-a2cf-df7af450f1a1

tianleiwu · 2026-03-24T03:23:51Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-03-24T03:24:08Z

Azure Pipelines successfully started running 4 pipeline(s).

### Description Extends CUDA ReduceMax and ReduceMin kernel registrations from opset 18 to opset 20. - **`reduction_ops.cc`**: Added `REGISTER_KERNEL_VERSIONED_RANGE_AXES_INPUT_TYPED` macro for versioned ranges requiring `InputMemoryType(OrtMemTypeCPUInput, 1)`. Split both operators from 2-way (1–17, 18+) to 3-way (1–17, 18–19, 20+). - **`cuda_execution_provider.cc`**: Capped opset 18 forward declarations and `BuildKernelCreateInfo` entries to versioned 18–19. Added opset 20 non-versioned entries for both operators. Type coverage maintained as-is: ReduceMax (float, double, MLFloat16, int32_t, int64_t), ReduceMin adds int8_t, uint8_t. ### Motivation and Context ReduceMax and ReduceMin CUDA registrations stopped at opset 18; ONNX latest is opset 20. Models exported with opset 19–20 could fail to find a matching CUDA kernel for these ops. Follows the same pattern used in #27735 (TopK) and other opset gap PRs tracked in #27729. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>

Update CUDA TopK to fill opset gap: cap opset 11 to 11-23, add opset 24

c911bf4

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

Copilot AI assigned Copilot and tianleiwu Mar 18, 2026

Copilot created this pull request from a session on behalf of tianleiwu March 18, 2026 17:52 View session

tianleiwu mentioned this pull request Mar 18, 2026

[Feature Request] Extend CUDA ONNX Ops to latest opset version #27729

Open

Copilot started work on behalf of tianleiwu March 18, 2026 21:45 View session

Copilot AI and others added 2 commits March 18, 2026 21:47

Merge remote-tracking branch 'origin/main' into copilot/update-topk-f…

ef1079e

…ill-opset-gap

Update docs/OperatorKernels.md for CUDA TopK opset gap (11+ → [11,23]…

291d8af

… + 24+) Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

Copilot AI requested a review from tianleiwu March 18, 2026 21:49

Copilot finished work on behalf of tianleiwu March 18, 2026 21:49

tianleiwu marked this pull request as ready for review March 18, 2026 23:44

Copilot AI mentioned this pull request Mar 19, 2026

Fill CUDA opset gap for ReduceMax and ReduceMin (18 → 20) #27755

Merged

tianleiwu approved these changes Mar 20, 2026

View reviewed changes

Copilot started work on behalf of tianleiwu March 20, 2026 18:18 View session

Add BFloat16 support for CUDA TopK opset 24

58775e4

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/cd70afdc-35b3-4472-a3d3-262dbb68f2b5

Copilot AI changed the title ~~Update CUDA TopK kernel registration to opset 24~~ Update CUDA TopK kernel registration to opset 24 with BFloat16 support Mar 20, 2026

Copilot AI requested a review from tianleiwu March 20, 2026 18:29

Copilot finished work on behalf of tianleiwu March 20, 2026 18:29

Copilot started work on behalf of tianleiwu March 20, 2026 21:41 View session

Add BFloat16 test cases for TopK opset 24

3ea0233

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/9eab8ca4-7d1f-4b1e-bcb9-b65c411b4256

Copilot finished work on behalf of tianleiwu March 20, 2026 21:48

Copilot started work on behalf of tianleiwu March 21, 2026 17:12 View session

Fix CUB radix sort build error: map BFloat16 to __nv_bfloat16 for CUB…

e33539c

… operations Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d0c34466-e21b-4457-a2cf-df7af450f1a1

Copilot finished work on behalf of tianleiwu March 21, 2026 17:21

tianleiwu approved these changes Mar 24, 2026

View reviewed changes

tianleiwu requested review from kunal-vaishnavi and titaiwangms March 24, 2026 03:34

kunal-vaishnavi approved these changes Mar 24, 2026

View reviewed changes

titaiwangms approved these changes Mar 24, 2026

View reviewed changes

tianleiwu merged commit a8424e6 into main Mar 24, 2026
89 checks passed

tianleiwu deleted the copilot/update-topk-fill-opset-gap branch March 24, 2026 21:32

BrewTestBot mentioned this pull request Apr 20, 2026

onnxruntime 1.25.0 Homebrew/homebrew-core#278543

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update CUDA TopK kernel registration to opset 24 with BFloat16 support#27735

Update CUDA TopK kernel registration to opset 24 with BFloat16 support#27735
tianleiwu merged 6 commits intomainfrom
copilot/update-topk-fill-opset-gap

Copilot AI commented Mar 18, 2026 •

edited

Loading

Uh oh!

tianleiwu commented Mar 18, 2026

Uh oh!

azure-pipelines Bot commented Mar 18, 2026

Uh oh!

tianleiwu commented Mar 20, 2026

Uh oh!

Copilot AI commented Mar 20, 2026

Uh oh!

tianleiwu commented Mar 20, 2026

Uh oh!

Copilot AI commented Mar 20, 2026

Uh oh!

tianleiwu commented Mar 24, 2026

Uh oh!

azure-pipelines Bot commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianleiwu commented Mar 18, 2026

Uh oh!

azure-pipelines Bot commented Mar 18, 2026

Uh oh!

tianleiwu commented Mar 20, 2026

Uh oh!

Copilot AI commented Mar 20, 2026

Uh oh!

tianleiwu commented Mar 20, 2026

Uh oh!

Copilot AI commented Mar 20, 2026

Uh oh!

tianleiwu commented Mar 24, 2026

Uh oh!

azure-pipelines Bot commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Mar 18, 2026 •

edited

Loading