Update CUDA TopK kernel registration to opset 24 with BFloat16 support#27735
Update CUDA TopK kernel registration to opset 24 with BFloat16 support#27735
Conversation
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
|
/azp run Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
… + 24+) Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
|
@copilot, TopK-24 supports bfloat16 according to https://github.com/onnx/onnx/blob/main/docs/Operators.md#TopK |
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/cd70afdc-35b3-4472-a3d3-262dbb68f2b5
Added BFloat16 support for CUDA TopK opset 24 in commit
|
|
@copilot, could you add a test case for bfloat16 TopK-24? |
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/9eab8ca4-7d1f-4b1e-bcb9-b65c411b4256
Added three BFloat16 test cases in commit
All tests use the |
… operations Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/d0c34466-e21b-4457-a2cf-df7af450f1a1
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
### Description Extends CUDA ReduceMax and ReduceMin kernel registrations from opset 18 to opset 20. - **`reduction_ops.cc`**: Added `REGISTER_KERNEL_VERSIONED_RANGE_AXES_INPUT_TYPED` macro for versioned ranges requiring `InputMemoryType(OrtMemTypeCPUInput, 1)`. Split both operators from 2-way (1–17, 18+) to 3-way (1–17, 18–19, 20+). - **`cuda_execution_provider.cc`**: Capped opset 18 forward declarations and `BuildKernelCreateInfo` entries to versioned 18–19. Added opset 20 non-versioned entries for both operators. Type coverage maintained as-is: ReduceMax (float, double, MLFloat16, int32_t, int64_t), ReduceMin adds int8_t, uint8_t. ### Motivation and Context ReduceMax and ReduceMin CUDA registrations stopped at opset 18; ONNX latest is opset 20. Models exported with opset 19–20 could fail to find a matching CUDA kernel for these ops. Follows the same pattern used in #27735 (TopK) and other opset gap PRs tracked in #27729. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.