Fill CUDA Cast operator opset gap: extend registration from opset 23 to 25#27744
Merged
Fill CUDA Cast operator opset gap: extend registration from opset 23 to 25#27744
Conversation
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copilot created this pull request from a session on behalf of
tianleiwu
March 18, 2026 18:08
View session
Contributor
|
/azp run Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…ange for CUDA EP Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
std::vector<bool> is bit-packed and cannot be converted to gsl::span. Use a C-style bool array with explicit span construction, matching the pattern used by existing bool tests in the same file.
Co-authored-by: Copilot <copilot@github.com>
justinchuby
approved these changes
May 5, 2026
Contributor
|
I think there is also issues with reshape: #28368 |
tianleiwu
approved these changes
May 5, 2026
Contributor
tianleiwu
left a comment
There was a problem hiding this comment.
Summary
Clean mechanical opset gap-fill extending CUDA Cast kernel registrations from opset 23 to opset 25. The implementation correctly versions existing opset 23 entries as (23, 24) and adds non-versioned opset 25 entries, consistent with how the CPU EP already handles Cast at these opset versions.
Positives:
- Complete contiguous opset coverage from opset 6 through 25 in the
REGISTER_KERNEL_TYPEDmacro. - All 16 type specializations (13 standard + 2 Float8 + 1 Float4) properly registered for both versioned (23,24) and non-versioned opset 25 blocks.
- Good test coverage: standard types, Float8, and Float4 at opset 25 with proper CUDA-only guards and compute capability checks.
- Combining opsets 23 and 24 into one versioned registration is valid since the kernel implementation is identical between them.
LGTM.
titaiwangms
approved these changes
May 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Extends CUDA Cast kernel registration to cover opset 25 (latest ONNX spec). The existing non-versioned opset 23 registration is capped to VERSIONED (23, 24), and a new non-versioned opset 25 registration is added for all type specializations.
cast_op.cc:REGISTER_KERNEL_TYPED(T): opset 23 → VERSIONED (23, 24), added non-versioned opset 25REGISTER_KERNEL_TYPED_23→REGISTER_KERNEL_TYPED_23_TO_24(VERSIONED)REGISTER_KERNEL_TYPED_25macro (non-versioned)SPECIALIZE_IMPL_19_TO_23→SPECIALIZE_IMPL_19_TO_25, covering Float8 types through opset 25cuda_execution_provider.cc:BuildKernelCreateInfo: same pattern — capped 23 to (23, 24), added opset 25 blockMotivation and Context
CUDA Cast operator was registered up to opset 23, but ONNX spec defines Cast through opset 25. This gap can cause kernel lookup failures when running models exported at opset 25. Part of the broader CUDA opset gap-filling effort tracked in #27729.