Skip to content

Fill CUDA Cast operator opset gap: extend registration from opset 23 to 25#27744

Merged
tianleiwu merged 8 commits intomainfrom
copilot/update-cast-cuda-operator
May 5, 2026
Merged

Fill CUDA Cast operator opset gap: extend registration from opset 23 to 25#27744
tianleiwu merged 8 commits intomainfrom
copilot/update-cast-cuda-operator

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 18, 2026

Description

Extends CUDA Cast kernel registration to cover opset 25 (latest ONNX spec). The existing non-versioned opset 23 registration is capped to VERSIONED (23, 24), and a new non-versioned opset 25 registration is added for all type specializations.

cast_op.cc:

  • REGISTER_KERNEL_TYPED(T): opset 23 → VERSIONED (23, 24), added non-versioned opset 25
  • Renamed REGISTER_KERNEL_TYPED_23REGISTER_KERNEL_TYPED_23_TO_24 (VERSIONED)
  • Added REGISTER_KERNEL_TYPED_25 macro (non-versioned)
  • Renamed SPECIALIZE_IMPL_19_TO_23SPECIALIZE_IMPL_19_TO_25, covering Float8 types through opset 25
  • Updated Float4E2M1x2 registration to use new versioned/non-versioned macros

cuda_execution_provider.cc:

  • Forward declarations: all opset 23 Cast entries → VERSIONED (23, 24), added opset 25 non-versioned entries (all 16 types: 13 standard + 2 Float8 + 1 Float4)
  • BuildKernelCreateInfo: same pattern — capped 23 to (23, 24), added opset 25 block

Motivation and Context

CUDA Cast operator was registered up to opset 23, but ONNX spec defines Cast through opset 25. This gap can cause kernel lookup failures when running models exported at opset 25. Part of the broader CUDA opset gap-filling effort tracked in #27729.

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
@tianleiwu
Copy link
Copy Markdown
Contributor

/azp run Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI and others added 2 commits March 18, 2026 16:40
…ange for CUDA EP

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
@tianleiwu tianleiwu marked this pull request as ready for review March 18, 2026 23:48
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/core/providers/cuda/tensor/cast_op.cc Outdated
Comment thread onnxruntime/core/providers/cuda/tensor/cast_op.cc Outdated
std::vector<bool> is bit-packed and cannot be converted to gsl::span.
Use a C-style bool array with explicit span construction, matching the
pattern used by existing bool tests in the same file.
Co-authored-by: Copilot <copilot@github.com>
@justinchuby
Copy link
Copy Markdown
Contributor

I think there is also issues with reshape: #28368

@tianleiwu tianleiwu enabled auto-merge (squash) May 5, 2026 19:56
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Clean mechanical opset gap-fill extending CUDA Cast kernel registrations from opset 23 to opset 25. The implementation correctly versions existing opset 23 entries as (23, 24) and adds non-versioned opset 25 entries, consistent with how the CPU EP already handles Cast at these opset versions.

Positives:

  • Complete contiguous opset coverage from opset 6 through 25 in the REGISTER_KERNEL_TYPED macro.
  • All 16 type specializations (13 standard + 2 Float8 + 1 Float4) properly registered for both versioned (23,24) and non-versioned opset 25 blocks.
  • Good test coverage: standard types, Float8, and Float4 at opset 25 with proper CUDA-only guards and compute capability checks.
  • Combining opsets 23 and 24 into one versioned registration is valid since the kernel implementation is identical between them.

LGTM.

Comment thread onnxruntime/core/providers/cuda/cuda_execution_provider.cc
@tianleiwu tianleiwu merged commit ee5158e into main May 5, 2026
89 checks passed
@tianleiwu tianleiwu deleted the copilot/update-cast-cuda-operator branch May 5, 2026 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants