Skip to content

Fix CUDA EP: add opset 24 kernel registrations for Reshape and Cast#28368

Closed
justinchuby wants to merge 1 commit intomainfrom
fix-cuda-opset24-reshape-cast
Closed

Fix CUDA EP: add opset 24 kernel registrations for Reshape and Cast#28368
justinchuby wants to merge 1 commit intomainfrom
fix-cuda-opset24-reshape-cast

Conversation

@justinchuby
Copy link
Copy Markdown
Contributor

ONNX opset 24 bumped Reshape and Cast (added float8e8m0 type). ORT CUDA EP only had opset 23 registrations, causing these ops to fall to CPUExecutionProvider on opset 24 models — producing ~280 memcpy nodes.

Fix: Version opset 23 registrations to (23, 23) and add non-versioned opset 24 registrations. Same kernel code.

Result: 282 memcpy → 4 memcpy for opset 24 models.

Tested with Gemma4 E2B-it (2B, opset 24) on H200.

ONNX opset 24 bumped Reshape and Cast (added float8e8m0 type support).
ORT CUDA EP only had opset 23 registrations, so these ops fell to
CPUExecutionProvider on opset 24 models, producing ~280
MemcpyFromHost/MemcpyToHost nodes.

Version existing opset 23 registrations to (23, 23) and add new
non-versioned opset 24 registrations. Same kernel implementations.

Result: 282 memcpy → 4 memcpy for opset 24 models on CUDA EP.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
@tianleiwu
Copy link
Copy Markdown
Contributor

Overlap with #27742 and #27744

@justinchuby
Copy link
Copy Markdown
Contributor Author

Will close. LMK when they can be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants