Skip to content

[Build] Link PYTORCH_ROCM_ARCH specified archs#15

Merged
amirakb89 merged 1 commit into
ROCm:mainfrom
rjrock:fix/link_offload_archs
Mar 27, 2026
Merged

[Build] Link PYTORCH_ROCM_ARCH specified archs#15
amirakb89 merged 1 commit into
ROCm:mainfrom
rjrock:fix/link_offload_archs

Conversation

@rjrock
Copy link
Copy Markdown
Contributor

@rjrock rjrock commented Mar 23, 2026

Motivation

In the vLLM CI we see the error message

'Failed: CUDA error /app/DeepEP/csrc/kernels/launch_hip.cuh:71 'invalid kernel file''

when using deep_ep. Although we specify gfx950 in the env var PYTORCH_ROCM_ARCH, the gfx950 kernels are not linked into the shared object.

Technical Details

The offload architectures are explicitly linked into the shared object file. Previously, whatever architecture was discovered at runtime was linked.

@amirakb89 amirakb89 requested review from amirakb89 March 23, 2026 21:08
Copy link
Copy Markdown
Contributor

@amirakb89 amirakb89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@amirakb89 amirakb89 requested review from itej89 and liligwu March 23, 2026 21:18
@rjrock
Copy link
Copy Markdown
Contributor Author

rjrock commented Mar 26, 2026

@liligwu could you take a look? I used this DeepEP fork with the previously failing test in vLLM CI and it passed.

Copy link
Copy Markdown
Collaborator

@liligwu liligwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amirakb89 amirakb89 merged commit 5d90af8 into ROCm:main Mar 27, 2026
@rjrock rjrock deleted the fix/link_offload_archs branch March 27, 2026 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants