Skip to content

Feature/amdgpu warp shfl ops#5

Draft
diptorupd wants to merge 2 commits intoROCm:amd-integrationfrom
diptorupd:feature/amdgpu_warp_shfl_ops
Draft

Feature/amdgpu warp shfl ops#5
diptorupd wants to merge 2 commits intoROCm:amd-integrationfrom
diptorupd:feature/amdgpu_warp_shfl_ops

Conversation

@diptorupd
Copy link
Copy Markdown
Collaborator

Add AMDGPU wave64 warp shuffle lowering for MI325X (gfx942/CDNA3):

  • New op: cuda_shfl_xor_sync_f32
  • AMDGPU lowering for shfl_xor/down/sync (i32 and f32) via llvm.amdgcn.ds.bpermute
  • shfl_xor_f32 raises NotImplementedError on CUDA (NVPTX lowering not implemented)
    8 wave64 tests validating butterfly reductions and broadcasts:
  • test_shfl_xor_f32_wave64_butterfly (integer values)
  • test_shfl_xor_f32_wave64_butterfly_fractional (IEEE 754 bitcast verification)
  • test_shfl_xor_i32_wave64_butterfly
  • test_shfl_down_f32/i32_wave64_butterfly
  • test_shfl_sync_f32/i32_wave64 (broadcast)
  • test_shfl_xor_f32_wave64_asymmetric (wrap-around)

Phase 0 of the monolith-warp-butterfly plan did not touch any quadrants
source; this commit only ignores the container bootstrap marker file
(.dev-installed) so the working tree stays clean after first-run of the
dev container.

The wheel produced from this commit with
  QUADRANTS_CMAKE_ARGS="-DQD_WITH_VULKAN:BOOL=ON -DQD_WITH_AMDGPU:BOOL=ON \
                       -DQD_WITH_CUDA:BOOL=ON -DQD_BUILD_TESTS:BOOL=OFF"
is what generated the Genesis-side `baselines/phase0/` numbers. Phase 1a
branches off of here.

Made-with: Cursor
Add AMDGPU wave64 warp shuffle lowering for MI325X (gfx942/CDNA3):
- New op: cuda_shfl_xor_sync_f32
- AMDGPU lowering for shfl_xor/down/sync (i32 and f32) via llvm.amdgcn.ds.bpermute
- shfl_xor_f32 raises NotImplementedError on CUDA (NVPTX lowering not implemented)

8 wave64 tests validating butterfly reductions and broadcasts:
- test_shfl_xor_f32_wave64_butterfly (integer values)
- test_shfl_xor_f32_wave64_butterfly_fractional (IEEE 754 bitcast verification)
- test_shfl_xor_i32_wave64_butterfly
- test_shfl_down_f32/i32_wave64_butterfly
- test_shfl_sync_f32/i32_wave64 (broadcast)
- test_shfl_xor_f32_wave64_asymmetric (wrap-around)
@jamesETsmith
Copy link
Copy Markdown
Collaborator

How similar is this to: Genesis-Embodied-AI#510

@diptorupd diptorupd marked this pull request as draft April 22, 2026 16:44
@gpinkert gpinkert mentioned this pull request Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants