Skip to content

Conversation

@DDEle
Copy link
Contributor

@DDEle DDEle commented Oct 13, 2025

Motivation

To optimize HDim=48 cases.

Technical Details

Update to ROCm/composable_kernel@95bdc74
Update to ROCm/composable_kernel@2d1c9e2

Test Plan

MAX_JOBS=$(nproc) pytest op_tests/test_mha.py -v

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings October 13, 2025 07:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the composable_kernel submodule to implement FMHA (Fused Multi-Head Attention) backward pass optimizations specifically for D48 configurations on GFX950 hardware.

  • Updates composable_kernel submodule commit to include FMHA BWD optimizations
  • Targets D48 dimension size optimizations for GFX950 GPU architecture
  • Focuses on backward pass performance improvements for attention mechanisms

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@DDEle DDEle requested a review from slippedJim October 13, 2025 09:18
@DDEle
Copy link
Contributor Author

DDEle commented Oct 16, 2025

It seems that test_gemm_a8w8_blockscale_mi350 fails (coredump) with a high probability with this CK update, while there is a small probability of coredump in current aiter main branch (with linked CK version).

Another pattern of this failure is that the problem only appears in the first run. The test_gemm_a8w8_blockscale_mi350 runs smoothly in following runs (where jit cache exists).

@valarLip
Copy link
Collaborator

Test failed: op_tests/test_mha.py ?

@valarLip valarLip self-assigned this Oct 17, 2025
@valarLip valarLip merged commit 26aaefc into main Oct 21, 2025
16 of 19 checks passed
@valarLip valarLip deleted the ck-fmha-bwd-d48 branch October 21, 2025 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants