Skip to content

[ROCm-Profiler] Add lds stall preset#211

Open
diptorupd wants to merge 2 commits intoROCm:amd-integrationfrom
diptorupd:feature/lds-stall-preset-in-profiler
Open

[ROCm-Profiler] Add lds stall preset#211
diptorupd wants to merge 2 commits intoROCm:amd-integrationfrom
diptorupd:feature/lds-stall-preset-in-profiler

Conversation

@diptorupd
Copy link
Copy Markdown
Collaborator

Adds one more preset to the rocm_profiler to capture LDS stall cycles.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ROCm profiler counter preset aimed at diagnosing LDS/VMEM stall behavior, and introduces automatic derived stall-metric reporting when the required PMCs are present in the collected counter CSVs.

Changes:

  • Added a new lds_stall 3-pass rocprofv3 preset to capture LDS/VMEM stall + latency counters under gfx942 constraints.
  • Added automatic post-processing to print an LDS/VMEM stall analysis table when stall counters are detected.
  • Updated the FA2 prefill ROCm benchmark documentation to include the new preset and the derived metrics it prints.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
rocm_profiler/rocm_profiler.py Adds lds_stall preset and implements stall table detection + printing after profiling.
benchmarks/rocm_benchmarks/bench_fa2_prefill.py Documents the new lds_stall preset and its derived metrics in the usage header.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rocm_profiler/rocm_profiler.py Outdated
Comment thread rocm_profiler/rocm_profiler.py Outdated
@diptorupd diptorupd requested a review from rtmadduri April 1, 2026 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants