Skip to content

Conversation

@PJAvinash
Copy link
Contributor

@PJAvinash PJAvinash commented Dec 8, 2025

Details

Navi4 Perfromance tuning

  • Single and Multinode

Work item: "Internal"

What were the changes?

  • Minor improvements to Tuning table isolation for different architecture
  • Single node LL/Simple protocol selection range update
  • LL enablement for gfx1201

Why were the changes made?

  • Tuning protocol selection for Single node, as the default choice is no optimal

How was the outcome achieved?
Experimentation with Single and two node system with 8x GPUs each

Additional Documentation:

  • The PR is applicable only if HIP/HSA uncached memory allocation changes are available
  • The existing tuning table for gfx1201 may need additional tuning as we get cluster with more nodes

Approval Checklist

@PJAvinash PJAvinash self-assigned this Dec 8, 2025
@PJAvinash PJAvinash added ci:extended ci:regression-detection Run through all collectives and data types to identify any performance issues ci:code-coverage labels Dec 8, 2025
@PJAvinash PJAvinash changed the title Navi4 ll tuning Navi4 LL enablement and tuning Dec 8, 2025
@ROCmMathLibrariesBot
Copy link

regression-detection run on commit 7fa02b7

Artifacts - Results

Copy link
Contributor

@corey-derochie-amd corey-derochie-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
@wenkaidu @alex-breslow-amd do you think we need the extra INFO logging? I guess more communication is often better, but just want your opinions.

@ROCmMathLibrariesBot
Copy link

regression-detection run on commit f5c9723

Artifacts - Results

@PJAvinash PJAvinash merged commit 9545ae0 into ROCm:develop Jan 5, 2026
3 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:code-coverage ci:extended ci:regression-detection Run through all collectives and data types to identify any performance issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants