Skip to content

chore(docs): Update minimum UFM version and congestion config guidance#1463

Open
hasayesh wants to merge 5 commits intoNVIDIA:mainfrom
hasayesh:update_ib_runbook
Open

chore(docs): Update minimum UFM version and congestion config guidance#1463
hasayesh wants to merge 5 commits intoNVIDIA:mainfrom
hasayesh:update_ib_runbook

Conversation

@hasayesh
Copy link
Copy Markdown
Contributor

@hasayesh hasayesh commented May 7, 2026

…B runbook (#1459)

Updates docs/playbooks/ib_runbook.md to set the minimum supported UFM release to 6.23.1-6, update the example ufm_release_version, note that InfiniBand switches must be updated accordingly, and link the NVIDIA UFM Enterprise REST API Guide v6.23.1.

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)
  • This PR contains breaking changes
  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Description

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@hasayesh hasayesh requested a review from Coco-Ben as a code owner May 7, 2026 00:39
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hasayesh hasayesh requested a review from ajf May 7, 2026 00:40
@ajf
Copy link
Copy Markdown
Collaborator

ajf commented May 7, 2026

@hasayesh this is missing a commit signature. Please add and force-push; then we can merge.

…B runbook (NVIDIA#1459)

<!-- Describe what this PR does -->

Adds initial OpenSM parameters (`max_op_vls`, `ar_tree_asymmetric_flow`) to `docs/playbooks/ib_runbook.md` under UFM static configuration, using `$UFM_HOME/ufm/files/conf/opensm/opensm.conf`. Includes wording for matching existing deployments to this guideline and restarting OpenSM/UFM after edits.

<!-- Check one that best describes this PR -->
- [ ] **Add** - New feature or capability
- [ ] **Change** - Changes in existing functionality
- [ ] **Fix** - Bug fixes
- [ ] **Remove** - Removed features or deprecated functionality
- [x] **Internal** - Internal changes (refactoring, tests, docs, etc.)

<!-- If applicable, provide GitHub Issue. -->

- [ ] This PR contains breaking changes

<!-- If checked above, describe the breaking changes and migration steps
-->

<!-- How was this tested? Check all that apply -->
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manual testing performed
- [x] No testing required (docs, internal refactor, etc.)

<!-- Any additional context, deployment notes, or reviewer guidance -->

---------

Signed-off-by: Hamid Asayesh <hasayesh@nvidia.com>
@hasayesh hasayesh force-pushed the update_ib_runbook branch from 9be6599 to fde407d Compare May 7, 2026 21:55
@hasayesh
Copy link
Copy Markdown
Contributor Author

hasayesh commented May 7, 2026

@ajf I force-pushed a signed tip and the Commits tab shows Verified on the relevant commits; is that ok now? (Build failed for unrelated to this).

<!-- Describe what this PR does -->

Updates docs/playbooks/ib_runbook.md to set the minimum supported UFM release to `6.23.1-6`, update the example `ufm_release_version`, note that InfiniBand switches must be updated accordingly, and link the NVIDIA UFM Enterprise REST API Guide v6.23.1.

<!-- Check one that best describes this PR -->
- [ ] **Add** - New feature or capability
- [ ] **Change** - Changes in existing functionality
- [ ] **Fix** - Bug fixes
- [ ] **Remove** - Removed features or deprecated functionality
- [x] **Internal** - Internal changes (refactoring, tests, docs, etc.)

<!-- If applicable, provide GitHub Issue. -->

- [ ] This PR contains breaking changes

<!-- If checked above, describe the breaking changes and migration steps
-->

<!-- How was this tested? Check all that apply -->
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manual testing performed
- [x] No testing required (docs, internal refactor, etc.)

<!-- Any additional context, deployment notes, or reviewer guidance -->

---------

Signed-off-by: Hamid Asayesh <hasayesh@nvidia.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

@ajf ajf changed the title chore(docs): Document initial OpenSM settings for UFM congestion in I… chore(docs): Update minimum UFM version and congestion config guidance May 8, 2026
@ajf ajf enabled auto-merge (squash) May 8, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants