Skip to content

feat: Update mbridge with cache support#1187

Merged
terrykong merged 1 commit intomainfrom
zhiyul/cache_in_mbridge
Sep 26, 2025
Merged

feat: Update mbridge with cache support#1187
terrykong merged 1 commit intomainfrom
zhiyul/cache_in_mbridge

Conversation

@ZhiyuLi-Nvidia
Copy link
Copy Markdown
Contributor

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia commented Sep 22, 2025

What does this PR do ?

This is to add cache support in mbridge submodule, which mitigates the dsv3 refitting regression:

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Chores
    • Updated a third-party submodule to the latest revision to keep dependencies current.
    • No changes to app functionality, UI, or public APIs.
    • No configuration or migration required for end-users.
    • This update is intended to maintain alignment with upstream without altering behavior.
    • Expect no impact on performance, features, or compatibility.

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia requested a review from a team as a code owner September 22, 2025 20:53
@github-actions
Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: 0f35de4 (PR #1187 from zhiyul/cache_in_mbridge)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@ZhiyuLi-Nvidia
Copy link
Copy Markdown
Contributor Author

@terrykong @guyueh1 could you take a review?

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 22, 2025

📝 Walkthrough

Walkthrough

Updated the Git submodule reference for 3rdparty/Megatron-Bridge-workspace/Megatron-Bridge to a new commit SHA. No code or API changes in this repository.

Changes

Cohort / File(s) Summary
Submodule pointer update
3rdparty/Megatron-Bridge-workspace/Megatron-Bridge
Advance submodule from abd52c8 to 85a37ff; no interface or public API changes noted.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Test Results For Major Changes ❓ Inconclusive Based on the PR summary, this is a submodule bump that introduces cache support in Megatron-Bridge with claimed performance improvements and links to Weights & Biases runs; however, there are no explicit test results, numerics/convergence checks, or detailed before/after metrics with configuration captured in the provided description beyond high-level timing claims. Since this change likely affects performance and possibly numerics via caching behavior, the PR should document concrete benchmarks (setup, model/config, hardware, dataset, and exact metrics) and any convergence validation. Given the absence of such testing details in the provided context, the requirement to include test results for major/performance-impacting changes is not met. Please update the PR description to include: (1) links or inline summaries of benchmark runs with exact configs (model, batch size, sequence length, hardware, software versions), (2) before/after timing or throughput tables for the affected paths (e.g., dsv3 refit and moonshort 16B) and (3) a brief numerics/convergence check showing no regression. If such details already exist in the PR body or linked Megatron-Bridge PR, surface them explicitly here to satisfy the check.
✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat: Update mbridge with cache support" is concise, follows commit-style conventions, and accurately summarizes the primary change (updating the mbridge/Megatron-Bridge submodule to add cache support) described in the PR summary and raw changes. It is specific and clear enough for a teammate scanning history and does not include noise or ambiguous wording.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch zhiyul/cache_in_mbridge

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
3rdparty/Megatron-Bridge-workspace/Megatron-Bridge (1)

1-1: Document/cache toggle and rollback plan.

Cache support in the submodule can change behavior/perf. Please add a brief CHANGELOG/release note and mention any flags/env vars to disable it for rollback if needed; link the W&B runs for reproducibility.

I can draft the release-note snippet if helpful.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42aa41b and 0f35de4.

📒 Files selected for processing (1)
  • 3rdparty/Megatron-Bridge-workspace/Megatron-Bridge (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (2)
3rdparty/Megatron-Bridge-workspace/Megatron-Bridge (2)

1-1: Submodule pin verified — no action required. 3rdparty/Megatron-Bridge-workspace/Megatron-Bridge is pinned to 85a37ffdf02edc07c0a7ac97cb9abcafcd0ac0ed (merge commit of PR #621) and that commit is the head of branch yifu/nemo-rl-use-chunkpatch-ds; .gitmodules has no branch configured, so the SHA pin is explicit and reproducible.


1-1: Ensure CI fetches submodules recursively (actions/checkout: submodules: true/recursive, fetch-depth: 0)
No .github/workflows/* files were found during verification — confirm your CI (GitHub Actions or other) is configured to fetch submodules (actions/checkout with submodules: true/recursive and fetch-depth: 0) or update CI accordingly.

@github-actions
Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: 583a283 (PR #1187 from zhiyul/cache_in_mbridge)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@ZhiyuLi-Nvidia
Copy link
Copy Markdown
Contributor Author

@yaoyu-33 could you take a review as well?

@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Sep 25, 2025
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
@github-actions
Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: a822dcd (PR #1187 from zhiyul/cache_in_mbridge)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@terrykong terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Sep 26, 2025
@terrykong terrykong enabled auto-merge (squash) September 26, 2025 04:35
@terrykong terrykong merged commit f521459 into main Sep 26, 2025
55 of 58 checks passed
@terrykong terrykong deleted the zhiyul/cache_in_mbridge branch September 26, 2025 09:53
terrykong pushed a commit that referenced this pull request Sep 26, 2025
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests r0.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants