Chenhany/megatron export per layer by ChenhanYu · Pull Request #881 · NVIDIA/Model-Optimizer

ChenhanYu · 2026-02-12T16:31:28Z

What does this PR do?

Type of change: ? Bug fix

Overview: ?

Fixing megatron ignore module has additional . in the suffix
Change megatron export to safe per layer as a safetensor (avoid ghost safetensors)

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Export workflow now supports additional model components (EAGLE/Medusa modules)
- Per-layer model state organization for improved checkpoint management
Bug Fixes
- More robust Hugging Face configuration, tokenizer, and image processor preservation
- Enhanced multimodal component extraction and loading
Refactor
- Optimized model export process with improved per-layer safetensors handling

Signed-off-by: Chenhan Yu <chenhany@nvidia.com>

coderabbitai · 2026-02-12T17:18:11Z

📝 Walkthrough

Walkthrough

Introduces new Hugging Face checkpoint utilities (copy_remote_code and load_multimodal_components), adds per-layer safetensors saving functionality, refactors the export flow to track layer state dicts separately, and makes pretrained_model_name_or_path a required parameter across export functions. The unified exporter now supports saving extra modules (EAGLE/Medusa) and streamlines multimodal component loading.

Changes

Cohort / File(s)	Summary
HF Checkpoint Utilities `modelopt/torch/export/plugins/hf_checkpoint_utils.py`	New module with two functions: `copy_remote_code` validates and copies .py files from a directory, and `load_multimodal_components` loads multimodal tensors from single or sharded safetensors files with progress tracking and error handling.
Per-Layer Safetensors Saving `modelopt/torch/export/plugins/mcore_custom.py`	New function `save_safetensors_by_layer_index` saves per-layer state dicts into separate safetensors shards, creates per-layer metadata, and aggregates into a single model.safetensors.index.json after a distributed barrier.
VLLm Megatron Exporter `modelopt/torch/export/plugins/vllm_fakequant_megatron.py`	Made `pretrained_model_name_or_path` a required parameter (removed Optional and default None) in `save_pretrained` and `export_mcore_gpt_to_hf_vllm_fq` signatures.
Unified Export Refactoring `modelopt/torch/export/unified_export_megatron.py`	Added per-layer state dict tracking via `_layer_state_dicts` and `layer_state_dicts` property; introduced `save_pretrained_extra_modules` for EAGLE/Medusa modules; reworked multimodal loading to use new utility; switched to per-layer safetensors saving; made `pretrained_model_name_or_path` required; streamlined HF config/tokenizer/processor saving blocks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Chenhany/megatron export per layer' is vague and uses generic phrasing that doesn't clearly convey the main purpose of the changes, which involve refactoring safetensors saving logic, introducing multimodal component utilities, and restructuring the export flow.	Consider a more descriptive title like 'Refactor Megatron export to use per-layer safetensors saving with multimodal support' that better captures the core changes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chenhany/megatron_export_per_layer

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@modelopt/torch/export/plugins/hf_checkpoint_utils.py`:
- Around line 91-118: The current code only reads the first shard file to
populate multimodal_state_dict using multimodal_keys_in_shard; instead, iterate
the safetensors_index["weight_map"] to find all keys whose weight_map value
exists and whose key startswith ("multi_modal_projector", "vision_model"), group
those keys by their shard filename, then for each shard
(Path(hf_checkpoint_path)/shard_name) open it with safe_open and load only the
mapped keys into multimodal_state_dict (use f.get_tensor(key)); ensure you
handle duplicate shard filenames, preserve progress reporting (tqdm) and fall
back gracefully if a mapped shard file is missing.

In `@modelopt/torch/export/unified_export_megatron.py`:
- Around line 228-245: The save_pretrained_extra_modules function may write
model.safetensors into a non-existent save_directory; ensure the directory
exists before any writes by creating the directory (e.g., call
os.makedirs(save_directory, exist_ok=True) or
Path(save_directory).mkdir(parents=True, exist_ok=True)) at the start of
save_pretrained_extra_modules, and use os.path.join(save_directory,
"model.safetensors") when calling save_file instead of string concatenation;
keep the _hf_extra_config.save_pretrained call and torch.distributed.barrier
as-is.
- Around line 347-348: Guard the call to copy_remote_code so it only runs for
local directories: check that pretrained_model_name_or_path is a local directory
(e.g., Path(pretrained_model_name_or_path).is_dir()) before calling
copy_remote_code(save_directory) in the is_last_stage_main_rank &&
self._hf_config branch, or alternatively wrap copy_remote_code(...) in a
try/except ValueError to skip non-local HF hub model IDs like
"meta-llama/Llama-2-7b"; update the block with this guard around
copy_remote_code to avoid crashes.
- Around line 337-342: The code attempts to merge multimodal components into
layer_state_dicts[0] but _get_state_dict stores layers keyed by
layer.layer_number (1-based), so accessing 0 will KeyError; update the merge to
target layer_state_dicts[1] (or better: compute first_layer_key = 1 and use
that) and guard by ensuring that key exists (create an empty dict if missing)
before calling update; adjust the block around is_first_stage_main_rank and
self.is_multimodal to call
load_multimodal_components(pretrained_model_name_or_path) and merge into
layer_state_dicts[first_layer_key] rather than layer_state_dicts[0].

🧹 Nitpick comments (5)

modelopt/torch/export/plugins/mcore_custom.py (2)

277-328: Per-layer save/index assembly looks correct for the non-multimodal path.

The writing loop iterates over dict keys (1-based layer numbers) and the index-assembly loop iterates range(total_layers) with +1 offset, so filenames match. The TODO on line 311 about replacing the global barrier is noted.

One minor concern: if layer_state_dicts is ever empty or has gaps (i.e. a layer index is missing), rank 0 will crash with FileNotFoundError when reading the per-layer .json in lines 319-324. Consider adding a guard or logging if a shard file is missing.

303-309: Prefer Path / operator over string concatenation for path construction.

Lines 303 and 309 use save_directory + "/" + meta_filename and similar patterns. While consistent with the older save_safetensors function above, using Path(save_directory) / meta_filename is more robust (handles trailing slashes, OS differences). Not blocking, but worth aligning with the Path usage already present elsewhere in the codebase.
modelopt/torch/export/plugins/vllm_fakequant_megatron.py (1)
75-86: Dead guard: pretrained_model_name_or_path is not None is always True now.

Since pretrained_model_name_or_path is now required (str | os.PathLike), the is not None check on line 79 is always True, making the assert equivalent to assert not self.is_multimodal. Consider simplifying for clarity:
Suggested simplification
-        assert not (self.is_multimodal and pretrained_model_name_or_path is not None), (
-            "Exporting weights in bf16 and amax values is not supported for multimodal models "
-            "when pretrained_model_name_or_path is not None"
-        )
+        assert not self.is_multimodal, (
+            "Exporting weights in bf16 and amax values is not supported for multimodal models"
+        )
modelopt/torch/export/plugins/hf_checkpoint_utils.py (1)

81-89: Use logging instead of print for status and warning messages.

Multiple print() calls throughout this module (lines 81, 92, 102, 111, 117, 120, 122) should use logging.info / logging.warning for consistency with library conventions and to allow callers to control verbosity.

Also applies to: 92-92, 102-102, 111-112, 117-117, 120-120, 122-122

modelopt/torch/export/unified_export_megatron.py (1)

388-418: Per-layer state dict tracking via shared reference — last layer accumulates final norm and output layer.

The logic on lines 405-407 assigns self._state_dict by reference to _layer_state_dicts[layer.layer_number], then resets self._state_dict for all layers except the last. This means subsequent writes (final_layernorm, output_layer on lines 410-418) mutate the last layer's entry in-place. This is correct but subtle — a comment noting this intentional aliasing would help future readers.

coderabbitai · 2026-02-12T17:24:13Z

modelopt/torch/export/plugins/hf_checkpoint_utils.py

+    elif safetensors_index_file.is_file():
+        print(f"Loading multimodal components from sharded model: {hf_checkpoint_path}")
+        with open(safetensors_index_file) as f:
+            safetensors_index = json.load(f)
+
+        # For multimodal models, vision_model and multi_modal_projector are in the first shard
+        all_shard_files = sorted(set(safetensors_index["weight_map"].values()))
+        first_shard_file = all_shard_files[0]  # e.g., "model-00001-of-00050.safetensors"
+
+        # Load multimodal components from the first shard file
+        safetensors_filepath = Path(hf_checkpoint_path) / first_shard_file
+        print(f"Loading multimodal components from {first_shard_file}")
+
+        with safe_open(safetensors_filepath, framework="pt") as f:
+            shard_keys = list(f.keys())
+            multimodal_keys_in_shard = [
+                k for k in shard_keys if k.startswith(("multi_modal_projector", "vision_model"))
+            ]
+
+            if multimodal_keys_in_shard:
+                print(
+                    f"Found {len(multimodal_keys_in_shard)} multimodal tensors in {first_shard_file}"
+                )
+                for key in tqdm(multimodal_keys_in_shard, desc="Loading multimodal tensors"):
+                    multimodal_state_dict[key] = f.get_tensor(key)
+            else:
+                print(f"No multimodal components found in {first_shard_file}")
+


⚠️ Potential issue | 🟠 Major

Multimodal components may span multiple shards — only loading the first shard is fragile.

The code assumes all multimodal tensors reside in the first shard (line 96-98), but model.safetensors.index.json already provides the weight_map that maps each key to its exact shard file. If a model's multimodal weights are split across shards, this silently returns an incomplete state dict.

Suggested fix: use the weight_map to load from the correct shards

elif safetensors_index_file.is_file(): print(f"Loading multimodal components from sharded model: {hf_checkpoint_path}") with open(safetensors_index_file) as f: safetensors_index = json.load(f) - # For multimodal models, vision_model and multi_modal_projector are in the first shard - all_shard_files = sorted(set(safetensors_index["weight_map"].values())) - first_shard_file = all_shard_files[0] # e.g., "model-00001-of-00050.safetensors" - - # Load multimodal components from the first shard file - safetensors_filepath = Path(hf_checkpoint_path) / first_shard_file - print(f"Loading multimodal components from {first_shard_file}") - - with safe_open(safetensors_filepath, framework="pt") as f: - shard_keys = list(f.keys()) - multimodal_keys_in_shard = [ - k for k in shard_keys if k.startswith(("multi_modal_projector", "vision_model")) - ] - - if multimodal_keys_in_shard: - print( - f"Found {len(multimodal_keys_in_shard)} multimodal tensors in {first_shard_file}" - ) - for key in tqdm(multimodal_keys_in_shard, desc="Loading multimodal tensors"): - multimodal_state_dict[key] = f.get_tensor(key) - else: - print(f"No multimodal components found in {first_shard_file}") + # Find shards that contain multimodal keys using the weight_map + multimodal_shard_keys = {} + for key, shard_file in safetensors_index["weight_map"].items(): + if key.startswith(("multi_modal_projector", "vision_model")): + multimodal_shard_keys.setdefault(shard_file, []).append(key) + + for shard_file, keys in multimodal_shard_keys.items(): + safetensors_filepath = Path(hf_checkpoint_path) / shard_file + print(f"Loading {len(keys)} multimodal tensors from {shard_file}") + with safe_open(safetensors_filepath, framework="pt") as f: + for key in tqdm(keys, desc=f"Loading from {shard_file}"): + multimodal_state_dict[key] = f.get_tensor(key) + + if not multimodal_shard_keys: + print("No multimodal components found in weight_map")

🤖 Prompt for AI Agents

In `@modelopt/torch/export/plugins/hf_checkpoint_utils.py` around lines 91 - 118, The current code only reads the first shard file to populate multimodal_state_dict using multimodal_keys_in_shard; instead, iterate the safetensors_index["weight_map"] to find all keys whose weight_map value exists and whose key startswith ("multi_modal_projector", "vision_model"), group those keys by their shard filename, then for each shard (Path(hf_checkpoint_path)/shard_name) open it with safe_open and load only the mapped keys into multimodal_state_dict (use f.get_tensor(key)); ensure you handle duplicate shard filenames, preserve progress reporting (tqdm) and fall back gracefully if a mapped shard file is missing.

coderabbitai · 2026-02-12T17:24:13Z

modelopt/torch/export/unified_export_megatron.py

+    def save_pretrained_extra_modules(
+        self,
+        save_directory: str | os.PathLike,
+    ):
+        """Save a EAGLE or Medusa checkpoints which can be deployed by vLLM and TensorRT-LLM."""
+        # We use the last PP rank to write the config because
+        # medusa_heads and eagle_module only exist in the last stage.
+        pp_rank = get_pipeline_model_parallel_rank()
+        pp_size = get_pipeline_model_parallel_world_size()
+        is_last_stage_main_rank = pp_rank == pp_size - 1
+
+        state_dict = self.extra_state_dict
+
+        if is_last_stage_main_rank and self._hf_extra_config is not None:
+            self._hf_extra_config.save_pretrained(save_directory)
+            save_file(state_dict, save_directory + "/model.safetensors", metadata={"format": "pt"})
+
+        torch.distributed.barrier()


⚠️ Potential issue | 🟠 Major

save_pretrained_extra_modules does not create save_directory.

Unlike save_pretrained (where _hf_config.save_pretrained(save_directory) implicitly creates the directory), save_pretrained_extra_modules writes directly to save_directory + "/model.safetensors" without ensuring the directory exists. This will fail with FileNotFoundError if export_dir doesn't already exist.

Proposed fix

def save_pretrained_extra_modules( self, save_directory: str | os.PathLike, ): """Save a EAGLE or Medusa checkpoints which can be deployed by vLLM and TensorRT-LLM.""" + os.makedirs(save_directory, exist_ok=True) # We use the last PP rank to write the config because

🤖 Prompt for AI Agents

In `@modelopt/torch/export/unified_export_megatron.py` around lines 228 - 245, The save_pretrained_extra_modules function may write model.safetensors into a non-existent save_directory; ensure the directory exists before any writes by creating the directory (e.g., call os.makedirs(save_directory, exist_ok=True) or Path(save_directory).mkdir(parents=True, exist_ok=True)) at the start of save_pretrained_extra_modules, and use os.path.join(save_directory, "model.safetensors") when calling save_file instead of string concatenation; keep the _hf_extra_config.save_pretrained call and torch.distributed.barrier as-is.

coderabbitai · 2026-02-12T17:24:13Z

modelopt/torch/export/unified_export_megatron.py

+        # Add multimodal components to state_dict. Since only support decoder model quantization,
+        # no changes will be made to the multimodal components. We copy the multimodal components
+        # from the pretrained model directly to the state_dict to avoid implementing the export logic.
+        if is_first_stage_main_rank and self.is_multimodal:
+            multimodal_state_dict = load_multimodal_components(pretrained_model_name_or_path)
+            layer_state_dicts[0].update(multimodal_state_dict)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check the file structure and read the relevant sections cat -n modelopt/torch/export/unified_export_megatron.py | sed -n '375n;375,420p'

Repository: NVIDIA/Model-Optimizer

Length of output: 2165

🏁 Script executed:

# Search for layer_number assignments to understand the numbering scheme rg 'layer\.layer_number|layer_number\s*=' --context=3

Repository: NVIDIA/Model-Optimizer

Length of output: 9299

🏁 Script executed:

# Check if there's any initialization of layer_state_dicts with key 0 rg 'layer_state_dicts\[0\]' modelopt/torch/export/unified_export_megatron.py

Repository: NVIDIA/Model-Optimizer

Length of output: 128

🏁 Script executed:

# Look at the property definition for layer_state_dicts rg -A5 'def layer_state_dicts' modelopt/torch/export/unified_export_megatron.py

Repository: NVIDIA/Model-Optimizer

Length of output: 233

layer_state_dicts[0] will raise KeyError — layer indices are 1-based.

The _get_state_dict method stores layer state dicts using layer.layer_number as keys (line 405), which is 1-based and ranges from 1 to num_layers. Accessing key 0 on line 342 will crash at runtime for any multimodal model.

Proposed fix: merge into the first layer's dict (key 1)

if is_first_stage_main_rank and self.is_multimodal: multimodal_state_dict = load_multimodal_components(pretrained_model_name_or_path) - layer_state_dicts[0].update(multimodal_state_dict) + layer_state_dicts[1].update(multimodal_state_dict)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Add multimodal components to state_dict. Since only support decoder model quantization,

# no changes will be made to the multimodal components. We copy the multimodal components

# from the pretrained model directly to the state_dict to avoid implementing the export logic.

if is_first_stage_main_rank and self.is_multimodal:

multimodal_state_dict = load_multimodal_components(pretrained_model_name_or_path)

layer_state_dicts[0].update(multimodal_state_dict)

# Add multimodal components to state_dict. Since only support decoder model quantization,

# no changes will be made to the multimodal components. We copy the multimodal components

# from the pretrained model directly to the state_dict to avoid implementing the export logic.

if is_first_stage_main_rank and self.is_multimodal:

multimodal_state_dict = load_multimodal_components(pretrained_model_name_or_path)

layer_state_dicts[1].update(multimodal_state_dict)

🤖 Prompt for AI Agents

In `@modelopt/torch/export/unified_export_megatron.py` around lines 337 - 342, The code attempts to merge multimodal components into layer_state_dicts[0] but _get_state_dict stores layers keyed by layer.layer_number (1-based), so accessing 0 will KeyError; update the merge to target layer_state_dicts[1] (or better: compute first_layer_key = 1 and use that) and guard by ensuring that key exists (create an empty dict if missing) before calling update; adjust the block around is_first_stage_main_rank and self.is_multimodal to call load_multimodal_components(pretrained_model_name_or_path) and merge into layer_state_dicts[first_layer_key] rather than layer_state_dicts[0].

coderabbitai · 2026-02-12T17:24:14Z

modelopt/torch/export/unified_export_megatron.py

+        if is_last_stage_main_rank and self._hf_config is not None:
+            copy_remote_code(pretrained_model_name_or_path, save_directory)


⚠️ Potential issue | 🟠 Major

copy_remote_code will crash when pretrained_model_name_or_path is a HuggingFace hub model ID.

copy_remote_code raises ValueError if the path is not a directory, but pretrained_model_name_or_path can be a HF hub model ID (e.g., "meta-llama/Llama-2-7b"). Guard with an is_dir() check or handle the non-local-path case:

Proposed fix

if is_last_stage_main_rank and self._hf_config is not None: - copy_remote_code(pretrained_model_name_or_path, save_directory) + if Path(pretrained_model_name_or_path).is_dir(): + copy_remote_code(pretrained_model_name_or_path, save_directory)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if is_last_stage_main_rank and self._hf_config is not None:

copy_remote_code(pretrained_model_name_or_path, save_directory)

if is_last_stage_main_rank and self._hf_config is not None:

if Path(pretrained_model_name_or_path).is_dir():

copy_remote_code(pretrained_model_name_or_path, save_directory)

🤖 Prompt for AI Agents

In `@modelopt/torch/export/unified_export_megatron.py` around lines 347 - 348, Guard the call to copy_remote_code so it only runs for local directories: check that pretrained_model_name_or_path is a local directory (e.g., Path(pretrained_model_name_or_path).is_dir()) before calling copy_remote_code(save_directory) in the is_last_stage_main_rank && self._hf_config branch, or alternatively wrap copy_remote_code(...) in a try/except ValueError to skip non-local HF hub model IDs like "meta-llama/Llama-2-7b"; update the block with this guard around copy_remote_code to avoid crashes.

codecov · 2026-02-12T17:29:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.73%. Comparing base (3e95d9f) to head (757c915).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #881   +/-   ##
=======================================
  Coverage   73.73%   73.73%           
=======================================
  Files         199      199           
  Lines       21165    21165           
=======================================
  Hits        15606    15606           
  Misses       5559     5559

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

meenchen · 2026-02-12T17:54:59Z

modelopt/torch/export/plugins/hf_checkpoint_utils.py

+        all_shard_files = sorted(set(safetensors_index["weight_map"].values()))
+        first_shard_file = all_shard_files[0]  # e.g., "model-00001-of-00050.safetensors"
+
+        # Load multimodal components from the first shard file


qq, is multimodel always in the first shard? Could it be in the first multiple shards?

Good question. We will leave this to @yueshen2016 to improve.

meenchen · 2026-02-12T17:56:15Z

modelopt/torch/export/unified_export_megatron.py

                json.dump(config_dict, f, indent=4)

-        save_safetensors(state_dict, save_directory)
+        # save_safetensors(state_dict, save_directory)


In case you want to remove this.

## What does this PR do? **Type of change:** ?  Bug fix **Overview:** ? 1. Fixing megatron ignore module has additional `.` in the suffix 2. Change megatron export to safe per layer as a safetensor (avoid ghost safetensors) ## Usage  ```python # Add a code snippet demonstrating how to use this ``` ## Testing  ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No  - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No  ## Additional Information   ## Summary by CodeRabbit ## Release Notes * **New Features** * Export workflow now supports additional model components (EAGLE/Medusa modules) * Per-layer model state organization for improved checkpoint management * **Bug Fixes** * More robust Hugging Face configuration, tokenizer, and image processor preservation * Enhanced multimodal component extraction and loading * **Refactor** * Optimized model export process with improved per-layer safetensors handling  Signed-off-by: Chenhan Yu <chenhany@nvidia.com> Signed-off-by: Hung-Yueh <hungyueh.chiang@gmail.com>

## What does this PR do? **Type of change:** ?  Bug fix **Overview:** ? 1. Fixing megatron ignore module has additional `.` in the suffix 2. Change megatron export to safe per layer as a safetensor (avoid ghost safetensors) ## Usage  ```python # Add a code snippet demonstrating how to use this ``` ## Testing  ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No  - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No  ## Additional Information   ## Summary by CodeRabbit ## Release Notes * **New Features** * Export workflow now supports additional model components (EAGLE/Medusa modules) * Per-layer model state organization for improved checkpoint management * **Bug Fixes** * More robust Hugging Face configuration, tokenizer, and image processor preservation * Enhanced multimodal component extraction and loading * **Refactor** * Optimized model export process with improved per-layer safetensors handling  Signed-off-by: Chenhan Yu <chenhany@nvidia.com>

ChenhanYu requested a review from a team as a code owner February 12, 2026 16:31

ChenhanYu requested review from Edwardf0t1, jenchen13, meenchen and yueshen2016 and removed request for Edwardf0t1 February 12, 2026 16:31

chg: update

757c915

Signed-off-by: Chenhan Yu <chenhany@nvidia.com>

ChenhanYu force-pushed the chenhany/megatron_export_per_layer branch from 44e6e0b to 757c915 Compare February 12, 2026 17:17

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

meenchen approved these changes Feb 12, 2026

View reviewed changes

jenchen13 approved these changes Feb 12, 2026

View reviewed changes

ChenhanYu merged commit f9d9a71 into main Feb 12, 2026
37 checks passed

ChenhanYu deleted the chenhany/megatron_export_per_layer branch February 12, 2026 23:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Chenhany/megatron export per layer#881

Chenhany/megatron export per layer#881
ChenhanYu merged 1 commit intomainfrom
chenhany/megatron_export_per_layer

ChenhanYu commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

codecov bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

meenchen Feb 12, 2026

Uh oh!

ChenhanYu Feb 12, 2026

Uh oh!

meenchen Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if is_last_stage_main_rank and self._hf_config is not None:
		copy_remote_code(pretrained_model_name_or_path, save_directory)

Comments

Conversation

ChenhanYu commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

meenchen Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

ChenhanYu Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

meenchen Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChenhanYu commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

codecov bot commented Feb 12, 2026 •

edited

Loading