Skip to content

Comments

Chenhany/megatron export per layer#881

Merged
ChenhanYu merged 1 commit intomainfrom
chenhany/megatron_export_per_layer
Feb 12, 2026
Merged

Chenhany/megatron export per layer#881
ChenhanYu merged 1 commit intomainfrom
chenhany/megatron_export_per_layer

Conversation

@ChenhanYu
Copy link
Collaborator

@ChenhanYu ChenhanYu commented Feb 12, 2026

What does this PR do?

Type of change: ? Bug fix

Overview: ?

  1. Fixing megatron ignore module has additional . in the suffix
  2. Change megatron export to safe per layer as a safetensor (avoid ghost safetensors)

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Release Notes

  • New Features

    • Export workflow now supports additional model components (EAGLE/Medusa modules)
    • Per-layer model state organization for improved checkpoint management
  • Bug Fixes

    • More robust Hugging Face configuration, tokenizer, and image processor preservation
    • Enhanced multimodal component extraction and loading
  • Refactor

    • Optimized model export process with improved per-layer safetensors handling

@ChenhanYu ChenhanYu requested a review from a team as a code owner February 12, 2026 16:31
@ChenhanYu ChenhanYu requested review from Edwardf0t1, jenchen13, meenchen and yueshen2016 and removed request for Edwardf0t1 February 12, 2026 16:31
Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
@ChenhanYu ChenhanYu force-pushed the chenhany/megatron_export_per_layer branch from 44e6e0b to 757c915 Compare February 12, 2026 17:17
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

Introduces new Hugging Face checkpoint utilities (copy_remote_code and load_multimodal_components), adds per-layer safetensors saving functionality, refactors the export flow to track layer state dicts separately, and makes pretrained_model_name_or_path a required parameter across export functions. The unified exporter now supports saving extra modules (EAGLE/Medusa) and streamlines multimodal component loading.

Changes

Cohort / File(s) Summary
HF Checkpoint Utilities
modelopt/torch/export/plugins/hf_checkpoint_utils.py
New module with two functions: copy_remote_code validates and copies .py files from a directory, and load_multimodal_components loads multimodal tensors from single or sharded safetensors files with progress tracking and error handling.
Per-Layer Safetensors Saving
modelopt/torch/export/plugins/mcore_custom.py
New function save_safetensors_by_layer_index saves per-layer state dicts into separate safetensors shards, creates per-layer metadata, and aggregates into a single model.safetensors.index.json after a distributed barrier.
VLLm Megatron Exporter
modelopt/torch/export/plugins/vllm_fakequant_megatron.py
Made pretrained_model_name_or_path a required parameter (removed Optional and default None) in save_pretrained and export_mcore_gpt_to_hf_vllm_fq signatures.
Unified Export Refactoring
modelopt/torch/export/unified_export_megatron.py
Added per-layer state dict tracking via _layer_state_dicts and layer_state_dicts property; introduced save_pretrained_extra_modules for EAGLE/Medusa modules; reworked multimodal loading to use new utility; switched to per-layer safetensors saving; made pretrained_model_name_or_path required; streamlined HF config/tokenizer/processor saving blocks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Chenhany/megatron export per layer' is vague and uses generic phrasing that doesn't clearly convey the main purpose of the changes, which involve refactoring safetensors saving logic, introducing multimodal component utilities, and restructuring the export flow. Consider a more descriptive title like 'Refactor Megatron export to use per-layer safetensors saving with multimodal support' that better captures the core changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chenhany/megatron_export_per_layer

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@modelopt/torch/export/plugins/hf_checkpoint_utils.py`:
- Around line 91-118: The current code only reads the first shard file to
populate multimodal_state_dict using multimodal_keys_in_shard; instead, iterate
the safetensors_index["weight_map"] to find all keys whose weight_map value
exists and whose key startswith ("multi_modal_projector", "vision_model"), group
those keys by their shard filename, then for each shard
(Path(hf_checkpoint_path)/shard_name) open it with safe_open and load only the
mapped keys into multimodal_state_dict (use f.get_tensor(key)); ensure you
handle duplicate shard filenames, preserve progress reporting (tqdm) and fall
back gracefully if a mapped shard file is missing.

In `@modelopt/torch/export/unified_export_megatron.py`:
- Around line 228-245: The save_pretrained_extra_modules function may write
model.safetensors into a non-existent save_directory; ensure the directory
exists before any writes by creating the directory (e.g., call
os.makedirs(save_directory, exist_ok=True) or
Path(save_directory).mkdir(parents=True, exist_ok=True)) at the start of
save_pretrained_extra_modules, and use os.path.join(save_directory,
"model.safetensors") when calling save_file instead of string concatenation;
keep the _hf_extra_config.save_pretrained call and torch.distributed.barrier
as-is.
- Around line 347-348: Guard the call to copy_remote_code so it only runs for
local directories: check that pretrained_model_name_or_path is a local directory
(e.g., Path(pretrained_model_name_or_path).is_dir()) before calling
copy_remote_code(save_directory) in the is_last_stage_main_rank &&
self._hf_config branch, or alternatively wrap copy_remote_code(...) in a
try/except ValueError to skip non-local HF hub model IDs like
"meta-llama/Llama-2-7b"; update the block with this guard around
copy_remote_code to avoid crashes.
- Around line 337-342: The code attempts to merge multimodal components into
layer_state_dicts[0] but _get_state_dict stores layers keyed by
layer.layer_number (1-based), so accessing 0 will KeyError; update the merge to
target layer_state_dicts[1] (or better: compute first_layer_key = 1 and use
that) and guard by ensuring that key exists (create an empty dict if missing)
before calling update; adjust the block around is_first_stage_main_rank and
self.is_multimodal to call
load_multimodal_components(pretrained_model_name_or_path) and merge into
layer_state_dicts[first_layer_key] rather than layer_state_dicts[0].
🧹 Nitpick comments (5)
modelopt/torch/export/plugins/mcore_custom.py (2)

277-328: Per-layer save/index assembly looks correct for the non-multimodal path.

The writing loop iterates over dict keys (1-based layer numbers) and the index-assembly loop iterates range(total_layers) with +1 offset, so filenames match. The TODO on line 311 about replacing the global barrier is noted.

One minor concern: if layer_state_dicts is ever empty or has gaps (i.e. a layer index is missing), rank 0 will crash with FileNotFoundError when reading the per-layer .json in lines 319-324. Consider adding a guard or logging if a shard file is missing.


303-309: Prefer Path / operator over string concatenation for path construction.

Lines 303 and 309 use save_directory + "/" + meta_filename and similar patterns. While consistent with the older save_safetensors function above, using Path(save_directory) / meta_filename is more robust (handles trailing slashes, OS differences). Not blocking, but worth aligning with the Path usage already present elsewhere in the codebase.

modelopt/torch/export/plugins/vllm_fakequant_megatron.py (1)

75-86: Dead guard: pretrained_model_name_or_path is not None is always True now.

Since pretrained_model_name_or_path is now required (str | os.PathLike), the is not None check on line 79 is always True, making the assert equivalent to assert not self.is_multimodal. Consider simplifying for clarity:

Suggested simplification
-        assert not (self.is_multimodal and pretrained_model_name_or_path is not None), (
-            "Exporting weights in bf16 and amax values is not supported for multimodal models "
-            "when pretrained_model_name_or_path is not None"
-        )
+        assert not self.is_multimodal, (
+            "Exporting weights in bf16 and amax values is not supported for multimodal models"
+        )
modelopt/torch/export/plugins/hf_checkpoint_utils.py (1)

81-89: Use logging instead of print for status and warning messages.

Multiple print() calls throughout this module (lines 81, 92, 102, 111, 117, 120, 122) should use logging.info / logging.warning for consistency with library conventions and to allow callers to control verbosity.

Also applies to: 92-92, 102-102, 111-112, 117-117, 120-120, 122-122

modelopt/torch/export/unified_export_megatron.py (1)

388-418: Per-layer state dict tracking via shared reference — last layer accumulates final norm and output layer.

The logic on lines 405-407 assigns self._state_dict by reference to _layer_state_dicts[layer.layer_number], then resets self._state_dict for all layers except the last. This means subsequent writes (final_layernorm, output_layer on lines 410-418) mutate the last layer's entry in-place. This is correct but subtle — a comment noting this intentional aliasing would help future readers.

Comment on lines +91 to +118
elif safetensors_index_file.is_file():
print(f"Loading multimodal components from sharded model: {hf_checkpoint_path}")
with open(safetensors_index_file) as f:
safetensors_index = json.load(f)

# For multimodal models, vision_model and multi_modal_projector are in the first shard
all_shard_files = sorted(set(safetensors_index["weight_map"].values()))
first_shard_file = all_shard_files[0] # e.g., "model-00001-of-00050.safetensors"

# Load multimodal components from the first shard file
safetensors_filepath = Path(hf_checkpoint_path) / first_shard_file
print(f"Loading multimodal components from {first_shard_file}")

with safe_open(safetensors_filepath, framework="pt") as f:
shard_keys = list(f.keys())
multimodal_keys_in_shard = [
k for k in shard_keys if k.startswith(("multi_modal_projector", "vision_model"))
]

if multimodal_keys_in_shard:
print(
f"Found {len(multimodal_keys_in_shard)} multimodal tensors in {first_shard_file}"
)
for key in tqdm(multimodal_keys_in_shard, desc="Loading multimodal tensors"):
multimodal_state_dict[key] = f.get_tensor(key)
else:
print(f"No multimodal components found in {first_shard_file}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Multimodal components may span multiple shards — only loading the first shard is fragile.

The code assumes all multimodal tensors reside in the first shard (line 96-98), but model.safetensors.index.json already provides the weight_map that maps each key to its exact shard file. If a model's multimodal weights are split across shards, this silently returns an incomplete state dict.

Suggested fix: use the weight_map to load from the correct shards
     elif safetensors_index_file.is_file():
         print(f"Loading multimodal components from sharded model: {hf_checkpoint_path}")
         with open(safetensors_index_file) as f:
             safetensors_index = json.load(f)
 
-        # For multimodal models, vision_model and multi_modal_projector are in the first shard
-        all_shard_files = sorted(set(safetensors_index["weight_map"].values()))
-        first_shard_file = all_shard_files[0]  # e.g., "model-00001-of-00050.safetensors"
-
-        # Load multimodal components from the first shard file
-        safetensors_filepath = Path(hf_checkpoint_path) / first_shard_file
-        print(f"Loading multimodal components from {first_shard_file}")
-
-        with safe_open(safetensors_filepath, framework="pt") as f:
-            shard_keys = list(f.keys())
-            multimodal_keys_in_shard = [
-                k for k in shard_keys if k.startswith(("multi_modal_projector", "vision_model"))
-            ]
-
-            if multimodal_keys_in_shard:
-                print(
-                    f"Found {len(multimodal_keys_in_shard)} multimodal tensors in {first_shard_file}"
-                )
-                for key in tqdm(multimodal_keys_in_shard, desc="Loading multimodal tensors"):
-                    multimodal_state_dict[key] = f.get_tensor(key)
-            else:
-                print(f"No multimodal components found in {first_shard_file}")
+        # Find shards that contain multimodal keys using the weight_map
+        multimodal_shard_keys = {}
+        for key, shard_file in safetensors_index["weight_map"].items():
+            if key.startswith(("multi_modal_projector", "vision_model")):
+                multimodal_shard_keys.setdefault(shard_file, []).append(key)
+
+        for shard_file, keys in multimodal_shard_keys.items():
+            safetensors_filepath = Path(hf_checkpoint_path) / shard_file
+            print(f"Loading {len(keys)} multimodal tensors from {shard_file}")
+            with safe_open(safetensors_filepath, framework="pt") as f:
+                for key in tqdm(keys, desc=f"Loading from {shard_file}"):
+                    multimodal_state_dict[key] = f.get_tensor(key)
+
+        if not multimodal_shard_keys:
+            print("No multimodal components found in weight_map")
🤖 Prompt for AI Agents
In `@modelopt/torch/export/plugins/hf_checkpoint_utils.py` around lines 91 - 118,
The current code only reads the first shard file to populate
multimodal_state_dict using multimodal_keys_in_shard; instead, iterate the
safetensors_index["weight_map"] to find all keys whose weight_map value exists
and whose key startswith ("multi_modal_projector", "vision_model"), group those
keys by their shard filename, then for each shard
(Path(hf_checkpoint_path)/shard_name) open it with safe_open and load only the
mapped keys into multimodal_state_dict (use f.get_tensor(key)); ensure you
handle duplicate shard filenames, preserve progress reporting (tqdm) and fall
back gracefully if a mapped shard file is missing.

Comment on lines +228 to +245
def save_pretrained_extra_modules(
self,
save_directory: str | os.PathLike,
):
"""Save a EAGLE or Medusa checkpoints which can be deployed by vLLM and TensorRT-LLM."""
# We use the last PP rank to write the config because
# medusa_heads and eagle_module only exist in the last stage.
pp_rank = get_pipeline_model_parallel_rank()
pp_size = get_pipeline_model_parallel_world_size()
is_last_stage_main_rank = pp_rank == pp_size - 1

state_dict = self.extra_state_dict

if is_last_stage_main_rank and self._hf_extra_config is not None:
self._hf_extra_config.save_pretrained(save_directory)
save_file(state_dict, save_directory + "/model.safetensors", metadata={"format": "pt"})

torch.distributed.barrier()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

save_pretrained_extra_modules does not create save_directory.

Unlike save_pretrained (where _hf_config.save_pretrained(save_directory) implicitly creates the directory), save_pretrained_extra_modules writes directly to save_directory + "/model.safetensors" without ensuring the directory exists. This will fail with FileNotFoundError if export_dir doesn't already exist.

Proposed fix
     def save_pretrained_extra_modules(
         self,
         save_directory: str | os.PathLike,
     ):
         """Save a EAGLE or Medusa checkpoints which can be deployed by vLLM and TensorRT-LLM."""
+        os.makedirs(save_directory, exist_ok=True)
         # We use the last PP rank to write the config because
🤖 Prompt for AI Agents
In `@modelopt/torch/export/unified_export_megatron.py` around lines 228 - 245, The
save_pretrained_extra_modules function may write model.safetensors into a
non-existent save_directory; ensure the directory exists before any writes by
creating the directory (e.g., call os.makedirs(save_directory, exist_ok=True) or
Path(save_directory).mkdir(parents=True, exist_ok=True)) at the start of
save_pretrained_extra_modules, and use os.path.join(save_directory,
"model.safetensors") when calling save_file instead of string concatenation;
keep the _hf_extra_config.save_pretrained call and torch.distributed.barrier
as-is.

Comment on lines +337 to +342
# Add multimodal components to state_dict. Since only support decoder model quantization,
# no changes will be made to the multimodal components. We copy the multimodal components
# from the pretrained model directly to the state_dict to avoid implementing the export logic.
if is_first_stage_main_rank and self.is_multimodal:
multimodal_state_dict = load_multimodal_components(pretrained_model_name_or_path)
layer_state_dicts[0].update(multimodal_state_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check the file structure and read the relevant sections
cat -n modelopt/torch/export/unified_export_megatron.py | sed -n '375n;375,420p'

Repository: NVIDIA/Model-Optimizer

Length of output: 2165


🏁 Script executed:

# Search for layer_number assignments to understand the numbering scheme
rg 'layer\.layer_number|layer_number\s*=' --context=3

Repository: NVIDIA/Model-Optimizer

Length of output: 9299


🏁 Script executed:

# Check if there's any initialization of layer_state_dicts with key 0
rg 'layer_state_dicts\[0\]' modelopt/torch/export/unified_export_megatron.py

Repository: NVIDIA/Model-Optimizer

Length of output: 128


🏁 Script executed:

# Look at the property definition for layer_state_dicts
rg -A5 'def layer_state_dicts' modelopt/torch/export/unified_export_megatron.py

Repository: NVIDIA/Model-Optimizer

Length of output: 233


layer_state_dicts[0] will raise KeyError — layer indices are 1-based.

The _get_state_dict method stores layer state dicts using layer.layer_number as keys (line 405), which is 1-based and ranges from 1 to num_layers. Accessing key 0 on line 342 will crash at runtime for any multimodal model.

Proposed fix: merge into the first layer's dict (key 1)
         if is_first_stage_main_rank and self.is_multimodal:
             multimodal_state_dict = load_multimodal_components(pretrained_model_name_or_path)
-            layer_state_dicts[0].update(multimodal_state_dict)
+            layer_state_dicts[1].update(multimodal_state_dict)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Add multimodal components to state_dict. Since only support decoder model quantization,
# no changes will be made to the multimodal components. We copy the multimodal components
# from the pretrained model directly to the state_dict to avoid implementing the export logic.
if is_first_stage_main_rank and self.is_multimodal:
multimodal_state_dict = load_multimodal_components(pretrained_model_name_or_path)
layer_state_dicts[0].update(multimodal_state_dict)
# Add multimodal components to state_dict. Since only support decoder model quantization,
# no changes will be made to the multimodal components. We copy the multimodal components
# from the pretrained model directly to the state_dict to avoid implementing the export logic.
if is_first_stage_main_rank and self.is_multimodal:
multimodal_state_dict = load_multimodal_components(pretrained_model_name_or_path)
layer_state_dicts[1].update(multimodal_state_dict)
🤖 Prompt for AI Agents
In `@modelopt/torch/export/unified_export_megatron.py` around lines 337 - 342, The
code attempts to merge multimodal components into layer_state_dicts[0] but
_get_state_dict stores layers keyed by layer.layer_number (1-based), so
accessing 0 will KeyError; update the merge to target layer_state_dicts[1] (or
better: compute first_layer_key = 1 and use that) and guard by ensuring that key
exists (create an empty dict if missing) before calling update; adjust the block
around is_first_stage_main_rank and self.is_multimodal to call
load_multimodal_components(pretrained_model_name_or_path) and merge into
layer_state_dicts[first_layer_key] rather than layer_state_dicts[0].

Comment on lines +347 to +348
if is_last_stage_main_rank and self._hf_config is not None:
copy_remote_code(pretrained_model_name_or_path, save_directory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

copy_remote_code will crash when pretrained_model_name_or_path is a HuggingFace hub model ID.

copy_remote_code raises ValueError if the path is not a directory, but pretrained_model_name_or_path can be a HF hub model ID (e.g., "meta-llama/Llama-2-7b"). Guard with an is_dir() check or handle the non-local-path case:

Proposed fix
         if is_last_stage_main_rank and self._hf_config is not None:
-            copy_remote_code(pretrained_model_name_or_path, save_directory)
+            if Path(pretrained_model_name_or_path).is_dir():
+                copy_remote_code(pretrained_model_name_or_path, save_directory)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if is_last_stage_main_rank and self._hf_config is not None:
copy_remote_code(pretrained_model_name_or_path, save_directory)
if is_last_stage_main_rank and self._hf_config is not None:
if Path(pretrained_model_name_or_path).is_dir():
copy_remote_code(pretrained_model_name_or_path, save_directory)
🤖 Prompt for AI Agents
In `@modelopt/torch/export/unified_export_megatron.py` around lines 347 - 348,
Guard the call to copy_remote_code so it only runs for local directories: check
that pretrained_model_name_or_path is a local directory (e.g.,
Path(pretrained_model_name_or_path).is_dir()) before calling
copy_remote_code(save_directory) in the is_last_stage_main_rank &&
self._hf_config branch, or alternatively wrap copy_remote_code(...) in a
try/except ValueError to skip non-local HF hub model IDs like
"meta-llama/Llama-2-7b"; update the block with this guard around
copy_remote_code to avoid crashes.

@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.73%. Comparing base (3e95d9f) to head (757c915).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #881   +/-   ##
=======================================
  Coverage   73.73%   73.73%           
=======================================
  Files         199      199           
  Lines       21165    21165           
=======================================
  Hits        15606    15606           
  Misses       5559     5559           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

all_shard_files = sorted(set(safetensors_index["weight_map"].values()))
first_shard_file = all_shard_files[0] # e.g., "model-00001-of-00050.safetensors"

# Load multimodal components from the first shard file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq, is multimodel always in the first shard? Could it be in the first multiple shards?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. We will leave this to @yueshen2016 to improve.

json.dump(config_dict, f, indent=4)

save_safetensors(state_dict, save_directory)
# save_safetensors(state_dict, save_directory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you want to remove this.

@ChenhanYu ChenhanYu merged commit f9d9a71 into main Feb 12, 2026
37 checks passed
@ChenhanYu ChenhanYu deleted the chenhany/megatron_export_per_layer branch February 12, 2026 23:51
hychiang-git pushed a commit to eigen-ai-labs/Model-Optimizer-public that referenced this pull request Feb 16, 2026
## What does this PR do?

**Type of change:** ? <!-- Use one of the following: Bug fix, new
feature, new example, new tests, documentation. --> Bug fix

**Overview:** ?

1. Fixing megatron ignore module has additional `.` in the suffix
2. Change megatron export to safe per layer as a safetensor (avoid ghost
safetensors)

## Usage
<!-- You can potentially add a usage example below. -->

```python
# Add a code snippet demonstrating how to use this
```

## Testing
<!-- Mention how have you tested your change if applicable. -->

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->

## Additional Information
<!-- E.g. related issue. -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Export workflow now supports additional model components (EAGLE/Medusa
modules)
* Per-layer model state organization for improved checkpoint management

* **Bug Fixes**
* More robust Hugging Face configuration, tokenizer, and image processor
preservation
  * Enhanced multimodal component extraction and loading

* **Refactor**
* Optimized model export process with improved per-layer safetensors
handling

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
Signed-off-by: Hung-Yueh <hungyueh.chiang@gmail.com>
kevalmorabia97 pushed a commit that referenced this pull request Feb 20, 2026
## What does this PR do?

**Type of change:** ? <!-- Use one of the following: Bug fix, new
feature, new example, new tests, documentation. --> Bug fix

**Overview:** ?

1. Fixing megatron ignore module has additional `.` in the suffix
2. Change megatron export to safe per layer as a safetensor (avoid ghost
safetensors)

## Usage
<!-- You can potentially add a usage example below. -->

```python
# Add a code snippet demonstrating how to use this
```

## Testing
<!-- Mention how have you tested your change if applicable. -->

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->

## Additional Information
<!-- E.g. related issue. -->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Export workflow now supports additional model components (EAGLE/Medusa
modules)
* Per-layer model state organization for improved checkpoint management

* **Bug Fixes**
* More robust Hugging Face configuration, tokenizer, and image processor
preservation
  * Enhanced multimodal component extraction and loading

* **Refactor**
* Optimized model export process with improved per-layer safetensors
handling

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants