Conversation
Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
44e6e0b to
757c915
Compare
📝 WalkthroughWalkthroughIntroduces new Hugging Face checkpoint utilities ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@modelopt/torch/export/plugins/hf_checkpoint_utils.py`:
- Around line 91-118: The current code only reads the first shard file to
populate multimodal_state_dict using multimodal_keys_in_shard; instead, iterate
the safetensors_index["weight_map"] to find all keys whose weight_map value
exists and whose key startswith ("multi_modal_projector", "vision_model"), group
those keys by their shard filename, then for each shard
(Path(hf_checkpoint_path)/shard_name) open it with safe_open and load only the
mapped keys into multimodal_state_dict (use f.get_tensor(key)); ensure you
handle duplicate shard filenames, preserve progress reporting (tqdm) and fall
back gracefully if a mapped shard file is missing.
In `@modelopt/torch/export/unified_export_megatron.py`:
- Around line 228-245: The save_pretrained_extra_modules function may write
model.safetensors into a non-existent save_directory; ensure the directory
exists before any writes by creating the directory (e.g., call
os.makedirs(save_directory, exist_ok=True) or
Path(save_directory).mkdir(parents=True, exist_ok=True)) at the start of
save_pretrained_extra_modules, and use os.path.join(save_directory,
"model.safetensors") when calling save_file instead of string concatenation;
keep the _hf_extra_config.save_pretrained call and torch.distributed.barrier
as-is.
- Around line 347-348: Guard the call to copy_remote_code so it only runs for
local directories: check that pretrained_model_name_or_path is a local directory
(e.g., Path(pretrained_model_name_or_path).is_dir()) before calling
copy_remote_code(save_directory) in the is_last_stage_main_rank &&
self._hf_config branch, or alternatively wrap copy_remote_code(...) in a
try/except ValueError to skip non-local HF hub model IDs like
"meta-llama/Llama-2-7b"; update the block with this guard around
copy_remote_code to avoid crashes.
- Around line 337-342: The code attempts to merge multimodal components into
layer_state_dicts[0] but _get_state_dict stores layers keyed by
layer.layer_number (1-based), so accessing 0 will KeyError; update the merge to
target layer_state_dicts[1] (or better: compute first_layer_key = 1 and use
that) and guard by ensuring that key exists (create an empty dict if missing)
before calling update; adjust the block around is_first_stage_main_rank and
self.is_multimodal to call
load_multimodal_components(pretrained_model_name_or_path) and merge into
layer_state_dicts[first_layer_key] rather than layer_state_dicts[0].
🧹 Nitpick comments (5)
modelopt/torch/export/plugins/mcore_custom.py (2)
277-328: Per-layer save/index assembly looks correct for the non-multimodal path.The writing loop iterates over dict keys (1-based layer numbers) and the index-assembly loop iterates
range(total_layers)with+1offset, so filenames match. The TODO on line 311 about replacing the global barrier is noted.One minor concern: if
layer_state_dictsis ever empty or has gaps (i.e. a layer index is missing), rank 0 will crash withFileNotFoundErrorwhen reading the per-layer.jsonin lines 319-324. Consider adding a guard or logging if a shard file is missing.
303-309: PreferPath/ operator over string concatenation for path construction.Lines 303 and 309 use
save_directory + "/" + meta_filenameand similar patterns. While consistent with the oldersave_safetensorsfunction above, usingPath(save_directory) / meta_filenameis more robust (handles trailing slashes, OS differences). Not blocking, but worth aligning with thePathusage already present elsewhere in the codebase.modelopt/torch/export/plugins/vllm_fakequant_megatron.py (1)
75-86: Dead guard:pretrained_model_name_or_path is not Noneis alwaysTruenow.Since
pretrained_model_name_or_pathis now required (str | os.PathLike), theis not Nonecheck on line 79 is alwaysTrue, making the assert equivalent toassert not self.is_multimodal. Consider simplifying for clarity:Suggested simplification
- assert not (self.is_multimodal and pretrained_model_name_or_path is not None), ( - "Exporting weights in bf16 and amax values is not supported for multimodal models " - "when pretrained_model_name_or_path is not None" - ) + assert not self.is_multimodal, ( + "Exporting weights in bf16 and amax values is not supported for multimodal models" + )modelopt/torch/export/plugins/hf_checkpoint_utils.py (1)
81-89: Uselogginginstead ofMultiple
print()calls throughout this module (lines 81, 92, 102, 111, 117, 120, 122) should uselogging.info/logging.warningfor consistency with library conventions and to allow callers to control verbosity.Also applies to: 92-92, 102-102, 111-112, 117-117, 120-120, 122-122
modelopt/torch/export/unified_export_megatron.py (1)
388-418: Per-layer state dict tracking via shared reference — last layer accumulates final norm and output layer.The logic on lines 405-407 assigns
self._state_dictby reference to_layer_state_dicts[layer.layer_number], then resetsself._state_dictfor all layers except the last. This means subsequent writes (final_layernorm, output_layer on lines 410-418) mutate the last layer's entry in-place. This is correct but subtle — a comment noting this intentional aliasing would help future readers.
| elif safetensors_index_file.is_file(): | ||
| print(f"Loading multimodal components from sharded model: {hf_checkpoint_path}") | ||
| with open(safetensors_index_file) as f: | ||
| safetensors_index = json.load(f) | ||
|
|
||
| # For multimodal models, vision_model and multi_modal_projector are in the first shard | ||
| all_shard_files = sorted(set(safetensors_index["weight_map"].values())) | ||
| first_shard_file = all_shard_files[0] # e.g., "model-00001-of-00050.safetensors" | ||
|
|
||
| # Load multimodal components from the first shard file | ||
| safetensors_filepath = Path(hf_checkpoint_path) / first_shard_file | ||
| print(f"Loading multimodal components from {first_shard_file}") | ||
|
|
||
| with safe_open(safetensors_filepath, framework="pt") as f: | ||
| shard_keys = list(f.keys()) | ||
| multimodal_keys_in_shard = [ | ||
| k for k in shard_keys if k.startswith(("multi_modal_projector", "vision_model")) | ||
| ] | ||
|
|
||
| if multimodal_keys_in_shard: | ||
| print( | ||
| f"Found {len(multimodal_keys_in_shard)} multimodal tensors in {first_shard_file}" | ||
| ) | ||
| for key in tqdm(multimodal_keys_in_shard, desc="Loading multimodal tensors"): | ||
| multimodal_state_dict[key] = f.get_tensor(key) | ||
| else: | ||
| print(f"No multimodal components found in {first_shard_file}") | ||
|
|
There was a problem hiding this comment.
Multimodal components may span multiple shards — only loading the first shard is fragile.
The code assumes all multimodal tensors reside in the first shard (line 96-98), but model.safetensors.index.json already provides the weight_map that maps each key to its exact shard file. If a model's multimodal weights are split across shards, this silently returns an incomplete state dict.
Suggested fix: use the weight_map to load from the correct shards
elif safetensors_index_file.is_file():
print(f"Loading multimodal components from sharded model: {hf_checkpoint_path}")
with open(safetensors_index_file) as f:
safetensors_index = json.load(f)
- # For multimodal models, vision_model and multi_modal_projector are in the first shard
- all_shard_files = sorted(set(safetensors_index["weight_map"].values()))
- first_shard_file = all_shard_files[0] # e.g., "model-00001-of-00050.safetensors"
-
- # Load multimodal components from the first shard file
- safetensors_filepath = Path(hf_checkpoint_path) / first_shard_file
- print(f"Loading multimodal components from {first_shard_file}")
-
- with safe_open(safetensors_filepath, framework="pt") as f:
- shard_keys = list(f.keys())
- multimodal_keys_in_shard = [
- k for k in shard_keys if k.startswith(("multi_modal_projector", "vision_model"))
- ]
-
- if multimodal_keys_in_shard:
- print(
- f"Found {len(multimodal_keys_in_shard)} multimodal tensors in {first_shard_file}"
- )
- for key in tqdm(multimodal_keys_in_shard, desc="Loading multimodal tensors"):
- multimodal_state_dict[key] = f.get_tensor(key)
- else:
- print(f"No multimodal components found in {first_shard_file}")
+ # Find shards that contain multimodal keys using the weight_map
+ multimodal_shard_keys = {}
+ for key, shard_file in safetensors_index["weight_map"].items():
+ if key.startswith(("multi_modal_projector", "vision_model")):
+ multimodal_shard_keys.setdefault(shard_file, []).append(key)
+
+ for shard_file, keys in multimodal_shard_keys.items():
+ safetensors_filepath = Path(hf_checkpoint_path) / shard_file
+ print(f"Loading {len(keys)} multimodal tensors from {shard_file}")
+ with safe_open(safetensors_filepath, framework="pt") as f:
+ for key in tqdm(keys, desc=f"Loading from {shard_file}"):
+ multimodal_state_dict[key] = f.get_tensor(key)
+
+ if not multimodal_shard_keys:
+ print("No multimodal components found in weight_map")🤖 Prompt for AI Agents
In `@modelopt/torch/export/plugins/hf_checkpoint_utils.py` around lines 91 - 118,
The current code only reads the first shard file to populate
multimodal_state_dict using multimodal_keys_in_shard; instead, iterate the
safetensors_index["weight_map"] to find all keys whose weight_map value exists
and whose key startswith ("multi_modal_projector", "vision_model"), group those
keys by their shard filename, then for each shard
(Path(hf_checkpoint_path)/shard_name) open it with safe_open and load only the
mapped keys into multimodal_state_dict (use f.get_tensor(key)); ensure you
handle duplicate shard filenames, preserve progress reporting (tqdm) and fall
back gracefully if a mapped shard file is missing.
| def save_pretrained_extra_modules( | ||
| self, | ||
| save_directory: str | os.PathLike, | ||
| ): | ||
| """Save a EAGLE or Medusa checkpoints which can be deployed by vLLM and TensorRT-LLM.""" | ||
| # We use the last PP rank to write the config because | ||
| # medusa_heads and eagle_module only exist in the last stage. | ||
| pp_rank = get_pipeline_model_parallel_rank() | ||
| pp_size = get_pipeline_model_parallel_world_size() | ||
| is_last_stage_main_rank = pp_rank == pp_size - 1 | ||
|
|
||
| state_dict = self.extra_state_dict | ||
|
|
||
| if is_last_stage_main_rank and self._hf_extra_config is not None: | ||
| self._hf_extra_config.save_pretrained(save_directory) | ||
| save_file(state_dict, save_directory + "/model.safetensors", metadata={"format": "pt"}) | ||
|
|
||
| torch.distributed.barrier() |
There was a problem hiding this comment.
save_pretrained_extra_modules does not create save_directory.
Unlike save_pretrained (where _hf_config.save_pretrained(save_directory) implicitly creates the directory), save_pretrained_extra_modules writes directly to save_directory + "/model.safetensors" without ensuring the directory exists. This will fail with FileNotFoundError if export_dir doesn't already exist.
Proposed fix
def save_pretrained_extra_modules(
self,
save_directory: str | os.PathLike,
):
"""Save a EAGLE or Medusa checkpoints which can be deployed by vLLM and TensorRT-LLM."""
+ os.makedirs(save_directory, exist_ok=True)
# We use the last PP rank to write the config because🤖 Prompt for AI Agents
In `@modelopt/torch/export/unified_export_megatron.py` around lines 228 - 245, The
save_pretrained_extra_modules function may write model.safetensors into a
non-existent save_directory; ensure the directory exists before any writes by
creating the directory (e.g., call os.makedirs(save_directory, exist_ok=True) or
Path(save_directory).mkdir(parents=True, exist_ok=True)) at the start of
save_pretrained_extra_modules, and use os.path.join(save_directory,
"model.safetensors") when calling save_file instead of string concatenation;
keep the _hf_extra_config.save_pretrained call and torch.distributed.barrier
as-is.
| # Add multimodal components to state_dict. Since only support decoder model quantization, | ||
| # no changes will be made to the multimodal components. We copy the multimodal components | ||
| # from the pretrained model directly to the state_dict to avoid implementing the export logic. | ||
| if is_first_stage_main_rank and self.is_multimodal: | ||
| multimodal_state_dict = load_multimodal_components(pretrained_model_name_or_path) | ||
| layer_state_dicts[0].update(multimodal_state_dict) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the file structure and read the relevant sections
cat -n modelopt/torch/export/unified_export_megatron.py | sed -n '375n;375,420p'Repository: NVIDIA/Model-Optimizer
Length of output: 2165
🏁 Script executed:
# Search for layer_number assignments to understand the numbering scheme
rg 'layer\.layer_number|layer_number\s*=' --context=3Repository: NVIDIA/Model-Optimizer
Length of output: 9299
🏁 Script executed:
# Check if there's any initialization of layer_state_dicts with key 0
rg 'layer_state_dicts\[0\]' modelopt/torch/export/unified_export_megatron.pyRepository: NVIDIA/Model-Optimizer
Length of output: 128
🏁 Script executed:
# Look at the property definition for layer_state_dicts
rg -A5 'def layer_state_dicts' modelopt/torch/export/unified_export_megatron.pyRepository: NVIDIA/Model-Optimizer
Length of output: 233
layer_state_dicts[0] will raise KeyError — layer indices are 1-based.
The _get_state_dict method stores layer state dicts using layer.layer_number as keys (line 405), which is 1-based and ranges from 1 to num_layers. Accessing key 0 on line 342 will crash at runtime for any multimodal model.
Proposed fix: merge into the first layer's dict (key 1)
if is_first_stage_main_rank and self.is_multimodal:
multimodal_state_dict = load_multimodal_components(pretrained_model_name_or_path)
- layer_state_dicts[0].update(multimodal_state_dict)
+ layer_state_dicts[1].update(multimodal_state_dict)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Add multimodal components to state_dict. Since only support decoder model quantization, | |
| # no changes will be made to the multimodal components. We copy the multimodal components | |
| # from the pretrained model directly to the state_dict to avoid implementing the export logic. | |
| if is_first_stage_main_rank and self.is_multimodal: | |
| multimodal_state_dict = load_multimodal_components(pretrained_model_name_or_path) | |
| layer_state_dicts[0].update(multimodal_state_dict) | |
| # Add multimodal components to state_dict. Since only support decoder model quantization, | |
| # no changes will be made to the multimodal components. We copy the multimodal components | |
| # from the pretrained model directly to the state_dict to avoid implementing the export logic. | |
| if is_first_stage_main_rank and self.is_multimodal: | |
| multimodal_state_dict = load_multimodal_components(pretrained_model_name_or_path) | |
| layer_state_dicts[1].update(multimodal_state_dict) |
🤖 Prompt for AI Agents
In `@modelopt/torch/export/unified_export_megatron.py` around lines 337 - 342, The
code attempts to merge multimodal components into layer_state_dicts[0] but
_get_state_dict stores layers keyed by layer.layer_number (1-based), so
accessing 0 will KeyError; update the merge to target layer_state_dicts[1] (or
better: compute first_layer_key = 1 and use that) and guard by ensuring that key
exists (create an empty dict if missing) before calling update; adjust the block
around is_first_stage_main_rank and self.is_multimodal to call
load_multimodal_components(pretrained_model_name_or_path) and merge into
layer_state_dicts[first_layer_key] rather than layer_state_dicts[0].
| if is_last_stage_main_rank and self._hf_config is not None: | ||
| copy_remote_code(pretrained_model_name_or_path, save_directory) |
There was a problem hiding this comment.
copy_remote_code will crash when pretrained_model_name_or_path is a HuggingFace hub model ID.
copy_remote_code raises ValueError if the path is not a directory, but pretrained_model_name_or_path can be a HF hub model ID (e.g., "meta-llama/Llama-2-7b"). Guard with an is_dir() check or handle the non-local-path case:
Proposed fix
if is_last_stage_main_rank and self._hf_config is not None:
- copy_remote_code(pretrained_model_name_or_path, save_directory)
+ if Path(pretrained_model_name_or_path).is_dir():
+ copy_remote_code(pretrained_model_name_or_path, save_directory)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if is_last_stage_main_rank and self._hf_config is not None: | |
| copy_remote_code(pretrained_model_name_or_path, save_directory) | |
| if is_last_stage_main_rank and self._hf_config is not None: | |
| if Path(pretrained_model_name_or_path).is_dir(): | |
| copy_remote_code(pretrained_model_name_or_path, save_directory) |
🤖 Prompt for AI Agents
In `@modelopt/torch/export/unified_export_megatron.py` around lines 347 - 348,
Guard the call to copy_remote_code so it only runs for local directories: check
that pretrained_model_name_or_path is a local directory (e.g.,
Path(pretrained_model_name_or_path).is_dir()) before calling
copy_remote_code(save_directory) in the is_last_stage_main_rank &&
self._hf_config branch, or alternatively wrap copy_remote_code(...) in a
try/except ValueError to skip non-local HF hub model IDs like
"meta-llama/Llama-2-7b"; update the block with this guard around
copy_remote_code to avoid crashes.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #881 +/- ##
=======================================
Coverage 73.73% 73.73%
=======================================
Files 199 199
Lines 21165 21165
=======================================
Hits 15606 15606
Misses 5559 5559 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| all_shard_files = sorted(set(safetensors_index["weight_map"].values())) | ||
| first_shard_file = all_shard_files[0] # e.g., "model-00001-of-00050.safetensors" | ||
|
|
||
| # Load multimodal components from the first shard file |
There was a problem hiding this comment.
qq, is multimodel always in the first shard? Could it be in the first multiple shards?
There was a problem hiding this comment.
Good question. We will leave this to @yueshen2016 to improve.
| json.dump(config_dict, f, indent=4) | ||
|
|
||
| save_safetensors(state_dict, save_directory) | ||
| # save_safetensors(state_dict, save_directory) |
There was a problem hiding this comment.
In case you want to remove this.
## What does this PR do? **Type of change:** ? <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> Bug fix **Overview:** ? 1. Fixing megatron ignore module has additional `.` in the suffix 2. Change megatron export to safe per layer as a safetensor (avoid ghost safetensors) ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Export workflow now supports additional model components (EAGLE/Medusa modules) * Per-layer model state organization for improved checkpoint management * **Bug Fixes** * More robust Hugging Face configuration, tokenizer, and image processor preservation * Enhanced multimodal component extraction and loading * **Refactor** * Optimized model export process with improved per-layer safetensors handling <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Chenhan Yu <chenhany@nvidia.com> Signed-off-by: Hung-Yueh <hungyueh.chiang@gmail.com>
## What does this PR do? **Type of change:** ? <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> Bug fix **Overview:** ? 1. Fixing megatron ignore module has additional `.` in the suffix 2. Change megatron export to safe per layer as a safetensor (avoid ghost safetensors) ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Export workflow now supports additional model components (EAGLE/Medusa modules) * Per-layer model state organization for improved checkpoint management * **Bug Fixes** * More robust Hugging Face configuration, tokenizer, and image processor preservation * Enhanced multimodal component extraction and loading * **Refactor** * Optimized model export process with improved per-layer safetensors handling <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
What does this PR do?
Type of change: ? Bug fix
Overview: ?
.in the suffixUsage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Refactor