Merged
Conversation
…ividual layers _reduce_attrs returns ModuleList objects as single items; extending layers with them meant AC code never found self_attn/mlp on a ModuleList and silently skipped all checkpointing. Flatten any ModuleList results so layers contains individual decoder layers, matching the heuristic path. Signed-off-by: khazic <khazzz1c@gmail.com>
codecov/patch flagged #1941 at 14.28% (1/7 diff lines hit): every existing test mocks _extract_model_layers, so the new _extend_layers helper and the two modified call sites were unexecuted. Add six tests over the real function covering: class-keyed single FQN (GPT2), string-keyed arm (NemotronH name match), multi-FQN (Qwen2.5-VL), non-ModuleList element kept as a single entry (ModuleDict post-PP-split shape), and both ModuleList/ModuleDict fallback branches as regression guards. Uses Cls.__new__ + nn.Module.__init__ to produce instances whose type(model) matches the exact class in MODEL_CLS_TO_LAYERS (identity lookup — subclasses miss the dict) without HF's config-dependent __init__. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
5b2bafb to
390e3a8
Compare
Fix E741 and apply ruff format on the test file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Contributor
Author
|
/claude review |
Contributor
Author
|
/ok to test 523bd78 |
#1941 flattens each ModuleList returned from _reduce_attrs, so _extract_model_layers now yields individual decoder modules instead of the containing ModuleLists. Update the two fallback/None-safety assertions added in #1859 to isinstance-check the inner nn.Linear. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Contributor
Author
|
/ok to test ba3e9d7 |
akoumpa
approved these changes
Apr 21, 2026
linnanwang
pushed a commit
that referenced
this pull request
Apr 24, 2026
* fix: flatten ModuleList in _extract_model_layers so AC applies to individual layers _reduce_attrs returns ModuleList objects as single items; extending layers with them meant AC code never found self_attn/mlp on a ModuleList and silently skipped all checkpointing. Flatten any ModuleList results so layers contains individual decoder layers, matching the heuristic path. Signed-off-by: khazic <khazzz1c@gmail.com> * test: cover _extract_model_layers flatten branches codecov/patch flagged #1941 at 14.28% (1/7 diff lines hit): every existing test mocks _extract_model_layers, so the new _extend_layers helper and the two modified call sites were unexecuted. Add six tests over the real function covering: class-keyed single FQN (GPT2), string-keyed arm (NemotronH name match), multi-FQN (Qwen2.5-VL), non-ModuleList element kept as a single entry (ModuleDict post-PP-split shape), and both ModuleList/ModuleDict fallback branches as regression guards. Uses Cls.__new__ + nn.Module.__init__ to produce instances whose type(model) matches the exact class in MODEL_CLS_TO_LAYERS (identity lookup — subclasses miss the dict) without HF's config-dependent __init__. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * style: ruff format + rename ambiguous `l` loop var Fix E741 and apply ruff format on the test file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * test(qwen3_5): expect flattened per-layer modules after #1941 #1941 flattens each ModuleList returned from _reduce_attrs, so _extract_model_layers now yields individual decoder modules instead of the containing ModuleLists. Update the two fallback/None-safety assertions added in #1859 to isinstance-check the inner nn.Linear. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> --------- Signed-off-by: khazic <khazzz1c@gmail.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Co-authored-by: khazic <khazzz1c@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Changelog
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information