Missing weights not initialized properly #35437 by sambhavnoobcoder · Pull Request #35913 · huggingface/transformers

sambhavnoobcoder · 2025-01-27T17:32:47Z

Problem Statement

When using from_pretrained() to load a model with new parameters that weren't in the original saved model, these new parameters were not being properly initialized according to the model's _init_weights() method. Instead, they remained in their default PyTorch initialization state, sometimes resulting in NaN values.

Root Cause Analysis

The issue was identified in the _load_pretrained_model method where missing weights weren't being properly initialized when _fast_init=True (default behavior). This caused inconsistent behavior between direct model initialization and loading via from_pretrained().

Solution

Modified the _load_pretrained_model method to properly initialize missing weights using the model's _init_weights() method, regardless of the _fast_init setting. The solution maintains backward compatibility while ensuring consistent initialization behavior.

Implementation Details

Added weight initialization for missing keys after state dict loading
Implemented proper module hierarchy traversal for initialization
Maintained existing logging behavior

Testing Strategy

Created comprehensive test suite verifying:

Backward compatibility with existing weights
Consistent initialization behavior with/without _fast_init
Proper initialization of new weights
Original issue reproduction case

Test Results

Related Issues

Fixes Missing weights are not properly initialized when using model.from_pretrained() #35437

Rocketknight1 · 2025-01-27T18:47:57Z

Hi @sambhavnoobcoder, I see some failing tests in the CI! I think the cause is that in some cases, models have tied weights, meaning that the input embeddings and output projection are identical. In these cases, only one of those tensors may exist in the safetensors file. I think the problem is that this PR might overwrite / re-initialize the output weights in this case, but I'm not certain.

sambhavnoobcoder · 2025-02-02T19:13:05Z

@Rocketknight1 Thank you for pointing to the issue with tied weights. I've modified the initialization logic to detect tied parameters using id_tensor_storage and skip initialization if any weight in a tied group exists in loaded_keys. This should prevent reinitializing tied weights when their counterpart exists in the safetensors file. Please let me know if you think we need to handle this differently or if there are other edge cases to consider. I would appreciate your help in figuring out the CI tests a bit as well as if some other changes are needed from my end as well .

Rocketknight1 · 2025-02-05T16:52:15Z

Hi @sambhavnoobcoder debugging issues like this can be tricky. I suggest the following approach:

Install your local copy of transformers with pip install -e . in the transformers directory
pip install pytest parameterized and try running one of the failing tests, like this: pytest tests/models/paligemma/test_modeling_paligemma.py -k 'test_can_use_safetensors'
Once you can reproduce the failure on your local machine, add a breakpoint just before the failure occurs (or wrap the failing line in Try/Except: breakpoint() )
Use the breakpoint to figure out what's happening. You can also try running the same code on main and adding a breakpoint to see what's different
Try to figure out how the code changes caused the test to fail, and adjust the code accordingly! For example, if the error relates to an lm_head key, you can add a breakpoint in your code like if "lm_head" in key_name: breakpoint() or something like that. This will let you figure out why the behaviour changed in your PR, and that should hopefully help you diagnose the problem!
Push changes and see how the CI responds, and try new approaches until you can get the tests to pass.

This is quite advanced debugging, but it's unfortunately necessary sometimes. Good luck!

ArthurZucker

This is important IMO! but does not need a special additionnal file test!
I don't have my head in this part of the code so @Cyrilvallez if you want to have a look! 🤗

ArthurZucker · 2025-02-12T14:02:01Z

            )

+        # After loading weights, initialize missing ones properly
+        missing_keys = set(model.state_dict().keys()) - set(loaded_keys)


why don't we use the missing_keys defined above?

okay , addressed this in 566028f commit .

sambhavnoobcoder · 2025-02-13T08:04:56Z

as for tests in seperate file , added the seperate tests now in test_modeling_utils.py file only in commit 54dfcb4 . No changes required to thte tests after refactor .
Also , for anyone wanting to test the changes on a real model , feel free to use the follwing script :

from transformers import AutoModel, AutoConfig
import torch

def test_real_model_initialization():
    """Test initialization with a real model by adding a new classification layer"""
    
    # Create base model and save
    model = AutoModel.from_pretrained("bert-base-uncased")
    model.save_pretrained("./test-model")

    # Modify config to add new classification layer
    config = AutoConfig.from_pretrained("./test-model")
    
    class BertWithClassification(type(model)):
        def __init__(self, config):
            super().__init__(config)
            # Add a new classification layer
            self.classifier = torch.nn.Linear(config.hidden_size, 3)  # 3 classes
            
        def _init_weights(self, module):
            super()._init_weights(module)
            if isinstance(module, torch.nn.Linear):
                module.weight.data.normal_(mean=0.0, std=0.02)
                if module.bias is not None:
                    module.bias.data.zero_()

    # Load with new architecture - should initialize new layer properly
    new_model = BertWithClassification.from_pretrained("./test-model", config=config)

    # Verify no NaN values in new layer
    assert not torch.isnan(new_model.classifier.weight).any(), "NaN found in classifier weights"
    assert not torch.isnan(new_model.classifier.bias).any(), "NaN found in classifier bias"
    
    # Verify base model weights are preserved
    for name, param in new_model.named_parameters():
        if "classifier" not in name:  # Skip the new layer
            orig_param = model.get_parameter(name)
            assert torch.equal(param, orig_param), f"Original weights not preserved for {name}"

    print("Real model test passed successfully!")
    return new_model

if __name__ == "__main__":
    test_real_model_initialization()

Thanks for the reviews . i'll make any other changes required as well @ArthurZucker @Cyrilvallez

Cyrilvallez · 2025-02-14T14:05:02Z

Hey @sambhavnoobcoder! The logic of from_pretrained is currently very hard to understand. See here for the first iteration of a big refactor coming up. Currently, this part of the code is responsible for doing what you're talking about - initializing missing keys if using _fast_init. Could you please point me to the exact scenario where this does not work?

sambhavnoobcoder · 2025-02-15T15:32:07Z

Hey @Cyrilvallez ,
Thanks for the feedback. The issue occurs when loading a pretrained model that conditionally adds new layers (for example, when a flag like use_new is enabled). In these cases, the base model's parameters load correctly, but the newly added modules aren’t present in the checkpoint. This means that with _fast_init=True, those new modules aren’t identified for initialization, leaving their weights uninitialized (often resulting in NaNs).
To address this, I removed my previous extra initialization block and instead refined the existing logic in set_initialized_submodules(). The updated function now marks a module as initialized only if all its parameters are found in the loaded state dict; otherwise, it is flagged for proper initialization. This change ensures that new or partially-loaded submodules are correctly handled by our existing fast initialization workflow.
Additionally, I updated our test suite to cover these scenarios—ensuring that:
• Newly added layers are properly initialized,
• Tied weights remain correctly identified and preserved, and
• Partial initialization cases are handled as expected.
I believe these updates fully resolve the issue. Please let me know if you need any further details or adjustments!

sambhavnoobcoder · 2025-02-15T15:35:37Z

btw , i saw the large refactor , and the changes coming are truly awesome and much needed . however i think since you have a better perspective of the changes coming up than me , i would appreciate if you could look at my PR and tell me if i need to make any more changes to it to make it future proof to the upcoming changes .

Cyrilvallez · 2025-02-27T16:38:50Z

Hey, sorry for the late reply, we are actively working on improving the loading logic these days!
Do you mind sharing a code snippet to reproduce the issue so that I can have a better look? I'm not sure I understand exactly what corner case you are hitting!

Avelina9X · 2025-06-27T01:55:48Z

Any update on this PR? Still having issues with fresh weights added to existing modules not initialising correctly. Only solution was to revert to an older transformers version, as disabling fast init does not fix things.

Cyrilvallez · 2025-06-27T10:43:26Z

Hey @Avelina9X! I can look into it if you provide a min repro of the issue. BTW, _fast_init was removed 1-2 months ago, so it's expected not to solve the issue haha!

Avelina9X · 2025-06-28T17:26:02Z

HI @Cyrilvallez, thanks for the response but I managed to fix the issue.

My use case is VLM training, so I have a composite model where I have also injected additional layer types into the base language model. Turns out the new "smart apply" weight init system was at fault here, since the weight inits of the outer VLM get overwritten by the inner language model, meaning the new layers had no valid initialisation scheme in the newer transformer versions. I fixed this by simply overriding the intialize_weights() method to replicate the old self.apply() behaviour which uses only the outer most weight inits for all inner models and that solved everything.

For reference to anyone that might come across a similar problem I solved the issue by adding this in my VLM's pretrained model class:

    @torch.no_grad
    def initialize_weights( self ):
        # Ugly hack, but prevents the new smart apply system from breaking composite model inits
        self.apply( self._initialize_weights )

I do realise I should ideally change the _init_weights() method of the inner language model to account for my new layer types, but this isn't really feasible in my VLM as it is model agnostic, so I can't go around and modify that method for every potential LM that might be used with my VLM, I'm just relying on the fact that I'll be initialising my VLM using an already trained LM from the hub so skipping the composite init for the inner model isn't an issue.

Cyrilvallez · 2025-07-16T09:43:55Z

Closing this since that logic has evolved considerably since the original issue, and it's not relevant anymore! Feel free to reopen if you think there is still something to address! 🤗

SangbumChoi · 2025-11-12T01:41:41Z

model = AutoModelForObjectDetection.from_pretrained(
            "jozhang97/deta-swin-large",
            id2label=id2label,
            label2id=label2id,
            ignore_mismatched_sizes=True,
        )
for class_embed in model.class_embed:
      print("class_embed bias", class_embed.bias.data)
      print("class_embed weight", class_embed.weight.data)

@Cyrilvallez You might face that when we use the custom number of classes with id2label the initialization of classification head does not properly initialized (e.g. 1e34) (even though deta is deprecated this is just code snippet) cc. @qubvel @yonigozlan

sambhavnoobcoder and others added 10 commits January 27, 2025 22:30

Resolved missing weights

d05f1f0

initiate the libraries

2e000a6

initial setup

0602fff

test missing weights initialization

73e6b7c

test backward compatibilty

9e8d349

test initialization without fast initialization

a0a44c8

test missing weights initialization

6dc44f7

tail ender and teardown

c78b246

fix the final changes

fc7c40d

Merge branch 'huggingface:main' into Missing-Weights-Resolved

8b458f7

sambhavnoobcoder mentioned this pull request Jan 27, 2025

Missing weights are not properly initialized when using model.from_pretrained() #35437

Closed

4 tasks

sambhavnoobcoder added 2 commits February 3, 2025 00:13

fixed tied weight initiliasation

c9c12ae

test for tied weight initilastion

5a22efc

ArthurZucker reviewed Feb 12, 2025

View reviewed changes

sambhavnoobcoder added 3 commits February 13, 2025 13:26

use existing missing_keys from exisitng code

566028f

tests refactored in existig tests in test_modeling_utils.py

54dfcb4

removed seperate test for missing weights init

741cc1f

sambhavnoobcoder added 2 commits February 15, 2025 19:55

changing core logic to mdify existing submodules logic

1534e3f

modify tests for new implemenation

400a490

Cyrilvallez closed this Jul 16, 2025

Conversation

sambhavnoobcoder commented Jan 27, 2025

Problem Statement

Root Cause Analysis

Solution

Implementation Details

Testing Strategy

Test Results

Related Issues

Uh oh!

Rocketknight1 commented Jan 27, 2025

Uh oh!

sambhavnoobcoder commented Feb 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

sambhavnoobcoder Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

sambhavnoobcoder commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrilvallez commented Feb 14, 2025

Uh oh!

sambhavnoobcoder commented Feb 15, 2025

Uh oh!

sambhavnoobcoder commented Feb 15, 2025

Uh oh!

Cyrilvallez commented Feb 27, 2025

Uh oh!

Avelina9X commented Jun 27, 2025

Uh oh!

Cyrilvallez commented Jun 27, 2025

Uh oh!

Avelina9X commented Jun 28, 2025

Uh oh!

Cyrilvallez commented Jul 16, 2025

Uh oh!

SangbumChoi commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sambhavnoobcoder commented Feb 2, 2025 •

edited

Loading

Rocketknight1 commented Feb 5, 2025 •

edited

Loading

sambhavnoobcoder commented Feb 13, 2025 •

edited

Loading

SangbumChoi commented Nov 12, 2025 •

edited

Loading