Skip to content

NotImplementedError: aten::equal on meta tensors during multi-GPU model init with transformers >= 5.4.0 #1765

@sharonyu-115

Description

@sharonyu-115

Describe the bug

Issue discovered while working on NVIDIA-NeMo/RL#2212

When loading an HF model with tie_word_embeddings=True (e.g., Qwen/Qwen3-0.6B) on multi-GPU, model initialization crashes with:

NotImplementedError: aten::equal: attempted to run this operator with Meta tensors,
but there was no fake impl or Meta kernel registered.

The crash occurs because _build_model wraps the entire _init_model call — including HF's from_pretrained — inside an init_empty_weights() context (meta device). This means that by the time HF's _finalize_model_loading calls tie_weights(missing_keys=...), the model parameters are still meta tensors. Transformers v5.4.0 added a torch.equal() call inside tie_weights to compare tied parameter values (HF PR #44497), and torch.equal does not support meta tensors.

Call chain

_build_model (auto_model.py:359)
  with [no_init_weights(), init_empty_weights()]:     ← meta device context wraps everything
    _init_model (model_init.py:396)
      _from_pretrained_parent_class (auto_model.py:205)
        HF AutoModelForCausalLM.from_pretrained
          model.__init__()                             ← meta tensors created here
          _load_pretrained_model()                     ← weights loaded, but STILL META (inside init_empty_weights)
          _finalize_model_loading (modeling_utils.py:4290)
            tie_weights(missing_keys=...)
              torch.equal(source_param, target_param)  ← CRASH: meta tensors don't support this

Steps/Code to reproduce bug

Run the existing qwen3_0p6b_hellaswag.yaml SFT recipe on multiple GPUs.

automodel examples/llm_finetune/qwen/qwen3_0p6b_hellaswag.yaml --nproc-per-node 2

Impact seems to be: Any model with tie_word_embeddings=True in its config.json will trigger this when loaded via the HF fallback path (i.e., not a custom-registered model) on multi-GPU.

Additional context

Error log from my reproduce with automodel sft on dfw /lustre/fsw/portfolios/coreai/users/shuangy/src/NeMo-RL/nemo-rl/meta_tensor_issue_reproduce.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions