`NotImplementedError: aten::equal` on meta tensors during multi-GPU model init with transformers >= 5.4.0

**Describe the bug**

Issue discovered while working on https://github.com/NVIDIA-NeMo/RL/issues/2212

When loading an HF model with `tie_word_embeddings=True` (e.g., `Qwen/Qwen3-0.6B`) on multi-GPU, model initialization crashes with:

```
NotImplementedError: aten::equal: attempted to run this operator with Meta tensors,
but there was no fake impl or Meta kernel registered.
```

The crash occurs because `_build_model` wraps the entire `_init_model` call — including HF's `from_pretrained` — inside an `init_empty_weights()` context (meta device). This means that by the time HF's `_finalize_model_loading` calls `tie_weights(missing_keys=...)`, the model parameters are still meta tensors. Transformers v5.4.0 added a `torch.equal()` call inside `tie_weights` to compare tied parameter values ([HF PR #44497](https://github.com/huggingface/transformers/pull/44497)), and `torch.equal` does not support meta tensors.

### Call chain

```
_build_model (auto_model.py:359)
  with [no_init_weights(), init_empty_weights()]:     ← meta device context wraps everything
    _init_model (model_init.py:396)
      _from_pretrained_parent_class (auto_model.py:205)
        HF AutoModelForCausalLM.from_pretrained
          model.__init__()                             ← meta tensors created here
          _load_pretrained_model()                     ← weights loaded, but STILL META (inside init_empty_weights)
          _finalize_model_loading (modeling_utils.py:4290)
            tie_weights(missing_keys=...)
              torch.equal(source_param, target_param)  ← CRASH: meta tensors don't support this
```


**Steps/Code to reproduce bug**

Run the existing `qwen3_0p6b_hellaswag.yaml` SFT recipe on multiple GPUs. 
```bash
automodel examples/llm_finetune/qwen/qwen3_0p6b_hellaswag.yaml --nproc-per-node 2
```
Impact seems to be: Any model with `tie_word_embeddings=True` in its `config.json` will trigger this when loaded via the HF fallback path (i.e., not a custom-registered model) on multi-GPU.



**Additional context**

Error log from my reproduce with automodel sft on dfw `/lustre/fsw/portfolios/coreai/users/shuangy/src/NeMo-RL/nemo-rl/meta_tensor_issue_reproduce.log`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`NotImplementedError: aten::equal` on meta tensors during multi-GPU model init with transformers >= 5.4.0 #1765

Call chain

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NotImplementedError: aten::equal on meta tensors during multi-GPU model init with transformers >= 5.4.0 #1765

Description

Call chain

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`NotImplementedError: aten::equal` on meta tensors during multi-GPU model init with transformers >= 5.4.0 #1765