Cast multimodal forward_kwargs to compute dtype for bf16/fp16 training by akshan-main · Pull Request #5073 · huggingface/trl

akshan-main · 2026-02-11T05:11:07Z

What does this PR do?

When training VLMs with bf16=True or fp16=True, pixel_values returned by the processor stay float32 after _prepare_inputs (dtype casting is DeepSpeed-specific).
If the vision encoder weights are bfloat16/float16, this can crash in torch.layer_norm with:

RuntimeError: expected scalar type BFloat16 but found Float

This is the next failure reported in the #4451 thread after the prompt-format TypeError.

Note: Fixed by casting floating-point tensors in forward_kwargs to the compute dtype when bf16=True or fp16=True. This is consistent with how the trainer already handles model dtype casting. If the model is loaded in bf16 via torch_dtype without setting the training flag, this path won't trigger, but neither does the existing model casting.

Changes made

In the multimodal path, cast only floating-point tensors in forward_kwargs to the active compute dtype (bf16/fp16).
Leave non-floating tensors (for example image_grid_thw) unchanged.

No prompt-format behavior changes (proposed in #5064 and #5067)
No reward-function behavior changes (proposed in #5064)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@qgallouedec @kashif @albertvillanova

akshan-main · 2026-02-11T19:38:27Z

@qgallouedec hey mate! Can this be reviewed? Let me know if you want any changes.

akshan-main · 2026-02-14T23:11:13Z

@codex review

qgallouedec · 2026-02-16T17:19:18Z

tests/test_grpo_trainer.py

+def test_forward_kwargs_dtype_casting():
+    forward_kwargs = {
+        "pixel_values": torch.randn(1, 3, 224, 224, dtype=torch.float32),
+        "image_grid_thw": torch.tensor([[1, 14, 14]]),
+    }
+
+    for bf16, fp16, expected_dtype in [
+        (True, False, torch.bfloat16),
+        (False, True, torch.float16),
+        (False, False, None),
+    ]:
+        if bf16:
+            compute_dtype = torch.bfloat16
+        elif fp16:
+            compute_dtype = torch.float16
+        else:
+            compute_dtype = None
+
+        if compute_dtype is not None:
+            result = {
+                k: v.to(compute_dtype) if isinstance(v, torch.Tensor) and torch.is_floating_point(v) else v
+                for k, v in forward_kwargs.items()
+            }
+        else:
+            result = forward_kwargs
+
+        if expected_dtype is not None:
+            assert result["pixel_values"].dtype == expected_dtype
+        else:
+            assert result["pixel_values"].dtype == torch.float32
+        assert result["image_grid_thw"].dtype == torch.int64
+
+


what are we testing here? just that the to works? if so I think it's out of the scope the trl tests

You're right, I'll drop the test.

qgallouedec reviewed Feb 16, 2026

View reviewed changes

Cast multimodal forward_kwargs to compute dtype for bf16/fp16 training

68b0c28

akshan-main force-pushed the fix_grpo_vlm_pixel_dtype branch from 697ba32 to 68b0c28 Compare February 16, 2026 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cast multimodal forward_kwargs to compute dtype for bf16/fp16 training#5073

Cast multimodal forward_kwargs to compute dtype for bf16/fp16 training#5073
akshan-main wants to merge 1 commit intohuggingface:mainfrom
akshan-main:fix_grpo_vlm_pixel_dtype

akshan-main commented Feb 11, 2026 •

edited

Loading

Uh oh!

akshan-main commented Feb 11, 2026 •

edited

Loading

Uh oh!

akshan-main commented Feb 14, 2026

Uh oh!

qgallouedec Feb 16, 2026

Uh oh!

akshan-main Feb 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

akshan-main commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes made

Before submitting

Who can review?

Uh oh!

akshan-main commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akshan-main commented Feb 14, 2026

Uh oh!

qgallouedec Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

akshan-main Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

akshan-main commented Feb 11, 2026 •

edited

Loading

akshan-main commented Feb 11, 2026 •

edited

Loading

akshan-main Feb 16, 2026 •

edited

Loading