[WIP] [Flaubert] Refactor output tracing to decorator-based interface by dtiourine · Pull Request #44116 · huggingface/transformers

dtiourine · 2026-02-17T21:52:13Z

Migrate Flaubert to the @capture_outputs and @can_return_tuple decorator pattern for output handling, as part of #43979.

What does this PR do?

Add _can_record_outputs = {"attentions": MultiHeadAttention} on FlaubertPreTrainedModel
Apply @capture_outputs to FlaubertModel.forward
Apply @can_return_tuple to all 6 wrapper model forwards
Remove output_attentions, output_hidden_states, and return_dict from all forward signatures
Remove parameter resolution boilerplate and manual hidden_states/attentions collection loops
Standardize MultiHeadAttention.forward to always return (attn_output, attn_weights)
All wrapper models pass return_dict=True to self.transformer() to ensure the backbone always returns a dict internally

Notes

# Copied from XLM markers: Several classes/methods are marked as copied from XLM. This PR modifies them but does not update XLM (since I believe this is being migrated separately).

Known test failures

3 tests fail due to output_hidden_states not being supported:

test_attention_outputs
test_hidden_states_output
test_retain_grad_hidden_states_attentions

Flaubert doesn't have a unified layer class (e.g. FlaubertLayer). Instead it seems that the layer logic is inline in FlaubertModel.forward, using separate ModuleLists for attention, FFN, and layer norms. I couldn't find a single module to hook for hidden_states in _can_record_outputs.

For now I only included {"attentions": MultiHeadAttention}. Would appreciate feedback on the best approach here. For example, I was considering introducing a FlaubertLayer but wasn't sure if that was out of scope for this refactor or if there is another preferred pattern.

Contributes to #43979 (Flaubert portion)

- Add @capture_outputs to FlaubertModel and @can_return_tuple to wrapper models - Remove manual output_attentions/output_hidden_states/return_dict handling - Always return (attn_output, attn_weights) from MultiHeadAttention - Only record attentions via _can_record_outputs (hidden_states needs FlaubertLayer)

github-actions · 2026-02-17T21:53:23Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: flaubert

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [Flaubert] Refactor output tracing to decorator-based interface#44116

[WIP] [Flaubert] Refactor output tracing to decorator-based interface#44116
dtiourine wants to merge 1 commit intohuggingface:mainfrom
dtiourine:refactor/flaubert-output-tracing

dtiourine commented Feb 17, 2026

Uh oh!

github-actions Bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dtiourine commented Feb 17, 2026

What does this PR do?

Notes

Known test failures

Uh oh!

github-actions Bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant