Skip to content

[WIP] [Flaubert] Refactor output tracing to decorator-based interface#44116

Open
dtiourine wants to merge 1 commit intohuggingface:mainfrom
dtiourine:refactor/flaubert-output-tracing
Open

[WIP] [Flaubert] Refactor output tracing to decorator-based interface#44116
dtiourine wants to merge 1 commit intohuggingface:mainfrom
dtiourine:refactor/flaubert-output-tracing

Conversation

@dtiourine
Copy link
Copy Markdown

Migrate Flaubert to the @capture_outputs and @can_return_tuple decorator pattern for output handling, as part of #43979.

What does this PR do?

  • Add _can_record_outputs = {"attentions": MultiHeadAttention} on FlaubertPreTrainedModel
  • Apply @capture_outputs to FlaubertModel.forward
  • Apply @can_return_tuple to all 6 wrapper model forwards
  • Remove output_attentions, output_hidden_states, and return_dict from all forward signatures
  • Remove parameter resolution boilerplate and manual hidden_states/attentions collection loops
  • Standardize MultiHeadAttention.forward to always return (attn_output, attn_weights)
  • All wrapper models pass return_dict=True to self.transformer() to ensure the backbone always returns a dict internally

Notes

  • # Copied from XLM markers: Several classes/methods are marked as copied from XLM. This PR modifies them but does not update XLM (since I believe this is being migrated separately).

Known test failures

3 tests fail due to output_hidden_states not being supported:

  • test_attention_outputs
  • test_hidden_states_output
  • test_retain_grad_hidden_states_attentions

Flaubert doesn't have a unified layer class (e.g. FlaubertLayer). Instead it seems that the layer logic is inline in FlaubertModel.forward, using separate ModuleLists for attention, FFN, and layer norms. I couldn't find a single module to hook for hidden_states in _can_record_outputs.

For now I only included {"attentions": MultiHeadAttention}. Would appreciate feedback on the best approach here. For example, I was considering introducing a FlaubertLayer but wasn't sure if that was out of scope for this refactor or if there is another preferred pattern.

Contributes to #43979 (Flaubert portion)

- Add @capture_outputs to FlaubertModel and @can_return_tuple to wrapper models
- Remove manual output_attentions/output_hidden_states/return_dict handling
- Always return (attn_output, attn_weights) from MultiHeadAttention
- Only record attentions via _can_record_outputs (hidden_states needs FlaubertLayer)
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: flaubert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant