Skip to content

[GPT2] Refactor output tracing to use capture_outputs/can_return_tuple decorators#44059

Open
lakprigan wants to merge 1 commit intohuggingface:mainfrom
lakprigan:refactor-gpt2-output-tracing
Open

[GPT2] Refactor output tracing to use capture_outputs/can_return_tuple decorators#44059
lakprigan wants to merge 1 commit intohuggingface:mainfrom
lakprigan:refactor-gpt2-output-tracing

Conversation

@lakprigan
Copy link
Copy Markdown

Summary

Migrates GPT2 to the standardized output collection interface as part of #43979.

  • Added _can_record_outputs to GPT2PreTrainedModel (including cross_attentions via OutputRecorder targeting the crossattention submodule)
  • Added @capture_outputs on GPT2Model.forward()
  • Added @can_return_tuple on all wrapper model forwards (GPT2LMHeadModel, GPT2DoubleHeadsModel, GPT2ForSequenceClassification, GPT2ForTokenClassification, GPT2ForQuestionAnswering)
  • Removed manual output_attentions, output_hidden_states, and return_dict handling from all forward methods
  • GPT2Block.forward() now returns a single torch.Tensor instead of a tuple
  • GPT2Attention.forward() always returns (attn_output, attn_weights) — hooks capture weights when needed

Net reduction: ~89 lines of boilerplate removed (44 insertions, 133 deletions).

Testing

All 136 non-compile GPT2 tests pass. The 2 torch.compile test failures (test_generate_compilation_all_outputs, test_generate_compile_model_forward_fullgraph) are pre-existing environment-specific issues (arm64 torch inductor) and also fail on unmodified main.

References

Used llama, nllb_moe, and t5gemma as reference implementations for the cross-attention OutputRecorder pattern.

…decorators

Part of huggingface#43979. Migrates GPT2 to the standardized output collection
interface, removing ~89 lines of manual output_attentions,
output_hidden_states, and return_dict boilerplate.

Changes:
- Add _can_record_outputs to GPT2PreTrainedModel (including
  cross_attentions via OutputRecorder)
- Add @capture_outputs on GPT2Model.forward()
- Add @can_return_tuple on all wrapper model forwards
- GPT2Block returns a single tensor instead of a tuple
- GPT2Attention always returns (attn_output, attn_weights)
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt2

@lakprigan
Copy link
Copy Markdown
Author

@molbap This GPT2 refactor is ready for review per #43979. All 136 non-compile tests pass. The 2 torch.compile failures are pre-existing on main (arm64 inductor)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant