Refactor GPT-Neo to use `@capture_outputs` and `@can_return_tuple` decorators by mtthw13 · Pull Request #44068 · huggingface/transformers

mtthw13 · 2026-02-17T06:13:37Z

Replaces manual output_attentions/output_hidden_states/return_dict boilerplate in GPT-Neo with the hook-based decorator system.

Changes:

Added _can_record_outputs = {"hidden_states": GPTNeoBlock, "attentions": GPTNeoAttention} on GPTNeoPreTrainedModel
Added @capture_outputs + @merge_with_config_defaults on GPTNeoModel.forward
Added @can_return_tuple on GPTNeoForCausalLM, GPTNeoForSequenceClassification, GPTNeoForTokenClassification, GPTNeoForQuestionAnswering
Dropped output_attentions, output_hidden_states, return_dict from all forward signatures
Removed parameter resolution lines and manual collection loops
GPTNeoBlock now returns hidden_states directly (not a tuple)
Attention always returns (attn_output, attn_weights)
Removed unused imports: BaseModelOutputWithPastAndCrossAttentions, CausalLMOutputWithCrossAttentions
Updated test: removed output_attentions=True from direct GPTNeoSelfAttention call in test_local_attn_probs

Tests: All 111 GPT-Neo model tests pass.

…tors

mtthw13 · 2026-02-17T08:27:31Z

Fixes #43979.

github-actions · 2026-02-18T08:30:31Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_neo

Refactor GPT-Neo to use @capture_outputs and @can_return_tuple decora…

09320fc

…tors

Merge branch 'main' into refactor-gpt-neo-capture-outputs

c4de8c3

Provide feedback