Skip to content

Refactor GPT-Neo to use @capture_outputs and @can_return_tuple decorators#44068

Open
mtthw13 wants to merge 2 commits intohuggingface:mainfrom
mtthw13:refactor-gpt-neo-capture-outputs
Open

Refactor GPT-Neo to use @capture_outputs and @can_return_tuple decorators#44068
mtthw13 wants to merge 2 commits intohuggingface:mainfrom
mtthw13:refactor-gpt-neo-capture-outputs

Conversation

@mtthw13
Copy link
Copy Markdown

@mtthw13 mtthw13 commented Feb 17, 2026

Replaces manual output_attentions/output_hidden_states/return_dict boilerplate in GPT-Neo with the hook-based decorator system.

Changes:

  • Added _can_record_outputs = {"hidden_states": GPTNeoBlock, "attentions": GPTNeoAttention} on GPTNeoPreTrainedModel
  • Added @capture_outputs + @merge_with_config_defaults on GPTNeoModel.forward
  • Added @can_return_tuple on GPTNeoForCausalLM, GPTNeoForSequenceClassification, GPTNeoForTokenClassification, GPTNeoForQuestionAnswering
  • Dropped output_attentions, output_hidden_states, return_dict from all forward signatures
  • Removed parameter resolution lines and manual collection loops
  • GPTNeoBlock now returns hidden_states directly (not a tuple)
  • Attention always returns (attn_output, attn_weights)
  • Removed unused imports: BaseModelOutputWithPastAndCrossAttentions, CausalLMOutputWithCrossAttentions
  • Updated test: removed output_attentions=True from direct GPTNeoSelfAttention call in test_local_attn_probs

Tests: All 111 GPT-Neo model tests pass.

@mtthw13
Copy link
Copy Markdown
Author

mtthw13 commented Feb 17, 2026

Fixes #43979.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_neo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant