Refactor GPT2-based models to standardized output collection interface by akashadsare · Pull Request #44015 · huggingface/transformers

akashadsare · 2026-02-15T18:07:11Z

This PR migrates GPT-2 and its derivatives (GPTBigCode, Decision Transformer) to the new standardized output collection interface using the @capture_outputs and @can_return_tuple decorators.

By leveraging the _can_record_outputs metadata in the model classes, we remove the need for manual loops and boilerplate code to collect hidden states and attention weights in the forward pass. This improves maintainability and ensures consistency with the library's evolving standards.

Key Changes:

Standardized Collection: Replaced manual output_hidden_states and output_attentions logic with decorators.
Hook-based Capture: Defined _can_record_outputs for GPT2, GPT BigCode, and Decision Transformer components.
Attention Capture: Utilized OutputRecorder to target the second element (index 1) of attention outputs.
Improved Signatures: Updated forward signatures to use Unpack[TransformersKwargs].
Validation:

Verified GPT-2 modeling tests: pytest tests/models/gpt2/test_modeling_gpt2.py (all tests passed).
Ensured repo consistency via make fix-repo.

github-actions · 2026-02-15T18:08:13Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: decision_transformer, gpt2, gpt_bigcode

Copilot

Pull request overview

This PR migrates GPT-2-family PyTorch model implementations (GPT-2, GPTBigCode, DecisionTransformer’s GPT-2 backbone) to the standardized output-collection interface based on @capture_outputs / @can_return_tuple and per-model _can_record_outputs metadata.

Changes:

Replaced manual hidden-state/attention collection loops with hook-based output capturing.
Updated forward signatures to use **kwargs: Unpack[TransformersKwargs] and rely on decorators for return_dict handling.
Added _can_record_outputs mappings for GPT-2, GPTBigCode, and DecisionTransformer components.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py`	Adds `_can_record_outputs` and moves GPTBigCode to decorator-based hidden-state/attention capture.
`src/transformers/models/gpt2/modeling_gpt2.py`	Refactors GPT-2 and heads to decorator-based output capture and standardized kwargs.
`src/transformers/models/decision_transformer/modeling_decision_transformer.py`	Updates DecisionTransformer’s GPT-2 backbone + top-level model to the new capture interface and standardized kwargs.

Copilot · 2026-02-15T18:13:55Z

+        "attentions": OutputRecorder(GPTBigCodeAttention, index=1),
+        "cross_attentions": OutputRecorder(GPTBigCodeAttention, index=1),


Both attentions and cross_attentions are configured to record from all GPTBigCodeAttention modules with no layer_name filter. This will mix self-attention and cross-attention weights into both outputs. Use layer_name filters (e.g. attn vs crossattention) similar to the GPT-2 setup so each key captures only the intended modules.

Suggested change

"attentions": OutputRecorder(GPTBigCodeAttention, index=1),

"cross_attentions": OutputRecorder(GPTBigCodeAttention, index=1),

"attentions": OutputRecorder(GPTBigCodeAttention, index=1, layer_name="attn"),

"cross_attentions": OutputRecorder(GPTBigCodeAttention, index=1, layer_name="crossattention"),

Copilot · 2026-02-15T18:13:56Z

+from ...modeling_utils import (
+    ALL_ATTENTION_FUNCTIONS,
+    OutputRecorder,
+    PreTrainedModel,
+    can_return_tuple,
+    capture_outputs,
+)
+from ...processing_utils import Unpack


can_return_tuple and capture_outputs are imported from modeling_utils, but those symbols are not defined there (only OutputRecorder is re-exported). This will raise an ImportError at import time. Import can_return_tuple from ...utils and capture_outputs from ...utils.output_capturing (and keep OutputRecorder consistent with the other model files).

Copilot · 2026-02-15T18:13:56Z

-                output_attentions=output_attentions,
            )

            hidden_states = outputs[0]


DecisionTransformerGPT2Block.forward() now returns a tensor, but this loop still treats the return value as a tuple (hidden_states = outputs[0]). That will index into the batch dimension and corrupt the hidden states. Assign the block output directly (and consider renaming outputs to hidden_states for clarity).

Copilot · 2026-02-15T18:13:56Z

+        "hidden_states": "encoder",
+        "attentions": "encoder",


This _can_record_outputs mapping targets the submodule named encoder, which causes hooks to record the entire BaseModelOutput... object returned by the encoder (for attentions, this gets appended into the collected attention list). This will produce malformed attentions/hidden_states outputs when output_attentions/output_hidden_states are enabled. If the intent is just to expose these keys at the top level while relying on the encoder’s own _can_record_outputs, set the values to an empty list (so no hook is installed) but keep the keys present.

Suggested change

"hidden_states": "encoder",

"attentions": "encoder",

"hidden_states": [],

"attentions": [],

Refactor GPT2-based models to standardized output collection interface

d6af732

Copilot AI review requested due to automatic review settings February 15, 2026 18:07

Copilot started reviewing on behalf of akashadsare February 15, 2026 18:07 View session

Copilot AI reviewed Feb 15, 2026

View reviewed changes

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor GPT2-based models to standardized output collection interface#44015

Refactor GPT2-based models to standardized output collection interface#44015
akashadsare wants to merge 1 commit intohuggingface:mainfrom
akashadsare:refactor-gpt2-output-interface

akashadsare commented Feb 15, 2026

Uh oh!

github-actions Bot commented Feb 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		"attentions": OutputRecorder(GPTBigCodeAttention, index=1),
		"cross_attentions": OutputRecorder(GPTBigCodeAttention, index=1),

Conversation

akashadsare commented Feb 15, 2026

Uh oh!

github-actions Bot commented Feb 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants