Skip to content

[Auto] Refactor output tracing across model backends (cluster-43979-11): merged 7 of 10 PRs#29

Open
evalstate wants to merge 22 commits intomainfrom
merge-cluster-cluster-43979-11-20260424104629
Open

[Auto] Refactor output tracing across model backends (cluster-43979-11): merged 7 of 10 PRs#29
evalstate wants to merge 22 commits intomainfrom
merge-cluster-cluster-43979-11-20260424104629

Conversation

@evalstate
Copy link
Copy Markdown
Owner

Cluster: cluster-43979-11
Source repo cluster inspected for huggingface/transformers issue huggingface#43979.

Merged PRs:

Skipped PRs:

Failed PRs:

Notes:

Next steps:

  • Review resolved conflicts in the merged model files for semantic correctness.
  • Run targeted model tests for ResNet, RegNet, CvT, FNet, MobileNetV2, DeBERTa v2, EfficientNet, SpeechT5, and GPT-J.
  • Run make style and make typing or make check-repo before any PR-ready handoff.
  • Revisit VITS only with a corrected patch using the current output_capturing API.

beelapranay and others added 22 commits February 14, 2026 13:04
…tuple

Migrate the GPT-J model to use the new standardized output collection
decorators, replacing manual accumulation of hidden states and attention
weights with hook-based capturing.

Changes:
- Add `_can_record_outputs` to `GPTJPreTrainedModel` mapping hidden_states
  to GPTJBlock and attentions to GPTJAttention
- Add `@capture_outputs` and `@merge_with_config_defaults` to
  `GPTJModel.forward()`
- Add `@can_return_tuple` to all task head models (ForCausalLM,
  ForSequenceClassification, ForQuestionAnswering)
- Remove `output_attentions`, `output_hidden_states`, and `return_dict`
  parameters from all forward signatures
- Remove manual accumulator loops and return_dict branching
- Simplify GPTJBlock to return plain `torch.Tensor` instead of tuple
- Update attention forward signatures to always return
  `(attn_output, attn_weights)` without conditional logic

Resolves huggingface#43979
The CodeGenBlock is a documented copy of GPTJBlock. This syncs it
to match the updated signature after removing output_attentions
parameter and simplifying the return type to plain torch.Tensor.

Generated via `python utils/check_copies.py --fix_and_overwrite`.
The previous commit auto-synced CodeGenBlock.forward() with the
refactored GPTJBlock, but CodeGenModel still passes output_attentions
to CodeGenBlock and expects a tuple return. Since the CodeGen model
has not been refactored to use the new decorators yet, restore
CodeGenBlock's original forward() signature and remove the
'# Copied from' directive to decouple it from GPTJBlock until
CodeGen gets its own output tracing refactor.
…er-cluster-43979-11-20260424104629

# Conflicts:
#	src/transformers/models/regnet/modeling_regnet.py
#	src/transformers/models/resnet/modeling_resnet.py
…er-cluster-43979-11-20260424104629

# Conflicts:
#	src/transformers/models/cvt/modeling_cvt.py
#	src/transformers/models/fnet/modeling_fnet.py
…er-cluster-43979-11-20260424104629

# Conflicts:
#	src/transformers/models/mobilenet_v2/modeling_mobilenet_v2.py
…er-cluster-43979-11-20260424104629

# Conflicts:
#	src/transformers/models/deberta_v2/modeling_deberta_v2.py
…er-cluster-43979-11-20260424104629

# Conflicts:
#	src/transformers/models/efficientnet/modeling_efficientnet.py
…er-cluster-43979-11-20260424104629

# Conflicts:
#	src/transformers/models/speecht5/modeling_speecht5.py
…er-cluster-43979-11-20260424104629

# Conflicts:
#	src/transformers/models/gptj/modeling_gptj.py
@evalstate
Copy link
Copy Markdown
Owner Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants