Skip to content

[Auto] Merge output tracing refactors across selected models (cluster-43979-11): merged 5 of 10 PRs#17

Closed
evalstate wants to merge 18 commits intomainfrom
merge-cluster-cluster-43979-11-20260423223633
Closed

[Auto] Merge output tracing refactors across selected models (cluster-43979-11): merged 5 of 10 PRs#17
evalstate wants to merge 18 commits intomainfrom
merge-cluster-cluster-43979-11-20260423223633

Conversation

@evalstate
Copy link
Copy Markdown
Owner

Automated cluster merge for cluster-43979-11 against main.

Merged PRs:

Skipped PRs:

Failed PRs:

Notes:

Next steps:

  • Optionally run make style or make fix-repo to normalize imports and generated consistency before any further handoff.
  • Run targeted tests for the merged model families: ResNet/RegNet/RT-DETR ResNet, MobileNetV2, DeBERTa v2, EfficientNet, GPT-J/CodeGen.
  • If more of the cluster should be salvaged, SpeechT5 would need a real manual forward-port against current cache_position-based decoder code.

Paritosh Dwivedi and others added 18 commits February 15, 2026 12:56
…tuple

Migrate the GPT-J model to use the new standardized output collection
decorators, replacing manual accumulation of hidden states and attention
weights with hook-based capturing.

Changes:
- Add `_can_record_outputs` to `GPTJPreTrainedModel` mapping hidden_states
  to GPTJBlock and attentions to GPTJAttention
- Add `@capture_outputs` and `@merge_with_config_defaults` to
  `GPTJModel.forward()`
- Add `@can_return_tuple` to all task head models (ForCausalLM,
  ForSequenceClassification, ForQuestionAnswering)
- Remove `output_attentions`, `output_hidden_states`, and `return_dict`
  parameters from all forward signatures
- Remove manual accumulator loops and return_dict branching
- Simplify GPTJBlock to return plain `torch.Tensor` instead of tuple
- Update attention forward signatures to always return
  `(attn_output, attn_weights)` without conditional logic

Resolves huggingface#43979
The CodeGenBlock is a documented copy of GPTJBlock. This syncs it
to match the updated signature after removing output_attentions
parameter and simplifying the return type to plain torch.Tensor.

Generated via `python utils/check_copies.py --fix_and_overwrite`.
The previous commit auto-synced CodeGenBlock.forward() with the
refactored GPTJBlock, but CodeGenModel still passes output_attentions
to CodeGenBlock and expects a tuple return. Since the CodeGen model
has not been refactored to use the new decorators yet, restore
CodeGenBlock's original forward() signature and remove the
'# Copied from' directive to decouple it from GPTJBlock until
CodeGen gets its own output tracing refactor.
…23633

# Conflicts:
#	src/transformers/models/regnet/modeling_regnet.py
#	src/transformers/models/resnet/modeling_resnet.py
…23633

# Conflicts:
#	src/transformers/models/mobilenet_v2/modeling_mobilenet_v2.py
…23633

# Conflicts:
#	src/transformers/models/deberta_v2/modeling_deberta_v2.py
…23633

# Conflicts:
#	src/transformers/models/efficientnet/modeling_efficientnet.py
…23633

# Conflicts:
#	src/transformers/models/gptj/modeling_gptj.py
@evalstate
Copy link
Copy Markdown
Owner Author

@evalstate evalstate closed this Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants