Conversation
…eturn_tuple decorators
…tuple Migrate the GPT-J model to use the new standardized output collection decorators, replacing manual accumulation of hidden states and attention weights with hook-based capturing. Changes: - Add `_can_record_outputs` to `GPTJPreTrainedModel` mapping hidden_states to GPTJBlock and attentions to GPTJAttention - Add `@capture_outputs` and `@merge_with_config_defaults` to `GPTJModel.forward()` - Add `@can_return_tuple` to all task head models (ForCausalLM, ForSequenceClassification, ForQuestionAnswering) - Remove `output_attentions`, `output_hidden_states`, and `return_dict` parameters from all forward signatures - Remove manual accumulator loops and return_dict branching - Simplify GPTJBlock to return plain `torch.Tensor` instead of tuple - Update attention forward signatures to always return `(attn_output, attn_weights)` without conditional logic Resolves huggingface#43979
The CodeGenBlock is a documented copy of GPTJBlock. This syncs it to match the updated signature after removing output_attentions parameter and simplifying the return type to plain torch.Tensor. Generated via `python utils/check_copies.py --fix_and_overwrite`.
The previous commit auto-synced CodeGenBlock.forward() with the refactored GPTJBlock, but CodeGenModel still passes output_attentions to CodeGenBlock and expects a tuple return. Since the CodeGen model has not been refactored to use the new decorators yet, restore CodeGenBlock's original forward() signature and remove the '# Copied from' directive to decouple it from GPTJBlock until CodeGen gets its own output tracing refactor.
…23633 # Conflicts: # src/transformers/models/regnet/modeling_regnet.py # src/transformers/models/resnet/modeling_resnet.py
…23633 # Conflicts: # src/transformers/models/mobilenet_v2/modeling_mobilenet_v2.py
…23633 # Conflicts: # src/transformers/models/deberta_v2/modeling_deberta_v2.py
…23633 # Conflicts: # src/transformers/models/efficientnet/modeling_efficientnet.py
…23633 # Conflicts: # src/transformers/models/gptj/modeling_gptj.py
Owner
Author
|
Trace for this mergeability run: https://huggingface.co/datasets/evalstate/transformers-merge-experiments/blob/main/2604232336-Ocse9d__dev__codex.jsonl |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated cluster merge for cluster-43979-11 against
main.Merged PRs:
Skipped PRs:
Failed PRs:
Notes:
resnetto use@capture_outputs/@can_return_tupleoutput tracing huggingface/transformers#44019 also overlaps with ResNet, but [ResNet] Refactor output tracing to decorator-based interface huggingface/transformers#44007 was broader because it covered ResNet, RegNet, and RT-DETR ResNet.python -m compileallon the merged model files; compilation succeeded.Next steps:
make styleormake fix-repoto normalize imports and generated consistency before any further handoff.