Skip to content

Refactor GPT-J to use standardized output tracing (#43979)#44066

Open
jayavelubalaji-ai wants to merge 3 commits intohuggingface:mainfrom
jayavelubalaji-ai:43979/refactor-gptj-output-tracing
Open

Refactor GPT-J to use standardized output tracing (#43979)#44066
jayavelubalaji-ai wants to merge 3 commits intohuggingface:mainfrom
jayavelubalaji-ai:43979/refactor-gptj-output-tracing

Conversation

@jayavelubalaji-ai
Copy link
Copy Markdown

Migrate GPT-J from manual boilerplate output collection to the new decorator-based output tracing system:

  • Add _can_record_outputs to GPTJPreTrainedModel
  • Add @capture_outputs and @merge_with_config_defaults to GPTJModel.forward
  • Add @can_return_tuple to GPTJForCausalLM, GPTJForSequenceClassification, and GPTJForQuestionAnswering forwards
  • Simplify GPTJBlock.forward to return hidden_states directly
  • Remove output_attentions, output_hidden_states, return_dict params from signatures (now handled by decorators)
  • Propagate changes to CodeGen via # Copied from annotation

Net reduction of ~70 lines of boilerplate code.

What does this PR do?

This PR migrates the GPT-J model from manual boilerplate output collection to the new standardized decorator-based output tracing system introduced in #43979.

Before: Each forward method manually resolved output_attentions, output_hidden_states, and return_dict from config defaults, maintained accumulator lists (all_hidden_states, all_self_attentions), and conditionally appended outputs in the decoder loop.

After: Decorators (@capture_outputs, @can_return_tuple, @merge_with_config_defaults) and PyTorch forward hooks handle all output collection automatically. GPTJBlock.forward returns only hidden_states (a single tensor) instead of a tuple, and wrapper model forwards use attribute access on the output object.

Changes to CodeGenBlock were auto-propagated via make fix-repo through the existing # Copied from transformers.models.gptj.modeling_gptj.GPTJBlock annotation.

Tests: All 107 GPT-J model tests pass (139 skipped — GPU/Hub dependent, expected on CPU-only).

Fixes #43979 (partial — GPT-J model only)

jayavelubalaji-ai and others added 3 commits February 17, 2026 00:08
Migrate GPT-J from manual boilerplate output collection to the new
decorator-based output tracing system:

- Add _can_record_outputs to GPTJPreTrainedModel
- Add @capture_outputs and @merge_with_config_defaults to GPTJModel.forward
- Add @can_return_tuple to GPTJForCausalLM, GPTJForSequenceClassification,
  and GPTJForQuestionAnswering forwards
- Simplify GPTJBlock.forward to return hidden_states directly
- Remove output_attentions, output_hidden_states, return_dict params
  from signatures (now handled by decorators)
- Propagate changes to CodeGen via Copied from annotation

Net reduction of ~70 lines of boilerplate code.
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: codegen, gptj

@jayavelubalaji-ai
Copy link
Copy Markdown
Author

Hi @ArthurZucker , Can you please review this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Call to contributions: refactor output tracing in transformers

1 participant