Skip to content

[GPT-J] Refactor output tracing to use capture_outputs/can_return_tuple decorators#44084

Closed
Zephyr-Blessed wants to merge 1 commit intohuggingface:mainfrom
Zephyr-Blessed:refactor/gptj-output-tracing
Closed

[GPT-J] Refactor output tracing to use capture_outputs/can_return_tuple decorators#44084
Zephyr-Blessed wants to merge 1 commit intohuggingface:mainfrom
Zephyr-Blessed:refactor/gptj-output-tracing

Conversation

@Zephyr-Blessed
Copy link
Copy Markdown

What does this PR do?

Refactors the GPT-J model to use the new capture_outputs and can_return_tuple decorators for output tracing, following the same pattern as #44046 (CodeGen).

Changes:

  • Added @capture_outputs decorator on GPTJModel.forward
  • Added @can_return_tuple decorator on GPTJForCausalLM.forward, GPTJForSequenceClassification.forward, and GPTJForQuestionAnswering.forward
  • Added _can_record_outputs class attribute on GPTJPreTrainedModel
  • Removed output_attentions, output_hidden_states, and return_dict parameters from forward signatures (handled by decorators)
  • Removed manual all_hidden_states/all_self_attentions collection loops
  • Removed gradient checkpointing use_cache warning (handled by decorator)
  • Simplified decoder block to return just hidden_states instead of tuple
  • Wrapper models now use outputs.last_hidden_state instead of outputs[0]

Fixes #43979 (GPT-J part)

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gptj

@Zephyr-Blessed
Copy link
Copy Markdown
Author

Closing — GPT-J was already claimed. Sorry for the duplicate!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Call to contributions: refactor output tracing in transformers

1 participant