Skip to content

Remove many output_attentions and other traced outputs on 100+ models #43590

Merged
vasqu merged 191 commits intomainfrom
update_all_decorators
Mar 12, 2026
Merged

Remove many output_attentions and other traced outputs on 100+ models #43590
vasqu merged 191 commits intomainfrom
update_all_decorators

Conversation

@molbap
Copy link
Copy Markdown
Contributor

@molbap molbap commented Jan 29, 2026

What does this PR do?

In model additions, we often see old standards not using check_model_inputs, can_return_tuple and it's often a first review comment/something that can slip through. Doing a wide scan to try to remove all occurrences systematically.

Background

Every model used to manually resolve output_attentions, output_hidden_states, and return_dict in each forward, then collect intermediate outputs in a loop, then convert to tuple at the end. That's ~30 lines of boilerplate per model, reimplemented everywhere with subtle inconsistencies.

Two decorators now handle this:

  • @capture_outputs goes on the base model forward (the one with the layer loop). It reads output_attentions/output_hidden_states from kwargs or config, installs hooks on modules listed in _can_record_outputs, collects intermediate outputs automatically, injects them into the ModelOutput, and handles return_dict. The model just needs to declare which module classes produce which outputs (e.g. _can_record_outputs = {"hidden_states": DecoderLayer, "attentions": Attention}).

  • @can_return_tuple goes on wrapper forwards (ForCausalLM, ForSequenceClassification, VLM wrappers) that only need return_dict conversion. Wrapper models should not use @capture_outputs to avoid nested hook chains.

What changes per model

  • output_attentions, output_hidden_states, return_dict dropped from forward signatures, replaced by **kwargs: Unpack[TransformersKwargs]
  • Explicit parameter resolution lines removed
  • Manual all_hidden_states += (hidden_states,) collection loops removed
  • Decoder layers return a single tensor instead of a tuple
  • Attention modules always return (attn_output, attn_weights) — the if not output_attentions: attn_weights = None guard is removed since hooks capture directly from the module output

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/roberta", "models/roberta_prelayernorm", "models/roc_bert", "models/sam", "models/speech_to_text", "models/splinter", "models/stablelm", "models/time_series_transformer", "models/timesfm2_5", "models/timm_wrapper", "models/video_llama_3", "models/video_llava", "models/videomae"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 68de84a6 workflow commit (merge commit)
PR fdaa667f branch commit (from PR)
main 535f289d base commit (on main)

Model CI Report

5 new failed tests from this PR 😭

  • time_series_transformer:
    tests/models/time_series_transformer/test_modeling_time_series_transformer.py::TimeSeriesTransformerModelTest::test_forward_signature (✅ ⟹ ❌)

  • video_llama_3:
    tests/models/video_llama_3/test_modeling_video_llama_3.py::VideoLlama3IntegrationTest::test_small_model_integration_test (❌ ⟹ ❌)
    tests/models/video_llama_3/test_modeling_video_llama_3.py::VideoLlama3IntegrationTest::test_small_model_integration_test_batch (❌ ⟹ ❌)
    tests/models/video_llama_3/test_modeling_video_llama_3.py::VideoLlama3IntegrationTest::test_small_model_integration_test_batch_different_resolutions (❌ ⟹ ❌)
    tests/models/video_llama_3/test_modeling_video_llama_3.py::VideoLlama3IntegrationTest::test_small_model_integration_test_batch_wo_image (❌ ⟹ ❌)

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 12, 2026

run-slow: vipllava,vit_mae,vit_msn,vitpose_backbone,vivit,vjepa2,voxtral_realtime,xglm,xlm_roberta,xlm_roberta_xl,xlstm,xmod,yolos,zamba,qwen2_5_omni,qwen2_audio,qwen2_vl,qwen3_5

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/qwen2_5_omni", "models/qwen2_audio", "models/qwen2_vl", "models/qwen3_5", "models/vipllava", "models/vit_mae", "models/vit_msn", "models/vitpose_backbone", "models/vivit", "models/vjepa2", "models/voxtral_realtime", "models/xglm", "models/xlm_roberta", "models/xlm_roberta_xl", "models/xlstm", "models/xmod", "models/yolos", "models/zamba"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 68de84a6 workflow commit (merge commit)
PR fdaa667f branch commit (from PR)
main 535f289d base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 12, 2026

run-slow: qwen3_5_moe,qwen3_omni_moe,qwen3_vl,qwen3_vl_moe

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/qwen3_5_moe", "models/qwen3_omni_moe", "models/qwen3_vl", "models/qwen3_vl_moe"]
quantizations: []

@vasqu vasqu mentioned this pull request Mar 12, 2026
5 tasks
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, align, altclip, apertus, aria, audio_spectrogram_transformer, audioflamingo3, autoformer, aya_vision, bamba, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, align, altclip, apertus, aria, audio_spectrogram_transformer, audioflamingo3, autoformer, aya_vision, bamba, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, align, altclip, apertus, aria, audio_spectrogram_transformer, audioflamingo3, autoformer, aya_vision, bamba, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus

@vasqu vasqu enabled auto-merge March 12, 2026 17:15
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, align, altclip, apertus, aria, audio_spectrogram_transformer, audioflamingo3, autoformer, aya_vision, bamba, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus

@vasqu vasqu disabled auto-merge March 12, 2026 17:23
@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 68de84a6 workflow commit (merge commit)
PR fdaa667f branch commit (from PR)
main 535f289d base commit (on main)

⚠️ Model CI failed to report results

The test failure analysis could not be completed. Please check the workflow run for details.

@vasqu vasqu enabled auto-merge March 12, 2026 18:44
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, align, altclip, apertus, aria, audio_spectrogram_transformer, audioflamingo3, autoformer, aya_vision, bamba, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus

@vasqu vasqu added this pull request to the merge queue Mar 12, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 12, 2026
@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 12, 2026

Force merging flaky test, and also important for a model addition (and several other refactors)

@vasqu vasqu merged commit e2d4ac0 into main Mar 12, 2026
29 checks passed
@vasqu vasqu deleted the update_all_decorators branch March 12, 2026 19:08
@vasqu vasqu mentioned this pull request Mar 12, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants