Remove many output_attentions and other traced outputs on 100+ models #43590
Remove many output_attentions and other traced outputs on 100+ models #43590
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
This comment contains models: ["models/roberta", "models/roberta_prelayernorm", "models/roc_bert", "models/sam", "models/speech_to_text", "models/splinter", "models/stablelm", "models/time_series_transformer", "models/timesfm2_5", "models/timm_wrapper", "models/video_llama_3", "models/video_llava", "models/videomae"] |
CI ResultsCommit Info
Model CI Report❌ 5 new failed tests from this PR 😭
|
|
run-slow: vipllava,vit_mae,vit_msn,vitpose_backbone,vivit,vjepa2,voxtral_realtime,xglm,xlm_roberta,xlm_roberta_xl,xlstm,xmod,yolos,zamba,qwen2_5_omni,qwen2_audio,qwen2_vl,qwen3_5 |
|
This comment contains models: ["models/qwen2_5_omni", "models/qwen2_audio", "models/qwen2_vl", "models/qwen3_5", "models/vipllava", "models/vit_mae", "models/vit_msn", "models/vitpose_backbone", "models/vivit", "models/vjepa2", "models/voxtral_realtime", "models/xglm", "models/xlm_roberta", "models/xlm_roberta_xl", "models/xlstm", "models/xmod", "models/yolos", "models/zamba"] |
|
run-slow: qwen3_5_moe,qwen3_omni_moe,qwen3_vl,qwen3_vl_moe |
|
This comment contains models: ["models/qwen3_5_moe", "models/qwen3_omni_moe", "models/qwen3_vl", "models/qwen3_vl_moe"] |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, align, altclip, apertus, aria, audio_spectrogram_transformer, audioflamingo3, autoformer, aya_vision, bamba, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, align, altclip, apertus, aria, audio_spectrogram_transformer, audioflamingo3, autoformer, aya_vision, bamba, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus |
1 similar comment
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, align, altclip, apertus, aria, audio_spectrogram_transformer, audioflamingo3, autoformer, aya_vision, bamba, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, align, altclip, apertus, aria, audio_spectrogram_transformer, audioflamingo3, autoformer, aya_vision, bamba, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus |
CI ResultsCommit Info
The test failure analysis could not be completed. Please check the workflow run for details. |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, align, altclip, apertus, aria, audio_spectrogram_transformer, audioflamingo3, autoformer, aya_vision, bamba, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus |
|
Force merging flaky test, and also important for a model addition (and several other refactors) |
What does this PR do?
In model additions, we often see old standards not using
check_model_inputs,can_return_tupleand it's often a first review comment/something that can slip through. Doing a wide scan to try to remove all occurrences systematically.Background
Every model used to manually resolve
output_attentions,output_hidden_states, andreturn_dictin eachforward, then collect intermediate outputs in a loop, then convert to tuple at the end. That's ~30 lines of boilerplate per model, reimplemented everywhere with subtle inconsistencies.Two decorators now handle this:
@capture_outputsgoes on the base model forward (the one with the layer loop). It readsoutput_attentions/output_hidden_statesfrom kwargs or config, installs hooks on modules listed in_can_record_outputs, collects intermediate outputs automatically, injects them into theModelOutput, and handlesreturn_dict. The model just needs to declare which module classes produce which outputs (e.g._can_record_outputs = {"hidden_states": DecoderLayer, "attentions": Attention}).@can_return_tuplegoes on wrapper forwards (ForCausalLM,ForSequenceClassification, VLM wrappers) that only needreturn_dictconversion. Wrapper models should not use@capture_outputsto avoid nested hook chains.What changes per model
output_attentions,output_hidden_states,return_dictdropped from forward signatures, replaced by**kwargs: Unpack[TransformersKwargs]all_hidden_states += (hidden_states,)collection loops removed(attn_output, attn_weights)— theif not output_attentions: attn_weights = Noneguard is removed since hooks capture directly from the module output