Skip to content

Migrate wav2vec2, wav2vec2_conformer, and wav2vec2_bert to standardized output collection decorators#44114

Open
23atharvaS wants to merge 12 commits intohuggingface:mainfrom
23atharvaS:wav2vec2-output-capturing-migration
Open

Migrate wav2vec2, wav2vec2_conformer, and wav2vec2_bert to standardized output collection decorators#44114
23atharvaS wants to merge 12 commits intohuggingface:mainfrom
23atharvaS:wav2vec2-output-capturing-migration

Conversation

@23atharvaS
Copy link
Copy Markdown

@23atharvaS 23atharvaS commented Feb 17, 2026

Summary

This PR migrates the wav2vec2 family to the standardized output-capturing interface (@capture_outputs + @can_return_tuple) and includes follow-up compatibility fixes required to make full CI green.

What changed

Core migration (wav2vec2, wav2vec2_conformer, wav2vec2_bert)

  • Added _can_record_outputs on:
    • Wav2Vec2PreTrainedModel
    • Wav2Vec2ConformerPreTrainedModel
    • Wav2Vec2BertPreTrainedModel
  • Added @capture_outputs on base model forwards:
    • Wav2Vec2Model.forward
    • Wav2Vec2ConformerModel.forward (via modular -> generated)
    • Wav2Vec2BertModel.forward (via modular -> generated)
  • Added @can_return_tuple on wrapper forwards (CTC / sequence classification / audio frame classification / xvector / pretraining where applicable).
  • Removed manual hidden-state/attention collection loops and legacy output plumbing in migrated paths.
  • Updated encoder layer return flow to align with hook-based output capture.
  • In conformer/bert modular encoders, ensured only output_attentions is forwarded to self-attention (preventing unrelated kwargs from propagating too deep).

Follow-up compatibility fixes (from CI)

  • Restored config-driven forwarding of output_attentions / output_hidden_states in Wav2Vec2Model.forward so config-only output requests are honored.
  • Synced the same behavior across related wav2vec2-derived models through modular/generated updates:
    • hubert
    • sew
    • unispeech
    • unispeech_sat
  • Fixed SEWEncoderLayer output contract to match encoder expectations.
  • Updated UniSpeechForPreTraining / UniSpeechSatForPreTraining to propagate config-driven output flags.
  • Updated XcodecModel to accept and thread capture-output kwargs (output_attentions, output_hidden_states) through its semantic backbone call path, fixing test_capture_outputs_decorator.
  • Kept attention return compatibility where required by downstream copied implementations.

Files touched (high-level)

  • src/transformers/models/wav2vec2/modeling_wav2vec2.py
  • src/transformers/models/wav2vec2_conformer/modular_wav2vec2_conformer.py
  • src/transformers/models/wav2vec2_conformer/modeling_wav2vec2_conformer.py (regenerated)
  • src/transformers/models/wav2vec2_bert/modular_wav2vec2_bert.py
  • src/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py (regenerated)
  • src/transformers/models/hubert/modular_hubert.py
  • src/transformers/models/hubert/modeling_hubert.py (regenerated)
  • src/transformers/models/sew/modular_sew.py
  • src/transformers/models/sew/modeling_sew.py (regenerated)
  • src/transformers/models/unispeech/modular_unispeech.py
  • src/transformers/models/unispeech/modeling_unispeech.py (regenerated)
  • src/transformers/models/unispeech_sat/modular_unispeech_sat.py
  • src/transformers/models/unispeech_sat/modeling_unispeech_sat.py (regenerated)
  • src/transformers/models/xcodec/modeling_xcodec.py

Validation

Regeneration / consistency

  • python utils/check_modular_conversion.py --fix_and_overwrite
  • python utils/check_modular_conversion.py
  • python utils/check_copies.py --fix_and_overwrite (when needed by CI drift)

Code quality

  • python -m ruff check ... on touched files (clean)

Focused output-interface tests

  • python -m pytest tests/models/wav2vec2/test_modeling_wav2vec2.py tests/models/wav2vec2_conformer/test_modeling_wav2vec2_conformer.py tests/models/wav2vec2_bert/test_modeling_wav2vec2_bert.py -k "capture_outputs_decorator or attention_outputs or hidden_states_output or model_outputs_equivalence" -q
  • Result: pass

Targeted regression reruns for CI failures

  • hubert: test_attention_outputs (regular + robust) — pass
  • sew: test_attention_outputs — pass
  • unispeech: test_attention_outputs (robust) — pass
  • unispeech_sat: test_attention_outputs (regular + robust) — pass
  • xcodec: test_capture_outputs_decorator — pass
  • xcodec: test_model_forward_default_config_values — pass
  • glm_ocr: test_generate_with_and_without_position_ids — pass

CI status

  • tests_torch and all major CircleCI shards pass.
  • Remaining merge blockers are maintainer workflow approval / required review, not code failures.

Notes

  • On Windows, test_save_load may fail with a safetensors file-lock issue (os error 1224), which appears environment-specific and unrelated to this migration logic.

@23atharvaS
Copy link
Copy Markdown
Author

23atharvaS commented Feb 18, 2026

This PR is still under testing phase on CI

@23atharvaS
Copy link
Copy Markdown
Author

The PR has passed all of the CircleCI tests

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: data2vec, hubert, patchtsmixer, patchtst, sew, sew_d, unispeech, unispeech_sat, wav2vec2, wav2vec2_bert, wav2vec2_conformer, wavlm, xcodec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants