Skip to content

Fix Seq2SeqTrainer generation path for decoder-only models#44650

Open
shaealh wants to merge 1 commit intohuggingface:mainfrom
shaealh:shaealh/seq2seq-decoder-only-generate
Open

Fix Seq2SeqTrainer generation path for decoder-only models#44650
shaealh wants to merge 1 commit intohuggingface:mainfrom
shaealh:shaealh/seq2seq-decoder-only-generate

Conversation

@shaealh
Copy link
Copy Markdown

@shaealh shaealh commented Mar 13, 2026

Closes #44593

Summary

  • use generation_input_ids/generation_attention_mask when provided for decoder-only models
  • otherwise infer prompt from leading -100 labels and build left-padded prompt batch
  • return completion tokens for decoder-only generation (strip prompt)
  • keep encoder-decoder behavior unchanged

Tests

  • PYTHONPATH=src python -m pytest tests/trainer/test_trainer_seq2seq.py -k Seq2SeqTrainerPredictionStepTester -q -rs

Related #26474, #33396
Follow-up to #32346

@Rocketknight1
Copy link
Copy Markdown
Member

cc @SunMarc

@shaealh
Copy link
Copy Markdown
Author

shaealh commented Apr 2, 2026

Still waiting for review and approval

@shaealh
Copy link
Copy Markdown
Author

shaealh commented Apr 27, 2026

following up here. Still waiting for approval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for sequence-level custom metrics with decoder-only models

2 participants