Skip to content

[transformers] set return dict false for transformers v5 compatibility#1325

Merged
erictang000 merged 4 commits intoNovaSky-AI:mainfrom
erictang000:return_dict_false
Mar 20, 2026
Merged

[transformers] set return dict false for transformers v5 compatibility#1325
erictang000 merged 4 commits intoNovaSky-AI:mainfrom
erictang000:return_dict_false

Conversation

@erictang000
Copy link
Copy Markdown
Collaborator

@erictang000 erictang000 commented Mar 14, 2026

Overview

This PR makes 2 changes for transformers-v5 compatibility:

  • Sets return_dict=False where needed for transformers-v5 compatibility - this can be merged prior to explicitly upgrading to transformers v5 in the pyproject.toml, since vllm still does not technically fully support it in the latest release. (This change should be backwards compatible with transformers v4.*, since the behavior prior to v5 was that the default value for return_dict was None, which was interpreted as False.)
  • check if fsdp_transformer_layer_cls_to_wrap is a set for v5 compatibility while maintaining backwards compatibility

huggingface/transformers breaking PR return_dict=false PR for v5: huggingface/transformers#42567

Validation

Checked all CPU tests are passing both with transformers < 5.0.0 and for transformers==5.3.0, and checked that examples/train/gsm8k/run_gsm8k runs for both old/new transformers.


Open with Devin

gemini-code-assist[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

# For a simple chat template, the fixed base approach is expected to behave the same
# as `apply_chat_template`
expected_token_ids = tokenizer_w_dummy_template.apply_chat_template(messages)
# as `apply_chat_template`. We compare decoded strings rather than raw token IDs
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comparison between full chat template vs stripping base only happens in this test and not in the SkyRLGymGenerator

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it broken by v5? I wonder why it doesn't surface until now

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i asked claude it told me this was the relevant PR: huggingface/transformers#40936 and that in that PR

LlamaTokenizer was rewritten to inherit from TokenizersBackend (Rust) instead of the old Python SentencePiece backend
legacy was changed from defaulting to True to defaulting to False
The _get_prepend_scheme helper was added to select "first" vs "always" based on the legacy flag

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Member

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@erictang000 erictang000 merged commit b2242a0 into NovaSky-AI:main Mar 20, 2026
5 of 6 checks passed
@erictang000 erictang000 deleted the return_dict_false branch March 20, 2026 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants