cp: fix Qwen3.5+Phi4MM CI after transformers v5.5 update(1906) into r0.4.0#1908
Merged
cp: fix Qwen3.5+Phi4MM CI after transformers v5.5 update(1906) into r0.4.0#1908
fix Qwen3.5+Phi4MM CI after transformers v5.5 update(1906) into r0.4.0#1908Conversation
…date (#1906) * fix: restore Qwen3.5 + Phi-4-MM nightly CI after transformers v5.5 update - Port Qwen3.5 MoE CPAwareGatedDeltaNet._forward_no_cp to the v5.5 per-layer cache API (has_previous_state method, cache.layers[idx].{conv,recurrent}_states, update_conv_state/update_recurrent_state) — fixes AttributeError: 'DynamicCache' object has no attribute 'conv_states' on every forward pass. - Bridge the legacy `_supports_flash_attn_2` class flag to v5.5's `_supports_flash_attn` (renamed + default-False on the base). Remote-code models pinned against <=v5.3 (e.g. microsoft/Phi-4-multimodal-instruct) only set the legacy flag and their FA2 support becomes invisible to v5.5 — FA2 dispatch then raises ValueError even though the model supports it. Install a property on PreTrainedModel that honors the legacy flag as a fallback when a subclass has not set the new one; subclasses that set the new flag directly still shadow the property via MRO, so native models are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * test: cover FA2 flag bridge and Qwen3.5 v5.5 cache API - TestPatchLegacyFlashAttnFlag: legacy `_supports_flash_attn_2 = True` bridges to `_supports_flash_attn`; explicit new flag (True/False) shadows via MRO; `False` legacy flag does not bridge; nearest-in-MRO wins; idempotent. - TestForwardNoCpV55CacheAPI: `_forward_no_cp` runs with a fresh DynamicCache (training path), runs without a cache, calls `update_conv_state` / `update_recurrent_state` with the layer's `layer_idx`, and calls `has_previous_state(layer_idx)` as a method. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * test: simplify update_conv_state arg assertion Addresses review nit — the production call is always positional, so the keyword-fallback branch was dead code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: HuiyingLi <willwin.lee@gmail.com> --------- Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Contributor
Author
|
/ok to test 3f1a214 |
fix: restore Qwen3.5 + Phi-4-MM nightly CI after transformers v5.5 update (1906) into r0.4.0restore Qwen3.5 + Phi-4-MM nightly CI after transformers v5.5 update (1906) into r0.4.0
restore Qwen3.5 + Phi-4-MM nightly CI after transformers v5.5 update (1906) into r0.4.0restore Qwen3.5 + Phi4MM CI after transformers v5.5 update (1906) into r0.4.0
restore Qwen3.5 + Phi4MM CI after transformers v5.5 update (1906) into r0.4.0fix Qwen3.5+Phi4MM CI after transformers v5.5 update(1906) into r0.4.0
Contributor
|
/ok to test 3f1a214 |
HuiyingLi
approved these changes
Apr 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
beep boop [🤖]: Hi @HuiyingLi 👋,