Skip to content

cp: fix Qwen3.5+Phi4MM CI after transformers v5.5 update(1906) into r0.4.0#1908

Merged
HuiyingLi merged 1 commit intor0.4.0from
cherry-pick-1906-r0.4.0
Apr 19, 2026
Merged

cp: fix Qwen3.5+Phi4MM CI after transformers v5.5 update(1906) into r0.4.0#1908
HuiyingLi merged 1 commit intor0.4.0from
cherry-pick-1906-r0.4.0

Conversation

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor

beep boop [🤖]: Hi @HuiyingLi 👋,

we've cherry picked #1906 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

…date (#1906)

* fix: restore Qwen3.5 + Phi-4-MM nightly CI after transformers v5.5 update

- Port Qwen3.5 MoE CPAwareGatedDeltaNet._forward_no_cp to the v5.5 per-layer
  cache API (has_previous_state method, cache.layers[idx].{conv,recurrent}_states,
  update_conv_state/update_recurrent_state) — fixes
  AttributeError: 'DynamicCache' object has no attribute 'conv_states' on every
  forward pass.
- Bridge the legacy `_supports_flash_attn_2` class flag to v5.5's
  `_supports_flash_attn` (renamed + default-False on the base). Remote-code
  models pinned against <=v5.3 (e.g. microsoft/Phi-4-multimodal-instruct) only
  set the legacy flag and their FA2 support becomes invisible to v5.5 — FA2
  dispatch then raises ValueError even though the model supports it. Install
  a property on PreTrainedModel that honors the legacy flag as a fallback
  when a subclass has not set the new one; subclasses that set the new flag
  directly still shadow the property via MRO, so native models are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* test: cover FA2 flag bridge and Qwen3.5 v5.5 cache API

- TestPatchLegacyFlashAttnFlag: legacy `_supports_flash_attn_2 = True` bridges
  to `_supports_flash_attn`; explicit new flag (True/False) shadows via MRO;
  `False` legacy flag does not bridge; nearest-in-MRO wins; idempotent.
- TestForwardNoCpV55CacheAPI: `_forward_no_cp` runs with a fresh DynamicCache
  (training path), runs without a cache, calls `update_conv_state` /
  `update_recurrent_state` with the layer's `layer_idx`, and calls
  `has_previous_state(layer_idx)` as a method.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* test: simplify update_conv_state arg assertion

Addresses review nit — the production call is always positional, so the
keyword-fallback branch was dead code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor Author

/ok to test 3f1a214

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 19, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@HuiyingLi HuiyingLi changed the title cp: fix: restore Qwen3.5 + Phi-4-MM nightly CI after transformers v5.5 update (1906) into r0.4.0 cp: restore Qwen3.5 + Phi-4-MM nightly CI after transformers v5.5 update (1906) into r0.4.0 Apr 19, 2026
@HuiyingLi HuiyingLi changed the title cp: restore Qwen3.5 + Phi-4-MM nightly CI after transformers v5.5 update (1906) into r0.4.0 cp: restore Qwen3.5 + Phi4MM CI after transformers v5.5 update (1906) into r0.4.0 Apr 19, 2026
@HuiyingLi HuiyingLi changed the title cp: restore Qwen3.5 + Phi4MM CI after transformers v5.5 update (1906) into r0.4.0 cp: fix Qwen3.5+Phi4MM CI after transformers v5.5 update(1906) into r0.4.0 Apr 19, 2026
@HuiyingLi
Copy link
Copy Markdown
Contributor

/ok to test 3f1a214

@HuiyingLi HuiyingLi merged commit 97a500f into r0.4.0 Apr 19, 2026
54 of 60 checks passed
@HuiyingLi HuiyingLi deleted the cherry-pick-1906-r0.4.0 branch April 19, 2026 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick Run CICD Trigger Testing CICD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants