Skip to content

feat: implement sequential prefill fallback for multimodal runner and add support for Qwen3.5 with vision encoder#1102

Draft
barhanc wants to merge 6 commits intomainfrom
@bh/add-qwen3.5-vl
Draft

feat: implement sequential prefill fallback for multimodal runner and add support for Qwen3.5 with vision encoder#1102
barhanc wants to merge 6 commits intomainfrom
@bh/add-qwen3.5-vl

Conversation

@barhanc
Copy link
Copy Markdown
Contributor

@barhanc barhanc commented Apr 27, 2026

Description

  • Implements sequential prefill fallback for multimodal runner required to support Qwen3.5 with vision capabilities.
  • Adds symbols for Qwen3.5 VL model
  • Adds model picker component to multimodal screen in LLM example app, similar to LLM screen

NOTE: Support for Qwen3.5 VL model is experimental right now since due to architectural constraints (namely the GatedDeltaNet implementation) model requires sequential prefill fallback which results in very slow prefill. Additionally the model output right now is not very good (repetition, etc.) which probably can be fixed by applying #1099 .

The best action for now is probably to wait until the ExecuTorch team adds better support for Qwen3.5 architecture.

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

  • Run example app and test new VLM models

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

Qwen3.5 VLM model was exported using https://github.com/barhanc/executorch/tree/rne-v1.2.0-export-qwen-3.5 .

@barhanc barhanc self-assigned this Apr 27, 2026
@barhanc barhanc added model Issues related to exporting, improving, fixing ML models blocked Issue blocked by some problems (but not other issue, use relationship -> blocker instead) feature PRs that implement a new feature labels Apr 27, 2026
@msluszniak
Copy link
Copy Markdown
Member

For now you can rebase and:

  1. Resolve conflicts
  2. Test if merged feat(llm): min_p and repetition_penalty sampling, per-model defaults, letterbox vision #1099 actually fixed repetition. This will help us detect if the problem is model base or was caused by the incorrect configuration.

@barhanc barhanc force-pushed the @bh/add-qwen3.5-vl branch from b006379 to 97257ad Compare April 30, 2026 12:42
@barhanc
Copy link
Copy Markdown
Contributor Author

barhanc commented Apr 30, 2026

Based on my limited testing it looks like the proper configuration actually somewhat helped. However, the models themselves aren't particularly good, especially compared to something like LFM2.5. And as of right now, they aren't really usable due to slow prefill.

ExecuTorch team merged the PR with custom op for GatedDeltaNet, but they reverted it later (pytorch/executorch#19178), so I guess we still have to wait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked Issue blocked by some problems (but not other issue, use relationship -> blocker instead) feature PRs that implement a new feature model Issues related to exporting, improving, fixing ML models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants