Skip to content

Openvino llama support#8

Merged
anzr299 merged 404 commits intoanzr299:an/quantizer_nncf_pt2e_supportfrom
cavusmustafa:openvino_llama_support
Oct 15, 2025
Merged

Openvino llama support#8
anzr299 merged 404 commits intoanzr299:an/quantizer_nncf_pt2e_supportfrom
cavusmustafa:openvino_llama_support

Conversation

@anzr299
Copy link
Owner

@anzr299 anzr299 commented Oct 15, 2025

Summary

[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a Fixes #<issue-id> line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

Gasoonjia and others added 30 commits September 28, 2025 15:26
Differential Revision: D83094606

Pull Request resolved: pytorch#14549
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: pytorch#14556 by
@zonglinpeng
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/zonglinpeng/4/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/zonglinpeng/4/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/zonglinpeng/4/orig
@diff-train-skip-merge

Co-authored-by: Zonglin Peng <zonglinpeng@fb.com>
Follow on from pytorch#14401
Enables dumping of Vela's debug database to a specified directory . This
gives us generic information on operators in our model, and can be
combined with the trace output to provide more detailed profiling
analysis.

Co-authored-by: Zingo Andersen <zingo.andersen@arm.com>
…ch#14503)

This is a re-upload of pytorch#14435, with variables moved inside #ifdef to not
get unused variables when building without devtools.
---------------------------------------
Add some cmake to only do this if executorch is built with bundleio.

codegen/tools subdirectory include needs to be moved in top-level
CmakeLists.txt
to have access to the bundled_program target.

Follow-up patch to enable the fix in arm backend.

Signed-off-by: Erik Lundell <erik.lundell@arm.com>
The passes in the Arm backend have an attribute called
`_passes_required_after` which is a set specifying which passes must run
after the pass itself. This patch sets these dependencies for all the
passes.


Signed-off-by: Martin Lindstroem <Martin.Lindstroem@arm.com>
Add TOSA backend dialect op for TOSA RESIZE. The dialect op replaces
upsample_nearest2d and upsample_bilinear_2d in RewriteUpsamplePass. Also
the Nodevisitors of upsample_nearest2d and upsample_bilinear2d are
replaced by one NodeVisitor for the resize backend dialect op.


Signed-off-by: Oscar Andersson <oscar.andersson@arm.com>
…essary permissions (pytorch#14637)

Created a Fine-grained Personalized-Access Token (PAT), with the following permissions:
Read & write for Projects under Org permissions
Under repository permissions: Read access to Dependabot alerts, code, commit statuses, discussions, issues, merge queues, metadata, and pull requests
Read and Write access to actions and workflows
Saved it as a secret named ET_EXT_CONTRIB
Refactor the backend test suites to use pytest. This includes the
following changes:
* Define pytest markers for each backend and test flow (recipe). This
allows for easy filter, such as by running `pytest some/path/... -m
backend_xnnpack`.
* Use a parameterized pytest fixture to handle test generation /
expansion for each test flow.
* Switch to using the pytest-json-report plugin for reporting. Update
the markdown generation script to take json.
* Shim the existing unittest-based logic for op tests.
* I've updated add.py to show what they should look like long-term. I've
also just updated the model tests, since there aren't as many. I'll
update the remaining op tests later in this stack, though this is purely
to clean up the code. The shimming logic makes them work properly with
pytest in this PR.
 * Update the backend test CI to use pytest.

This also has the benefit of making the jobs much faster by leveraging
parallel execution. I've also added a repro command to the markdown
summary.
### Summary
Update tokenizers to pick up
meta-pytorch/tokenizers@b007644,
which fixes an issue when referencing tokenizers via CMake find_package
on Windows.

### Test plan
CI
Differential Revision: D83441105

Pull Request resolved: pytorch#14651
This removes a reference to the binary tree in the install tree.

Signed-off-by: Adrian Lundell <adrian.lundell@arm.com>
Updated QNN SDK version from 2.28.0 to 2.37.0 and added new library
pushes.
Differential Revision: D83318725

Pull Request resolved: pytorch#14622
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: pytorch#14647 by
@SS-JIA
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/332/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/332/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/332/orig
Differential Revision:
[D83437827](https://our.internmc.facebook.com/intern/diff/D83437827/)
@diff-train-skip-merge

Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: pytorch#14648 by
@SS-JIA
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/333/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/333/head
Merge bot PR base:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/332/orig
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/333/orig
Differential Revision:
[D83437826](https://our.internmc.facebook.com/intern/diff/D83437826/)
@diff-train-skip-merge

Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: pytorch#14649 by
@SS-JIA
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/334/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/334/head
Merge bot PR base:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/333/orig
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/SS-JIA/334/orig
Differential Revision:
[D83437828](https://our.internmc.facebook.com/intern/diff/D83437828/)
@diff-train-skip-merge

Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>
Differential Revision: D83448512

Pull Request resolved: pytorch#14663
Differential Revision: D82702247

Pull Request resolved: pytorch#14399
Differential Revision: D83517548

Pull Request resolved: pytorch#14676
Differential Revision: D82906134
Differential Revision: D83536133

Pull Request resolved: pytorch#14680
This adds a new "torchao" backend for pre-quantized checkpoints.

Pre-quantized checkpoints can be lowered to a backend (e.g., XNNPACK) by
specifying "-X" in etLLM.

With this PR, we can now lower pre-quantized checkpoints to torchao
lowbit kernels by specifying "--torchao_kernels" in the export script
instead of "-X". Note this will run both linear and tied_embedding
kernels with torchao_kernels.

If you want to run linear with XNNPACK, but only run tied embedding with
torchao, use "--torchao_kernels_tied_embedding" and "-X".

New CI tests are added for the flow.
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: pytorch#14686 by
@larryliu0820
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/larryliu0820/76/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/larryliu0820/76/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/larryliu0820/76/orig

@diff-train-skip-merge

Co-authored-by: Mengwei Liu <larryliu@meta.com>
…ibutors and exclude draft PRs (pytorch#14660)

1) Added an extensive list of internal and partner contributors to exclude
2) Better handling to exclude draft PRs. Initially, 'draft:false' was added as a parameter to 'github.rest.pulls.list', but it is an unsupported parameter. This method ignored excluding draft PRs. New method excludes using a '!pr.draft' condition.
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* pytorch#14690
* pytorch#14689
* pytorch#14688
* __->__ pytorch#14700
* pytorch#14686

Summary:
This is a manual cherry pick of pytorch#14687 

This function introduce aoti_torch_create_tensor_from_blob_v2, a
function that create tensor from data blob and custom stride and size.

Worth to notice that unlike aoti_torch_empty_strided, the tensor created
by aoti_torch_create_tensor_from_blob_v2 will not have the control of
the memory blob. Therefore when we delete it, the memory will not be
freed.

Reviewed By:

Differential Revision:
pytorchbot and others added 29 commits October 10, 2025 16:29
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: pytorch#15004 by
@Gasoonjia
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/gasoonjia/54/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/gasoonjia/54/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/gasoonjia/54/orig
Differential Revision:
[D84367515](https://our.internmc.facebook.com/intern/diff/D84367515/)
@diff-train-skip-merge

Co-authored-by: gasoonjia <gasoonjia@icloud.com>
This pull request introduces changes to the CUDA workflow, model
artifact handling, and multimodal runner logic. The main changes include
restructuring the GitHub Actions workflow to separate model export,
benchmarking, and end-to-end testing for the Voxtral CUDA pipeline,
improving artifact management and reproducibility. Additionally, the
multimodal runner now supports automatic conversion of audio tensors to
bfloat16, ensuring compatibility with expected input types. There are
also enhancements to caching and symbol registration in the CUDA
backend, and build system updates to support linking the CUDA backend.

**Workflow and Artifact Management Improvements:**

* Refactored `.github/workflows/cuda.yml` to split the Voxtral CUDA
pipeline into three jobs: `export-voxtral-cuda-artifact` (exports and
stores model artifacts), `benchmark-voxtral-cuda` (benchmarks using
exported artifacts), and `test-voxtral-cuda-e2e` (runs full end-to-end
tests with artifact download and audio input). Improved artifact
handling, reproducibility, and added explicit checks for required files.
[[1]](diffhunk://#diff-29abea04e0613c2569973e5c8e3c89e04846d408c855eeb1f3efcfae7cfa6f89L90-R91)
[[2]](diffhunk://#diff-29abea04e0613c2569973e5c8e3c89e04846d408c855eeb1f3efcfae7cfa6f89R107)
[[3]](diffhunk://#diff-29abea04e0613c2569973e5c8e3c89e04846d408c855eeb1f3efcfae7cfa6f89R134-R185)
[[4]](diffhunk://#diff-29abea04e0613c2569973e5c8e3c89e04846d408c855eeb1f3efcfae7cfa6f89R196-R267)
[[5]](diffhunk://#diff-29abea04e0613c2569973e5c8e3c89e04846d408c855eeb1f3efcfae7cfa6f89R122)

**Multimodal Runner Logic:**

* Added automatic conversion of audio tensors to bfloat16 in
`MultimodalPrefiller::prefill` and implemented a helper function
`convert_to_bfloat16` in `util.h` to support this. This ensures that
audio inputs match the expected dtype for the encoder, improving
robustness for multimodal inference.
[[1]](diffhunk://#diff-ad4fcb32ffc5f1f7b4f87b5ee58927cb948a8c0976295befd10e3de445913ae4L96-R136)
[[2]](diffhunk://#diff-db4801445eaa3bb4f1370fe41d3a00ae2e3ef354a23ad4d5ace141ecc3c6f413R144-R180)

**CUDA Backend and Caching Enhancements:**

* Improved caching logic in `common_shims.cpp` for tensor strides and
sizes by validating cached values and updating them when necessary. This
prevents stale cache issues and ensures correct tensor metadata.
[[1]](diffhunk://#diff-1e7c9d572d434c9a85c9d466e7f406877bc974a373c370fe7ddb3fe32852c1f2R54-R81)
[[2]](diffhunk://#diff-1e7c9d572d434c9a85c9d466e7f406877bc974a373c370fe7ddb3fe32852c1f2R104-R130)
* Added dynamic symbol re-registration in `CudaBackend` to handle
multiple shared objects in the same process, ensuring correct execution
when switching between models.
* Removed redundant logging statements in CUDA backend for cleaner
output.
[[1]](diffhunk://#diff-a4b17eccf1aa933837671c5184e02bc815d934a362344bb2b17b789cdfaa5375L226)
[[2]](diffhunk://#diff-a4b17eccf1aa933837671c5184e02bc815d934a362344bb2b17b789cdfaa5375L256)

**Build System Updates:**

* Updated `CMakeLists.txt` and `executorch-config.cmake` to include and
link the CUDA backend (`aoti_cuda`) when building Voxtral and other
components, improving build flexibility and CUDA support.
[[1]](diffhunk://#diff-606feb24310595f592d98d021a2c90618346977d94decb80b35b7e26ed8ccc1eR89-R95)
[[2]](diffhunk://#diff-6a78a155992483ff6f35d595ff6cef63b477d1c853f6482e77acae6ef443f0e4R56)

**Debugging and Tuning Options:**

* Added support for enabling debug compilation in `cuda_backend.py` via
the `DEBUG` environment variable, allowing easier troubleshooting and
development.
…ementwiseOps to the common section.

Differential Revision: D83793229

Pull Request resolved: pytorch#14780
Differential Revision: D84357937

Pull Request resolved: pytorch#14890
Differential Revision: D84187909

Pull Request resolved: pytorch#14958
### Summary
- refactor a bit & add more test cases


### Test plan
```bash
python backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedOperator.test_qnn_backend_index_put -b build-android -s $SN -m SM8750
python backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedOperator.test_qnn_backend_index_put_suite -b build-android -s $SN -m SM8750
```
Summary:

Updating the TOSA, U55 & U85 tests to remove xfails. These ops are supported now and updating tests to not expect failure.

Differential Revision: D84262200
Differential Revision: D81703253

Pull Request resolved: pytorch#15011
Differential Revision: D84279595

Pull Request resolved: pytorch#14956
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: pytorch#15016 by
@Gasoonjia
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/gasoonjia/56/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/gasoonjia/56/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/gasoonjia/56/orig
Differential Revision:
[D84280496](https://our.internmc.facebook.com/intern/diff/D84280496/)
@diff-train-skip-merge

Co-authored-by: gasoonjia <gasoonjia@icloud.com>
…s._clone_dim_order.default (pytorch#14535)

### Summary
- Adds support for conversion and quantization of
`dim_order_ops._clone_dim_order.default` operator and fixes problems
with some variations of `nn.Dropout`.
- Adds more robust test cases for clone operators.

### Test plan
All changes should be covered by unit tests.

cc @robert-kalmar @JakeStevens @digantdesai
Summary: As stated in the title

Reviewed By: bingcy

Differential Revision: D83859440

---------

Co-authored-by: Jacob Szwejbka <jakeszwe@meta.com>
Updated link to Core ATen operator set documentation.

### Summary
[PLEASE REMOVE] See [CONTRIBUTING.md's Pull
Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests)
for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a `Fixes
#<issue-id>` line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be
the upcoming release notes, please add a "Release notes: <area>" label.
For a list of available release notes labels, check out
[CONTRIBUTING.md's Pull
Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests).

### Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual
commands you used and note down tests that you have written if
applicable.
Summary: Wire up the unary sine operator in xnnpack for fp32 and fp16.

Differential Revision: D83623086
Summary: Fix up flags.

Differential Revision: D84296634
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: pytorch#14666 by
@lucylq
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/lucylq/114/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/lucylq/114/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/lucylq/114/orig
Differential Revision:
[D83504588](https://our.internmc.facebook.com/intern/diff/D83504588/)
@diff-train-skip-merge

Co-authored-by: lucylq <lfq@meta.com>
Summary: .

Differential Revision: D84516559
Summary: TensorPtr view created with TensorPtr should keep it alive to
match ATen behavior.

Differential Revision: D84512176
Differential Revision:
[D83777195](https://our.internmc.facebook.com/intern/diff/D83777195/)

[ghstack-poisoned]

### Summary
[PLEASE REMOVE] See [CONTRIBUTING.md's Pull
Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests)
for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a `Fixes
#<issue-id>` line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be
the upcoming release notes, please add a "Release notes: <area>" label.
For a list of available release notes labels, check out
[CONTRIBUTING.md's Pull
Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests).

### Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual
commands you used and note down tests that you have written if
applicable.
pytorch#15066)

… Clamp/Clamp (pytorch#14415)"

This reverts commit a5d7e5c.

Broke internal builds @SS-JIA is trying to fix this in
pytorch#15058 will leave relanding to
him

### Summary
[PLEASE REMOVE] See [CONTRIBUTING.md's Pull
Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests)
for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a `Fixes
#<issue-id>` line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be
the upcoming release notes, please add a "Release notes: <area>" label.
For a list of available release notes labels, check out
[CONTRIBUTING.md's Pull
Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests).

### Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual
commands you used and note down tests that you have written if
applicable.
@anzr299 anzr299 merged commit a63a894 into anzr299:an/quantizer_nncf_pt2e_support Oct 15, 2025
23 of 319 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.