Openvino llama support#8
Merged
anzr299 merged 404 commits intoanzr299:an/quantizer_nncf_pt2e_supportfrom Oct 15, 2025
Merged
Conversation
Differential Revision: D83094606 Pull Request resolved: pytorch#14549
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#14556 by @zonglinpeng ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/zonglinpeng/4/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/zonglinpeng/4/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/zonglinpeng/4/orig @diff-train-skip-merge Co-authored-by: Zonglin Peng <zonglinpeng@fb.com>
Follow on from pytorch#14401 Enables dumping of Vela's debug database to a specified directory . This gives us generic information on operators in our model, and can be combined with the trace output to provide more detailed profiling analysis. Co-authored-by: Zingo Andersen <zingo.andersen@arm.com>
…ch#14503) This is a re-upload of pytorch#14435, with variables moved inside #ifdef to not get unused variables when building without devtools. --------------------------------------- Add some cmake to only do this if executorch is built with bundleio. codegen/tools subdirectory include needs to be moved in top-level CmakeLists.txt to have access to the bundled_program target. Follow-up patch to enable the fix in arm backend. Signed-off-by: Erik Lundell <erik.lundell@arm.com>
The passes in the Arm backend have an attribute called `_passes_required_after` which is a set specifying which passes must run after the pass itself. This patch sets these dependencies for all the passes. Signed-off-by: Martin Lindstroem <Martin.Lindstroem@arm.com>
Add TOSA backend dialect op for TOSA RESIZE. The dialect op replaces upsample_nearest2d and upsample_bilinear_2d in RewriteUpsamplePass. Also the Nodevisitors of upsample_nearest2d and upsample_bilinear2d are replaced by one NodeVisitor for the resize backend dialect op. Signed-off-by: Oscar Andersson <oscar.andersson@arm.com>
…essary permissions (pytorch#14637) Created a Fine-grained Personalized-Access Token (PAT), with the following permissions: Read & write for Projects under Org permissions Under repository permissions: Read access to Dependabot alerts, code, commit statuses, discussions, issues, merge queues, metadata, and pull requests Read and Write access to actions and workflows Saved it as a secret named ET_EXT_CONTRIB
Refactor the backend test suites to use pytest. This includes the following changes: * Define pytest markers for each backend and test flow (recipe). This allows for easy filter, such as by running `pytest some/path/... -m backend_xnnpack`. * Use a parameterized pytest fixture to handle test generation / expansion for each test flow. * Switch to using the pytest-json-report plugin for reporting. Update the markdown generation script to take json. * Shim the existing unittest-based logic for op tests. * I've updated add.py to show what they should look like long-term. I've also just updated the model tests, since there aren't as many. I'll update the remaining op tests later in this stack, though this is purely to clean up the code. The shimming logic makes them work properly with pytest in this PR. * Update the backend test CI to use pytest. This also has the benefit of making the jobs much faster by leveraging parallel execution. I've also added a repro command to the markdown summary.
### Summary Update tokenizers to pick up meta-pytorch/tokenizers@b007644, which fixes an issue when referencing tokenizers via CMake find_package on Windows. ### Test plan CI
Differential Revision: D83441105 Pull Request resolved: pytorch#14651
This removes a reference to the binary tree in the install tree. Signed-off-by: Adrian Lundell <adrian.lundell@arm.com>
Updated QNN SDK version from 2.28.0 to 2.37.0 and added new library pushes.
Differential Revision: D83318725 Pull Request resolved: pytorch#14622
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#14647 by @SS-JIA ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/332/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/332/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/332/orig Differential Revision: [D83437827](https://our.internmc.facebook.com/intern/diff/D83437827/) @diff-train-skip-merge Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#14648 by @SS-JIA ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/333/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/333/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/332/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/333/orig Differential Revision: [D83437826](https://our.internmc.facebook.com/intern/diff/D83437826/) @diff-train-skip-merge Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#14649 by @SS-JIA ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/334/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/334/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/333/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/334/orig Differential Revision: [D83437828](https://our.internmc.facebook.com/intern/diff/D83437828/) @diff-train-skip-merge Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>
Differential Revision: D83448512 Pull Request resolved: pytorch#14663
Differential Revision: D82702247 Pull Request resolved: pytorch#14399
Differential Revision: D83517548 Pull Request resolved: pytorch#14676
Differential Revision: D82906134
Differential Revision: D83536133 Pull Request resolved: pytorch#14680
…nager (pytorch#14685) As title. cc: @haowhsu-quic , @winskuo-quic
This adds a new "torchao" backend for pre-quantized checkpoints. Pre-quantized checkpoints can be lowered to a backend (e.g., XNNPACK) by specifying "-X" in etLLM. With this PR, we can now lower pre-quantized checkpoints to torchao lowbit kernels by specifying "--torchao_kernels" in the export script instead of "-X". Note this will run both linear and tied_embedding kernels with torchao_kernels. If you want to run linear with XNNPACK, but only run tied embedding with torchao, use "--torchao_kernels_tied_embedding" and "-X". New CI tests are added for the flow.
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#14686 by @larryliu0820 ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/larryliu0820/76/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/larryliu0820/76/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/larryliu0820/76/orig @diff-train-skip-merge Co-authored-by: Mengwei Liu <larryliu@meta.com>
…ibutors and exclude draft PRs (pytorch#14660) 1) Added an extensive list of internal and partner contributors to exclude 2) Better handling to exclude draft PRs. Initially, 'draft:false' was added as a parameter to 'github.rest.pulls.list', but it is an unsupported parameter. This method ignored excluding draft PRs. New method excludes using a '!pr.draft' condition.
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * pytorch#14690 * pytorch#14689 * pytorch#14688 * __->__ pytorch#14700 * pytorch#14686 Summary: This is a manual cherry pick of pytorch#14687 This function introduce aoti_torch_create_tensor_from_blob_v2, a function that create tensor from data blob and custom stride and size. Worth to notice that unlike aoti_torch_empty_strided, the tensor created by aoti_torch_create_tensor_from_blob_v2 will not have the control of the memory blob. Therefore when we delete it, the memory will not be freed. Reviewed By: Differential Revision:
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#15004 by @Gasoonjia ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/gasoonjia/54/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/gasoonjia/54/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/gasoonjia/54/orig Differential Revision: [D84367515](https://our.internmc.facebook.com/intern/diff/D84367515/) @diff-train-skip-merge Co-authored-by: gasoonjia <gasoonjia@icloud.com>
This pull request introduces changes to the CUDA workflow, model artifact handling, and multimodal runner logic. The main changes include restructuring the GitHub Actions workflow to separate model export, benchmarking, and end-to-end testing for the Voxtral CUDA pipeline, improving artifact management and reproducibility. Additionally, the multimodal runner now supports automatic conversion of audio tensors to bfloat16, ensuring compatibility with expected input types. There are also enhancements to caching and symbol registration in the CUDA backend, and build system updates to support linking the CUDA backend. **Workflow and Artifact Management Improvements:** * Refactored `.github/workflows/cuda.yml` to split the Voxtral CUDA pipeline into three jobs: `export-voxtral-cuda-artifact` (exports and stores model artifacts), `benchmark-voxtral-cuda` (benchmarks using exported artifacts), and `test-voxtral-cuda-e2e` (runs full end-to-end tests with artifact download and audio input). Improved artifact handling, reproducibility, and added explicit checks for required files. [[1]](diffhunk://#diff-29abea04e0613c2569973e5c8e3c89e04846d408c855eeb1f3efcfae7cfa6f89L90-R91) [[2]](diffhunk://#diff-29abea04e0613c2569973e5c8e3c89e04846d408c855eeb1f3efcfae7cfa6f89R107) [[3]](diffhunk://#diff-29abea04e0613c2569973e5c8e3c89e04846d408c855eeb1f3efcfae7cfa6f89R134-R185) [[4]](diffhunk://#diff-29abea04e0613c2569973e5c8e3c89e04846d408c855eeb1f3efcfae7cfa6f89R196-R267) [[5]](diffhunk://#diff-29abea04e0613c2569973e5c8e3c89e04846d408c855eeb1f3efcfae7cfa6f89R122) **Multimodal Runner Logic:** * Added automatic conversion of audio tensors to bfloat16 in `MultimodalPrefiller::prefill` and implemented a helper function `convert_to_bfloat16` in `util.h` to support this. This ensures that audio inputs match the expected dtype for the encoder, improving robustness for multimodal inference. [[1]](diffhunk://#diff-ad4fcb32ffc5f1f7b4f87b5ee58927cb948a8c0976295befd10e3de445913ae4L96-R136) [[2]](diffhunk://#diff-db4801445eaa3bb4f1370fe41d3a00ae2e3ef354a23ad4d5ace141ecc3c6f413R144-R180) **CUDA Backend and Caching Enhancements:** * Improved caching logic in `common_shims.cpp` for tensor strides and sizes by validating cached values and updating them when necessary. This prevents stale cache issues and ensures correct tensor metadata. [[1]](diffhunk://#diff-1e7c9d572d434c9a85c9d466e7f406877bc974a373c370fe7ddb3fe32852c1f2R54-R81) [[2]](diffhunk://#diff-1e7c9d572d434c9a85c9d466e7f406877bc974a373c370fe7ddb3fe32852c1f2R104-R130) * Added dynamic symbol re-registration in `CudaBackend` to handle multiple shared objects in the same process, ensuring correct execution when switching between models. * Removed redundant logging statements in CUDA backend for cleaner output. [[1]](diffhunk://#diff-a4b17eccf1aa933837671c5184e02bc815d934a362344bb2b17b789cdfaa5375L226) [[2]](diffhunk://#diff-a4b17eccf1aa933837671c5184e02bc815d934a362344bb2b17b789cdfaa5375L256) **Build System Updates:** * Updated `CMakeLists.txt` and `executorch-config.cmake` to include and link the CUDA backend (`aoti_cuda`) when building Voxtral and other components, improving build flexibility and CUDA support. [[1]](diffhunk://#diff-606feb24310595f592d98d021a2c90618346977d94decb80b35b7e26ed8ccc1eR89-R95) [[2]](diffhunk://#diff-6a78a155992483ff6f35d595ff6cef63b477d1c853f6482e77acae6ef443f0e4R56) **Debugging and Tuning Options:** * Added support for enabling debug compilation in `cuda_backend.py` via the `DEBUG` environment variable, allowing easier troubleshooting and development.
…ementwiseOps to the common section. Differential Revision: D83793229 Pull Request resolved: pytorch#14780
Differential Revision: D84357937 Pull Request resolved: pytorch#14890
Differential Revision: D84187909 Pull Request resolved: pytorch#14958
…ch#14993) Signed-off-by: Ryan O'Shea <ryan.oshea3@arm.com>
### Summary - refactor a bit & add more test cases ### Test plan ```bash python backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedOperator.test_qnn_backend_index_put -b build-android -s $SN -m SM8750 python backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedOperator.test_qnn_backend_index_put_suite -b build-android -s $SN -m SM8750 ```
Summary: Updating the TOSA, U55 & U85 tests to remove xfails. These ops are supported now and updating tests to not expect failure. Differential Revision: D84262200
Differential Revision: D81703253 Pull Request resolved: pytorch#15011
Differential Revision: D84279595 Pull Request resolved: pytorch#14956
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#15016 by @Gasoonjia ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/gasoonjia/56/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/gasoonjia/56/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/gasoonjia/56/orig Differential Revision: [D84280496](https://our.internmc.facebook.com/intern/diff/D84280496/) @diff-train-skip-merge Co-authored-by: gasoonjia <gasoonjia@icloud.com>
…s._clone_dim_order.default (pytorch#14535) ### Summary - Adds support for conversion and quantization of `dim_order_ops._clone_dim_order.default` operator and fixes problems with some variations of `nn.Dropout`. - Adds more robust test cases for clone operators. ### Test plan All changes should be covered by unit tests. cc @robert-kalmar @JakeStevens @digantdesai
fix unexpanded VGF term use.
Summary: As stated in the title Reviewed By: bingcy Differential Revision: D83859440 --------- Co-authored-by: Jacob Szwejbka <jakeszwe@meta.com>
Updated link to Core ATen operator set documentation. ### Summary [PLEASE REMOVE] See [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests) for ExecuTorch PR guidelines. [PLEASE REMOVE] If this PR closes an issue, please add a `Fixes #<issue-id>` line. [PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: <area>" label. For a list of available release notes labels, check out [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests). ### Test plan [PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.
Summary: Wire up the unary sine operator in xnnpack for fp32 and fp16. Differential Revision: D83623086
Summary: Fix up flags. Differential Revision: D84296634
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#14666 by @lucylq ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/lucylq/114/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/lucylq/114/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/lucylq/114/orig Differential Revision: [D83504588](https://our.internmc.facebook.com/intern/diff/D83504588/) @diff-train-skip-merge Co-authored-by: lucylq <lfq@meta.com>
Summary: . Differential Revision: D84516559
Summary: Copied assets from https://github.com/dbort/executorch-logos/
Summary: TensorPtr view created with TensorPtr should keep it alive to match ATen behavior. Differential Revision: D84512176
Differential Revision: [D83777195](https://our.internmc.facebook.com/intern/diff/D83777195/) [ghstack-poisoned] ### Summary [PLEASE REMOVE] See [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests) for ExecuTorch PR guidelines. [PLEASE REMOVE] If this PR closes an issue, please add a `Fixes #<issue-id>` line. [PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: <area>" label. For a list of available release notes labels, check out [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests). ### Test plan [PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.
pytorch#15066) … Clamp/Clamp (pytorch#14415)" This reverts commit a5d7e5c. Broke internal builds @SS-JIA is trying to fix this in pytorch#15058 will leave relanding to him ### Summary [PLEASE REMOVE] See [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests) for ExecuTorch PR guidelines. [PLEASE REMOVE] If this PR closes an issue, please add a `Fixes #<issue-id>` line. [PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: <area>" label. For a list of available release notes labels, check out [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests). ### Test plan [PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.
a63a894
into
anzr299:an/quantizer_nncf_pt2e_support
23 of 319 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.
[PLEASE REMOVE] If this PR closes an issue, please add a
Fixes #<issue-id>line.[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.
Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.