[ET-VK] Implement linear_qcs4w by pytorchbot · Pull Request #10772 · pytorch/executorch

pytorchbot · 2025-05-08T06:36:18Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #10588 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/222/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/222/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/220/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/222/orig
@diff-train-skip-merge

Pull Request resolved: #10525 ## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Added some refactors to `test_vulkan_delegate` Python test to improve code organization. Introduce the `linear_qcsnw` nomenclature: * q - quantized * c - per-channel / channelswise * s - symmetric * n - number of bits (qcs4w for 4-bit quant, qcs8w for 8-bit quant) * w - weight quantized Added custom op for `linear_qcs4w` for 4-bit weight quantized linear and add the ability for the quantized op fusion pass to produce this op. Slight renaming/refactoring of quantization config retrieval functions in the `VulkanQuantizer` to improve clarity and API flexibility. ghstack-source-id: 282688199 @exported-using-ghexport Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/)

Pull Request resolved: #10588 ## Context Title says it all! ## Changes Extended the implementation of `linear_qcsnw` to support packed 4-bit weight tensors. ghstack-source-id: 282707610 @exported-using-ghexport Differential Revision: [D73941991](https://our.internmc.facebook.com/intern/diff/D73941991/)

pytorch-bot · 2025-05-08T06:36:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10772

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

SS-JIA added 2 commits May 7, 2025 17:41

pytorchbot requested a review from SS-JIA as a code owner May 8, 2025 06:36

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 8, 2025

Base automatically changed from gh/SS-JIA/220/orig to main May 8, 2025 06:39

SS-JIA requested review from iseeyuan, jackzhxng, kimishpatel, larryliu0820 and swolchok as code owners May 8, 2025 06:39

SS-JIA approved these changes May 8, 2025

View reviewed changes

SS-JIA merged commit 5e8295e into main May 8, 2025
80 of 81 checks passed

SS-JIA deleted the gh/SS-JIA/222/orig branch May 8, 2025 06:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK] Implement linear_qcs4w#10772

[ET-VK] Implement linear_qcs4w#10772
SS-JIA merged 2 commits intomainfrom
gh/SS-JIA/222/orig

pytorchbot commented May 8, 2025

Uh oh!

pytorch-bot bot commented May 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pytorchbot commented May 8, 2025

Uh oh!

pytorch-bot bot commented May 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10772

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants