[ET-VK] Introduce generic export pass for fusing Q/DQ nodes by SS-JIA · Pull Request #10525 · pytorch/executorch

SS-JIA · 2025-04-28T19:41:17Z

Stack from ghstack (oldest at bottom):

Context

When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as aten.linear.default to produce nodes corresponding to quantized operators (e.g. weight_int8pack_mm) in order for quantized operator implementations to be called at runtime.

Currently, the op fusion is done by the fuse_dequant_linear.py pass, however, this only handles one specific fusion pattern to generate a weight_int8pack_mm operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns.

Changes

Introduce the FuseQuantizedOpsTransform() pass. I elected to introduce a new pass under the backends/vulkan/_passes directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK.

Remove the existing FuseDequantLinearPass()

Switch to using the FuseQuantizedOpsTransform pass instead of the old FuseDequantLinear pass.

Add test_vulkan_passes Python test to test export passes.

Some small refactors to test_vulkan_delegate Python test to improve code organizations.

Differential Revision: D73794042

## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Some small refactors to `test_vulkan_delegate` Python test to improve code organizations. Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/) [ghstack-poisoned]

## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Some small refactors to `test_vulkan_delegate` Python test to improve code organizations. Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/) ghstack-source-id: 280746102 Pull Request resolved: #10525

pytorch-bot · 2025-04-28T19:41:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10525

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit d817493 with merge base 6932baf ():

NEW FAILURE - The following job has failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

.github/workflows/build-presets.yml (gh) (similar failure)
pull / test-llava-runner-linux / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-04-28T19:41:29Z

This pull request was exported from Phabricator. Differential Revision: D73794042

github-actions · 2025-04-28T19:42:08Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

kimishpatel · 2025-04-28T23:30:33Z

Why are you relyin on weight_int8pack_mm at all? That is not a public api op as it precedes with _. If it is removed your passes here will fail. What you really want is just a fused pattern recognition. Can you directly not recognize that? or you need to serialize some "fake" op that you have lowering for at runtime?

## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Some small refactors to `test_vulkan_delegate` Python test to improve code organizations. Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/) [ghstack-poisoned]

facebook-github-bot · 2025-04-30T18:39:52Z

This pull request was exported from Phabricator. Differential Revision: D73794042

## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Some small refactors to `test_vulkan_delegate` Python test to improve code organizations. Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/) [ghstack-poisoned]

Pull Request resolved: #10525 ## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Added some refactors to `test_vulkan_delegate` Python test to improve code organization. Introduce the `linear_qcsnw` nomenclature: * q - quantized * c - per-channel / channelswise * s - symmetric * n - number of bits (qcs4w for 4-bit quant, qcs8w for 8-bit quant) * w - weight quantized Added custom op for `linear_qcs4w` for 4-bit weight quantized linear and add the ability for the quantized op fusion pass to produce this op. Slight renaming/refactoring of quantization config retrieval functions in the `VulkanQuantizer` to improve clarity and API flexibility. ghstack-source-id: 281448174 @exported-using-ghexport Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/)

facebook-github-bot · 2025-05-01T16:47:35Z

This pull request was exported from Phabricator. Differential Revision: D73794042

## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Some small refactors to `test_vulkan_delegate` Python test to improve code organizations. Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/) [ghstack-poisoned]

facebook-github-bot · 2025-05-05T15:02:44Z

This pull request was exported from Phabricator. Differential Revision: D73794042

## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Some small refactors to `test_vulkan_delegate` Python test to improve code organizations. Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/) [ghstack-poisoned]

facebook-github-bot · 2025-05-08T00:42:15Z

This pull request was exported from Phabricator. Differential Revision: D73794042

SS-JIA · 2025-05-08T06:44:30Z

@kimishpatel apologies, only just saw your comment. I'm using _weight_int8pack_mm purely for convenience. I didn't realize that the preceding underscore implies that it can be removed without notice.

I suppose the proper thing to do is to register the implementation under an equivalent custom op (i.e. etvk.linear_qcs8w). Will get around to this in a follow up diff.

Btw, another factor for why _weight_int8pack_mm is used is because the ATen op is used as a reference when checking the correctness of the Vulkan implementation.

SS-JIA requested a review from kimishpatel as a code owner April 28, 2025 19:41

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2025

facebook-github-bot added the fb-exported label Apr 28, 2025

This was referenced Apr 30, 2025

[ET-VK][ez] Use standard quant naming scheme for quantized ops #10587

Merged

[ET-VK] Implement linear_qcs4w #10588

Merged

SS-JIA requested review from iseeyuan, jackzhxng, larryliu0820 and swolchok as code owners April 30, 2025 18:39

trviv approved these changes May 5, 2025

View reviewed changes

facebook-github-bot merged commit 4ecf3ad into gh/SS-JIA/220/base May 8, 2025
82 of 87 checks passed

facebook-github-bot deleted the gh/SS-JIA/220/head branch May 8, 2025 06:35

facebook-github-bot temporarily deployed to cherry-pick-bot May 8, 2025 06:35 — with GitHub Actions Inactive

pytorchbot mentioned this pull request May 8, 2025

[ET-VK] Introduce generic export pass for fusing Q/DQ nodes #10771

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK] Introduce generic export pass for fusing Q/DQ nodes#10525

[ET-VK] Introduce generic export pass for fusing Q/DQ nodes#10525
facebook-github-bot merged 5 commits intogh/SS-JIA/220/basefrom
gh/SS-JIA/220/head

SS-JIA commented Apr 28, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 28, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 28, 2025

Uh oh!

github-actions bot commented Apr 28, 2025

Uh oh!

kimishpatel commented Apr 28, 2025

Uh oh!

facebook-github-bot commented Apr 30, 2025

Uh oh!

facebook-github-bot commented May 1, 2025

Uh oh!

facebook-github-bot commented May 5, 2025

Uh oh!

facebook-github-bot commented May 8, 2025

Uh oh!

Uh oh!

SS-JIA commented May 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

SS-JIA commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Uh oh!

pytorch-bot bot commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10525

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

facebook-github-bot commented Apr 28, 2025

Uh oh!

github-actions bot commented Apr 28, 2025

This PR needs a release notes: label

Uh oh!

kimishpatel commented Apr 28, 2025

Uh oh!

facebook-github-bot commented Apr 30, 2025

Uh oh!

facebook-github-bot commented May 1, 2025

Uh oh!

facebook-github-bot commented May 5, 2025

Uh oh!

facebook-github-bot commented May 8, 2025

Uh oh!

Uh oh!

SS-JIA commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SS-JIA commented Apr 28, 2025 •

edited

Loading

pytorch-bot bot commented Apr 28, 2025 •

edited

Loading

This PR needs a `release notes:` label

SS-JIA commented May 8, 2025 •

edited

Loading