Skip to content

[ET-VK] Prevent decomposition of activation ops with native shaders#17361

Open
abdelaziz-mahdy wants to merge 6 commits intopytorch:mainfrom
abdelaziz-mahdy:vulkan-preserve-activation-ops
Open

[ET-VK] Prevent decomposition of activation ops with native shaders#17361
abdelaziz-mahdy wants to merge 6 commits intopytorch:mainfrom
abdelaziz-mahdy:vulkan-preserve-activation-ops

Conversation

@abdelaziz-mahdy
Copy link
Contributor

@abdelaziz-mahdy abdelaziz-mahdy commented Feb 11, 2026

Summary

Add hardswish, hardsigmoid, and hardshrink to the Vulkan partitioner's ops_not_to_decompose list, and register hardswish and hardsigmoid in op_registry.py.

These activation ops have native GLSL shader implementations in the Vulkan backend (activations.h / UnaryOp.cpp) but were being decomposed by PyTorch's default decomposition table into primitive ops (mul/add/clamp/div with constant tensors) before the Vulkan partitioner could claim them.

On PowerVR GPUs (e.g. Pixel 10 Pro), the decomposed paths produce NaN/Inf because the constant scalar tensors (3 and 6 in hardswish(x) = x * clamp(x+3, 0, 6) / 6) are not loaded correctly through the dim_order_ops._to_dim_order_copy buffer-to-texture conversion path.

Root Cause

  1. aten.hardswish.default and aten.hardsigmoid.default are in PyTorch's default decomposition table
  2. vulkan_partitioner.py's ops_not_to_decompose only contained upsample_nearest2d.vec
  3. When using to_edge_transform_and_lower(), the partitioner's ops_to_not_decompose() method is called — but since these ops weren't listed, they got decomposed before the partitioner could see them
  4. The native GLSL shaders (DEFINE_ACTIVATION_FN(hardswish), VK_REGISTER_OP(aten.hardswish.default, hardswish)) were never used

Changes

  • backends/vulkan/op_registry.py: Register hardsigmoid and hardswish in the unary ops list (they had C++ implementations but were missing from the Python registry)
  • backends/vulkan/partitioner/vulkan_partitioner.py: Add 3 activation ops (hardswish, hardsigmoid, hardshrink) to ops_not_to_decompose so to_edge_transform_and_lower() preserves them

Note: silu was intentionally excluded — it has no native Vulkan shader or C++ registration. Its decomposed path (sigmoid + mul) works correctly since both ops have native implementations.

Test Plan

Tested on Pixel 10 Pro (PowerVR D-Series DXT-48-1536 MC1, Android 16):

  • Isolated hardswish-only model: perfect match with XNNPACK reference (maxDiff=0.000000)
  • Isolated hardsigmoid model: works without NaN
  • Full MobileNet V3 Small (FP32): NaN eliminated (was 1000/1000 NaN → now 0/1000)
  • Full MobileNet V3 Small (FP16): NaN eliminated (0/1000)

Note: MobileNetV3 uses hardswish extensively in feature blocks and hardsigmoid in Squeeze-and-Excite blocks, making both critical for this model family.

Fixes #17299

Copilot AI review requested due to automatic review settings February 11, 2026 00:06
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17361

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures

As of commit 2159c3e with merge base 0d9799f (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 11, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the ExecuTorch Vulkan backend’s lowering path to preserve certain activation ops from PyTorch’s default decompositions so the Vulkan partitioner can claim them and use native unary implementations.

Changes:

  • Extend the Vulkan partitioner’s ops_not_to_decompose list to include several activation ops so they survive to_edge_transform_and_lower().
  • Register aten.hardsigmoid.default and aten.hardswish.default as supported unary ops in the Vulkan Python op registry.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
backends/vulkan/partitioner/vulkan_partitioner.py Adds activation ops to the “do not decompose” list so Vulkan can see/claim them before decomposition happens.
backends/vulkan/op_registry.py Adds hardsigmoid and hardswish to the unary-op registration list for Vulkan partitioning support.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

torch.ops.aten.hardsigmoid.default,
torch.ops.aten.hardswish.default,
torch.ops.aten.hardshrink.default,
torch.ops.aten.silu.default,
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch.ops.aten.silu.default is being added to ops_not_to_decompose, but the Vulkan backend doesn’t appear to have a native implementation/registration for SiLU (no VK_REGISTER_OP(aten.silu.default, ...) in backends/vulkan/runtime/graph/ops/impl, no GLSL helper, and it’s not registered in backends/vulkan/op_registry.py). Preserving it from decomposition may therefore prevent the graph from lowering to Vulkan via the decomposed mul+sigmoid path and could leave an unsupported op in the edge graph.

Suggestion: either (a) add and register a Vulkan SiLU implementation end-to-end (C++ + GLSL + op_registry.py), or (b) remove SiLU from ops_not_to_decompose and keep this list limited to ops that Vulkan can actually consume natively.

Suggested change
torch.ops.aten.silu.default,

Copilot uses AI. Check for mistakes.
Add hardswish, hardsigmoid, hardshrink, and silu to the Vulkan
partitioner's ops_not_to_decompose list, and register hardswish and
hardsigmoid in the op_registry.

These ops have native GLSL shader implementations in the Vulkan backend
but were being decomposed by PyTorch's default decomposition table into
primitive ops (mul/add/clamp/div with constant tensors) before the
partitioner could claim them. The decomposed paths produce NaN/Inf on
PowerVR GPUs due to constant tensor loading issues in the decomposed
graph.

With this fix, to_edge_transform_and_lower() automatically preserves
these ops via the partitioner's ops_to_not_decompose() method, allowing
the native Vulkan shaders to handle them directly.

Tested on Pixel 10 Pro (PowerVR D-Series DXT-48-1536):
- MobileNet V3 Small: NaN eliminated (was 1000/1000 NaN, now 0/1000)
- Isolated hardswish test: perfect match with XNNPACK reference

Fixes pytorch#17299
@abdelaziz-mahdy abdelaziz-mahdy force-pushed the vulkan-preserve-activation-ops branch from 193765a to 0b85421 Compare February 11, 2026 00:38
Copilot AI review requested due to automatic review settings February 11, 2026 21:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

@SS-JIA SS-JIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdelaziz-mahdy change LGTM, just need to fix some changes introduced by merge conflicts.

Restore register_pow_tensor_scalar which was accidentally replaced
with a duplicate register_unary_op during merge conflict resolution.
Copilot AI review requested due to automatic review settings February 17, 2026 15:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@nil-is-all
Copy link
Contributor

Hi @abdelaziz-mahdy, could you address the merge conflicts?

@abdelaziz-mahdy
Copy link
Contributor Author

Hi @abdelaziz-mahdy, could you address the merge conflicts?

pulled from main .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vulkan backend produces all-zero outputs on PowerVR GPU (Pixel 10 Pro)

4 participants