[ET-VK] Prevent decomposition of activation ops with native shaders#17361
[ET-VK] Prevent decomposition of activation ops with native shaders#17361abdelaziz-mahdy wants to merge 6 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17361
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New FailuresAs of commit 2159c3e with merge base 0d9799f ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR updates the ExecuTorch Vulkan backend’s lowering path to preserve certain activation ops from PyTorch’s default decompositions so the Vulkan partitioner can claim them and use native unary implementations.
Changes:
- Extend the Vulkan partitioner’s
ops_not_to_decomposelist to include several activation ops so they surviveto_edge_transform_and_lower(). - Register
aten.hardsigmoid.defaultandaten.hardswish.defaultas supported unary ops in the Vulkan Python op registry.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
backends/vulkan/partitioner/vulkan_partitioner.py |
Adds activation ops to the “do not decompose” list so Vulkan can see/claim them before decomposition happens. |
backends/vulkan/op_registry.py |
Adds hardsigmoid and hardswish to the unary-op registration list for Vulkan partitioning support. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| torch.ops.aten.hardsigmoid.default, | ||
| torch.ops.aten.hardswish.default, | ||
| torch.ops.aten.hardshrink.default, | ||
| torch.ops.aten.silu.default, |
There was a problem hiding this comment.
torch.ops.aten.silu.default is being added to ops_not_to_decompose, but the Vulkan backend doesn’t appear to have a native implementation/registration for SiLU (no VK_REGISTER_OP(aten.silu.default, ...) in backends/vulkan/runtime/graph/ops/impl, no GLSL helper, and it’s not registered in backends/vulkan/op_registry.py). Preserving it from decomposition may therefore prevent the graph from lowering to Vulkan via the decomposed mul+sigmoid path and could leave an unsupported op in the edge graph.
Suggestion: either (a) add and register a Vulkan SiLU implementation end-to-end (C++ + GLSL + op_registry.py), or (b) remove SiLU from ops_not_to_decompose and keep this list limited to ops that Vulkan can actually consume natively.
| torch.ops.aten.silu.default, |
Add hardswish, hardsigmoid, hardshrink, and silu to the Vulkan partitioner's ops_not_to_decompose list, and register hardswish and hardsigmoid in the op_registry. These ops have native GLSL shader implementations in the Vulkan backend but were being decomposed by PyTorch's default decomposition table into primitive ops (mul/add/clamp/div with constant tensors) before the partitioner could claim them. The decomposed paths produce NaN/Inf on PowerVR GPUs due to constant tensor loading issues in the decomposed graph. With this fix, to_edge_transform_and_lower() automatically preserves these ops via the partitioner's ops_to_not_decompose() method, allowing the native Vulkan shaders to handle them directly. Tested on Pixel 10 Pro (PowerVR D-Series DXT-48-1536): - MobileNet V3 Small: NaN eliminated (was 1000/1000 NaN, now 0/1000) - Isolated hardswish test: perfect match with XNNPACK reference Fixes pytorch#17299
193765a to
0b85421
Compare
SS-JIA
left a comment
There was a problem hiding this comment.
@abdelaziz-mahdy change LGTM, just need to fix some changes introduced by merge conflicts.
Restore register_pow_tensor_scalar which was accidentally replaced with a duplicate register_unary_op during merge conflict resolution.
|
Hi @abdelaziz-mahdy, could you address the merge conflicts? |
…kan-preserve-activation-ops
…bdelaziz-mahdy/executorch into vulkan-preserve-activation-ops
pulled from main . |
Summary
Add
hardswish,hardsigmoid, andhardshrinkto the Vulkan partitioner'sops_not_to_decomposelist, and registerhardswishandhardsigmoidinop_registry.py.These activation ops have native GLSL shader implementations in the Vulkan backend (
activations.h/UnaryOp.cpp) but were being decomposed by PyTorch's default decomposition table into primitive ops (mul/add/clamp/div with constant tensors) before the Vulkan partitioner could claim them.On PowerVR GPUs (e.g. Pixel 10 Pro), the decomposed paths produce NaN/Inf because the constant scalar tensors (3 and 6 in
hardswish(x) = x * clamp(x+3, 0, 6) / 6) are not loaded correctly through thedim_order_ops._to_dim_order_copybuffer-to-texture conversion path.Root Cause
aten.hardswish.defaultandaten.hardsigmoid.defaultare in PyTorch's default decomposition tablevulkan_partitioner.py'sops_not_to_decomposeonly containedupsample_nearest2d.vecto_edge_transform_and_lower(), the partitioner'sops_to_not_decompose()method is called — but since these ops weren't listed, they got decomposed before the partitioner could see themDEFINE_ACTIVATION_FN(hardswish),VK_REGISTER_OP(aten.hardswish.default, hardswish)) were never usedChanges
backends/vulkan/op_registry.py: Registerhardsigmoidandhardswishin the unary ops list (they had C++ implementations but were missing from the Python registry)backends/vulkan/partitioner/vulkan_partitioner.py: Add 3 activation ops (hardswish,hardsigmoid,hardshrink) toops_not_to_decomposesoto_edge_transform_and_lower()preserves themTest Plan
Tested on Pixel 10 Pro (PowerVR D-Series DXT-48-1536 MC1, Android 16):
Note: MobileNetV3 uses hardswish extensively in feature blocks and hardsigmoid in Squeeze-and-Excite blocks, making both critical for this model family.
Fixes #17299