Initial PowerVR GPU support for Vulkan backend by abdelaziz-mahdy · Pull Request #17323 · pytorch/executorch

abdelaziz-mahdy · 2026-02-10T03:33:24Z

Summary

Initial work to add PowerVR GPU support to the Vulkan backend. This PR adds device detection, workgroup tuning, and correctness fixes needed for PowerVR's TBDR architecture.

FP32 inference works correctly with these changes. FP16 has a known issue (see below).

Motivation

The Vulkan backend currently has no awareness of PowerVR GPUs. Testing on Pixel 10 Pro (PowerVR D-Series DXT-48-1536 MC1) showed that without explicit PowerVR handling, inference produces incorrect results. This PR adds the foundational support needed to run models on PowerVR hardware.

Changes

Device Detection (`vk_api/Device.h`, `vk_api/Device.cpp`)

Added POWERVR to DeviceType enum
Added PowerVR string detection in PhysicalDevice constructor

Compute Graph (`graph/ComputeGraph.h`)

Added device_is_powervr() convenience method

Convolution Optimization (`graph/ops/impl/Convolution.cpp`)

Added PowerVR-specific workgroup sizes (32 instead of 64) for convolution dispatch via conv2d_local_wg_size callback
PowerVR GPUs have different execution unit configurations than Adreno/Mali

Tiling Fix (`api/Context.cpp`)

Force optimal tiling on PowerVR (linear tiling may produce incorrect results in compute shaders on TBDR architecture)

Buffer Access (`vk_api/Adapter.cpp`)

Enable robustBufferAccess on PowerVR for well-defined out-of-bounds behavior

Test Results (Pixel 10 Pro)

I tested 13 minimal single-operator models to isolate behavior:

Test	Type	Expected	Actual	Status
bare_add (x+1.0)	FP16	2.0	2.0	PASS
bare_half (x*0.5)	FP16	0.5	0.5	PASS
zero_conv (all-zero weights, no bias)	FP16	0.0	0.5	FAIL (+0.5)
identity_conv (1x1, input=5.0)	FP16	5.0	5.5	FAIL (+0.5)
bare_conv2d (3->4, 3x3, no bias)	FP16	0.8403	1.3400	FAIL (+0.5)
bare_linear (16->4)	FP16	1.6	2.0996	FAIL (+0.5)
bare_conv2d_fp32 (3->4, 3x3, no bias)	FP32	0.8403	0.8403	PASS
bare_conv2d_xnnpack (CPU reference)	FP32	0.8403	0.8403	PASS

Key observations:

FP32 works correctly on PowerVR with these changes
FP16 conv/linear ops have a deterministic +0.5 offset (same every run, not random)
Non-conv FP16 ops work (bare_add, bare_half produce correct results)
All conv variants affected — regular, depthwise, pointwise, and linear

MobileNet progressive slices all produce NaN, likely cascading from the +0.5 offset through batch normalization.

Known Issues

FP16 +0.5 Offset (Unresolved)

All FP16 convolution and linear operations on PowerVR show a constant +0.5 added to the output. This is deterministic and affects all conv variants. The root cause has not been identified yet.

What I've ruled out:

Not uninitialized memory: The offset is deterministic (+0.5 every run), not random
Not a convolution algorithm bug: FP32 convolution produces correct results
Not specific to conv size/type: Affects regular, depthwise, pointwise, and linear

The issue likely lives somewhere in the FP16 prepack or texture path, but I haven't pinpointed it yet. Help from the Vulkan team would be appreciated.

Test Plan

Tested 13 minimal single-operator models on Pixel 10 Pro (PowerVR)
FP32 convolution passes all tests
Non-conv FP16 ops (add, multiply) pass all tests
FP16 conv/linear has known +0.5 offset (unresolved)
Run existing Vulkan test suite on Adreno/Mali to verify no regressions

Related Issues

Vulkan backend produces all-zero outputs on PowerVR GPU (Pixel 10 Pro) #17299 - Vulkan backend issues on PowerVR GPUs
[ET-VK] Add missing VMA flush/invalidate calls in StagingBuffer copy methods #17341 - StagingBuffer vmaFlushAllocation fixes (separate PR)

cc @SS-JIA @manuelcandales @digantdesai @cbilgin

pytorch-bot · 2026-02-10T03:33:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17323

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 8 Awaiting Approval

As of commit 3f26203 with merge base 429925d ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-10T03:34:05Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

SS-JIA · 2026-02-10T16:34:42Z

backends/vulkan/runtime/api/containers/StagingBuffer.h

    for (size_t i = 0; i < numel; ++i) {
      dst[i] = static_cast<DST_T>(src[i]);
    }
+    vmaFlushAllocation(


Great catch. I noticed that there were a few other calls missing as well, so I created a dedicated PR to fix: #17341

Feel free to remove these changes from this PR; also, afaik this doesn't impact the model correctness issue that you are observing.

SS-JIA

Overall, the changes LGTM. Feel free to take it out of draft mode.

Copilot

Pull request overview

This PR adds initial PowerVR GPU support to the Vulkan backend, addressing device detection and correctness issues identified on the Pixel 10 Pro. The PR enables FP32 inference to work correctly on PowerVR hardware, though FP16 support has a known +0.5 offset issue that remains unresolved.

Changes:

Added PowerVR device type detection and convenience methods
Applied PowerVR-specific workgroup size optimizations for convolution operations
Forced optimal tiling on PowerVR to avoid linear tiling correctness issues
Enabled robustBufferAccess feature for PowerVR devices

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
backends/vulkan/runtime/vk_api/Device.h	Added POWERVR to DeviceType enum
backends/vulkan/runtime/vk_api/Device.cpp	Added PowerVR string detection in PhysicalDevice constructor
backends/vulkan/runtime/graph/ComputeGraph.h	Added device_is_powervr() convenience method
backends/vulkan/runtime/graph/ops/impl/Convolution.cpp	Added PowerVR-specific workgroup sizes (32 instead of 64) for convolution dispatch
backends/vulkan/runtime/api/Context.cpp	Forced optimal tiling on PowerVR devices
backends/vulkan/runtime/vk_api/Adapter.cpp	Enabled robustBufferAccess on PowerVR for well-defined out-of-bounds behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-10T23:01:26Z

backends/vulkan/runtime/vk_api/Device.h

  MALI,
  ADRENO,
  SWIFTSHADER,
+  POWERVR,


The PR description mentions StagingBuffer.h changes (vmaFlushAllocation additions), but these changes are not present in the PR diff. Upon inspection, these changes already exist in the codebase (lines 77, 91, 125 of StagingBuffer.h). Please update the PR description to clarify that the StagingBuffer correctness fixes are already present in the codebase and are not part of this PR's changes.

backends/vulkan/runtime/graph/ops/impl/Convolution.cpp

SS-JIA

Overall LGTM, just a very small request.

Add PowerVR GPU type detection to the Vulkan backend device enumeration, PowerVR-specific workgroup size tuning for convolution operators, and correctness fixes for PowerVR's TBDR architecture. Changes: - Add POWERVR to DeviceType enum with string detection - Add device_is_powervr() convenience method on ComputeGraph - Add PowerVR-specific workgroup sizes (32 instead of 64) for convolution dispatch to match PowerVR execution unit configuration - Force optimal tiling on PowerVR (linear tiling may produce incorrect results in compute shaders on TBDR architecture) - Enable robustBufferAccess on PowerVR for well-defined OOB behavior Tested on Pixel 10 Pro (PowerVR D-Series DXT-48-1536 MC1): - FP32 convolution passes all tests - Non-conv FP16 ops (add, multiply) pass correctly - FP16 conv has known bias texture initialization issue (pytorch#17299) Related: pytorch#17299

set_staging_zeros() and cast_and_copy_from() write to staging buffers without flushing, unlike copy_from() which correctly calls vmaFlushAllocation(). On GPUs where VMA staging memory is not host-coherent (e.g. PowerVR), CPU writes stay in cache and the GPU reads garbage, causing incorrect inference results. This fixes FP16 convolution producing wrong outputs on PowerVR GPUs where the implicit zero-bias texture reads uninitialized memory.

Remove PowerVR-specific diagnostic cerr logging and unused iostream include that were used during development.

This reverts commit 9509064.

Add PowerVR GPU type detection to the Vulkan backend device enumeration, PowerVR-specific workgroup size tuning for convolution operators, and correctness fixes for PowerVR's TBDR architecture. Changes: - Add POWERVR to DeviceType enum with string detection - Add device_is_powervr() convenience method on ComputeGraph - Add PowerVR-specific workgroup sizes (32 instead of 64) for convolution dispatch to match PowerVR execution unit configuration - Force optimal tiling on PowerVR (linear tiling may produce incorrect results in compute shaders on TBDR architecture) - Enable robustBufferAccess on PowerVR for well-defined OOB behavior Tested on Pixel 10 Pro (PowerVR D-Series DXT-48-1536 MC1): - FP32 convolution passes all tests - Non-conv FP16 ops (add, multiply) pass correctly - FP16 conv has known bias texture initialization issue (pytorch#17299) Related: pytorch#17299

set_staging_zeros() and cast_and_copy_from() write to staging buffers without flushing, unlike copy_from() which correctly calls vmaFlushAllocation(). On GPUs where VMA staging memory is not host-coherent (e.g. PowerVR), CPU writes stay in cache and the GPU reads garbage, causing incorrect inference results. This fixes FP16 convolution producing wrong outputs on PowerVR GPUs where the implicit zero-bias texture reads uninitialized memory.

Remove PowerVR-specific diagnostic cerr logging and unused iostream include that were used during development.

This reverts commit 9509064.

The local_wg_size variable was computed but never used since DynamicDispatchNode uses the conv2d_local_wg_size callback which already contains the PowerVR-specific logic.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-12T01:48:32Z

backends/vulkan/runtime/graph/ops/impl/Convolution.cpp

  utils::uvec3 wg_size = create_conv2d_global_wg_size(
      graph, method, out, weight_data, stride_equals_dilation);

-  utils::uvec3 local_wg_size;
  if (method == Conv2dMethod::Depthwise || method == Conv2dMethod::Pointwise) {
    wg_size = {wg_size[0] * wg_size[1], wg_size[2], 1};
  }


wg_size is computed (and reshaped for depthwise/pointwise) but is no longer used after removing the local_wg_size block. This leaves dead code and can trigger unused-variable warnings; consider deleting the wg_size calculation entirely here since DynamicDispatchNode uses conv2d_global_wg_size/conv2d_local_wg_size callbacks to determine dispatch sizes.

Copilot · 2026-02-12T01:48:33Z

backends/vulkan/runtime/vk_api/Adapter.cpp

+  // Enable robustBufferAccess on PowerVR devices to ensure well-defined
+  // behavior for out-of-bounds buffer/image accesses. Without this, PowerVR
+  // drivers may return zeros or undefined values for edge cases in compute
+  // shaders. This has a minor performance cost but improves correctness.


The comment mentions "out-of-bounds buffer/image accesses", but VkPhysicalDeviceFeatures::robustBufferAccess only applies to buffer descriptor accesses (it doesn't make out-of-bounds storage image operations well-defined). Please tighten the wording so it matches what the enabled feature actually guarantees, to avoid future debugging confusion.

Suggested change

// Enable robustBufferAccess on PowerVR devices to ensure well-defined

// behavior for out-of-bounds buffer/image accesses. Without this, PowerVR

// drivers may return zeros or undefined values for edge cases in compute

// shaders. This has a minor performance cost but improves correctness.

// Enable robustBufferAccess on PowerVR devices to provide more well-defined

// behavior for out-of-bounds buffer descriptor accesses. Without this,

// PowerVR drivers may return zeros or undefined values for some edge cases

// in compute shaders. This has a minor performance cost but improves correctness.

Copilot · 2026-02-12T01:48:33Z

backends/vulkan/runtime/vk_api/Adapter.cpp

+  // shaders. This has a minor performance cost but improves correctness.
+  VkPhysicalDeviceFeatures enabled_features{};
+  if (physical_device.device_type == DeviceType::POWERVR) {
+    enabled_features.robustBufferAccess = VK_TRUE;


robustBufferAccess is enabled for PowerVR without checking whether the physical device reports support for it. If an implementation were to report robustBufferAccess = VK_FALSE, this would cause vkCreateDevice to fail on PowerVR only. Consider querying vkGetPhysicalDeviceFeatures (or storing it in PhysicalDevice) and only requesting the feature when it is supported (or explicitly erroring with a clear message).

Suggested change

enabled_features.robustBufferAccess = VK_TRUE;

VkPhysicalDeviceFeatures supported_features{};

vkGetPhysicalDeviceFeatures(physical_device.handle, &supported_features);

if (supported_features.robustBufferAccess == VK_TRUE) {

enabled_features.robustBufferAccess = VK_TRUE;

}

- Remove unused wg_size variable left behind after removing inline workgroup size calculation (DynamicDispatchNode uses callbacks) - Fix robustBufferAccess comment to accurately describe buffer-only scope - Query device feature support before enabling robustBufferAccess

pytorch-bot bot added the module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ label Feb 10, 2026

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 10, 2026

abdelaziz-mahdy mentioned this pull request Feb 10, 2026

Vulkan backend produces all-zero outputs on PowerVR GPU (Pixel 10 Pro) #17299

Open

abdelaziz-mahdy changed the title ~~Add PowerVR GPU detection and initial support to Vulkan backend~~ Initial PowerVR GPU support for Vulkan backend Feb 10, 2026

SS-JIA reviewed Feb 10, 2026

View reviewed changes

abdelaziz-mahdy marked this pull request as ready for review February 10, 2026 22:56

Copilot AI review requested due to automatic review settings February 10, 2026 22:56

Copilot started reviewing on behalf of abdelaziz-mahdy February 10, 2026 22:57 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

SS-JIA requested changes Feb 11, 2026

View reviewed changes

abdelaziz-mahdy requested review from JacobSzwejbka, cccclai, digantdesai, kirklandsign, larryliu0820 and lucylq as code owners February 12, 2026 01:38

abdelaziz-mahdy added 9 commits February 11, 2026 21:43

Remove debug logging from Convolution.cpp

a75bec1

Remove PowerVR-specific diagnostic cerr logging and unused iostream include that were used during development.

Revert "Fix missing vmaFlushAllocation in StagingBuffer"

c653c2e

This reverts commit 9509064.

Remove debug logging from Convolution.cpp

aebe7fd

Remove PowerVR-specific diagnostic cerr logging and unused iostream include that were used during development.

Revert "Fix missing vmaFlushAllocation in StagingBuffer"

ed41c7a

This reverts commit 9509064.

Remove dead local_wg_size code from add_conv2d_node

bd9b151

The local_wg_size variable was computed but never used since DynamicDispatchNode uses the conv2d_local_wg_size callback which already contains the PowerVR-specific logic.

abdelaziz-mahdy force-pushed the powervr-support branch from 9e14319 to bd9b151 Compare February 12, 2026 01:44

Copilot AI review requested due to automatic review settings February 12, 2026 01:44

Copilot started reviewing on behalf of abdelaziz-mahdy February 12, 2026 01:44 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

abdelaziz-mahdy mentioned this pull request Feb 15, 2026

[ET-VK] Serialize prepack dispatches on PowerVR GPUs #17468

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial PowerVR GPU support for Vulkan backend#17323

Initial PowerVR GPU support for Vulkan backend#17323
abdelaziz-mahdy wants to merge 10 commits intopytorch:mainfrom
abdelaziz-mahdy:powervr-support

abdelaziz-mahdy commented Feb 10, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

SS-JIA Feb 10, 2026

Uh oh!

SS-JIA left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Uh oh!

SS-JIA left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    enabled_features.robustBufferAccess = VK_TRUE;
+    VkPhysicalDeviceFeatures supported_features{};
+    vkGetPhysicalDeviceFeatures(physical_device.handle, &supported_features);
+    if (supported_features.robustBufferAccess == VK_TRUE) {
+      enabled_features.robustBufferAccess = VK_TRUE;
+    }

Conversation

abdelaziz-mahdy commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Device Detection (vk_api/Device.h, vk_api/Device.cpp)

Compute Graph (graph/ComputeGraph.h)

Convolution Optimization (graph/ops/impl/Convolution.cpp)

Tiling Fix (api/Context.cpp)

Buffer Access (vk_api/Adapter.cpp)

Test Results (Pixel 10 Pro)

Known Issues

FP16 +0.5 Offset (Unresolved)

Test Plan

Related Issues

Uh oh!

pytorch-bot bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17323

⚠️ 8 Awaiting Approval

Uh oh!

github-actions bot commented Feb 10, 2026

This PR needs a release notes: label

Uh oh!

SS-JIA Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

SS-JIA left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SS-JIA left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abdelaziz-mahdy commented Feb 10, 2026 •

edited

Loading

Device Detection (`vk_api/Device.h`, `vk_api/Device.cpp`)

Compute Graph (`graph/ComputeGraph.h`)

Convolution Optimization (`graph/ops/impl/Convolution.cpp`)

Tiling Fix (`api/Context.cpp`)

Buffer Access (`vk_api/Adapter.cpp`)

pytorch-bot bot commented Feb 10, 2026 •

edited

Loading

This PR needs a `release notes:` label