Skip to content

Initial PowerVR GPU support for Vulkan backend#17323

Open
abdelaziz-mahdy wants to merge 10 commits intopytorch:mainfrom
abdelaziz-mahdy:powervr-support
Open

Initial PowerVR GPU support for Vulkan backend#17323
abdelaziz-mahdy wants to merge 10 commits intopytorch:mainfrom
abdelaziz-mahdy:powervr-support

Conversation

@abdelaziz-mahdy
Copy link
Contributor

@abdelaziz-mahdy abdelaziz-mahdy commented Feb 10, 2026

Summary

Initial work to add PowerVR GPU support to the Vulkan backend. This PR adds device detection, workgroup tuning, and correctness fixes needed for PowerVR's TBDR architecture.

FP32 inference works correctly with these changes. FP16 has a known issue (see below).

Motivation

The Vulkan backend currently has no awareness of PowerVR GPUs. Testing on Pixel 10 Pro (PowerVR D-Series DXT-48-1536 MC1) showed that without explicit PowerVR handling, inference produces incorrect results. This PR adds the foundational support needed to run models on PowerVR hardware.

Changes

Device Detection (vk_api/Device.h, vk_api/Device.cpp)

  • Added POWERVR to DeviceType enum
  • Added PowerVR string detection in PhysicalDevice constructor

Compute Graph (graph/ComputeGraph.h)

  • Added device_is_powervr() convenience method

Convolution Optimization (graph/ops/impl/Convolution.cpp)

  • Added PowerVR-specific workgroup sizes (32 instead of 64) for convolution dispatch via conv2d_local_wg_size callback
  • PowerVR GPUs have different execution unit configurations than Adreno/Mali

Tiling Fix (api/Context.cpp)

  • Force optimal tiling on PowerVR (linear tiling may produce incorrect results in compute shaders on TBDR architecture)

Buffer Access (vk_api/Adapter.cpp)

  • Enable robustBufferAccess on PowerVR for well-defined out-of-bounds behavior

Test Results (Pixel 10 Pro)

I tested 13 minimal single-operator models to isolate behavior:

Test Type Expected Actual Status
bare_add (x+1.0) FP16 2.0 2.0 PASS
bare_half (x*0.5) FP16 0.5 0.5 PASS
zero_conv (all-zero weights, no bias) FP16 0.0 0.5 FAIL (+0.5)
identity_conv (1x1, input=5.0) FP16 5.0 5.5 FAIL (+0.5)
bare_conv2d (3->4, 3x3, no bias) FP16 0.8403 1.3400 FAIL (+0.5)
bare_linear (16->4) FP16 1.6 2.0996 FAIL (+0.5)
bare_conv2d_fp32 (3->4, 3x3, no bias) FP32 0.8403 0.8403 PASS
bare_conv2d_xnnpack (CPU reference) FP32 0.8403 0.8403 PASS

Key observations:

  • FP32 works correctly on PowerVR with these changes
  • FP16 conv/linear ops have a deterministic +0.5 offset (same every run, not random)
  • Non-conv FP16 ops work (bare_add, bare_half produce correct results)
  • All conv variants affected — regular, depthwise, pointwise, and linear

MobileNet progressive slices all produce NaN, likely cascading from the +0.5 offset through batch normalization.

Known Issues

FP16 +0.5 Offset (Unresolved)

All FP16 convolution and linear operations on PowerVR show a constant +0.5 added to the output. This is deterministic and affects all conv variants. The root cause has not been identified yet.

What I've ruled out:

  • Not uninitialized memory: The offset is deterministic (+0.5 every run), not random
  • Not a convolution algorithm bug: FP32 convolution produces correct results
  • Not specific to conv size/type: Affects regular, depthwise, pointwise, and linear

The issue likely lives somewhere in the FP16 prepack or texture path, but I haven't pinpointed it yet. Help from the Vulkan team would be appreciated.

Test Plan

  • Tested 13 minimal single-operator models on Pixel 10 Pro (PowerVR)
  • FP32 convolution passes all tests
  • Non-conv FP16 ops (add, multiply) pass all tests
  • FP16 conv/linear has known +0.5 offset (unresolved)
  • Run existing Vulkan test suite on Adreno/Mali to verify no regressions

Related Issues

cc @SS-JIA @manuelcandales @digantdesai @cbilgin

@pytorch-bot pytorch-bot bot added the module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ label Feb 10, 2026
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 10, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17323

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 8 Awaiting Approval

As of commit 3f26203 with merge base 429925d (image):

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 10, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@abdelaziz-mahdy abdelaziz-mahdy changed the title Add PowerVR GPU detection and initial support to Vulkan backend Initial PowerVR GPU support for Vulkan backend Feb 10, 2026
for (size_t i = 0; i < numel; ++i) {
dst[i] = static_cast<DST_T>(src[i]);
}
vmaFlushAllocation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. I noticed that there were a few other calls missing as well, so I created a dedicated PR to fix: #17341

Feel free to remove these changes from this PR; also, afaik this doesn't impact the model correctness issue that you are observing.

Copy link
Contributor

@SS-JIA SS-JIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the changes LGTM. Feel free to take it out of draft mode.

@abdelaziz-mahdy abdelaziz-mahdy marked this pull request as ready for review February 10, 2026 22:56
Copilot AI review requested due to automatic review settings February 10, 2026 22:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds initial PowerVR GPU support to the Vulkan backend, addressing device detection and correctness issues identified on the Pixel 10 Pro. The PR enables FP32 inference to work correctly on PowerVR hardware, though FP16 support has a known +0.5 offset issue that remains unresolved.

Changes:

  • Added PowerVR device type detection and convenience methods
  • Applied PowerVR-specific workgroup size optimizations for convolution operations
  • Forced optimal tiling on PowerVR to avoid linear tiling correctness issues
  • Enabled robustBufferAccess feature for PowerVR devices

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
backends/vulkan/runtime/vk_api/Device.h Added POWERVR to DeviceType enum
backends/vulkan/runtime/vk_api/Device.cpp Added PowerVR string detection in PhysicalDevice constructor
backends/vulkan/runtime/graph/ComputeGraph.h Added device_is_powervr() convenience method
backends/vulkan/runtime/graph/ops/impl/Convolution.cpp Added PowerVR-specific workgroup sizes (32 instead of 64) for convolution dispatch
backends/vulkan/runtime/api/Context.cpp Forced optimal tiling on PowerVR devices
backends/vulkan/runtime/vk_api/Adapter.cpp Enabled robustBufferAccess on PowerVR for well-defined out-of-bounds behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MALI,
ADRENO,
SWIFTSHADER,
POWERVR,
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions StagingBuffer.h changes (vmaFlushAllocation additions), but these changes are not present in the PR diff. Upon inspection, these changes already exist in the codebase (lines 77, 91, 125 of StagingBuffer.h). Please update the PR description to clarify that the StagingBuffer correctness fixes are already present in the codebase and are not part of this PR's changes.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@SS-JIA SS-JIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just a very small request.

Add PowerVR GPU type detection to the Vulkan backend device enumeration,
PowerVR-specific workgroup size tuning for convolution operators, and
correctness fixes for PowerVR's TBDR architecture.

Changes:
- Add POWERVR to DeviceType enum with string detection
- Add device_is_powervr() convenience method on ComputeGraph
- Add PowerVR-specific workgroup sizes (32 instead of 64) for
  convolution dispatch to match PowerVR execution unit configuration
- Force optimal tiling on PowerVR (linear tiling may produce
  incorrect results in compute shaders on TBDR architecture)
- Enable robustBufferAccess on PowerVR for well-defined OOB behavior

Tested on Pixel 10 Pro (PowerVR D-Series DXT-48-1536 MC1):
- FP32 convolution passes all tests
- Non-conv FP16 ops (add, multiply) pass correctly
- FP16 conv has known bias texture initialization issue (pytorch#17299)

Related: pytorch#17299
set_staging_zeros() and cast_and_copy_from() write to staging buffers
without flushing, unlike copy_from() which correctly calls
vmaFlushAllocation(). On GPUs where VMA staging memory is not
host-coherent (e.g. PowerVR), CPU writes stay in cache and the GPU
reads garbage, causing incorrect inference results.

This fixes FP16 convolution producing wrong outputs on PowerVR GPUs
where the implicit zero-bias texture reads uninitialized memory.
Remove PowerVR-specific diagnostic cerr logging and unused iostream
include that were used during development.
Add PowerVR GPU type detection to the Vulkan backend device enumeration,
PowerVR-specific workgroup size tuning for convolution operators, and
correctness fixes for PowerVR's TBDR architecture.

Changes:
- Add POWERVR to DeviceType enum with string detection
- Add device_is_powervr() convenience method on ComputeGraph
- Add PowerVR-specific workgroup sizes (32 instead of 64) for
  convolution dispatch to match PowerVR execution unit configuration
- Force optimal tiling on PowerVR (linear tiling may produce
  incorrect results in compute shaders on TBDR architecture)
- Enable robustBufferAccess on PowerVR for well-defined OOB behavior

Tested on Pixel 10 Pro (PowerVR D-Series DXT-48-1536 MC1):
- FP32 convolution passes all tests
- Non-conv FP16 ops (add, multiply) pass correctly
- FP16 conv has known bias texture initialization issue (pytorch#17299)

Related: pytorch#17299
set_staging_zeros() and cast_and_copy_from() write to staging buffers
without flushing, unlike copy_from() which correctly calls
vmaFlushAllocation(). On GPUs where VMA staging memory is not
host-coherent (e.g. PowerVR), CPU writes stay in cache and the GPU
reads garbage, causing incorrect inference results.

This fixes FP16 convolution producing wrong outputs on PowerVR GPUs
where the implicit zero-bias texture reads uninitialized memory.
Remove PowerVR-specific diagnostic cerr logging and unused iostream
include that were used during development.
The local_wg_size variable was computed but never used since
DynamicDispatchNode uses the conv2d_local_wg_size callback
which already contains the PowerVR-specific logic.
Copilot AI review requested due to automatic review settings February 12, 2026 01:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 537 to 542
utils::uvec3 wg_size = create_conv2d_global_wg_size(
graph, method, out, weight_data, stride_equals_dilation);

utils::uvec3 local_wg_size;
if (method == Conv2dMethod::Depthwise || method == Conv2dMethod::Pointwise) {
wg_size = {wg_size[0] * wg_size[1], wg_size[2], 1};
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wg_size is computed (and reshaped for depthwise/pointwise) but is no longer used after removing the local_wg_size block. This leaves dead code and can trigger unused-variable warnings; consider deleting the wg_size calculation entirely here since DynamicDispatchNode uses conv2d_global_wg_size/conv2d_local_wg_size callbacks to determine dispatch sizes.

Copilot uses AI. Check for mistakes.
Comment on lines +134 to +137
// Enable robustBufferAccess on PowerVR devices to ensure well-defined
// behavior for out-of-bounds buffer/image accesses. Without this, PowerVR
// drivers may return zeros or undefined values for edge cases in compute
// shaders. This has a minor performance cost but improves correctness.
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions "out-of-bounds buffer/image accesses", but VkPhysicalDeviceFeatures::robustBufferAccess only applies to buffer descriptor accesses (it doesn't make out-of-bounds storage image operations well-defined). Please tighten the wording so it matches what the enabled feature actually guarantees, to avoid future debugging confusion.

Suggested change
// Enable robustBufferAccess on PowerVR devices to ensure well-defined
// behavior for out-of-bounds buffer/image accesses. Without this, PowerVR
// drivers may return zeros or undefined values for edge cases in compute
// shaders. This has a minor performance cost but improves correctness.
// Enable robustBufferAccess on PowerVR devices to provide more well-defined
// behavior for out-of-bounds buffer descriptor accesses. Without this,
// PowerVR drivers may return zeros or undefined values for some edge cases
// in compute shaders. This has a minor performance cost but improves correctness.

Copilot uses AI. Check for mistakes.
// shaders. This has a minor performance cost but improves correctness.
VkPhysicalDeviceFeatures enabled_features{};
if (physical_device.device_type == DeviceType::POWERVR) {
enabled_features.robustBufferAccess = VK_TRUE;
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

robustBufferAccess is enabled for PowerVR without checking whether the physical device reports support for it. If an implementation were to report robustBufferAccess = VK_FALSE, this would cause vkCreateDevice to fail on PowerVR only. Consider querying vkGetPhysicalDeviceFeatures (or storing it in PhysicalDevice) and only requesting the feature when it is supported (or explicitly erroring with a clear message).

Suggested change
enabled_features.robustBufferAccess = VK_TRUE;
VkPhysicalDeviceFeatures supported_features{};
vkGetPhysicalDeviceFeatures(physical_device.handle, &supported_features);
if (supported_features.robustBufferAccess == VK_TRUE) {
enabled_features.robustBufferAccess = VK_TRUE;
}

Copilot uses AI. Check for mistakes.
- Remove unused wg_size variable left behind after removing inline
  workgroup size calculation (DynamicDispatchNode uses callbacks)
- Fix robustBufferAccess comment to accurately describe buffer-only scope
- Query device feature support before enabling robustBufferAccess
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants