[ET-VK] Use direct buffer-to-image copy for 1D tensor prepacking by abdelaziz-mahdy · Pull Request #17467 · pytorch/executorch

abdelaziz-mahdy · 2026-02-14T23:02:07Z

Summary

Fixes 1D tensor prepacking (e.g., conv2d bias) producing incorrect results on GPUs with strict texture bounds checking (PowerVR).

The nchw_to_image compute shader uses axis_map coordinate remapping designed for multi-dimensional tensors. For 1D tensors padded to 4D with three fake dimensions of size 1, the coordinate math produces out-of-bounds texture writes. PowerVR GPUs silently drop these writes, so bias values never reach the texture. Other GPUs (Adreno, Mali) wrap coordinates, masking the bug.

For 1D width-packed tensors where the element count is divisible by 4, the staging buffer data is already in the correct layout. This change bypasses the compute shader and uses vkCmdCopyBufferToImage to copy the data directly.

Changes:

PrepackNode::encode() — detect 1D width-packed tensors and use direct copy path
CommandBuffer::copy_buffer_to_image() — new helper for buffer-to-image DMA with post-copy barrier
Allocator::create_staging_buffer() — add TRANSFER_SRC to staging buffer usage flags (harmless for existing paths, enables direct copy)

Related: #17299

Test plan

Tested on Pixel 10 Pro (PowerVR DXT-48-1536) — conv2d bias values now reach the texture correctly
Verified no regression on other GPUs (Adreno, Mali) since the coordinate wrapping behavior is unchanged for multi-dimensional tensors

The nchw_to_image compute shader uses axis_map coordinate remapping designed for multi-dimensional tensors. For 1D tensors (e.g., conv2d bias) padded to 4D with three dimensions of size 1, the coordinate math produces out-of-bounds texture writes. Some GPU drivers (PowerVR) strictly validate texture bounds and silently drop these writes, causing bias values to never reach the texture. Other drivers wrap coordinates, masking the bug. For 1D width-packed tensors where the element count is divisible by 4, the staging buffer data is already in the correct layout for the target texture. This change bypasses the compute shader and uses vkCmdCopyBufferToImage to transfer the data directly. Also adds TRANSFER_SRC to staging buffer usage flags (harmless for existing paths) and a copy_buffer_to_image helper to CommandBuffer.

pytorch-bot · 2026-02-14T23:02:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17467

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 Awaiting Approval, 1 New Failure

As of commit 6974d32 with merge base 429925d ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

NEW FAILURE - The following job has failed:

Copilot code review / Cleanup artifacts (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-14T23:02:51Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR fixes a critical bug where 1D tensor prepacking (e.g., conv2d bias) produces incorrect results on PowerVR GPUs due to strict texture bounds checking. The nchw_to_image compute shader's coordinate remapping logic produces out-of-bounds texture writes for 1D tensors padded to 4D with fake dimensions, which PowerVR silently drops. The fix bypasses the buggy shader for 1D width-packed tensors by using a direct vkCmdCopyBufferToImage DMA operation instead.

Changes:

Adds direct buffer-to-image copy path for 1D width-packed tensors with element counts divisible by 4
Implements new copy_buffer_to_image() helper with proper layout transitions and post-copy barriers
Updates staging buffer usage flags to include TRANSFER_SRC for DMA operations

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`backends/vulkan/runtime/vk_api/memory/Allocator.cpp`	Adds `VK_BUFFER_USAGE_TRANSFER_SRC_BIT` to staging buffer usage flags to enable direct copy operations
`backends/vulkan/runtime/vk_api/Command.h`	Declares new `copy_buffer_to_image()` method for buffer-to-image DMA with layout transitions
`backends/vulkan/runtime/vk_api/Command.cpp`	Implements `copy_buffer_to_image()` with proper synchronization via pre/post-copy barriers
`backends/vulkan/runtime/graph/ops/PrepackNode.cpp`	Detects 1D width-packed tensors and uses direct copy path instead of compute shader remapping

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

abdelaziz-mahdy · 2026-02-15T03:27:25Z

Closing - Superseded by Serialize Prepack Fix

I'm closing this PR. The 1D bias prepacking issue on PowerVR turned out to be a symptom of a broader problem, not a coordinate math issue.

What I Found

The actual root cause is that PowerVR corrupts prepacked constant data when multiple prepack compute dispatches are batched in a single Vulkan command buffer. Only the first constant in the batch is correct; subsequent constants read as zero.

For conv2d bias, this meant the bias tensor (typically the second prepacked tensor after weights) read as zero, producing the incorrect output I attributed to out-of-bounds texture writes.

The Real Fix

A change in ComputeGraph::prepack() that serializes prepack dispatches on PowerVR. This fixes bias prepacking, weight prepacking, scalar constant prepacking, and every other multi-constant model — without touching PrepackNode, CommandBuffer, or Allocator.

I tested MobileNet V3 Small on Pixel 10 Pro with only this fix (all changes from this PR reverted): FP32 and FP16 both produce correct classification results, matching XNNPACK output.

PR #17468 has the fix. Details in #17299.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 14, 2026

abdelaziz-mahdy mentioned this pull request Feb 14, 2026

Vulkan backend produces all-zero outputs on PowerVR GPU (Pixel 10 Pro) #17299

Open

abdelaziz-mahdy marked this pull request as ready for review February 14, 2026 23:15

abdelaziz-mahdy requested a review from SS-JIA as a code owner February 14, 2026 23:15

Copilot AI review requested due to automatic review settings February 14, 2026 23:15

abdelaziz-mahdy marked this pull request as draft February 14, 2026 23:15

Copilot started reviewing on behalf of abdelaziz-mahdy February 14, 2026 23:16 View session

Copilot AI reviewed Feb 14, 2026

View reviewed changes

abdelaziz-mahdy mentioned this pull request Feb 15, 2026

[ET-VK] Serialize prepack dispatches on PowerVR GPUs #17468

Open

4 tasks

abdelaziz-mahdy closed this Feb 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK] Use direct buffer-to-image copy for 1D tensor prepacking#17467

[ET-VK] Use direct buffer-to-image copy for 1D tensor prepacking#17467
abdelaziz-mahdy wants to merge 1 commit intopytorch:mainfrom
abdelaziz-mahdy:fix/prepack-1d-direct-copy

abdelaziz-mahdy commented Feb 14, 2026

Uh oh!

pytorch-bot bot commented Feb 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

abdelaziz-mahdy commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abdelaziz-mahdy commented Feb 14, 2026

Summary

Test plan

Uh oh!

pytorch-bot bot commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17467

❌ 8 Awaiting Approval, 1 New Failure

Uh oh!

github-actions bot commented Feb 14, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

abdelaziz-mahdy commented Feb 15, 2026

Closing - Superseded by Serialize Prepack Fix

What I Found

The Real Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Feb 14, 2026 •

edited

Loading

This PR needs a `release notes:` label