Skip to content

Vulkan: Texture tensor UBO overflow on Android #17293

@abdelaziz-mahdy

Description

@abdelaziz-mahdy

Summary

Running Vulkan-delegated models on Android fails to load with:

E Vulkan uniform data allocation has exceeded tensor uniform buffer size
  (Tensor.h:579 metadata_ubo_impl)

ExecuTorchException: backend initialization failed:
  Exception raised from metadata_ubo_impl at
  .../backends/vulkan/runtime/api/containers/Tensor.h:579:
  ((uniforms_size_ + ubo_nbytes) <= max_ubo_nbytes_) is false!
  Uniform data allocation has exceeded Tensor uniform buffer size

Based on my analysis of the source code, the root cause appears to be a mismatch between the UBO field budget for texture-backed tensors (2 fields: sizes + logical_limits) and what operators actually request (up to 4+ fields: sizes, strides, dim_order, numel). I may be misunderstanding something, so corrections are welcome.

This affects all models I tested — including simple ones like MobileNet V3 Small — not just complex architectures.

Export-time workarounds (storage_type_override=VkStorageType.BUFFER, force_fp16=True) did not resolve the issue in my testing, seemingly because multiple ops hardcode texture storage in the C++ runtime.

Environment

ExecuTorch Runtime Info

Version
- ExecuTorch: 1.1.0
- FFI Library: 2.0.0
- Plugin: 1.1.0

Backends
- XNNPACK: Available
- CoreML: Not compiled
- Metal Performance Shaders: Not compiled
- Vulkan: Available
- Qualcomm QNN: Not compiled

Device
- Platform: Android
- Device: Pixel 10 Pro
- Manufacturer: Google
- Brand: google
- Hardware: blazer
- Board: blazer
- Product: blazer
- Android Version: 16
- SDK Int: 36
- Security Patch: 2026-01-05
- ABIs: arm64-v8a
- Physical RAM: 15575 MB
- Available RAM: 2268 MB
- Low RAM Device: false
- Physical Device: true

Error Log

E Vulkan uniform data allocation has exceeded tensor uniform buffer size
  (Tensor.h:579 metadata_ubo_impl)

I/flutter: [ExecuTorch DEBUG] NativeModule.loadFile() called with path:
  /data/user/0/com.zcreations.executorch_flutter_example/cache/models/1.1.0/mobilenet_v3_small_vulkan.pte
I/flutter: [ExecuTorch DEBUG] et_module_load_file returned, status code: 3
I/flutter: Failed to load model: ExecuTorchException: backend initialization failed:
  Exception raised from metadata_ubo_impl at
  .../backends/vulkan/runtime/api/containers/Tensor.h:579:
  ((uniforms_size_ + ubo_nbytes) <= max_ubo_nbytes_) is false!
  Uniform data allocation has exceeded Tensor uniform buffer size

Root Cause Analysis

1. Texture UBO budget is only 2 fields

backends/vulkan/runtime/api/containers/Tensor.cppcalculate_max_ubo_nbytes():

if (storage_type == utils::kBuffer) {
    // sizes, strides, dim order, numel
    return 3 * ivec4_ubo_nbytes + int32_ubo_nbytes;  // 4 fields
}
// sizes, logical limits
return ivec4_ubo_nbytes + uvec3_ubo_nbytes;  // Only 2 fields!

And the equivalent in get_max_ubo_nbytes():

size_t max_metadata_field_count = 2u;  // texture
if (storage_type() == utils::kBuffer) {
    max_metadata_field_count = 4u;
}

With a typical minUniformBufferOffsetAlignment = 256:

  • Buffer: 3×256 + 256 = 1024 bytes (4 fields)
  • Texture: 256 + 256 = 512 bytes (2 fields)

2. Operators unconditionally request 4+ fields on texture tensors

backends/vulkan/runtime/graph/ops/impl/Linear.cppadd_addmm_naive_node():

{
    graph.sizes_ubo(out),
    graph.strides_ubo(out),      // Needs buffer budget
    graph.sizes_ubo(mat1),
    graph.strides_ubo(mat1),     // Needs buffer budget
    graph.sizes_ubo(mat2),
    graph.strides_ubo(mat2),     // Needs buffer budget
    graph.numel_ubo(out),        // Needs buffer budget
    graph.create_params_buffer(params),
},

When any tensor here is texture-backed, calling strides_ubo() / numel_ubo() exceeds the 2-field budget → assertion failure at metadata_ubo_impl.

3. Ops hardcode texture storage, ignoring export-time overrides

Even with storage_type_override=VkStorageType.BUFFER, the runtime creates texture tensors:

Convolution.cpp — prepacked weights hardcoded to kTexture2D:

ValueRef v = graph.add_tensor(
    final_sizes, graph.dtype_of(vref),
    utils::kTexture2D,  // Hardcoded, ignores storage_type_override
    utils::kChannelsPacked);

op_registry.py — GroupNorm declares texture-only storage:

def register_native_group_norm():
    return OpFeatures(
        inputs_storage=utils.CHANNELS_PACKED_TEXTURE,
        outputs_storage=[utils.CHANNELS_PACKED_TEXTURE, ...],
    )

4. force_fp16=True makes it worse

_passes/tag_memory_meta_pass.py:361-365:

if self.force_fp16:
    op_repsets.try_constrain_with_arg_repset(arg_i, utils.ANY_TEXTURE)

This pushes ALL tensors toward texture storage, maximizing UBO overflow probability.

5. PR #11599 adds dim_order UBO without increasing texture budget

The recently merged PR #11599 adds dim_order_ubo() as a new metadata accessor, but calculate_max_ubo_nbytes was not updated for texture tensors. This creates even more potential for overflow if ops start requesting dim_order on texture tensors.

Export-time workarounds attempted (all failed)

Approach Result
Default export Fails
force_fp16=True Fails (worse — pushes more tensors to texture)
storage_type_override=VkStorageType.BUFFER Fails (ops hardcode texture in C++ runtime)

Model hashes were verified (SHA-256) after each re-export to confirm the correct model was loaded on the device.

Suggested Fixes

  1. Increase texture UBO budget in calculate_max_ubo_nbytes() to match buffer (4 fields) — simplest fix, small memory increase per tensor
  2. Guard UBO access by storage type — ops should only request strides/numel/dim_order on buffer tensors, and use logical_limits for texture tensors
  3. Respect storage_type_override in runtime prepacking ops (Convolution, etc.) — allow the export-time override to actually work end-to-end

Reproduction

import torch
from torchvision.models import mobilenet_v3_small
from executorch.backends.vulkan.partitioner.vulkan_partitioner import VulkanPartitioner
from executorch.exir import to_edge_transform_and_lower, EdgeCompileConfig

model = mobilenet_v3_small(pretrained=True).eval()
example_input = (torch.randn(1, 3, 224, 224),)

exported = torch.export.export(model, example_input)
et_program = to_edge_transform_and_lower(
    exported,
    compile_config=EdgeCompileConfig(_check_ir_validity=False),
    partitioner=[VulkanPartitioner(
        compile_options={"texture_limits": (2048, 2048, 2048)},
    )],
).to_executorch()

with open("mobilenet_v3_small_vulkan.pte", "wb") as f:
    f.write(et_program.buffer)

Run the exported model on any Android device → fails at model load time.

cc @SS-JIA @manuelcandales @digantdesai @cbilgin

Metadata

Metadata

Assignees

Labels

module: vulkanIssues related to the Vulkan delegate and code under backends/vulkan/

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions