-
Notifications
You must be signed in to change notification settings - Fork 873
Description
Summary
Running Vulkan-delegated models on Android fails to load with:
E Vulkan uniform data allocation has exceeded tensor uniform buffer size
(Tensor.h:579 metadata_ubo_impl)
ExecuTorchException: backend initialization failed:
Exception raised from metadata_ubo_impl at
.../backends/vulkan/runtime/api/containers/Tensor.h:579:
((uniforms_size_ + ubo_nbytes) <= max_ubo_nbytes_) is false!
Uniform data allocation has exceeded Tensor uniform buffer size
Based on my analysis of the source code, the root cause appears to be a mismatch between the UBO field budget for texture-backed tensors (2 fields: sizes + logical_limits) and what operators actually request (up to 4+ fields: sizes, strides, dim_order, numel). I may be misunderstanding something, so corrections are welcome.
This affects all models I tested — including simple ones like MobileNet V3 Small — not just complex architectures.
Export-time workarounds (storage_type_override=VkStorageType.BUFFER, force_fp16=True) did not resolve the issue in my testing, seemingly because multiple ops hardcode texture storage in the C++ runtime.
Environment
ExecuTorch Runtime Info
Version
- ExecuTorch: 1.1.0
- FFI Library: 2.0.0
- Plugin: 1.1.0
Backends
- XNNPACK: Available
- CoreML: Not compiled
- Metal Performance Shaders: Not compiled
- Vulkan: Available
- Qualcomm QNN: Not compiled
Device
- Platform: Android
- Device: Pixel 10 Pro
- Manufacturer: Google
- Brand: google
- Hardware: blazer
- Board: blazer
- Product: blazer
- Android Version: 16
- SDK Int: 36
- Security Patch: 2026-01-05
- ABIs: arm64-v8a
- Physical RAM: 15575 MB
- Available RAM: 2268 MB
- Low RAM Device: false
- Physical Device: true
Error Log
E Vulkan uniform data allocation has exceeded tensor uniform buffer size
(Tensor.h:579 metadata_ubo_impl)
I/flutter: [ExecuTorch DEBUG] NativeModule.loadFile() called with path:
/data/user/0/com.zcreations.executorch_flutter_example/cache/models/1.1.0/mobilenet_v3_small_vulkan.pte
I/flutter: [ExecuTorch DEBUG] et_module_load_file returned, status code: 3
I/flutter: Failed to load model: ExecuTorchException: backend initialization failed:
Exception raised from metadata_ubo_impl at
.../backends/vulkan/runtime/api/containers/Tensor.h:579:
((uniforms_size_ + ubo_nbytes) <= max_ubo_nbytes_) is false!
Uniform data allocation has exceeded Tensor uniform buffer size
Root Cause Analysis
1. Texture UBO budget is only 2 fields
backends/vulkan/runtime/api/containers/Tensor.cpp — calculate_max_ubo_nbytes():
if (storage_type == utils::kBuffer) {
// sizes, strides, dim order, numel
return 3 * ivec4_ubo_nbytes + int32_ubo_nbytes; // 4 fields
}
// sizes, logical limits
return ivec4_ubo_nbytes + uvec3_ubo_nbytes; // Only 2 fields!And the equivalent in get_max_ubo_nbytes():
size_t max_metadata_field_count = 2u; // texture
if (storage_type() == utils::kBuffer) {
max_metadata_field_count = 4u;
}With a typical minUniformBufferOffsetAlignment = 256:
- Buffer: 3×256 + 256 = 1024 bytes (4 fields)
- Texture: 256 + 256 = 512 bytes (2 fields)
2. Operators unconditionally request 4+ fields on texture tensors
backends/vulkan/runtime/graph/ops/impl/Linear.cpp — add_addmm_naive_node():
{
graph.sizes_ubo(out),
graph.strides_ubo(out), // Needs buffer budget
graph.sizes_ubo(mat1),
graph.strides_ubo(mat1), // Needs buffer budget
graph.sizes_ubo(mat2),
graph.strides_ubo(mat2), // Needs buffer budget
graph.numel_ubo(out), // Needs buffer budget
graph.create_params_buffer(params),
},When any tensor here is texture-backed, calling strides_ubo() / numel_ubo() exceeds the 2-field budget → assertion failure at metadata_ubo_impl.
3. Ops hardcode texture storage, ignoring export-time overrides
Even with storage_type_override=VkStorageType.BUFFER, the runtime creates texture tensors:
Convolution.cpp — prepacked weights hardcoded to kTexture2D:
ValueRef v = graph.add_tensor(
final_sizes, graph.dtype_of(vref),
utils::kTexture2D, // Hardcoded, ignores storage_type_override
utils::kChannelsPacked);op_registry.py — GroupNorm declares texture-only storage:
def register_native_group_norm():
return OpFeatures(
inputs_storage=utils.CHANNELS_PACKED_TEXTURE,
outputs_storage=[utils.CHANNELS_PACKED_TEXTURE, ...],
)4. force_fp16=True makes it worse
_passes/tag_memory_meta_pass.py:361-365:
if self.force_fp16:
op_repsets.try_constrain_with_arg_repset(arg_i, utils.ANY_TEXTURE)This pushes ALL tensors toward texture storage, maximizing UBO overflow probability.
5. PR #11599 adds dim_order UBO without increasing texture budget
The recently merged PR #11599 adds dim_order_ubo() as a new metadata accessor, but calculate_max_ubo_nbytes was not updated for texture tensors. This creates even more potential for overflow if ops start requesting dim_order on texture tensors.
Export-time workarounds attempted (all failed)
| Approach | Result |
|---|---|
| Default export | Fails |
force_fp16=True |
Fails (worse — pushes more tensors to texture) |
storage_type_override=VkStorageType.BUFFER |
Fails (ops hardcode texture in C++ runtime) |
Model hashes were verified (SHA-256) after each re-export to confirm the correct model was loaded on the device.
Suggested Fixes
- Increase texture UBO budget in
calculate_max_ubo_nbytes()to match buffer (4 fields) — simplest fix, small memory increase per tensor - Guard UBO access by storage type — ops should only request strides/numel/dim_order on buffer tensors, and use logical_limits for texture tensors
- Respect
storage_type_overridein runtime prepacking ops (Convolution, etc.) — allow the export-time override to actually work end-to-end
Reproduction
import torch
from torchvision.models import mobilenet_v3_small
from executorch.backends.vulkan.partitioner.vulkan_partitioner import VulkanPartitioner
from executorch.exir import to_edge_transform_and_lower, EdgeCompileConfig
model = mobilenet_v3_small(pretrained=True).eval()
example_input = (torch.randn(1, 3, 224, 224),)
exported = torch.export.export(model, example_input)
et_program = to_edge_transform_and_lower(
exported,
compile_config=EdgeCompileConfig(_check_ir_validity=False),
partitioner=[VulkanPartitioner(
compile_options={"texture_limits": (2048, 2048, 2048)},
)],
).to_executorch()
with open("mobilenet_v3_small_vulkan.pte", "wb") as f:
f.write(et_program.buffer)Run the exported model on any Android device → fails at model load time.