Vulkan: Texture tensor UBO overflow on Android


## Summary

Running Vulkan-delegated models on Android fails to load with:

```
E Vulkan uniform data allocation has exceeded tensor uniform buffer size
  (Tensor.h:579 metadata_ubo_impl)

ExecuTorchException: backend initialization failed:
  Exception raised from metadata_ubo_impl at
  .../backends/vulkan/runtime/api/containers/Tensor.h:579:
  ((uniforms_size_ + ubo_nbytes) <= max_ubo_nbytes_) is false!
  Uniform data allocation has exceeded Tensor uniform buffer size
```

Based on my analysis of the source code, the root cause appears to be a mismatch between the UBO field budget for texture-backed tensors (2 fields: sizes + logical_limits) and what operators actually request (up to 4+ fields: sizes, strides, dim_order, numel). I may be misunderstanding something, so corrections are welcome.

This affects **all models** I tested — including simple ones like MobileNet V3 Small — not just complex architectures.

Export-time workarounds (`storage_type_override=VkStorageType.BUFFER`, `force_fp16=True`) did not resolve the issue in my testing, seemingly because multiple ops hardcode texture storage in the C++ runtime.

## Environment

```
ExecuTorch Runtime Info

Version
- ExecuTorch: 1.1.0
- FFI Library: 2.0.0
- Plugin: 1.1.0

Backends
- XNNPACK: Available
- CoreML: Not compiled
- Metal Performance Shaders: Not compiled
- Vulkan: Available
- Qualcomm QNN: Not compiled

Device
- Platform: Android
- Device: Pixel 10 Pro
- Manufacturer: Google
- Brand: google
- Hardware: blazer
- Board: blazer
- Product: blazer
- Android Version: 16
- SDK Int: 36
- Security Patch: 2026-01-05
- ABIs: arm64-v8a
- Physical RAM: 15575 MB
- Available RAM: 2268 MB
- Low RAM Device: false
- Physical Device: true
```

## Error Log

```
E Vulkan uniform data allocation has exceeded tensor uniform buffer size
  (Tensor.h:579 metadata_ubo_impl)

I/flutter: [ExecuTorch DEBUG] NativeModule.loadFile() called with path:
  /data/user/0/com.zcreations.executorch_flutter_example/cache/models/1.1.0/mobilenet_v3_small_vulkan.pte
I/flutter: [ExecuTorch DEBUG] et_module_load_file returned, status code: 3
I/flutter: Failed to load model: ExecuTorchException: backend initialization failed:
  Exception raised from metadata_ubo_impl at
  .../backends/vulkan/runtime/api/containers/Tensor.h:579:
  ((uniforms_size_ + ubo_nbytes) <= max_ubo_nbytes_) is false!
  Uniform data allocation has exceeded Tensor uniform buffer size
```

## Root Cause Analysis

### 1. Texture UBO budget is only 2 fields

**`backends/vulkan/runtime/api/containers/Tensor.cpp` — `calculate_max_ubo_nbytes()`:**

```cpp
if (storage_type == utils::kBuffer) {
    // sizes, strides, dim order, numel
    return 3 * ivec4_ubo_nbytes + int32_ubo_nbytes;  // 4 fields
}
// sizes, logical limits
return ivec4_ubo_nbytes + uvec3_ubo_nbytes;  // Only 2 fields!
```

And the equivalent in `get_max_ubo_nbytes()`:
```cpp
size_t max_metadata_field_count = 2u;  // texture
if (storage_type() == utils::kBuffer) {
    max_metadata_field_count = 4u;
}
```

With a typical `minUniformBufferOffsetAlignment = 256`:
- **Buffer**: 3×256 + 256 = **1024 bytes** (4 fields)
- **Texture**: 256 + 256 = **512 bytes** (2 fields)

### 2. Operators unconditionally request 4+ fields on texture tensors

**`backends/vulkan/runtime/graph/ops/impl/Linear.cpp` — `add_addmm_naive_node()`:**

```cpp
{
    graph.sizes_ubo(out),
    graph.strides_ubo(out),      // Needs buffer budget
    graph.sizes_ubo(mat1),
    graph.strides_ubo(mat1),     // Needs buffer budget
    graph.sizes_ubo(mat2),
    graph.strides_ubo(mat2),     // Needs buffer budget
    graph.numel_ubo(out),        // Needs buffer budget
    graph.create_params_buffer(params),
},
```

When any tensor here is texture-backed, calling `strides_ubo()` / `numel_ubo()` exceeds the 2-field budget → assertion failure at `metadata_ubo_impl`.

### 3. Ops hardcode texture storage, ignoring export-time overrides

Even with `storage_type_override=VkStorageType.BUFFER`, the runtime creates texture tensors:

**Convolution.cpp** — prepacked weights hardcoded to `kTexture2D`:
```cpp
ValueRef v = graph.add_tensor(
    final_sizes, graph.dtype_of(vref),
    utils::kTexture2D,  // Hardcoded, ignores storage_type_override
    utils::kChannelsPacked);
```

**op_registry.py** — GroupNorm declares texture-only storage:
```python
def register_native_group_norm():
    return OpFeatures(
        inputs_storage=utils.CHANNELS_PACKED_TEXTURE,
        outputs_storage=[utils.CHANNELS_PACKED_TEXTURE, ...],
    )
```

### 4. `force_fp16=True` makes it worse

**`_passes/tag_memory_meta_pass.py:361-365`:**
```python
if self.force_fp16:
    op_repsets.try_constrain_with_arg_repset(arg_i, utils.ANY_TEXTURE)
```

This pushes ALL tensors toward texture storage, maximizing UBO overflow probability.

### 5. PR #11599 adds dim_order UBO without increasing texture budget

The recently merged PR #11599 adds `dim_order_ubo()` as a new metadata accessor, but `calculate_max_ubo_nbytes` was not updated for texture tensors. This creates even more potential for overflow if ops start requesting dim_order on texture tensors.

## Export-time workarounds attempted (all failed)

| Approach | Result |
|----------|--------|
| Default export | Fails |
| `force_fp16=True` | Fails (worse — pushes more tensors to texture) |
| `storage_type_override=VkStorageType.BUFFER` | Fails (ops hardcode texture in C++ runtime) |

Model hashes were verified (SHA-256) after each re-export to confirm the correct model was loaded on the device.

## Suggested Fixes

1. **Increase texture UBO budget** in `calculate_max_ubo_nbytes()` to match buffer (4 fields) — simplest fix, small memory increase per tensor
2. **Guard UBO access by storage type** — ops should only request strides/numel/dim_order on buffer tensors, and use logical_limits for texture tensors
3. **Respect `storage_type_override`** in runtime prepacking ops (Convolution, etc.) — allow the export-time override to actually work end-to-end

## Reproduction

```python
import torch
from torchvision.models import mobilenet_v3_small
from executorch.backends.vulkan.partitioner.vulkan_partitioner import VulkanPartitioner
from executorch.exir import to_edge_transform_and_lower, EdgeCompileConfig

model = mobilenet_v3_small(pretrained=True).eval()
example_input = (torch.randn(1, 3, 224, 224),)

exported = torch.export.export(model, example_input)
et_program = to_edge_transform_and_lower(
    exported,
    compile_config=EdgeCompileConfig(_check_ir_validity=False),
    partitioner=[VulkanPartitioner(
        compile_options={"texture_limits": (2048, 2048, 2048)},
    )],
).to_executorch()

with open("mobilenet_v3_small_vulkan.pte", "wb") as f:
    f.write(et_program.buffer)
```

Run the exported model on any Android device → fails at model load time.

cc @SS-JIA @manuelcandales @digantdesai @cbilgin


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan: Texture tensor UBO overflow on Android #17293

Summary

Environment

Error Log

Root Cause Analysis

1. Texture UBO budget is only 2 fields

2. Operators unconditionally request 4+ fields on texture tensors

3. Ops hardcode texture storage, ignoring export-time overrides

4. `force_fp16=True` makes it worse

5. PR #11599 adds dim_order UBO without increasing texture budget

Export-time workarounds attempted (all failed)

Suggested Fixes

Reproduction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Approach	Result
Default export	Fails
`force_fp16=True`	Fails (worse — pushes more tensors to texture)
`storage_type_override=VkStorageType.BUFFER`	Fails (ops hardcode texture in C++ runtime)

Vulkan: Texture tensor UBO overflow on Android #17293

Description

Summary

Environment

Error Log

Root Cause Analysis

1. Texture UBO budget is only 2 fields

2. Operators unconditionally request 4+ fields on texture tensors

3. Ops hardcode texture storage, ignoring export-time overrides

4. force_fp16=True makes it worse

5. PR #11599 adds dim_order UBO without increasing texture budget

Export-time workarounds attempted (all failed)

Suggested Fixes

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

4. `force_fp16=True` makes it worse