Skip to content

[Web] Avoid unnecessary data copy for pre-allocated tensors#25571

Merged
guschmue merged 3 commits intomicrosoft:mainfrom
Honry:fix-preallocate-issue
Sep 10, 2025
Merged

[Web] Avoid unnecessary data copy for pre-allocated tensors#25571
guschmue merged 3 commits intomicrosoft:mainfrom
Honry:fix-preallocate-issue

Conversation

@Honry
Copy link
Contributor

@Honry Honry commented Jul 29, 2025

Description

Ensure all pre-allocated tensors do not trigger unnecessary data copying. e.g. the WebNN EP always binds its tensor to 'ml-tensor'. In such cases, the tensor ID might change after binding, but copying data for these tensors should still be avoided.

Motivation and Context

This improves efficiency and avoids redundant operations.

@Honry
Copy link
Contributor Author

Honry commented Jul 29, 2025

@fs-eire, PTAL, thanks!

@Honry Honry changed the title [Web] Avoid unnecessary datathe WebNN EP [Web] Avoid unnecessary data copy for pre-allocated tensors Jul 29, 2025
@Honry Honry force-pushed the fix-preallocate-issue branch from 8bd7e01 to 4810b63 Compare July 29, 2025 02:13
@guschmue guschmue added the ep:WebNN WebNN execution provider label Jul 29, 2025
@guschmue
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@fs-eire
Copy link
Contributor

fs-eire commented Jul 29, 2025

I may need to understand more details about why in this case, tensor is not equal to outputTensorHandles[i] but they are the same bound output. Is this a WebNN only case? In my understanding, if tensor !== outputTensorHandles[i], we should release tensor otherwise it's a memory leak.

@Honry
Copy link
Contributor Author

Honry commented Jul 30, 2025

I may need to understand more details about why in this case, tensor is not equal to outputTensorHandles[i] but they are the same bound output. Is this a WebNN only case?

Currently this is WebNN only case, usually if we use pre-allocated output tensors, the ioBindingState will be null. After this PR, WebNN always bind its outputs to 'ml-tensor', therefore the ioBindingState is not null and then it will always call the wasm._OrtRunWithBinding() (which leads to the tensor !== outputTensorHandles[i])

In my understanding, if tensor !== outputTensorHandles[i], we should release tensor otherwise it's a memory leak.

So I need to release this tensor early?

@fs-eire
Copy link
Contributor

fs-eire commented Jul 30, 2025

I may need to understand more details about why in this case, tensor is not equal to outputTensorHandles[i] but they are the same bound output. Is this a WebNN only case?

Currently this is WebNN only case, usually if we use pre-allocated output tensors, the ioBindingState will be null. After this PR, WebNN always bind its outputs to 'ml-tensor', therefore the ioBindingState is not null and then it will always call the wasm._OrtRunWithBinding() (which leads to the tensor !== outputTensorHandles[i])

In my understanding, if tensor !== outputTensorHandles[i], we should release tensor otherwise it's a memory leak.

So I need to release this tensor early?

For OrtRunWithBinding, the tensor should be the same to outputTensorHandles[i]:

for (size_t i = 0; i < output_count; i++) {
outputs[i] = binding_outputs[i];
}

@Honry
Copy link
Contributor Author

Honry commented Jul 31, 2025

I may need to understand more details about why in this case, tensor is not equal to outputTensorHandles[i] but they are the same bound output. Is this a WebNN only case?

Currently this is WebNN only case, usually if we use pre-allocated output tensors, the ioBindingState will be null. After this PR, WebNN always bind its outputs to 'ml-tensor', therefore the ioBindingState is not null and then it will always call the wasm._OrtRunWithBinding() (which leads to the tensor !== outputTensorHandles[i])

In my understanding, if tensor !== outputTensorHandles[i], we should release tensor otherwise it's a memory leak.

So I need to release this tensor early?

For OrtRunWithBinding, the tensor should be the same to outputTensorHandles[i]:

for (size_t i = 0; i < output_count; i++) {
outputs[i] = binding_outputs[i];
}

Oh, that's odd, as I tested tensor !== outputTensorHandles[i] after OrtRunWithBinding, I will do more debug.

@Honry
Copy link
Contributor Author

Honry commented Aug 1, 2025

Oh, that's odd, as I tested tensor !== outputTensorHandles[i] after OrtRunWithBinding, I will do more debug.

@fs-eire, after further debugging, I find there're two places change the address of the tensor.

  1. wasm._OrtBindOutput -> BindOutputImpl
    output_names_.push_back(name);
  2. wasm._OrtRunWithBinding -> GetBoundOutputValues
    value_dups.push_back(std::make_unique<OrtValue>(out_value));

@Honry Honry force-pushed the fix-preallocate-issue branch from 4810b63 to a5cc31b Compare August 4, 2025 07:00
@Honry
Copy link
Contributor Author

Honry commented Aug 11, 2025

@fs-eire, friendly ping.

@fs-eire
Copy link
Contributor

fs-eire commented Sep 5, 2025

@fs-eire, friendly ping.

please merge to latest main branch and add comments in line 866 "// TODO: revisit this part to ensure it works for WebGPU when both pre-allocated outputs and preferred location are specified"

Ensure all pre-allocated tensors do not trigger unnecessary
data copying. e.g. the WebNN EP always binds its tensor to
'ml-tensor'. In such cases, the tensor ID might change after binding,
but copying data for these tensors should still be avoided.

This improves efficiency and avoids redundant operations.
@Honry Honry force-pushed the fix-preallocate-issue branch from a5cc31b to c92a638 Compare September 8, 2025 00:54
@Honry
Copy link
Contributor Author

Honry commented Sep 8, 2025

@fs-eire, friendly ping.

please merge to latest main branch and add comments in line 866 "// TODO: revisit this part to ensure it works for WebGPU when both pre-allocated outputs and preferred location are specified"

@fs-eire, done.

@fs-eire
Copy link
Contributor

fs-eire commented Sep 9, 2025

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@Honry
Copy link
Contributor Author

Honry commented Sep 10, 2025

@fs-eire, one unexpected failure CI, do I need to re-merge to latest main branch?

@guschmue guschmue merged commit 7fa13a6 into microsoft:main Sep 10, 2025
86 of 87 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebNN WebNN execution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants