[Web] Avoid unnecessary data copy for pre-allocated tensors by Honry · Pull Request #25571 · microsoft/onnxruntime

Honry · 2025-07-29T02:10:27Z

Description

Ensure all pre-allocated tensors do not trigger unnecessary data copying. e.g. the WebNN EP always binds its tensor to 'ml-tensor'. In such cases, the tensor ID might change after binding, but copying data for these tensors should still be avoided.

Motivation and Context

This improves efficiency and avoids redundant operations.

Honry · 2025-07-29T02:10:53Z

@fs-eire, PTAL, thanks!

guschmue · 2025-07-29T15:12:44Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-07-29T15:13:11Z

Azure Pipelines successfully started running 5 pipeline(s).

fs-eire · 2025-07-29T23:53:54Z

I may need to understand more details about why in this case, tensor is not equal to outputTensorHandles[i] but they are the same bound output. Is this a WebNN only case? In my understanding, if tensor !== outputTensorHandles[i], we should release tensor otherwise it's a memory leak.

Honry · 2025-07-30T02:19:51Z

I may need to understand more details about why in this case, tensor is not equal to outputTensorHandles[i] but they are the same bound output. Is this a WebNN only case?

Currently this is WebNN only case, usually if we use pre-allocated output tensors, the ioBindingState will be null. After this PR, WebNN always bind its outputs to 'ml-tensor', therefore the ioBindingState is not null and then it will always call the wasm._OrtRunWithBinding() (which leads to the tensor !== outputTensorHandles[i])

In my understanding, if tensor !== outputTensorHandles[i], we should release tensor otherwise it's a memory leak.

So I need to release this tensor early?

fs-eire · 2025-07-30T20:54:05Z

I may need to understand more details about why in this case, tensor is not equal to outputTensorHandles[i] but they are the same bound output. Is this a WebNN only case?

Currently this is WebNN only case, usually if we use pre-allocated output tensors, the ioBindingState will be null. After this PR, WebNN always bind its outputs to 'ml-tensor', therefore the ioBindingState is not null and then it will always call the wasm._OrtRunWithBinding() (which leads to the tensor !== outputTensorHandles[i])

In my understanding, if tensor !== outputTensorHandles[i], we should release tensor otherwise it's a memory leak.

So I need to release this tensor early?

For OrtRunWithBinding, the tensor should be the same to outputTensorHandles[i]:

onnxruntime/onnxruntime/wasm/api.cc

Lines 605 to 607 in 131cf40

    
           for (size_t i = 0; i < output_count; i++) { 
        
             outputs[i] = binding_outputs[i]; 
        
           }

Honry · 2025-07-31T00:10:33Z

I may need to understand more details about why in this case, tensor is not equal to outputTensorHandles[i] but they are the same bound output. Is this a WebNN only case?

Currently this is WebNN only case, usually if we use pre-allocated output tensors, the ioBindingState will be null. After this PR, WebNN always bind its outputs to 'ml-tensor', therefore the ioBindingState is not null and then it will always call the wasm._OrtRunWithBinding() (which leads to the tensor !== outputTensorHandles[i])

In my understanding, if tensor !== outputTensorHandles[i], we should release tensor otherwise it's a memory leak.

So I need to release this tensor early?

For OrtRunWithBinding, the tensor should be the same to outputTensorHandles[i]:

onnxruntime/onnxruntime/wasm/api.cc

Lines 605 to 607 in 131cf40

for (size_t i = 0; i < output_count; i++) {

outputs[i] = binding_outputs[i];

}

Oh, that's odd, as I tested tensor !== outputTensorHandles[i] after OrtRunWithBinding, I will do more debug.

Honry · 2025-08-01T08:14:59Z

Oh, that's odd, as I tested tensor !== outputTensorHandles[i] after OrtRunWithBinding, I will do more debug.

@fs-eire, after further debugging, I find there're two places change the address of the tensor.

wasm._OrtBindOutput -> BindOutputImpl

onnxruntime/onnxruntime/core/session/IOBinding.cc

Line 94 in e57dc2a

output_names_.push_back(name);
wasm._OrtRunWithBinding -> GetBoundOutputValues

onnxruntime/onnxruntime/core/session/onnxruntime_c_api.cc

Line 1018 in e57dc2a

value_dups.push_back(std::make_unique<OrtValue>(out_value));

Honry · 2025-08-11T05:26:12Z

@fs-eire, friendly ping.

fs-eire · 2025-09-05T22:16:00Z

@fs-eire, friendly ping.

please merge to latest main branch and add comments in line 866 "// TODO: revisit this part to ensure it works for WebGPU when both pre-allocated outputs and preferred location are specified"

Ensure all pre-allocated tensors do not trigger unnecessary data copying. e.g. the WebNN EP always binds its tensor to 'ml-tensor'. In such cases, the tensor ID might change after binding, but copying data for these tensors should still be avoided. This improves efficiency and avoids redundant operations.

Honry · 2025-09-08T00:55:13Z

@fs-eire, friendly ping.

please merge to latest main branch and add comments in line 866 "// TODO: revisit this part to ensure it works for WebGPU when both pre-allocated outputs and preferred location are specified"

@fs-eire, done.

fs-eire · 2025-09-09T22:57:44Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

azure-pipelines · 2025-09-09T22:58:04Z

Azure Pipelines successfully started running 5 pipeline(s).

Honry · 2025-09-10T04:55:13Z

@fs-eire, one unexpected failure CI, do I need to re-merge to latest main branch?

Honry changed the title ~~[Web] Avoid unnecessary datathe WebNN EP~~ [Web] Avoid unnecessary data copy for pre-allocated tensors Jul 29, 2025

Honry force-pushed the fix-preallocate-issue branch from 8bd7e01 to 4810b63 Compare July 29, 2025 02:13

guschmue added the ep:WebNN WebNN execution provider label Jul 29, 2025

Honry force-pushed the fix-preallocate-issue branch from 4810b63 to a5cc31b Compare August 4, 2025 07:00

Honry added 3 commits September 8, 2025 08:46

Release redundant tensor earlier

3aa1453

Add TODO

c92a638

Honry force-pushed the fix-preallocate-issue branch from a5cc31b to c92a638 Compare September 8, 2025 00:54

fs-eire approved these changes Sep 9, 2025

View reviewed changes

guschmue merged commit 7fa13a6 into microsoft:main Sep 10, 2025
86 of 87 checks passed

miaobin mentioned this pull request Sep 18, 2025

Optimize the whisper base demo microsoft/webnn-developer-preview#94

Merged

Conversation

Honry commented Jul 29, 2025

Description

Motivation and Context

Uh oh!

Honry commented Jul 29, 2025

Uh oh!

guschmue commented Jul 29, 2025

Uh oh!

azure-pipelines bot commented Jul 29, 2025

Uh oh!

fs-eire commented Jul 29, 2025

Uh oh!

Honry commented Jul 30, 2025

Uh oh!

fs-eire commented Jul 30, 2025

Uh oh!

Honry commented Jul 31, 2025

Uh oh!

Honry commented Aug 1, 2025

Uh oh!

Honry commented Aug 11, 2025

Uh oh!

fs-eire commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Honry commented Sep 8, 2025

Uh oh!

fs-eire commented Sep 9, 2025

Uh oh!

azure-pipelines bot commented Sep 9, 2025

Uh oh!

Honry commented Sep 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fs-eire commented Sep 5, 2025 •

edited

Loading