CVS-175980: Disable pre-allocation of memory for dynamic models through onnxruntime_perf_test#896
Conversation
b29a692 to
91c444c
Compare
adrianlizarraga
left a comment
There was a problem hiding this comment.
This looks solid to me. Thank you for fixing it.
| bool has_dynamic_output = false; | ||
|
|
||
| for (size_t i = 0; i < session_.GetOutputCount(); ++i) { | ||
| auto type_info = session_.GetOutputTypeInfo(i); |
There was a problem hiding this comment.
nit: some minor indentation issues. I only point this out because it will be flagged in the main ORT repo CI.
There was a problem hiding this comment.
Pull request overview
This PR fixes memory allocation issues for dynamic models when using device memory with the OpenVINO execution provider. Previously, the code pre-allocated output buffers by treating dynamic dimensions (-1) as 1, which caused crashes when users specified larger dimensions via -f or reshape_input options.
Key Changes:
- Detects dynamic dimensions in model outputs before pre-allocating device memory
- Skips pre-allocation when dynamic outputs are present, allowing runtime dimension resolution
- Preserves existing pre-allocation behavior for static models
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return Ort::Value(nullptr); | ||
| }; | ||
| } else { | ||
| new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape, | ||
| Ort::ConstTensorTypeAndShapeInfo& tensor_info) { | ||
| return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(), |
There was a problem hiding this comment.
Inconsistent indentation: the lambda body at line 1052 should be indented consistently with other lambda bodies in the same context (compare with lines 1023-1025 and 1055-1059).
| return Ort::Value(nullptr); | |
| }; | |
| } else { | |
| new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape, | |
| Ort::ConstTensorTypeAndShapeInfo& tensor_info) { | |
| return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(), | |
| return Ort::Value(nullptr); | |
| }; | |
| } else { | |
| new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape, | |
| Ort::ConstTensorTypeAndShapeInfo& tensor_info) { | |
| return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(), |
|
|
||
| auto transform_fcn = std::function<int64_t(int64_t)>(); | ||
| auto new_value = std::function<Ort::Value(OrtAllocator*, const std::vector<int64_t>&, Ort::ConstTensorTypeAndShapeInfo&)>(); | ||
| transform_fcn = [](int64_t input) { return input; }; |
There was a problem hiding this comment.
The declaration of transform_fcn is just a few lines above. If the variable is unconditionally changed to an identity function I think it makes sense to instead remove the variable and skip the transform call below; it's not doing anything.
There was a problem hiding this comment.
@javier-intel .. transform_fcn is used in line1067
However, there is no conditional seting it anymore.
@adrianlizarraga .. Can comment if the transform function can be removed altogether.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1034,11 +1034,30 @@ select from 'TF8', 'TF16', 'UINT8', 'FLOAT', 'ITENSOR'. \n)"); | |||
| // Switch to custom | |||
| allocator_ = Ort::UnownedAllocator(custom_allocator_); | |||
|
|
|||
| // free dimensions are treated as 1 if not overridden | |||
| transform_fcn = [](int64_t input) { return (input == -1) ? -input : input; }; | |||
| new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape, Ort::ConstTensorTypeAndShapeInfo& tensor_info) { | |||
| return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(), tensor_info.GetElementType()); | |||
| }; | |||
| // Do not pre-allocate if dynamic dimensions are present | |||
| bool has_dynamic_output = false; | |||
|
|
|||
| for (size_t i = 0; i < session_.GetOutputCount(); ++i) { | |||
| auto type_info = session_.GetOutputTypeInfo(i); | |||
| auto tensor_info = type_info.GetTensorTypeAndShapeInfo(); | |||
| auto shape = tensor_info.GetShape(); | |||
| if (std::any_of(shape.begin(), shape.end(), [](int64_t d) { return d == -1; })) { | |||
| has_dynamic_output = true; | |||
| break; | |||
| } | |||
| } | |||
|
|
|||
| if (has_dynamic_output) { | |||
| new_value = [](OrtAllocator*, const std::vector<int64_t>&, Ort::ConstTensorTypeAndShapeInfo&) { | |||
| return Ort::Value(nullptr); | |||
| }; | |||
| } else { | |||
| new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape, | |||
| Ort::ConstTensorTypeAndShapeInfo& tensor_info) { | |||
| return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(), | |||
| tensor_info.GetElementType()); | |||
| }; | |||
| } | |||
There was a problem hiding this comment.
I'd replace everything from line 1019-1069 with the following:
| if (!device_memory_name_.empty()) { | |
| Ort::MemoryInfo memory_info(nullptr); // Default initialize, will be overwritten | |
| if (device_memory_name_ == CUDA) { | |
| memory_info = Ort::MemoryInfo(device_memory_name_.data(), OrtArenaAllocator, 0, OrtMemTypeDefault); | |
| } else { | |
| memory_info = Ort::MemoryInfo(device_memory_name_.data(), OrtArenaAllocator, 0, OrtMemTypeCPUOutput); | |
| } | |
| custom_allocator_ = Ort::Allocator(session_, memory_info); | |
| // Switch to custom | |
| allocator_ = Ort::UnownedAllocator(custom_allocator_); | |
| } | |
| for (size_t i = 0; i < output_names_raw_ptr.size(); i++) { | |
| Ort::TypeInfo type_info = session_.GetOutputTypeInfo(i); | |
| auto tensor_info = type_info.GetTensorTypeAndShapeInfo(); | |
| std::vector<int64_t> output_shape = tensor_info.GetShape(); | |
| auto is_dynamic = std::find(shape.begin(), shape.end(), -1) != shape.end(); | |
| if (is_dynamic || device_memory_name_.empty()) { | |
| outputs_.emplace_back(Ort::Value(nullptr)); | |
| } else { | |
| auto & new_value = Ort::Value::CreateTensor(allocator_, output_shape.data(), output_shape.size(), tensor_info.GetElementType()); | |
| outputs_.emplace_back(std::move(new_value)); | |
| } | |
| } |
There was a problem hiding this comment.
Thanks @javier-intel, I've updated the code to check each output and pre-allocate only the static ones as suggested.
fc9569c to
e46e233
Compare
cd6516f to
35e276d
Compare
Description
This PR disables pre-allocating output buffer memory on device for dynamic models. Now, the device_memory_name|OpenVINO_RT_NPU option can be used with the freeoverride dimension or the reshape input option as such:
onnxruntime_perf_test -e openvino -m duration -t 30 -o 0 -C "session.disable_cpu_ep_fallback|1" -i "device_type|NPU device_memory_name|OpenVINO_RT_NPU reshape_input|input[]"
or
onnxruntime_perf_test -e openvino -m duration -t 30 -o 0 -C "session.disable_cpu_ep_fallback|1" -I -f "dynamic_dimension:100" -i "device_type|NPU device_memory_name|OpenVINO_RT_NPU"
Motivation and Context
This PR addresses the following open-source issue: microsoft#26217
When using device memory allocation (using the flag- device_memory_name|OpenVINO_RT_NPU) with models containing dynamic dimensions, the current code pre-allocates output buffers by converting all -1 dimensions to 1. This prevents the freeoverride dimension (-f) and the reshape input option from working as required. For example, if users specify -f batch:5, the pre-allocated buffer (size=1) is too small for actual inference (size=5), leading to memory crashes.