Skip to content

Add API to compile a model#24207

Merged
skottmckay merged 33 commits intomainfrom
adrianl/model-compile-api
Apr 12, 2025
Merged

Add API to compile a model#24207
skottmckay merged 33 commits intomainfrom
adrianl/model-compile-api

Conversation

@adrianlizarraga
Copy link
Contributor

@adrianlizarraga adrianlizarraga commented Mar 27, 2025

Description

  • Adds C/C++ API functionality to compile a model (i.e., generate a model with EPContext nodes) using explicit APIs.
  • Adds support for compiling when input or output models are in memory (not just files).
  • Allows specifying the threshold for when initializers are stored in an external file.
  • Allows file paths of arbitrary lengths (session_option key/value configs limited string length to 2048).

List of C API functions:

ORT_API(const OrtCompileApi*, GetCompileApi);

ORT_API(void, ReleaseModelCompilationOptions, _Frees_ptr_opt_ OrtModelCompilationOptions*);
ORT_API2_STATUS(CreateModelCompilationOptionsFromSessionOptions, _In_ const OrtEnv* env,
                _In_ const OrtSessionOptions* session_options, _Outptr_ OrtModelCompilationOptions** out);
ORT_API2_STATUS(ModelCompilationOptions_SetInputModelPath, _In_ OrtModelCompilationOptions* model_compile_options,
                _In_ const ORTCHAR_T* input_model_path);
ORT_API2_STATUS(ModelCompilationOptions_SetInputModelFromBuffer, _In_ OrtModelCompilationOptions* model_compile_options,
                _In_ const void* input_model_data, size_t input_model_data_size);
ORT_API2_STATUS(ModelCompilationOptions_SetOutputModelPath, _In_ OrtModelCompilationOptions* model_compile_options,
                _In_ const ORTCHAR_T* output_model_path);
ORT_API2_STATUS(ModelCompilationOptions_SetOutputModelExternalInitializersFile,
                _In_ OrtModelCompilationOptions* model_compile_options,
                _In_ const ORTCHAR_T* external_initializers_file_path,
                size_t external_initializer_size_threshold);
ORT_API2_STATUS(ModelCompilationOptions_SetOutputModelBuffer, _In_ OrtModelCompilationOptions* model_compile_options,
                _Inout_ OrtAllocator* allocator, void** output_model_buffer_ptr, size_t* output_model_buffer_size_ptr);
ORT_API2_STATUS(ModelCompilationOptions_SetEpContextEmbedMode, _In_ OrtModelCompilationOptions* model_compile_options,
                bool embed_ep_context_in_model);
ORT_API2_STATUS(CompileModel, _In_ const OrtEnv* env, _In_ const OrtModelCompilationOptions* model_options);

Example (see unit tests for others):

#include "onnxruntime_cxx_api.h"

// Test using the CompileModel() API with settings:
//   - input model from buffer
//   - output model file
//   - EPContext nodes in output model use embedded binary blobs.
TEST_F(QnnHTPBackendTests, CompileApi_FromSessionOptions_InputModelAsBuffer_Embedded) {
  const ORTCHAR_T* output_model_file = ORT_TSTR("./qnn_context_binary_multi_partition_test.onnx");
  std::filesystem::remove(output_model_file);

  // Initialize session options with QNN EP
  Ort::SessionOptions session_options;
  ProviderOptions provider_options;
#if defined(_WIN32)
  provider_options["backend_path"] = "QnnHtp.dll";
#else
  provider_options["backend_path"] = "libQnnHtp.so";
#endif
  provider_options["offload_graph_io_quantization"] = "0";
  session_options.AppendExecutionProvider("QNN", provider_options);

  // Create model compilation options from the session options.
  Ort::ModelCompilationOptions compile_options(*ort_env, session_options);
  compile_options.SetInputModelFromBuffer(reinterpret_cast<const void*>(model_data.data()), model_data.size());
  compile_options.SetOutputModelPath(output_model_file);
  compile_options.SetEpContextEmbedMode(true);

  // Compile the model.
  Ort::Status status = Ort::CompileModel(*ort_env, compile_options);
  ASSERT_TRUE(status.IsOK());

  // Make sure the compiled model was generated and has the expected number of EPContext nodes.
  ASSERT_TRUE(std::filesystem::exists(output_model_file));
  CheckEpContextNodeCounts(output_model_file, 2, 2);
}

Motivation and Context

Improve compilation workflow and add new capabilities.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@adrianlizarraga adrianlizarraga marked this pull request as ready for review April 7, 2025 15:44
@adrianlizarraga adrianlizarraga changed the title [draft] Add API to compile a model Add API to compile a model Apr 7, 2025
…ssion_options object. Return error if compiled model doesn't contain EPContext nodes (compile API only)
mschofie
mschofie previously approved these changes Apr 9, 2025
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@adrianlizarraga adrianlizarraga dismissed stale reviews from skottmckay and vriveras via ce389f1 April 12, 2025 04:03
@skottmckay skottmckay merged commit 90c263f into main Apr 12, 2025
85 of 89 checks passed
@skottmckay skottmckay deleted the adrianl/model-compile-api branch April 12, 2025 07:08

/** \brief Sets the file path for the output ONNX model generated by CompileModel.
*
* If the output model path is not specified and the output model is not to be stored in a buffer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path is not specified

It not clear how to unspecify the path. Would it be nullptr ptr?

* \param[in] model_compile_options The OrtModelCompilationOptions instance.
* \param[in] allocator The allocator used to allocate the buffer for the compiled model.
* \param[out] output_model_buffer_ptr Pointer to the buffer that stores the compiled model.
* \param[out] output_model_buffer_size_ptr Pointer set to the size of output buffer in bytes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set to the size of output buffe

suggestion: set to the size of the output model in bytes.

ORT_DEFINE_RELEASE(Node);
ORT_DEFINE_RELEASE(Graph);
ORT_DEFINE_RELEASE(Model);
#if !defined(ORT_MINIMAL_BUILD)
Copy link
Member

@yuslepukhin yuslepukhin Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ORT_MINIMAL_BUILD

My understanding ORT_MINIMAL_BUILD is a build option for the ORT, but not for the customer source code.
They code will probably not have a notion of MINIMAL build. So this will always be undefined. This may cause discrepancy between the binary and the header that the customer sees.

I suggest we do not leak our internal defines to public headers and adjust the source code assumptions accordingly.

using Base::Base;

explicit ModelCompilationOptions(std::nullptr_t) {} ///< Create an empty ModelCompilationOptions object, must be assigned a valid one to be used.
explicit ModelCompilationOptions(OrtModelCompilationOptions* p) ///< Takes ownership of an OrtModelCompilationOptions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptions(OrtModelCompilationOptions* p)

Given using Base::Base this ctor seems to be redundant

"Cannot serialize ONNX ModelProto larger than 2GB");

OrtAllocator* allocator = ep_context_gen_options.output_model_buffer_allocator;
void* buffer = allocator->Alloc(allocator, buffer_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oid* buffer

Can Serialize() throw and we leak the buffer?

ORT_RETURN_IF(buffer_size > static_cast<size_t>(std::numeric_limits<int>::max()),
"Cannot serialize ONNX ModelProto larger than 2GB");

OrtAllocator* allocator = ep_context_gen_options.output_model_buffer_allocator;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OrtAllocator* allocator

Typically, we wrap the incoming OrtAllocator pointers into a special Wrapper class, so it is represented as AllocatorPtr. This can be used in other places in our code safely and it can be used within a smart ptr.

Perhaps, we need to check if storing a raw ptr here is a good idea.


struct ModelCompilationOptions {
const OrtEnv* env = nullptr;
std::unique_ptr<OrtSessionOptions> session_options = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s

Is this optional?


Status ModelCompilationOptions::ResetOutputModelSettings() {
EpContextModelGenerationOptions& ep_context_gen_options = session_options->value.ep_context_gen_options;
ep_context_gen_options.output_model_file_path = "";
Copy link
Member

@yuslepukhin yuslepukhin Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= "";

clear() is a better choice.


namespace onnxruntime {
void ModelCompilationOptions::ResetInputModelSettings() {
input_model_path = "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ath = "";

as below

adrianlizarraga added a commit that referenced this pull request Apr 16, 2025
### Description
Address additional review comments on
#24207:
- Remove use of `#ifdef ORT_MINIMAL_BUILD` in public C/C++ API headers
for Compile API
- Use `AllocatorPtr` internally to ensure memory is properly released if
an exception is thrown while serializing the output model to the user's
buffer.
- Improve C API function documentation.
- Clean up internal `ModelCompilationOptions` class



### Motivation and Context
Useful review comments were left on the original PR after merge. This
addresses those comments.
adrianlizarraga added a commit that referenced this pull request Apr 17, 2025
… EP (#24406)

### Description
A new overload of CreateProvider() was added to the OpenVINO EP to
handle the extraction of EP options from the session option
configurations.


### Motivation and Context
Allows use of new Compile API.
Refer to #24207
adrianlizarraga added a commit that referenced this pull request Apr 18, 2025
…ession options (#24445)

### Description
A new overload of CreateProvider() was added to the to handle the
extraction of EP options from the session option configurations.

### Motivation and Context
Allows use of new Compile API.
Refer to #24207
ashrit-ms pushed a commit that referenced this pull request Apr 24, 2025
### Description
- Adds C/C++ API functionality to compile a model (i.e., generate a
model with EPContext nodes) using explicit APIs.
- Adds support for compiling when input or output models are in memory
(not just files).
- Allows specifying the threshold for when initializers are stored in an
external file.
- Allows file paths of arbitrary lengths (session_option key/value
configs limited string length to 2048).

List of C API functions:
```C++
ORT_API(const OrtCompileApi*, GetCompileApi);

ORT_API(void, ReleaseModelCompilationOptions, _Frees_ptr_opt_ OrtModelCompilationOptions*);
ORT_API2_STATUS(CreateModelCompilationOptionsFromSessionOptions, _In_ const OrtEnv* env,
                _In_ const OrtSessionOptions* session_options, _Outptr_ OrtModelCompilationOptions** out);
ORT_API2_STATUS(ModelCompilationOptions_SetInputModelPath, _In_ OrtModelCompilationOptions* model_compile_options,
                _In_ const ORTCHAR_T* input_model_path);
ORT_API2_STATUS(ModelCompilationOptions_SetInputModelFromBuffer, _In_ OrtModelCompilationOptions* model_compile_options,
                _In_ const void* input_model_data, size_t input_model_data_size);
ORT_API2_STATUS(ModelCompilationOptions_SetOutputModelPath, _In_ OrtModelCompilationOptions* model_compile_options,
                _In_ const ORTCHAR_T* output_model_path);
ORT_API2_STATUS(ModelCompilationOptions_SetOutputModelExternalInitializersFile,
                _In_ OrtModelCompilationOptions* model_compile_options,
                _In_ const ORTCHAR_T* external_initializers_file_path,
                size_t external_initializer_size_threshold);
ORT_API2_STATUS(ModelCompilationOptions_SetOutputModelBuffer, _In_ OrtModelCompilationOptions* model_compile_options,
                _Inout_ OrtAllocator* allocator, void** output_model_buffer_ptr, size_t* output_model_buffer_size_ptr);
ORT_API2_STATUS(ModelCompilationOptions_SetEpContextEmbedMode, _In_ OrtModelCompilationOptions* model_compile_options,
                bool embed_ep_context_in_model);
ORT_API2_STATUS(CompileModel, _In_ const OrtEnv* env, _In_ const OrtModelCompilationOptions* model_options);
```

Example (see unit tests for others):
```C++
#include "onnxruntime_cxx_api.h"

// Test using the CompileModel() API with settings:
//   - input model from buffer
//   - output model file
//   - EPContext nodes in output model use embedded binary blobs.
TEST_F(QnnHTPBackendTests, CompileApi_FromSessionOptions_InputModelAsBuffer_Embedded) {
  const ORTCHAR_T* output_model_file = ORT_TSTR("./qnn_context_binary_multi_partition_test.onnx");
  std::filesystem::remove(output_model_file);

  // Initialize session options with QNN EP
  Ort::SessionOptions session_options;
  ProviderOptions provider_options;
#if defined(_WIN32)
  provider_options["backend_path"] = "QnnHtp.dll";
#else
  provider_options["backend_path"] = "libQnnHtp.so";
#endif
  provider_options["offload_graph_io_quantization"] = "0";
  session_options.AppendExecutionProvider("QNN", provider_options);

  // Create model compilation options from the session options.
  Ort::ModelCompilationOptions compile_options(*ort_env, session_options);
  compile_options.SetInputModelFromBuffer(reinterpret_cast<const void*>(model_data.data()), model_data.size());
  compile_options.SetOutputModelPath(output_model_file);
  compile_options.SetEpContextEmbedMode(true);

  // Compile the model.
  Ort::Status status = Ort::CompileModel(*ort_env, compile_options);
  ASSERT_TRUE(status.IsOK());

  // Make sure the compiled model was generated and has the expected number of EPContext nodes.
  ASSERT_TRUE(std::filesystem::exists(output_model_file));
  CheckEpContextNodeCounts(output_model_file, 2, 2);
}
```


### Motivation and Context
Improve compilation workflow and add new capabilities.

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
ashrit-ms pushed a commit that referenced this pull request Apr 24, 2025
### Description
Address additional review comments on
#24207:
- Remove use of `#ifdef ORT_MINIMAL_BUILD` in public C/C++ API headers
for Compile API
- Use `AllocatorPtr` internally to ensure memory is properly released if
an exception is thrown while serializing the output model to the user's
buffer.
- Improve C API function documentation.
- Clean up internal `ModelCompilationOptions` class



### Motivation and Context
Useful review comments were left on the original PR after merge. This
addresses those comments.
ashrit-ms pushed a commit that referenced this pull request Apr 24, 2025
… EP (#24406)

### Description
A new overload of CreateProvider() was added to the OpenVINO EP to
handle the extraction of EP options from the session option
configurations.


### Motivation and Context
Allows use of new Compile API.
Refer to #24207
ashrit-ms pushed a commit that referenced this pull request Apr 24, 2025
…ession options (#24445)

### Description
A new overload of CreateProvider() was added to the to handle the
extraction of EP options from the session option configurations.

### Motivation and Context
Allows use of new Compile API.
Refer to #24207
intbf pushed a commit to intbf/onnxruntime that referenced this pull request Apr 25, 2025
… EP (microsoft#24406)

### Description
A new overload of CreateProvider() was added to the OpenVINO EP to
handle the extraction of EP options from the session option
configurations.

### Motivation and Context
Allows use of new Compile API.
Refer to microsoft#24207

Signed-off-by: bfilipek <bartlomiej.filipek@intel.com>
intbf pushed a commit to intbf/onnxruntime that referenced this pull request Apr 25, 2025
…ession options (microsoft#24445)

### Description
A new overload of CreateProvider() was added to the to handle the
extraction of EP options from the session option configurations.

### Motivation and Context
Allows use of new Compile API.
Refer to microsoft#24207

Signed-off-by: bfilipek <bartlomiej.filipek@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants