Add API to compile a model by adrianlizarraga · Pull Request #24207 · microsoft/onnxruntime

adrianlizarraga · 2025-03-27T13:01:27Z

Description

Adds C/C++ API functionality to compile a model (i.e., generate a model with EPContext nodes) using explicit APIs.
Adds support for compiling when input or output models are in memory (not just files).
Allows specifying the threshold for when initializers are stored in an external file.
Allows file paths of arbitrary lengths (session_option key/value configs limited string length to 2048).

List of C API functions:

ORT_API(const OrtCompileApi*, GetCompileApi);

ORT_API(void, ReleaseModelCompilationOptions, _Frees_ptr_opt_ OrtModelCompilationOptions*);
ORT_API2_STATUS(CreateModelCompilationOptionsFromSessionOptions, _In_ const OrtEnv* env,
                _In_ const OrtSessionOptions* session_options, _Outptr_ OrtModelCompilationOptions** out);
ORT_API2_STATUS(ModelCompilationOptions_SetInputModelPath, _In_ OrtModelCompilationOptions* model_compile_options,
                _In_ const ORTCHAR_T* input_model_path);
ORT_API2_STATUS(ModelCompilationOptions_SetInputModelFromBuffer, _In_ OrtModelCompilationOptions* model_compile_options,
                _In_ const void* input_model_data, size_t input_model_data_size);
ORT_API2_STATUS(ModelCompilationOptions_SetOutputModelPath, _In_ OrtModelCompilationOptions* model_compile_options,
                _In_ const ORTCHAR_T* output_model_path);
ORT_API2_STATUS(ModelCompilationOptions_SetOutputModelExternalInitializersFile,
                _In_ OrtModelCompilationOptions* model_compile_options,
                _In_ const ORTCHAR_T* external_initializers_file_path,
                size_t external_initializer_size_threshold);
ORT_API2_STATUS(ModelCompilationOptions_SetOutputModelBuffer, _In_ OrtModelCompilationOptions* model_compile_options,
                _Inout_ OrtAllocator* allocator, void** output_model_buffer_ptr, size_t* output_model_buffer_size_ptr);
ORT_API2_STATUS(ModelCompilationOptions_SetEpContextEmbedMode, _In_ OrtModelCompilationOptions* model_compile_options,
                bool embed_ep_context_in_model);
ORT_API2_STATUS(CompileModel, _In_ const OrtEnv* env, _In_ const OrtModelCompilationOptions* model_options);

Example (see unit tests for others):

#include "onnxruntime_cxx_api.h"

// Test using the CompileModel() API with settings:
//   - input model from buffer
//   - output model file
//   - EPContext nodes in output model use embedded binary blobs.
TEST_F(QnnHTPBackendTests, CompileApi_FromSessionOptions_InputModelAsBuffer_Embedded) {
  const ORTCHAR_T* output_model_file = ORT_TSTR("./qnn_context_binary_multi_partition_test.onnx");
  std::filesystem::remove(output_model_file);

  // Initialize session options with QNN EP
  Ort::SessionOptions session_options;
  ProviderOptions provider_options;
#if defined(_WIN32)
  provider_options["backend_path"] = "QnnHtp.dll";
#else
  provider_options["backend_path"] = "libQnnHtp.so";
#endif
  provider_options["offload_graph_io_quantization"] = "0";
  session_options.AppendExecutionProvider("QNN", provider_options);

  // Create model compilation options from the session options.
  Ort::ModelCompilationOptions compile_options(*ort_env, session_options);
  compile_options.SetInputModelFromBuffer(reinterpret_cast<const void*>(model_data.data()), model_data.size());
  compile_options.SetOutputModelPath(output_model_file);
  compile_options.SetEpContextEmbedMode(true);

  // Compile the model.
  Ort::Status status = Ort::CompileModel(*ort_env, compile_options);
  ASSERT_TRUE(status.IsOK());

  // Make sure the compiled model was generated and has the expected number of EPContext nodes.
  ASSERT_TRUE(std::filesystem::exists(output_model_file));
  CheckEpContextNodeCounts(output_model_file, 2, 2);
}

Motivation and Context

Improve compilation workflow and add new capabilities.

github-actions

You can commit the suggested changes from lintrunner.

include/onnxruntime/core/session/onnxruntime_cxx_api.h

onnxruntime/core/framework/graph_partitioner.cc

… minimal ORT build.

include/onnxruntime/core/session/onnxruntime_c_api.h

…ssion_options object. Return error if compiled model doesn't contain EPContext nodes (compile API only)

onnxruntime/core/framework/graph_partitioner.h

github-actions

You can commit the suggested changes from lintrunner.

include/onnxruntime/core/providers/providers.h

include/onnxruntime/core/session/onnxruntime_c_api.h

yuslepukhin · 2025-04-14T17:51:36Z

include/onnxruntime/core/session/onnxruntime_c_api.h

+
+  /** \brief Sets the file path for the output ONNX model generated by CompileModel.
+   *
+   * If the output model path is not specified and the output model is not to be stored in a buffer,


path is not specified

It not clear how to unspecify the path. Would it be nullptr ptr?

yuslepukhin · 2025-04-14T17:54:51Z

include/onnxruntime/core/session/onnxruntime_c_api.h

+   * \param[in] model_compile_options The OrtModelCompilationOptions instance.
+   * \param[in] allocator The allocator used to allocate the buffer for the compiled model.
+   * \param[out] output_model_buffer_ptr Pointer to the buffer that stores the compiled model.
+   * \param[out] output_model_buffer_size_ptr Pointer set to the size of output buffer in bytes.


set to the size of output buffe

suggestion: set to the size of the output model in bytes.

yuslepukhin · 2025-04-14T17:58:35Z

include/onnxruntime/core/session/onnxruntime_cxx_api.h

 ORT_DEFINE_RELEASE(Node);
 ORT_DEFINE_RELEASE(Graph);
 ORT_DEFINE_RELEASE(Model);
+#if !defined(ORT_MINIMAL_BUILD)


ORT_MINIMAL_BUILD

My understanding ORT_MINIMAL_BUILD is a build option for the ORT, but not for the customer source code.
They code will probably not have a notion of MINIMAL build. So this will always be undefined. This may cause discrepancy between the binary and the header that the customer sees.

I suggest we do not leak our internal defines to public headers and adjust the source code assumptions accordingly.

yuslepukhin · 2025-04-14T18:18:24Z

include/onnxruntime/core/session/onnxruntime_cxx_api.h

+  using Base::Base;
+
+  explicit ModelCompilationOptions(std::nullptr_t) {}              ///< Create an empty ModelCompilationOptions object, must be assigned a valid one to be used.
+  explicit ModelCompilationOptions(OrtModelCompilationOptions* p)  ///< Takes ownership of an OrtModelCompilationOptions


ptions(OrtModelCompilationOptions* p)

Given using Base::Base this ctor seems to be redundant

yuslepukhin · 2025-04-14T18:26:11Z

onnxruntime/core/framework/graph_partitioner.cc

+                  "Cannot serialize ONNX ModelProto larger than 2GB");
+
+    OrtAllocator* allocator = ep_context_gen_options.output_model_buffer_allocator;
+    void* buffer = allocator->Alloc(allocator, buffer_size);


oid* buffer

Can Serialize() throw and we leak the buffer?

yuslepukhin · 2025-04-14T18:29:11Z

onnxruntime/core/framework/graph_partitioner.cc

+    ORT_RETURN_IF(buffer_size > static_cast<size_t>(std::numeric_limits<int>::max()),
+                  "Cannot serialize ONNX ModelProto larger than 2GB");
+
+    OrtAllocator* allocator = ep_context_gen_options.output_model_buffer_allocator;


OrtAllocator* allocator

Typically, we wrap the incoming OrtAllocator pointers into a special Wrapper class, so it is represented as AllocatorPtr. This can be used in other places in our code safely and it can be used within a smart ptr.

Perhaps, we need to check if storing a raw ptr here is a good idea.

yuslepukhin · 2025-04-14T18:37:26Z

onnxruntime/core/session/model_compilation_options.h

+
+struct ModelCompilationOptions {
+  const OrtEnv* env = nullptr;
+  std::unique_ptr<OrtSessionOptions> session_options = nullptr;


s

Is this optional?

yuslepukhin · 2025-04-14T18:37:42Z

onnxruntime/core/session/model_compilation_options.cc

+
+Status ModelCompilationOptions::ResetOutputModelSettings() {
+  EpContextModelGenerationOptions& ep_context_gen_options = session_options->value.ep_context_gen_options;
+  ep_context_gen_options.output_model_file_path = "";


= "";

clear() is a better choice.

yuslepukhin · 2025-04-14T18:38:37Z

onnxruntime/core/session/model_compilation_options.cc

+
+namespace onnxruntime {
+void ModelCompilationOptions::ResetInputModelSettings() {
+  input_model_path = "";


ath = "";

as below

### Description Address additional review comments on #24207: - Remove use of `#ifdef ORT_MINIMAL_BUILD` in public C/C++ API headers for Compile API - Use `AllocatorPtr` internally to ensure memory is properly released if an exception is thrown while serializing the output model to the user's buffer. - Improve C API function documentation. - Clean up internal `ModelCompilationOptions` class ### Motivation and Context Useful review comments were left on the original PR after merge. This addresses those comments.

… EP (#24406) ### Description A new overload of CreateProvider() was added to the OpenVINO EP to handle the extraction of EP options from the session option configurations. ### Motivation and Context Allows use of new Compile API. Refer to #24207

…ession options (#24445) ### Description A new overload of CreateProvider() was added to the to handle the extraction of EP options from the session option configurations. ### Motivation and Context Allows use of new Compile API. Refer to #24207

### Description - Adds C/C++ API functionality to compile a model (i.e., generate a model with EPContext nodes) using explicit APIs. - Adds support for compiling when input or output models are in memory (not just files). - Allows specifying the threshold for when initializers are stored in an external file. - Allows file paths of arbitrary lengths (session_option key/value configs limited string length to 2048). List of C API functions: ```C++ ORT_API(const OrtCompileApi*, GetCompileApi); ORT_API(void, ReleaseModelCompilationOptions, _Frees_ptr_opt_ OrtModelCompilationOptions*); ORT_API2_STATUS(CreateModelCompilationOptionsFromSessionOptions, _In_ const OrtEnv* env, _In_ const OrtSessionOptions* session_options, _Outptr_ OrtModelCompilationOptions** out); ORT_API2_STATUS(ModelCompilationOptions_SetInputModelPath, _In_ OrtModelCompilationOptions* model_compile_options, _In_ const ORTCHAR_T* input_model_path); ORT_API2_STATUS(ModelCompilationOptions_SetInputModelFromBuffer, _In_ OrtModelCompilationOptions* model_compile_options, _In_ const void* input_model_data, size_t input_model_data_size); ORT_API2_STATUS(ModelCompilationOptions_SetOutputModelPath, _In_ OrtModelCompilationOptions* model_compile_options, _In_ const ORTCHAR_T* output_model_path); ORT_API2_STATUS(ModelCompilationOptions_SetOutputModelExternalInitializersFile, _In_ OrtModelCompilationOptions* model_compile_options, _In_ const ORTCHAR_T* external_initializers_file_path, size_t external_initializer_size_threshold); ORT_API2_STATUS(ModelCompilationOptions_SetOutputModelBuffer, _In_ OrtModelCompilationOptions* model_compile_options, _Inout_ OrtAllocator* allocator, void** output_model_buffer_ptr, size_t* output_model_buffer_size_ptr); ORT_API2_STATUS(ModelCompilationOptions_SetEpContextEmbedMode, _In_ OrtModelCompilationOptions* model_compile_options, bool embed_ep_context_in_model); ORT_API2_STATUS(CompileModel, _In_ const OrtEnv* env, _In_ const OrtModelCompilationOptions* model_options); ``` Example (see unit tests for others): ```C++ #include "onnxruntime_cxx_api.h" // Test using the CompileModel() API with settings: // - input model from buffer // - output model file // - EPContext nodes in output model use embedded binary blobs. TEST_F(QnnHTPBackendTests, CompileApi_FromSessionOptions_InputModelAsBuffer_Embedded) { const ORTCHAR_T* output_model_file = ORT_TSTR("./qnn_context_binary_multi_partition_test.onnx"); std::filesystem::remove(output_model_file); // Initialize session options with QNN EP Ort::SessionOptions session_options; ProviderOptions provider_options; #if defined(_WIN32) provider_options["backend_path"] = "QnnHtp.dll"; #else provider_options["backend_path"] = "libQnnHtp.so"; #endif provider_options["offload_graph_io_quantization"] = "0"; session_options.AppendExecutionProvider("QNN", provider_options); // Create model compilation options from the session options. Ort::ModelCompilationOptions compile_options(*ort_env, session_options); compile_options.SetInputModelFromBuffer(reinterpret_cast<const void*>(model_data.data()), model_data.size()); compile_options.SetOutputModelPath(output_model_file); compile_options.SetEpContextEmbedMode(true); // Compile the model. Ort::Status status = Ort::CompileModel(*ort_env, compile_options); ASSERT_TRUE(status.IsOK()); // Make sure the compiled model was generated and has the expected number of EPContext nodes. ASSERT_TRUE(std::filesystem::exists(output_model_file)); CheckEpContextNodeCounts(output_model_file, 2, 2); } ``` ### Motivation and Context Improve compilation workflow and add new capabilities. --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>

### Description Address additional review comments on #24207: - Remove use of `#ifdef ORT_MINIMAL_BUILD` in public C/C++ API headers for Compile API - Use `AllocatorPtr` internally to ensure memory is properly released if an exception is thrown while serializing the output model to the user's buffer. - Improve C API function documentation. - Clean up internal `ModelCompilationOptions` class ### Motivation and Context Useful review comments were left on the original PR after merge. This addresses those comments.

… EP (#24406) ### Description A new overload of CreateProvider() was added to the OpenVINO EP to handle the extraction of EP options from the session option configurations. ### Motivation and Context Allows use of new Compile API. Refer to #24207

…ession options (#24445) ### Description A new overload of CreateProvider() was added to the to handle the extraction of EP options from the session option configurations. ### Motivation and Context Allows use of new Compile API. Refer to #24207

… EP (microsoft#24406) ### Description A new overload of CreateProvider() was added to the OpenVINO EP to handle the extraction of EP options from the session option configurations. ### Motivation and Context Allows use of new Compile API. Refer to microsoft#24207 Signed-off-by: bfilipek <bartlomiej.filipek@intel.com>

…ession options (microsoft#24445) ### Description A new overload of CreateProvider() was added to the to handle the extraction of EP options from the session option configurations. ### Motivation and Context Allows use of new Compile API. Refer to microsoft#24207 Signed-off-by: bfilipek <bartlomiej.filipek@intel.com>

adrianlizarraga added 2 commits March 27, 2025 05:47

Add CompileModel() API

3f0dd46

Merge branch 'main' into adrianl/model-compile-api

4a48ac2

github-actions bot reviewed Mar 27, 2025

View reviewed changes

include/onnxruntime/core/session/onnxruntime_cxx_api.h Outdated Show resolved Hide resolved

include/onnxruntime/core/session/onnxruntime_cxx_api.h Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Mar 27, 2025

View reviewed changes

onnxruntime/core/framework/graph_partitioner.cc Fixed Show fixed Hide fixed

adrianlizarraga added 3 commits March 27, 2025 09:50

Fix call to Partition() in test for CUDA EP

bd55f7b

Fix linter issues

46f6ee1

more linter fixes

eca5951

adrianlizarraga requested a review from HectorSVC March 27, 2025 17:09

adrianlizarraga added 3 commits March 27, 2025 11:38

Fix compiler warning regarding unused variable in minimual build

1267897

Optional GitHub suggestions

f9aa437

Return a 'ORT_NOT_IMPLEMENTED' error if CompileModel() is called in a…

19cf95a

… minimal ORT build.

skottmckay reviewed Apr 3, 2025

View reviewed changes

include/onnxruntime/core/session/onnxruntime_c_api.h Outdated Show resolved Hide resolved

adrianlizarraga marked this pull request as ready for review April 7, 2025 15:44

adrianlizarraga changed the title ~~[draft] Add API to compile a model~~ Add API to compile a model Apr 7, 2025

adrianlizarraga added 6 commits April 7, 2025 15:40

Merge main and fix conflicts

0cd141c

Move APIs to a separate C API struct

0bdf00f

Store path as utf8 strings

3f83222

Always generate output compiled model (even if has no EpContext nodes)

c8d5925

Rename variables

4cfeffe

Exclude C++ api constructs for min build

a5937d2

adrianlizarraga requested review from adrastogi, mschofie and shschaefer April 8, 2025 20:32

adrianlizarraga added 2 commits April 8, 2025 23:08

Remove C API to create ModelCompilationOptions without an existing se…

2cd3885

…ssion_options object. Return error if compiled model doesn't contain EPContext nodes (compile API only)

Rename variable

a008ff9

mschofie previously approved these changes Apr 9, 2025

View reviewed changes

onnxruntime/core/framework/graph_partitioner.h Outdated Show resolved Hide resolved

Allow copying a const session option when creating compile options

e0bded6

adrianlizarraga dismissed mschofie’s stale review via e0bded6 April 9, 2025 19:28

github-actions bot reviewed Apr 9, 2025

View reviewed changes

include/onnxruntime/core/providers/providers.h Outdated Show resolved Hide resolved

adrastogi reviewed Apr 9, 2025

View reviewed changes

include/onnxruntime/core/session/onnxruntime_c_api.h Show resolved Hide resolved

adrianlizarraga added 2 commits April 11, 2025 18:02

Merge branch 'main' into adrianl/model-compile-api

82fd210

Use 'ep.WebGPU.' prefix for provider options added to session configs

ce389f1

adrianlizarraga dismissed stale reviews from skottmckay and vriveras via ce389f1 April 12, 2025 04:03

skottmckay approved these changes Apr 12, 2025

View reviewed changes

skottmckay merged commit 90c263f into main Apr 12, 2025
85 of 89 checks passed

skottmckay deleted the adrianl/model-compile-api branch April 12, 2025 07:08

adrianlizarraga mentioned this pull request Apr 14, 2025

[OpenVINO EP] Implement new overload of CreateProvider() for OpenVINO EP #24406

Merged

yuslepukhin reviewed Apr 14, 2025

View reviewed changes

adrianlizarraga mentioned this pull request Apr 15, 2025

Clean up Compile API #24436

Merged

adrianlizarraga mentioned this pull request Apr 16, 2025

[VitisAI EP] Implement new overload of CreateProvider() called with session options #24445

Merged

adrianlizarraga mentioned this pull request Apr 17, 2025

NV TensorRT RTX EP - initial commit #24456

Merged

adrastogi mentioned this pull request Apr 27, 2025

[Feature Request] Support for progress and cancellation in compilation APIs #24573

Open

Conversation

adrianlizarraga commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuslepukhin Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

adrianlizarraga commented Mar 27, 2025 •

edited

Loading

yuslepukhin Apr 14, 2025 •

edited

Loading

yuslepukhin Apr 14, 2025 •

edited

Loading