Skip to content

onnx_test_runner support for plugin EP#25754

Open
chilo-ms wants to merge 24 commits intomainfrom
chi/onnx_test_runner_for_plugin_ep
Open

onnx_test_runner support for plugin EP#25754
chilo-ms wants to merge 24 commits intomainfrom
chi/onnx_test_runner_for_plugin_ep

Conversation

@chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Aug 14, 2025

Description

Make onnx_test_runner be able to link against onnxruntime.dll.
Add support for onnx_test_runner to register plugin EP DLL and run plugin EP.

When running onnx_test_runner to test plugin EP, we highly suggest that the onnx_test_runner is built/linked against onnxruntime.dll, so that both onnx_test_runner and plugin EP are interacting with the same onnxruntime.dll.

[Note]: On Windows, it's possible that incompatible onnxruntime.dll in system folder is used. Make sure correct onnxruntime.dll is either in onnx_test_runner's folder or plugin EP DLL's folder. ORT calls LoadLibraryExW to load plugin EP DLL with the flag LOAD_WITH_ALTERED_SEARCH_PATH ( Windows uses alternative search order so that the directory containing the loaded DLL is searched for that DLL’s dependencies).

New options:

  • --plugin_ep_libs [registration names and libraries] Specifies a list of plugin execution provider (EP) registration names and their corresponding shared libraries to register.
    [Usage]: --plugin_ep_libs "plugin_ep_name_1|plugin_ep_1.dll plugin_ep_name_2|plugin_ep_2.dll ... "

  • --plugin_eps [Plugin EPs] Specifies a semicolon-separated list of plugin execution providers (EPs) to use.
    [Usage]: --plugin_eps "plugin_ep_1;plugin_ep_2;... "

  • --plugin_ep_options [EP options] Specifies provider options for each EP listed in --plugin_eps. Options (key-value pairs) for each EP are separated by space and EPs are separated by semicolons.
    [Usage]:
    --plugin_ep_options "ep_1_option_1_key|ep_1_option_1_value ...;ep_2_option_1_key|ep_2_option_1_value ...;..." or
    --plugin_ep_options ";ep_2_option_1_key|ep_2_option_1_value ...;..." or
    --plugin_ep_options "ep_1_option_1_key|ep_1_option_1_value ...;;ep_3_option_1_key|ep_3_option_1_value ...;..."

  • --list_ep_devices Prints all available device indices and their properties (including metadata). This option makes the program exit early without performing inference.

  • --select_ep_devices [list of device indices] A semicolon-separated list of device indices to add to the session and run with.

Motivation and Context

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@chilo-ms chilo-ms marked this pull request as ready for review August 16, 2025 06:47
oss << "Running tests in parallel: at most "
<< static_cast<unsigned>(parallel_models)
<< " models at any time";
TEST_LOG_VERBOSE(oss.str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a code duplicate and data copy, since the macro itself makes use of the std::ostringstream. Would it be possible to output to macro directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Used macro directly.

for (auto& pair : ep_names_to_libs) {
const std::filesystem::path library_path = pair.second;
const std::string registration_name = pair.first;
Ort::Status status(Ort::GetApi().RegisterExecutionProviderLibrary(env, registration_name.c_str(), ToPathString(library_path.string()).c_str()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use C++ interfaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to use C++ API

for (auto& registration_name : test_config.registered_plugin_eps) {
void UnregisterExecutionProviderLibrary(Ort::Env& env, std::vector<std::string>& registered_plugin_eps) {
for (auto& registration_name : registered_plugin_eps) {
Ort::Status status(Ort::GetApi().UnregisterExecutionProviderLibrary(env, registration_name.c_str()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. C++

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to use C++ API

@chilo-ms chilo-ms requested a review from yuslepukhin August 20, 2025 22:07

target_link_libraries(onnx_test_runner PRIVATE onnx_test_runner_common ${GETOPT_LIB_WIDE} ${onnx_test_libs} nlohmann_json::nlohmann_json)
if (onnxruntime_BUILD_SHARED_LIB)
# onnx_test_runner calls functions from onnxruntime_test_utils, which depend on ORT internal C++ APIs not exposed through the public C API.
Copy link
Member

@yuslepukhin yuslepukhin Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need clarufy the purpose and the scope of the PR.
This links the test against many internal libraries. I do not think this is needed or intended as it was not present before.
Also, I thought the purpose of the PR was to link against the DLL, meaning talk to ORT only via public interfaces.
The functionality of the utility is such that it only needs to create sessions and run them in essense.
Very much like onnxruntime_shared_lib_test which is specifically written to test and cover public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike onnxruntime_shared_lib_test, onnx_test_runner needs to compare the output and that functions eventually call ORT internal library that are not exposed as C API.

As the comment described below, it calls CompareOrtValue(), VerifyValueInfo() ... which rely on onnxruntime::DataTypeImpl::TypeFromProto(), onnxruntime::DataTypeImpl::ToString() ... in the internal library, i.e. onnxruntime_framework, to do the output comparison.

To work with plugin EP, onnx_test_runner needs to be link against onnxruntime.dll.
For onnx_test_runner to work with onnxruntime.dll, we need to link the ORT internal library.

(additional internal dependencies such as onnxruntime_graph and onnxruntime_mlas must also be linked,
as they are transitively required by symbols used in onnxruntime_framework.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though we make onnx_test_runner link many internal libraries, what it really needs is some functions to compare the OrtValues which are all stateless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I donot think the comparision logic warrants linking the whole world.
It can be re-written by means public interface. It is not that much code that needs to be changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, i though about making the parts needed by comparison logic be public. Will take a look to see how much change it needs.

Copy link
Contributor Author

@chilo-ms chilo-ms Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more changes than i expected.

  • Use Ort::Value instead of OrtValue.Type(), OrtValue.Get() ....
  • ONNXType - Using Ort::GetApi().GetValueType() to get the type.
  • The ONNXType that are supported:
    • Tensor
    • Sparse Tensor - including COO and Csr(c) format.
    • Sequence of Tensors
    • Sequence of Maps (where map is either 'int64 to float' or 'string to float')
    • (Need to rewrite the functions to compare two OrtValues with all of the above type)
  • ORT doesn't have public API to deal with Int4x2 and UInt4x2 tensor type
  • MLFloat16 - It's only used in training code, maybe we can remove it?

@jywu-msft
Copy link
Member

there's build errors and merge conflict
onnxruntime_logging_apis_test.vcxproj -> D:\a_work_temp\build\RelWithDebInfo\RelWithDebInfo\onnxruntime_logging_apis_test.exe
Error: D:\a_work\onnxruntime\onnxruntime\onnxruntime\test\onnx\main.cc(482,7): error C2065: 'p_models': undeclared identifier [D:\a_work_temp\build\RelWithDebInfo\onnx_test_runner.vcxproj]
Error: D:\a_work\onnxruntime\onnxruntime\onnxruntime\test\onnx\main.cc(483,7): error C2065: 'concurrent_session_runs': undeclared identifier [D:\a_work_temp\build\RelWithDebInfo\onnx_test_runner.vcxproj]
Error: D:\a_work\onnxruntime\onnxruntime\onnxruntime\test\onnx\main.cc(484,74): error C2065: 'device_id': undeclared identifier [D:\a_work_temp\build\RelWithDebInfo\onnx_test_runner.vcxproj]

@chilo-ms
Copy link
Contributor Author

chilo-ms commented Sep 9, 2025

Have a separate PR that add a new onnxruntime_plugin_ep_onnx_test for plugin EP to run onnx test and uses public api to compare output values.
#25942

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants