Skip to content

Support overriding most ORT session options via environment variables#16260

Closed
ivberg wants to merge 3 commits intomicrosoft:mainfrom
ivberg:user/ivberg/Override_ORT_Session_Options
Closed

Support overriding most ORT session options via environment variables#16260
ivberg wants to merge 3 commits intomicrosoft:mainfrom
ivberg:user/ivberg/Override_ORT_Session_Options

Conversation

@ivberg
Copy link
Contributor

@ivberg ivberg commented Jun 6, 2023

Description

Support overriding most ORT session options via environment variables for use in testing, debugging, perf evaluations

Motivation and Context

ORT Session options are key to controlling ORT behavior. This allows these session options to be overridden at session init time via debug environment variables. This is essential when debugging or investigating performance where a recompile would be burdensome or access to code calling ORT is difficult. This for example allows getting CPU profiling report, testing optimization level, # of threads, logging levels and other options on the fly.

@snnn snnn added the core runtime issues related to core runtime label Jun 7, 2023

// Necessary otherwise INFO logs below won't actually be logged
auto defaultLoggerOrigSeverity = default_logger.GetSeverity();
logging::LoggingManager::SetDefaultLoggerSeverity(logging::Severity::kINFO);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function call thread-safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe not thread-safe in the sense that multiple threads could theoretically be setting severity and this would miss those changes. I don't know practically how much of an issue this is because at this point during session creation I am assuming there is only 1 thread involved. Note that the existing code has this issue where the INFO event is not actually being logged.
Overall not seeing the argument why this needs to be thread-safe, so if you can provide that reasoning that would be gr8 @snnn!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since multiple sessions could be created in the same process, a session's constructor should avoid changing the process's global settings.

@snnn
Copy link
Contributor

snnn commented Jun 7, 2023

The change will make onnx runtime hard to debug. When we see a test failure from a CI build pipeline, we often need to reproduce it locally. Then we need to know how to reproduce it. Usages of environment variables make it harder, since:

  1. They are harder to replicate. Each process has 100+ environment variables. You won't want to copy them all since only some of them will impact onnxruntime's behavior. However, it's hard to select the variables out.
  2. In our CI build we cannot print all the environment variables out, because some of them contains sensitive data, like auth tokens.
  3. Azure DevOps pipelines work in a tricky way that pipeline variables are also environment variables with different names. For example, if you have a pipeline variable named "OnnxRuntimeThreadCount", it will be mapped to an environment variable as "ONNXRUNTIMETHREADCOUNT". This is not commonly known and could make debugging harder.

@ivberg
Copy link
Contributor Author

ivberg commented Jun 7, 2023

The change will make onnx runtime hard to debug.

#16259 PR should solve this because it logs out all the session variables that are controlled by these. Also every override is logged. Therefore you would not need to dump our the process env variables. I would not expect that these debug variables are used in a CI pipeline. Using env variable for debugging is used elsewhere in the codebase - see ORT_DEBUG_NODE_IO*

Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the direction we want to take. This is a maintenance nightmare for us as we'll need to keep introducing environment variables each time a new session/run option is introduced; it requires writing the code for override, writing tests to ensure the override works and documenting them. This kind of thing fits very well in the application code that calls ORT.

@ivberg
Copy link
Contributor Author

ivberg commented Jun 7, 2023

This is not the direction we want to take.

As mentioned in the use-case, doing this in the calling API can sometimes be a huge time sync recompiling (or impossible) if want to test out the effects of various params and get diagnostics or see affect on perf. Case in point WinML. How can you get a CPU profile trace with WinML. You can't today because WinML doesn't expose this session option and this change would allow a dev to do so on a production WinML / Onnx combo.

The other debug options are not documented and did not have to pass this level of scrutiny but I am happy to document them.

@pranavsharma
Copy link
Contributor

This is not the direction we want to take.

As mentioned in the use-case, doing this in the calling API can sometimes be a huge time sync recompiling (or impossible) if want to test out the effects of various params and get diagnostics or see affect on perf. Case in point WinML. How can you get a CPU profile trace with WinML. You can't today because WinML doesn't expose this session option and this change would allow a dev to do so on a production WinML / Onnx combo.

The other debug options are not documented and did not have to pass this level of scrutiny but I am happy to document them.

The env variables can reside in your code and you can control ORT behavior with them without recompiling any code by implementing the same override logic in the application code. I see no reason for ORT to take this burden on.

@ivberg
Copy link
Contributor Author

ivberg commented Jun 7, 2023

This is not the direction we want to take.

As mentioned in the use-case, doing this in the calling API can sometimes be a huge time sync recompiling (or impossible) if want to test out the effects of various params and get diagnostics or see affect on perf. Case in point WinML. How can you get a CPU profile trace with WinML. You can't today because WinML doesn't expose this session option and this change would allow a dev to do so on a production WinML / Onnx combo.
The other debug options are not documented and did not have to pass this level of scrutiny but I am happy to document them.

The env variables can reside in your code and you can control ORT behavior with them without recompiling any code by implementing the same override logic in the application code. I see no reason for ORT to take this burden on.

You may not control the code even it's within the same company (Microsoft). Case in point WinML devs CANT do this in code calling WinML because it's not exposed. I feel this is quite a useful feature and I am not sure there is an adequate attempt to even understand the use-case

@pranavsharma
Copy link
Contributor

This is not the direction we want to take.

As mentioned in the use-case, doing this in the calling API can sometimes be a huge time sync recompiling (or impossible) if want to test out the effects of various params and get diagnostics or see affect on perf. Case in point WinML. How can you get a CPU profile trace with WinML. You can't today because WinML doesn't expose this session option and this change would allow a dev to do so on a production WinML / Onnx combo.
The other debug options are not documented and did not have to pass this level of scrutiny but I am happy to document them.

The env variables can reside in your code and you can control ORT behavior with them without recompiling any code by implementing the same override logic in the application code. I see no reason for ORT to take this burden on.

You may not control the code even it's within the same company (Microsoft). Case in point WinML devs CANT do this in code calling WinML because it's not exposed. I feel this is quite a useful feature and I am not sure there is an adequate attempt to even understand the use-case

I see this as a limitation of WinML, not ORT. ORT exposes a bunch of session options which can be controlled in ways that best suits the application. Happy to get on a call with the WinML team to help resolve this.

@ivberg
Copy link
Contributor Author

ivberg commented Dec 5, 2023

Abandoning this PR. These overrides for testing will just live in my fork

@ivberg ivberg closed this Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core runtime issues related to core runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants