Skip to content

Sync with Microsoft ONNX Runtime - 09/01/2026#897

Merged
ankitm3k merged 21 commits intoovep-developfrom
sync_msft_09012026
Jan 9, 2026
Merged

Sync with Microsoft ONNX Runtime - 09/01/2026#897
ankitm3k merged 21 commits intoovep-developfrom
sync_msft_09012026

Conversation

@Jaswanth51
Copy link

Synchronizing intel/onnxruntime ovep-develop branch with latest changes from microsoft/onnxruntime master branch.

edgchen1 and others added 21 commits January 5, 2026 11:19
### Description
<!-- Describe your changes. -->

Add Plugin EP API `OrtEp::IsConcurrentRunSupported()` and connect it to
the internal `PluginExecutionProvider::ConcurrentRunSupported()` member
function.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Additional plugin EP support.
Update GQA benchmark to support bfloat16 and default to testing the
first configuration (fast mode).

Note that test_sparse_attention.py was removed in
microsoft#23547. It is referenced by
the benchmark script, so I add it back and disable the test in pipeline
mode.

Example output from H200 GPU:
```
prompt-sm90-Llama3-8B-b1-h32_8x128-float16:
   sequence_length  ORT-GQA-Dense  ORT-GQA-Dense-PackedQKV
0             16.0       0.781751                 0.571226
1             32.0       0.893813                 0.684198
2             64.0       1.434056                 1.589263
3            128.0       1.142192                 1.681969
4            256.0       1.503483                 2.225498
5            512.0       1.045732                 1.878660
6           1024.0       2.334924                 0.916745
7           2048.0       2.229924                 3.001290
8           4096.0       4.309678                 3.198855
9           8192.0       7.932211                 7.910411

token-sm90-Llama3-8B-b1-h32_8_d128-float16:
   past_sequence_length  ORT-GQA-Dense  ORT-GQA-Dense-PackedQKV
0                  16.0       1.751966                 0.780081
1                  32.0       1.302806                 0.043939
2                  64.0       2.301024                 2.207282
3                 128.0       2.294556                 3.010107
4                 256.0       2.931330                 1.781768
5                 512.0       1.210220                 2.799579
6                1024.0       2.767142                 2.660434
7                2048.0       1.420229                 0.091433
8                4096.0       0.860655                 0.801022
9                8191.0       0.749525                 0.820858

prompt-sm90-Llama3-8B-b1-h32_8x128-bfloat16:
   sequence_length  ORT-GQA-Dense  ORT-GQA-Dense-PackedQKV
0             16.0       1.085427                 0.666664
1             32.0       1.714795                 0.931262
2             64.0       1.729093                 1.438733
3            128.0       1.071263                 2.486135
4            256.0       1.957349                 1.342417
5            512.0       1.159680                 1.591321
6           1024.0       0.743702                 2.035150
7           2048.0       1.452736                 1.788801
8           4096.0       4.029917                 4.041565
9           8192.0       7.934485                 7.931600

token-sm90-Llama3-8B-b1-h32_8_d128-bfloat16:
   past_sequence_length  ORT-GQA-Dense  ORT-GQA-Dense-PackedQKV
0                  16.0       0.044354                 0.043983
1                  32.0       0.040715                 0.044061
2                  64.0       0.045586                 0.044071
3                 128.0       0.062204                 0.061418
4                 256.0       0.074764                 4.874854
5                 512.0       2.472094                 2.102259
6                1024.0       4.911269                 1.396149
7                2048.0       4.898032                 1.684034
8                4096.0       2.523432                 2.192279
9                8191.0       1.651366                 3.427370
```
### Description




### Motivation and Context
For the 1.24 release.
### Description

Executive Summary

The PR successfully updates the VCPKG baseline and Abseil dependency to version 20250814.0. A significant improvement is the introduction of a dynamic dependency resolution script for Abseil. There are also substantial refactoring changes in the WebGPU provider build logic which appear to be bundled with this update.

Key Changes Verified

1. VCPKG & Abseil Update

VCPKG Baseline: Updated to 120deac3062162151622ca4860575a33844ba10b in cmake/vcpkg-configuration.json.
Abseil Port: cmake/vcpkg-ports/abseil/vcpkg.json and portfile.cmake correctly updated to version 20250814.0.
Dependencies: cmake/deps.txt reflects the correct URL and hash for the new Abseil version.

2. Dependency Management

New Tool: tools/python/resolve_absl_deps_dynamic.py has been added.

Review: 

This is a robust script that automatically resolves and topologically sorts Abseil dependencies by parsing CMake lists. This is a great addition for maintainability, replacing the manual list management.
CMake Integration: cmake/external/abseil-cpp.cmake has been updated to use the new list of components, consistent with the script's output.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Gemini <gemini@google.com>
…t#26728)

## Summary
- Directly load QnnHtp.dll and query platform info using GetPlatformInfo
API in the GTest SetUp function.
- Add QNN include files in unittest CMake for QNN enums related to
platform.
- Skip all FP16 tests via macro for HTP architecture v68 on Windows.
- Skip Auto EP tests via macro for HTP architecture v68 on Windows.

## Description
This patch introduces a mechanism in QNN-specific unit tests to detect
HTP architecture version and related platform details. Based on this
information, tests that require unsupported features on certain hardware
models can be dynamically disabled. This ensures better compatibility
and stability across different SoC configurations.

## Motivation and Context
Due to variations in device configuration and feature support across SoC
models, some unit tests should only run on specific devices. This patch
addresses that by querying platform information from the existing SDK
library on Windows, enabling conditional test execution based on
hardware capabilities.
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This pull request refactors and streamlines the computation of Q, K, V
tensors in the WebGPU BERT Attention operator. The main changes include
removing a custom QKV preparation kernel in favor of a more modular
approach using a MatMul operation followed by a dedicated split kernel,
and generalizing the QKV splitting logic for broader reuse. This
improves maintainability, code reuse, and performance since we have done
many optimization on MatMul op.

With this change, PrepareQKV becomes 128.88 ms from 751.67 ms in
phi4-vision model.

Before
Kernel | Time (ms) | Percentage (%)
-- | -- | --
Attention\|AttentionPrepare | 751.67 | 49.91

After
Kernel | Time (ms) | Percentage (%)
-- | -- | --
Attention\|MatMul | 120.87 | 19.77
Attention\|SplitPackedQKV | 1.94 | 0.32
This pull request adds support for batched matrix multiplication in the
DP4A quantized matmul WebGPU kernels and their associated C++ code and
tests. The changes update the kernel code, tensor shapes, dispatch
logic, and test infrastructure to properly handle a `batch_count`
greater than 1, enabling efficient batched execution.
…rosoft#26915)

Aligns onnxruntime-foundry `caller-framework` build flag with the
windows-ml build flag so telemetry can work through the same
infrastructure.
Bumps [qs](https://github.com/ljharb/qs) and
[body-parser](https://github.com/expressjs/body-parser). These
dependencies needed to be updated together.
Updates `qs` from 6.13.0 to 6.14.1
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/ljharb/qs/blob/main/CHANGELOG.md">qs's
changelog</a>.</em></p>
<blockquote>
<h2><strong>6.14.1</strong></h2>
<ul>
<li>[Fix] ensure arrayLength applies to <code>[]</code> notation as
well</li>
<li>[Fix] <code>parse</code>: when a custom decoder returns
<code>null</code> for a key, ignore that key</li>
<li>[Refactor] <code>parse</code>: extract key segment splitting
helper</li>
<li>[meta] add threat model</li>
<li>[actions] add workflow permissions</li>
<li>[Tests] <code>stringify</code>: increase coverage</li>
<li>[Dev Deps] update <code>eslint</code>,
<code>@ljharb/eslint-config</code>, <code>npmignore</code>,
<code>es-value-fixtures</code>, <code>for-each</code>,
<code>object-inspect</code></li>
</ul>
<h2><strong>6.14.0</strong></h2>
<ul>
<li>[New] <code>parse</code>: add
<code>throwOnParameterLimitExceeded</code> option (<a
href="https://redirect.github.com/ljharb/qs/issues/517">#517</a>)</li>
<li>[Refactor] <code>parse</code>: use <code>utils.combine</code>
more</li>
<li>[patch] <code>parse</code>: add explicit
<code>throwOnLimitExceeded</code> default</li>
<li>[actions] use shared action; re-add finishers</li>
<li>[meta] Fix changelog formatting bug</li>
<li>[Deps] update <code>side-channel</code></li>
<li>[Dev Deps] update <code>es-value-fixtures</code>,
<code>has-bigints</code>, <code>has-proto</code>,
<code>has-symbols</code></li>
<li>[Tests] increase coverage</li>
</ul>
<h2><strong>6.13.1</strong></h2>
<ul>
<li>[Fix] <code>stringify</code>: avoid a crash when a
<code>filter</code> key is <code>null</code></li>
<li>[Fix] <code>utils.merge</code>: functions should not be stringified
into keys</li>
<li>[Fix] <code>parse</code>: avoid a crash with
interpretNumericEntities: true, comma: true, and iso charset</li>
<li>[Fix] <code>stringify</code>: ensure a non-string
<code>filter</code> does not crash</li>
<li>[Refactor] use <code>__proto__</code> syntax instead of
<code>Object.create</code> for null objects</li>
<li>[Refactor] misc cleanup</li>
<li>[Tests] <code>utils.merge</code>: add some coverage</li>
<li>[Tests] fix a test case</li>
<li>[actions] split out node 10-20, and 20+</li>
<li>[Dev Deps] update <code>es-value-fixtures</code>,
<code>mock-property</code>, <code>object-inspect</code>,
<code>tape</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/ljharb/qs/commit/3fa11a5f643c76896387bd2d86904a2d0141fdf7"><code>3fa11a5</code></a>
v6.14.1</li>
<li><a
href="https://github.com/ljharb/qs/commit/a62670423c1ccab0dd83c621bfb98c7c024e314d"><code>a626704</code></a>
[Dev Deps] update <code>npmignore</code></li>
<li><a
href="https://github.com/ljharb/qs/commit/3086902ecf7f088d0d1803887643ac6c03d415b9"><code>3086902</code></a>
[Fix] ensure arrayLength applies to <code>[]</code> notation as
well</li>
<li><a
href="https://github.com/ljharb/qs/commit/fc7930e86c2264c1568c9f5606830e19b0bc2af2"><code>fc7930e</code></a>
[Dev Deps] update <code>eslint</code>,
<code>@ljharb/eslint-config</code></li>
<li><a
href="https://github.com/ljharb/qs/commit/0b06aac566abee45ef0327667a7cc89e7aed8b58"><code>0b06aac</code></a>
[Dev Deps] update <code>@ljharb/eslint-config</code></li>
<li><a
href="https://github.com/ljharb/qs/commit/64951f6200a1fb72cc003c6e8226dde3d2ef591f"><code>64951f6</code></a>
[Refactor] <code>parse</code>: extract key segment splitting helper</li>
<li><a
href="https://github.com/ljharb/qs/commit/e1bd2599cdff4c936ea52fb1f16f921cbe7aa88c"><code>e1bd259</code></a>
[Dev Deps] update <code>@ljharb/eslint-config</code></li>
<li><a
href="https://github.com/ljharb/qs/commit/f4b3d39709fef6ddbd85128d1ba4c6b566c4902e"><code>f4b3d39</code></a>
[eslint] add eslint 9 optional peer dep</li>
<li><a
href="https://github.com/ljharb/qs/commit/6e94d9596ca50dffafcef40a5f64eca89962cf34"><code>6e94d95</code></a>
[Dev Deps] update <code>eslint</code>,
<code>@ljharb/eslint-config</code>, <code>npmignore</code></li>
<li><a
href="https://github.com/ljharb/qs/commit/973dc3c51c86da9f4e30edeb4b1725158d439102"><code>973dc3c</code></a>
[actions] add workflow permissions</li>
<li>Additional commits viewable in <a
href="https://github.com/ljharb/qs/compare/v6.13.0...v6.14.1">compare
view</a></li>
</ul>
</details>
<br />

Updates `body-parser` from 1.20.3 to 1.20.4
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/expressjs/body-parser/releases">body-parser's
releases</a>.</em></p>
<blockquote>
<h2>1.20.4</h2>
<h2>What's Changed</h2>
<ul>
<li>Remove redundant depth check by <a
href="https://github.com/blakeembrey"><code>@​blakeembrey</code></a> in
<a
href="https://redirect.github.com/expressjs/body-parser/pull/538">expressjs/body-parser#538</a></li>
<li>ci: add support for Node.js v23 by <a
href="https://github.com/Phillip9587"><code>@​Phillip9587</code></a> in
<a
href="https://redirect.github.com/expressjs/body-parser/pull/553">expressjs/body-parser#553</a></li>
<li>ci: restore CI for 1.x branch by <a
href="https://github.com/bjohansebas"><code>@​bjohansebas</code></a> in
<a
href="https://redirect.github.com/expressjs/body-parser/pull/665">expressjs/body-parser#665</a></li>
<li>deps: qs@^6.14.0 by <a
href="https://github.com/bjohansebas"><code>@​bjohansebas</code></a> in
<a
href="https://redirect.github.com/expressjs/body-parser/pull/664">expressjs/body-parser#664</a></li>
<li>deps: use tilde notation and update certain dependencies by <a
href="https://github.com/Phillip9587"><code>@​Phillip9587</code></a> in
<a
href="https://redirect.github.com/expressjs/body-parser/pull/668">expressjs/body-parser#668</a></li>
<li>chore: remove SECURITY.md by <a
href="https://github.com/Phillip9587"><code>@​Phillip9587</code></a> in
<a
href="https://redirect.github.com/expressjs/body-parser/pull/669">expressjs/body-parser#669</a></li>
<li>ci: add CodeQL (SAST) by <a
href="https://github.com/Phillip9587"><code>@​Phillip9587</code></a> in
<a
href="https://redirect.github.com/expressjs/body-parser/pull/670">expressjs/body-parser#670</a></li>
<li>Release: 1.20.4 by <a
href="https://github.com/UlisesGascon"><code>@​UlisesGascon</code></a>
in <a
href="https://redirect.github.com/expressjs/body-parser/pull/672">expressjs/body-parser#672</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/expressjs/body-parser/compare/1.20.3...1.20.4">https://github.com/expressjs/body-parser/compare/1.20.3...1.20.4</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/expressjs/body-parser/blob/master/HISTORY.md">body-parser's
changelog</a>.</em></p>
<blockquote>
<h1>1.20.4 / 2025-12-01</h1>
<ul>
<li>deps: qs@~6.14.0</li>
<li>deps: use tilde notation for dependencies</li>
<li>deps: http-errors@~2.0.1</li>
<li>deps: raw-body@~2.5.3</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/expressjs/body-parser/commit/7db202cac84a001e6566c2dc6516b44db98beff3"><code>7db202c</code></a>
1.20.4 (<a
href="https://redirect.github.com/expressjs/body-parser/issues/672">#672</a>)</li>
<li><a
href="https://github.com/expressjs/body-parser/commit/d8f8adb898676dfdf997b4455e5f9b689b53e989"><code>d8f8adb</code></a>
ci: add CodeQL (SAST) (<a
href="https://redirect.github.com/expressjs/body-parser/issues/670">#670</a>)</li>
<li><a
href="https://github.com/expressjs/body-parser/commit/6d133c19b3e7c0bb8301959ca1dba283d23d23c3"><code>6d133c1</code></a>
chore: remove SECURITY.md (<a
href="https://redirect.github.com/expressjs/body-parser/issues/669">#669</a>)</li>
<li><a
href="https://github.com/expressjs/body-parser/commit/fcd15355041ada6f37288dd13858d50429016b66"><code>fcd1535</code></a>
deps: use tilde notation and update certain dependencies (<a
href="https://redirect.github.com/expressjs/body-parser/issues/668">#668</a>)</li>
<li><a
href="https://github.com/expressjs/body-parser/commit/ec5fa290d25d85e0049757e240249072331eaee6"><code>ec5fa29</code></a>
deps: qs@~6.14.0 (<a
href="https://redirect.github.com/expressjs/body-parser/issues/664">#664</a>)</li>
<li><a
href="https://github.com/expressjs/body-parser/commit/ffb95c12c7785ec6d3852ce46b8711ac74009252"><code>ffb95c1</code></a>
ci: restore CI for 1.x branch (<a
href="https://redirect.github.com/expressjs/body-parser/issues/665">#665</a>)</li>
<li><a
href="https://github.com/expressjs/body-parser/commit/48a5f074a4db07066087ed8b6ff641825c9c03cf"><code>48a5f07</code></a>
ci: add support for Node.js v23 (<a
href="https://redirect.github.com/expressjs/body-parser/issues/553">#553</a>)</li>
<li><a
href="https://github.com/expressjs/body-parser/commit/f20f6adc7118cbf973e927d34bc0bbf2ff177459"><code>f20f6ad</code></a>
Remove redundant depth check (<a
href="https://redirect.github.com/expressjs/body-parser/issues/538">#538</a>)</li>
<li>See full diff in <a
href="https://github.com/expressjs/body-parser/compare/1.20.3...1.20.4">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
- Fix QNN-EP to support static bias tensors with mismatched quantization
encodings
- Add requantization logic to ensure bias_scale = weights_scale x
activation_scale

### Description
The QNN-EP currently lacks support for static bias tensors with
quantization encodings that don't match the expected mathematical
relationship bias_scale = weights_scale × activation_scale.



### Motivation and Context
This leads to HTP backend disregard the encodings of the bias tensor and
recompute its own causing accuracy drops

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: qti-ashwshan <ashwshan@qti.qualcomm.com>
…t#26803)

### Description
Adds new `OrtKernelInfo` APIs to enable plugin EP kernel implementations
to retrieve operator metadata and access the parent EP instance.
- Added four new KernelInfo APIs: `KernelInfo_GetOperatorDomain()`,
`KernelInfo_GetOperatorType()`, `KernelInfo_GetOperatorSinceVersion()`,
and `KernelInfo_GetEp()`
- Extended example plugin EP to demonstrate accessing EP configuration
from kernel implementations.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
microsoft#26872)

### Description
Use `file(GLOB ...)` to allow for the specified file to not exist.



### Motivation and Context
If the QNN library is missing, using `list(APPEND ...)` will cause
errors during the build process.
…soft#26917)

### Description
Update search for GPG to check for both Program Files and Program Files
(x86)


### Motivation and Context

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
## Summary
This pull request aims to significantly reduce the build time for Flash
Attention by removing support for less common head dimensions (160 and
224).

It also includes a build option for quick build `--cmake_extra_defines
onnxruntime_QUICK_BUILD=ON`, which will only build flash attention
kernel for float16 and head dimension=128. That could speed up
development.

## Key Changes

### 1. Flash Attention Build Optimization
- **Removed Head Dimensions:** Deleted source files and kernel
instantiations for head dimensions **160** and **224** (both FP16 and
BF16). These dimensions are less frequently used, and removing them
reduces the number of kernels to be compiled, thereby speeding up the
build process.
- **Updated Dispatch Logic:** Modified `static_switch.h` and
`flash_api.h` to remove the dispatch cases for `kHeadDim = 160` and
`kHeadDim = 224`.

### 2. Test Enhancements
- **GQA Tests:** Updated
`onnxruntime/test/python/transformers/test_gqa.py` to detect whether it
is quick build package. If it is, only test supported data type
(float16) and head dimension (128 only) for flash attention, and use
`has_flash_attention(bf16=True)` when checking for Flash Attention
availability in BF16 tests. This ensures that tests are skipped
appropriately if BF16 kernels are not compiled/available.

## Impact
- **Build Time:** Faster compilation of the CUDA provider due to fewer
Flash Attention kernels.
- **Functionality:** Head dimensions 160 and 224 are no longer supported
for Flash Attention. Models using these specific head dimensions will
fall back to next supported head dimension like 192 or 256.

## Verification
- Validated that the build completes successfully with the reduced
kernel set.
- `test_gqa.py` should pass or skip correctly based on hardware support.
- Build onnxruntime-gpu package with `--cmake_extra_defines
onnxruntime_QUICK_BUILD=ON` option, and the build info has
"quick-build=1", like the following python script:

```python
import onnxruntime
print(onnxruntime.get_build_info())
```

The output is like
```
ORT Build Info: git-branch=main, git-commit-id=ecf164a945, quick-build=1, build type=Release
```
### Description
This patch supports more `dim_inner` (up to 4096) for `Split-K` to
optimize more models.
This patch also enables `Split-K` on `gen-12lp`.


### Motivation and Context
With this PR we can achieve about 30% improvement on
`jina-clip-v1-text-fp16` and 20% improvement on
`jina-embeddings-v2-base-code-fp16` on Lunar Lake iGPUs.
…26926)

This pull request simplifies the logic for handling present key/value
tensors in the WebGPU Flash Attention implementation. The main change is
that the responsibility for creating internal present key/value tensors
is moved from the caller to the `ApplyFlashAttention` function itself.
This reduces code duplication and makes the API easier to use.
Additionally, the `CanApplyFlashAttention` function is simplified to
remove unnecessary checks for present key/value tensors.
### Description
Address OOB reads and writes issues.

### Motivation and Context
microsoft#11828
microsoft#13332
See also this: onnx/onnx#4294
…oft#26660)

### Description
Introduced a new environment variable
ORT_DEBUG_NODE_IO_PREPEND_EP_TO_FILE_NAME to control whether the
Execution Provider (EP) type is prepended to tensor file names.
Updated MakeTensorFileName() and related logic to include the EP type in
the format: `<EPType>_<TensorName>` when the variable is set to a
non-zero value. This change helps avoid file name conflicts and improves
clarity when analyzing outputs from multi-EP runs.

### Motivation and Context
Accuracy unit tests may run the same model multiple times using
different EPs, such as ORT CPU and QNN HTP. Previously, dumped tensor
files used identical names when tensor names matched across sessions,
causing file overwrites.

### Example
When ORT_DEBUG_NODE_IO_PREPEND_EP_TO_FILE_NAME=1, dumped tensors include
EP-specific prefixes:
```
CPU_input.tensorproto
CPU_input_token_0.tensorproto
CPU_output.tensorproto
QNN_constant.tensorproto
QNN_input.tensorproto
QNN_input_token_0.tensorproto
QNN_output.tensorproto
```
### Description
Update OV version to latest release in OVEP

Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
@Jaswanth51 Jaswanth51 requested a review from ankitm3k January 9, 2026 03:59
@ankitm3k ankitm3k merged commit 2d61435 into ovep-develop Jan 9, 2026
6 of 8 checks passed
@ankitm3k ankitm3k deleted the sync_msft_09012026 branch January 9, 2026 04:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.