Make WebGPU EP compatible with EP API by fs-eire · Pull Request #26907 · microsoft/onnxruntime

fs-eire · 2026-01-05T09:27:07Z

Description

This PR makes it possible to build WebGPU EP as an EP API based plugin EP.

Requirements

The goal of this PR is to support both building WebGPU EP as a bundled EP and an EP API based plugin EP. This approach allows:

enabling WebGPU EP as a standalone plugin EP package for WCR usage
graceful transition for WebGPU EP as an native EP for language binding, from the bundled EP to an EP API based plugin EP
keep the existing usage (static library) working (majorly for web)

Design & Implementation

Instead of changing WebGPU EP from a bundled EP to an EP API based plugin EP in one shot, this PR extend WebGPU EP to support building as plugin EP.

add a new folder include/onnxruntime/ep with a bunches of header files. Those files are not WebGPU specific. They are used for:
- include common defines/functions/macros for plugin EP to use
- include a few "adapter" classes that takes C-API objects to simulate ORT internal classes behaviors
- include a few "override" classes that simulate ORT internal classes, but using implementations that only depend on C-API
- include a special base class onnxruntime::ep::Ep to inherit from
These header files allow a compile time "switch" to the different set of types to minimize changes to existing code. Specifically, pch.h is required to be included as PCH to make sure the "override" to take place correctly.
add a new folder onnxruntime/core/providers/webgpu/ep for EP API implementation, specifically:
- api.cc: implements CreateEpFactories and ReleaseEpFactory
- ep.cc ep.h: implement class onnxruntime::webgpu::ep::Ep
- factory.cc factory.h: implement class onnxruntime::webgpu::ep::Factory

Dependencies and Prerequisites

(unmerged changes are included as a part of current PR)

Missing Parts

Allow setting Global/Default EP options

Copilot

Pull request overview

This PR extends WebGPU EP to support building as both a bundled EP (static library) and an EP API-based plugin EP (shared library). The changes introduce a new adapter infrastructure in include/onnxruntime/ep/ that bridges C-API objects to simulate ORT internal class behaviors, enabling compile-time switching between build modes with minimal code changes.

Key changes:

New EP adapter infrastructure with header-only wrapper classes for kernel info, context, registry, and EP interface
WebGPU EP API implementation (api.cc, factory.cc/h, ep.cc/h) in onnxruntime/core/providers/webgpu/ep/
Templated CPU tensor operator base classes to work with both native and adapter OpKernelInfo types
Enhanced C API with KernelInfo_GetOperatorType, KernelInfo_GetSinceVersion, and KernelInfo_GetEp
Test infrastructure updates including example kernel registry changes (Mul kernel renamed to BinaryOp supporting both Add and Mul)

Reviewed changes

Copilot reviewed 75 out of 75 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
include/onnxruntime/ep/*.h	New EP adapter infrastructure headers providing C-API wrappers for kernel registration and execution
onnxruntime/core/providers/webgpu/ep/*	WebGPU EP API implementation for factory, EP instance, and plugin entry points
onnxruntime/core/providers/cpu/tensor/*.h	Templated base classes for tensor ops to support both native and adapter kernel info types
onnxruntime/test/autoep/*	Test updates including BinaryOp generalization and additional test coverage
onnxruntime/core/session/*.cc	Core API additions for kernel info operator type, version, and EP retrieval

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/core/providers/webgpu/tensor/upsample.cc

onnxruntime/core/providers/cpu/tensor/upsamplebase.h

include/onnxruntime/ep/adapter/tensor_helper.h

onnxruntime/core/providers/webgpu/ep/factory.cc

onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc

### Description This PR adds a few headers for supporting building WebGPU EP and CUDA EP as plugin EPs. See summary of #26907

Copilot

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/core/providers/webgpu/tensor/shape_op.cc

onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc

onnxruntime/test/util/default_providers.cc

onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc

cmake/onnxruntime_unittests.cmake

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

tianleiwu · 2026-03-12T22:11:24Z

AI Summary

The overall direction is good: the PR adds a real EP API surface for WebGPU instead of relying on the legacy internal factory path, and most of the capability logic is intentionally mirrored from the native WebGPU EP. The blocking issue is that the new plugin path is still hard-wired to the default WebGPU context/device, so non-default device selection and the shared environment plumbing do not actually work once you leave the device-0 happy path.

Review

1. Device Routing And Allocator Metadata (`onnxruntime/core/providers/webgpu/ep/factory.cc`)

Positive:

The factory cleanly exposes the expected EP API entry points and reuses the existing WebGpuProviderFactoryCreator instead of duplicating provider construction logic.

Concern:

⚠️ Selected device is dropped and all EP devices are advertised as device 0: the factory constructor bakes config_ against WebGpuContextFactory::DefaultContext() and creates both exported OrtMemoryInfos with device id 0 (onnxruntime/core/providers/webgpu/ep/factory.cc, onnxruntime/core/providers/webgpu/ep/factory.cc). GetSupportedDevicesImpl then returns every GPU with those same fixed memory infos and no per-device metadata/options (onnxruntime/core/providers/webgpu/ep/factory.cc). Finally, CreateEpImpl ignores the devices / ep_metadata arguments completely and recreates the provider only from session config entries (onnxruntime/core/providers/webgpu/ep/factory.cc). ORT derives the plugin EP's OrtDevice from OrtEpDevice.device_memory_info (onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc), so selecting device 1 still produces a provider/allocator surface that claims to be device 0 and allocates against the default context. That breaks the core EP API contract for multi-device or externally supplied WebGPU devices.

2. Shared Data Transfer (`onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc`)

Positive:

Reusing DataTransfer::CopyTensorImpl avoids duplicating the actual copy logic, and the vendor-id filter is a good guard against accidentally claiming unrelated GPU tensors.

Concern:

⚠️ Environment data transfer is still hard-coded to the default context: the EP factory always creates transfers with OrtWebGpuCreateDataTransfer() and never passes a selected context/device id (onnxruntime/core/providers/webgpu/ep/factory.cc). The implementation then rejects GPU copies unless both memory devices match context_id_ (onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc), and lazy initialization explicitly errors out for any context_id_ != 0 before creating DefaultContext() (onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc). Even after fixing device selection, shared copies, env allocators, and IOBinding-style CPU<->GPU transfers for device 1+ will still fail because the only supported transfer object is pinned to device 0.

3. Capability Selection Parity (`onnxruntime/core/providers/webgpu/ep/ep.cc`)

Positive:

The EP API capability implementation correctly carries over the WebGPU-specific unsupported Attention cases, which keeps most of the selection logic aligned with the native provider.

Concern:

⚠️ Already-assigned WebGPU nodes can now be demoted back to CPU in the plugin build: the native WebGPU EP keeps two lists, candidates and tenative_candidates, and only passes the tentative set to GetCpuPreferredNodes, specifically so layout-transformer-inserted WebGPU nodes are never reconsidered for CPU fallback (onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc, onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc). The new EP API version pushes nodes whose ep_name is already WebGpuExecutionProvider into candidate_nodes and then passes that whole vector into GetCpuPreferredNodes (onnxruntime/core/providers/webgpu/ep/ep.cc, onnxruntime/core/providers/webgpu/ep/ep.cc). That creates plugin-only partitioning behavior where layout-transformed nodes can be excluded from EpGraphSupportInfo_AddSingleNode, leaving nodes that were already assigned to WebGPU stranded or re-routed differently from the non-plugin build.

Summary of Concerns

#	Severity	Component	Issue
1	High	EP factory	Device selection is ignored and all allocators/memory infos are exported as device 0.
2	High	Data transfer	Shared WebGPU data transfer only works for context/device 0.
3	High	Capability selection	Preassigned WebGPU nodes are incorrectly included in CPU-preference fallback.

Verdict

REQUEST CHANGES — the EP API path is still functionally tied to the default WebGPU device, and it also regresses capability selection behavior versus the native WebGPU EP.

fs-eire · 2026-03-12T22:27:52Z

Explanation of (1) and (2) from @tianleiwu's review:

ORT device discovery never worked with WebGPU, majorly because of 2 reasons:

WebGPU EP already use its own sematic of "Device ID" (or Context ID, which are the same concept in the context of WebGPU). Specifically, "Device 0" represents the default device, and all other number other than 0 represents a custom device, which is expected to be created and passed in by user. This is before the Device discovery feature introduced into ORT.
WebGPU has its own implementation for device discovery and selection, and it cannot work together with ORT's device discovery feature. the OrtDevice info is not helpful for creating a webgpu device.

Based on this fact, we create at most one WebGPU device inside WebGPU EP.

The real "missing" features are the webgpu options. They are planned to be supported in future changes.

tianleiwu · 2026-03-12T23:10:17Z

onnxruntime/core/providers/webgpu/ep/factory.cc

+}
+
+uint32_t ORT_API_CALL Factory::GetVendorIdImpl(const OrtEpFactory* /*this_ptr*/) noexcept {
+  return 0;


0 is None.
Shall this return actual vendor Id here?

For WebGPU, Vendor ID is explicitly defined to 0. See

onnxruntime/include/onnxruntime/core/framework/ortdevice.h

Lines 56 to 58 in b280801

enum VendorIds : VendorId {

// No vendor ID. Valid for DeviceType::CPU + MemType::DEFAULT or for generic allocators like WebGPU.

NONE = 0x0000,

onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc

Copilot

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

onnxruntime/core/providers/webgpu/ep/factory.cc

onnxruntime/core/providers/webgpu/data_transfer.cc

.github/workflows/windows_webgpu.yml

cmake/onnxruntime_providers_webgpu.cmake

onnxruntime/core/providers/webgpu/ep/ep.h

onnxruntime/core/providers/webgpu/ep/ep.cc

adrianlizarraga · 2026-03-13T23:50:35Z

onnxruntime/core/providers/webgpu/ep/ep.cc

+  // currently only option "gpu_graph_id" is used
+  auto graph_annotation_str = Api().ort.GetRunConfigEntry(run_options, kOrtRunOptionsConfigCudaGraphAnnotation);


Is this only handling this specific run option because we're missing a public C API to get all config entries from a OrtRunOptions? If so, I suppose that would be a good thing to add in a separate PR. It would allow this function to pass along all configs.

I think it is OK for WebGPU. The current design seems requires a deep copy to convert OrtRunOptions into a onnxruntime::RunOptions, so even if we can get the all entries we may still want to just look at the ones that we are interested in.

adrianlizarraga · 2026-03-13T23:55:18Z

onnxruntime/core/providers/webgpu/ep/factory.cc

+
+OrtStatus* ORT_API_CALL Factory::CreateEpImpl(
+    OrtEpFactory* this_ptr,
+    const OrtHardwareDevice* const* /*devices*/,


Should this function return a status error if there is more than one hardware device specified here? For example, an application may try to create a session with two OrtEpDevice instances: a discrete gpu and an integrated gpu.

I'm assuming this is not supported by webgpu EP, so perhaps this should return an error to the user.

This is totally a valid use scenario, however WebGPU does not work in this way.

Please check my comment reply at #26907 (comment). In short, the ORT device discovery does not work with WebGPU. Given a specific OrtHardwareDevice, I cannot pass any info from the OrtHardwareDevice to WebGPU device creation, and I cannot tell whether the WebGPU backend will use it or not.

This is the reason why I just ignore the param

onnxruntime/core/providers/webgpu/ep/factory.cc

fs-eire force-pushed the ep-api branch 2 times, most recently from 1ca0d32 to 6873ade Compare January 6, 2026 08:01

fs-eire requested a review from Copilot January 6, 2026 08:09

Copilot started reviewing on behalf of fs-eire January 6, 2026 08:09 View session

fs-eire mentioned this pull request Jan 6, 2026

[EP API] header-only adapter for EP API #26919

Merged

Copilot AI reviewed Jan 6, 2026

View reviewed changes

fs-eire force-pushed the ep-api branch 6 times, most recently from 4d451dc to e52d706 Compare January 22, 2026 01:27

fs-eire force-pushed the ep-api branch 6 times, most recently from 9e3419b to 0e58206 Compare February 2, 2026 22:33

fs-eire force-pushed the ep-api branch 5 times, most recently from 4a5e066 to 6333b00 Compare February 7, 2026 05:50

fs-eire force-pushed the ep-api branch from 6333b00 to 1967706 Compare February 26, 2026 07:43

fs-eire added a commit that referenced this pull request Feb 28, 2026

[EP API] header-only adapter for EP API (#26919)

4fd9ebb

### Description This PR adds a few headers for supporting building WebGPU EP and CUDA EP as plugin EPs. See summary of #26907

fs-eire force-pushed the ep-api branch 4 times, most recently from e5caa04 to a12a6a9 Compare March 6, 2026 01:56

fs-eire force-pushed the ep-api branch from a12a6a9 to e4edceb Compare March 8, 2026 00:16

fs-eire added 2 commits March 12, 2026 11:39

resolve comments - update upsamplebase.h

0df5bb4

remove out-of-dated todos.

b3dd963

fs-eire force-pushed the ep-api branch from 4117593 to b3dd963 Compare March 12, 2026 18:40

fs-eire changed the title ~~[WIP] Make WebGPU EP compatible with EP API~~ Make WebGPU EP compatible with EP API Mar 12, 2026

fs-eire marked this pull request as ready for review March 12, 2026 18:40

fs-eire requested a review from Copilot March 12, 2026 18:40

Copilot started reviewing on behalf of fs-eire March 12, 2026 18:43 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

fs-eire and others added 4 commits March 12, 2026 11:54

Apply suggestions from code review

77a3ace

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

resolve comments - lock on kernel registry access

a218734

substr

2754ece

unittest

5ceb23c

tianleiwu reviewed Mar 12, 2026

View reviewed changes

onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc Show resolved Hide resolved

tianleiwu reviewed Mar 12, 2026

View reviewed changes

onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc Show resolved Hide resolved

tianleiwu previously approved these changes Mar 13, 2026

View reviewed changes

tianleiwu requested a review from Copilot March 13, 2026 00:05

Copilot started reviewing on behalf of tianleiwu March 13, 2026 00:06 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

onnxruntime/core/providers/webgpu/ep/factory.cc Show resolved Hide resolved

onnxruntime/core/providers/webgpu/data_transfer.cc Show resolved Hide resolved

edgchen1 reviewed Mar 13, 2026

View reviewed changes

adrianlizarraga reviewed Mar 14, 2026

View reviewed changes

fs-eire added 2 commits March 13, 2026 21:08

resolve comments (part.1)

3155db9

resolve comments (part.2)

04e03c0

fs-eire dismissed tianleiwu’s stale review via 04e03c0 March 14, 2026 04:21

fs-eire added 4 commits March 13, 2026 21:53

resolve comments (part.3)

5f367a2

resolve comments (part.4)

7f6dcab

resolve comments (part.5)

6fbf6b5

use onnxruntime_add_shared_library_module

7f2a83f

	enum VendorIds : VendorId {
	// No vendor ID. Valid for DeviceType::CPU + MemType::DEFAULT or for generic allocators like WebGPU.
	NONE = 0x0000,

		// currently only option "gpu_graph_id" is used
		auto graph_annotation_str = Api().ort.GetRunConfigEntry(run_options, kOrtRunOptionsConfigCudaGraphAnnotation);

Conversation

fs-eire commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Requirements

Design & Implementation

Dependencies and Prerequisites

Missing Parts

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Summary

Review

1. Device Routing And Allocator Metadata (onnxruntime/core/providers/webgpu/ep/factory.cc)

2. Shared Data Transfer (onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc)

3. Capability Selection Parity (onnxruntime/core/providers/webgpu/ep/ep.cc)

Summary of Concerns

Verdict

Uh oh!

fs-eire commented Mar 12, 2026

Uh oh!

tianleiwu Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

fs-eire Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adrianlizarraga Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

fs-eire Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

adrianlizarraga Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

fs-eire Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

fs-eire commented Jan 5, 2026 •

edited

Loading

tianleiwu commented Mar 12, 2026 •

edited

Loading

1. Device Routing And Allocator Metadata (`onnxruntime/core/providers/webgpu/ep/factory.cc`)

2. Shared Data Transfer (`onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc`)

3. Capability Selection Parity (`onnxruntime/core/providers/webgpu/ep/ep.cc`)