Skip to content

Make WebGPU EP compatible with EP API#26907

Open
fs-eire wants to merge 15 commits intomicrosoft:mainfrom
fs-eire:ep-api
Open

Make WebGPU EP compatible with EP API#26907
fs-eire wants to merge 15 commits intomicrosoft:mainfrom
fs-eire:ep-api

Conversation

@fs-eire
Copy link
Contributor

@fs-eire fs-eire commented Jan 5, 2026

Description

This PR makes it possible to build WebGPU EP as an EP API based plugin EP.

Requirements

The goal of this PR is to support both building WebGPU EP as a bundled EP and an EP API based plugin EP. This approach allows:

  • enabling WebGPU EP as a standalone plugin EP package for WCR usage
  • graceful transition for WebGPU EP as an native EP for language binding, from the bundled EP to an EP API based plugin EP
  • keep the existing usage (static library) working (majorly for web)

Design & Implementation

Instead of changing WebGPU EP from a bundled EP to an EP API based plugin EP in one shot, this PR extend WebGPU EP to support building as plugin EP.

  • add a new folder include/onnxruntime/ep with a bunches of header files. Those files are not WebGPU specific. They are used for:

    • include common defines/functions/macros for plugin EP to use
    • include a few "adapter" classes that takes C-API objects to simulate ORT internal classes behaviors
    • include a few "override" classes that simulate ORT internal classes, but using implementations that only depend on C-API
    • include a special base class onnxruntime::ep::Ep to inherit from

    These header files allow a compile time "switch" to the different set of types to minimize changes to existing code. Specifically, pch.h is required to be included as PCH to make sure the "override" to take place correctly.

  • add a new folder onnxruntime/core/providers/webgpu/ep for EP API implementation, specifically:

    • api.cc: implements CreateEpFactories and ReleaseEpFactory
    • ep.cc ep.h: implement class onnxruntime::webgpu::ep::Ep
    • factory.cc factory.h: implement class onnxruntime::webgpu::ep::Factory

Dependencies and Prerequisites

(unmerged changes are included as a part of current PR)

Missing Parts

  • Allow setting Global/Default EP options

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends WebGPU EP to support building as both a bundled EP (static library) and an EP API-based plugin EP (shared library). The changes introduce a new adapter infrastructure in include/onnxruntime/ep/ that bridges C-API objects to simulate ORT internal class behaviors, enabling compile-time switching between build modes with minimal code changes.

Key changes:

  • New EP adapter infrastructure with header-only wrapper classes for kernel info, context, registry, and EP interface
  • WebGPU EP API implementation (api.cc, factory.cc/h, ep.cc/h) in onnxruntime/core/providers/webgpu/ep/
  • Templated CPU tensor operator base classes to work with both native and adapter OpKernelInfo types
  • Enhanced C API with KernelInfo_GetOperatorType, KernelInfo_GetSinceVersion, and KernelInfo_GetEp
  • Test infrastructure updates including example kernel registry changes (Mul kernel renamed to BinaryOp supporting both Add and Mul)

Reviewed changes

Copilot reviewed 75 out of 75 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
include/onnxruntime/ep/*.h New EP adapter infrastructure headers providing C-API wrappers for kernel registration and execution
onnxruntime/core/providers/webgpu/ep/* WebGPU EP API implementation for factory, EP instance, and plugin entry points
onnxruntime/core/providers/cpu/tensor/*.h Templated base classes for tensor ops to support both native and adapter kernel info types
onnxruntime/test/autoep/* Test updates including BinaryOp generalization and additional test coverage
onnxruntime/core/session/*.cc Core API additions for kernel info operator type, version, and EP retrieval

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fs-eire fs-eire force-pushed the ep-api branch 6 times, most recently from 4d451dc to e52d706 Compare January 22, 2026 01:27
@fs-eire fs-eire force-pushed the ep-api branch 6 times, most recently from 9e3419b to 0e58206 Compare February 2, 2026 22:33
@fs-eire fs-eire force-pushed the ep-api branch 5 times, most recently from 4a5e066 to 6333b00 Compare February 7, 2026 05:50
fs-eire added a commit that referenced this pull request Feb 28, 2026
### Description

This PR adds a few headers for supporting building WebGPU EP and CUDA EP
as plugin EPs.

See summary of #26907
@fs-eire fs-eire force-pushed the ep-api branch 4 times, most recently from e5caa04 to a12a6a9 Compare March 6, 2026 01:56
@fs-eire fs-eire changed the title [WIP] Make WebGPU EP compatible with EP API Make WebGPU EP compatible with EP API Mar 12, 2026
@fs-eire fs-eire marked this pull request as ready for review March 12, 2026 18:40
@fs-eire fs-eire requested a review from Copilot March 12, 2026 18:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fs-eire and others added 4 commits March 12, 2026 11:54
@tianleiwu
Copy link
Contributor

tianleiwu commented Mar 12, 2026

AI Summary

The overall direction is good: the PR adds a real EP API surface for WebGPU instead of relying on the legacy internal factory path, and most of the capability logic is intentionally mirrored from the native WebGPU EP. The blocking issue is that the new plugin path is still hard-wired to the default WebGPU context/device, so non-default device selection and the shared environment plumbing do not actually work once you leave the device-0 happy path.


Review

1. Device Routing And Allocator Metadata (onnxruntime/core/providers/webgpu/ep/factory.cc)

Positive:

  • The factory cleanly exposes the expected EP API entry points and reuses the existing WebGpuProviderFactoryCreator instead of duplicating provider construction logic.

Concern:

2. Shared Data Transfer (onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc)

Positive:

  • Reusing DataTransfer::CopyTensorImpl avoids duplicating the actual copy logic, and the vendor-id filter is a good guard against accidentally claiming unrelated GPU tensors.

Concern:

3. Capability Selection Parity (onnxruntime/core/providers/webgpu/ep/ep.cc)

Positive:

  • The EP API capability implementation correctly carries over the WebGPU-specific unsupported Attention cases, which keeps most of the selection logic aligned with the native provider.

Concern:


Summary of Concerns

# Severity Component Issue
1 High EP factory Device selection is ignored and all allocators/memory infos are exported as device 0.
2 High Data transfer Shared WebGPU data transfer only works for context/device 0.
3 High Capability selection Preassigned WebGPU nodes are incorrectly included in CPU-preference fallback.

Verdict

REQUEST CHANGES — the EP API path is still functionally tied to the default WebGPU device, and it also regresses capability selection behavior versus the native WebGPU EP.

@fs-eire
Copy link
Contributor Author

fs-eire commented Mar 12, 2026

Explanation of (1) and (2) from @tianleiwu's review:

ORT device discovery never worked with WebGPU, majorly because of 2 reasons:

  • WebGPU EP already use its own sematic of "Device ID" (or Context ID, which are the same concept in the context of WebGPU). Specifically, "Device 0" represents the default device, and all other number other than 0 represents a custom device, which is expected to be created and passed in by user. This is before the Device discovery feature introduced into ORT.
  • WebGPU has its own implementation for device discovery and selection, and it cannot work together with ORT's device discovery feature. the OrtDevice info is not helpful for creating a webgpu device.

Based on this fact, we create at most one WebGPU device inside WebGPU EP.

The real "missing" features are the webgpu options. They are planned to be supported in future changes.

}

uint32_t ORT_API_CALL Factory::GetVendorIdImpl(const OrtEpFactory* /*this_ptr*/) noexcept {
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is None.
Shall this return actual vendor Id here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For WebGPU, Vendor ID is explicitly defined to 0. See

enum VendorIds : VendorId {
// No vendor ID. Valid for DeviceType::CPU + MemType::DEFAULT or for generic allocators like WebGPU.
NONE = 0x0000,

tianleiwu
tianleiwu previously approved these changes Mar 13, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +213 to +214
// currently only option "gpu_graph_id" is used
auto graph_annotation_str = Api().ort.GetRunConfigEntry(run_options, kOrtRunOptionsConfigCudaGraphAnnotation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only handling this specific run option because we're missing a public C API to get all config entries from a OrtRunOptions? If so, I suppose that would be a good thing to add in a separate PR. It would allow this function to pass along all configs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is OK for WebGPU. The current design seems requires a deep copy to convert OrtRunOptions into a onnxruntime::RunOptions, so even if we can get the all entries we may still want to just look at the ones that we are interested in.


OrtStatus* ORT_API_CALL Factory::CreateEpImpl(
OrtEpFactory* this_ptr,
const OrtHardwareDevice* const* /*devices*/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function return a status error if there is more than one hardware device specified here? For example, an application may try to create a session with two OrtEpDevice instances: a discrete gpu and an integrated gpu.

I'm assuming this is not supported by webgpu EP, so perhaps this should return an error to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is totally a valid use scenario, however WebGPU does not work in this way.

Please check my comment reply at #26907 (comment). In short, the ORT device discovery does not work with WebGPU. Given a specific OrtHardwareDevice, I cannot pass any info from the OrtHardwareDevice to WebGPU device creation, and I cannot tell whether the WebGPU backend will use it or not.

This is the reason why I just ignore the param

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants