diff --git a/index.bs b/index.bs index 6016c0ab..959b4d4b 100644 --- a/index.bs +++ b/index.bs @@ -318,19 +318,19 @@ video summarization such as [[Video-Summarization-with-LSTM]]. ### Noise Suppression ### {#usecase-noise-suppression} -A web-based video conferencing application records received audio streams, but -usually the background noise is everywhere. The application leverages real-time -noise suppression using Recurrent Neural Network such as [[RNNoise]] for -suppressing background dynamic noise like baby cry or dog barking to improve +A web-based video conferencing application records received audio streams, but +usually the background noise is everywhere. The application leverages real-time +noise suppression using Recurrent Neural Network such as [[RNNoise]] for +suppressing background dynamic noise like baby cry or dog barking to improve audio experiences in video conferences. ### Detecting fake video ### {#usecase-detecting-fake-video} -A user is exposed to realistic fake videos generated by ‘deepfake’ on the web. -The fake video can swap the speaker’s face into the president’s face to incite -a user politically or to manipulate user’s opinion. The deepfake detection -applications such as [[FaceForensics++]] analyze the videos and protect a user against -the fake videos or images. When she watches a fake video on the web, the +A user is exposed to realistic fake videos generated by ‘deepfake’ on the web. +The fake video can swap the speaker’s face into the president’s face to incite +a user politically or to manipulate user’s opinion. The deepfake detection +applications such as [[FaceForensics++]] analyze the videos and protect a user against +the fake videos or images. When she watches a fake video on the web, the detection application alerts her of the fraud video in real-time. ## Framework Use Cases ## {#usecases-framework} @@ -472,7 +472,7 @@ during inference, as well as the output values of inference. At inference time, every {{MLOperand}} will be bound to a tensor (the actual data). The {{MLGraphBuilder}} interface enables the creation of {{MLOperand}}s. -A key part of the {{MLGraphBuilder}} interface are the operations (such as +A key part of the {{MLGraphBuilder}} interface are the operations (such as {{MLGraphBuilder}}.{{MLGraphBuilder/gemm()}} and {{MLGraphBuilder}}.{{MLGraphBuilder/softmax()}}). The operations have a functional semantics, with no side effects. Each operation invocation conceptually returns a distinct new value, without @@ -481,7 +481,7 @@ changing the value of any other {{MLOperand}}. The runtime values (of {{MLOperand}}s) are tensors, which are essentially multidimensional arrays. The representation of the tensors is implementation dependent, but it typically includes the array data stored in some buffer (memory) and some metadata describing the -array data (such as its shape). +array data (such as its shape). As mentioned above, the operations have a functional semantics. This allows the implementation to potentially share the array data between multiple tensors. For example, the implementation @@ -495,27 +495,27 @@ Before the execution, the computation graph that is used to compute one or more There are multiple ways by which the graph may be compiled. The {{MLGraphBuilder}}.{{MLGraphBuilder/build()}} method compiles the graph in the background without blocking the calling thread, and returns a {{Promise}} that resolves to an {{MLGraph}}. The {{MLGraphBuilder}}.{{MLGraphBuilder/buildSync()}} method compiles the graph immediately on the calling thread, which must be a worker thread running on CPU or GPU device, and returns an {{MLGraph}}. Both compilation methods produce an {{MLGraph}} that represents a compiled graph for optimal execution. Once the {{MLGraph}} is constructed, there are multiple ways by which the graph may be executed. The -{{MLContext}}.{{MLContext/computeSync()}} method represents a way the execution of the graph is carried out immediately -on the calling thread, which must also be a worker thread, either on a CPU or GPU device. The execution +{{MLContext}}.{{MLContext/computeSync()}} method represents a way the execution of the graph is carried out immediately +on the calling thread, which must also be a worker thread, either on a CPU or GPU device. The execution produces the results of the computation from all the inputs bound to the graph. The {{MLContext}}.{{MLContext/compute()}} method represents a way the execution of the graph is performed asynchronously -either on a parallel timeline in a separate worker thread for the CPU execution or on a GPU timeline in a GPU -command queue. This method returns immediately without blocking the calling thread while the actual execution is -offloaded to a different timeline. This type of execution is appropriate when the responsiveness of the calling -thread is critical to good user experience. The computation results will be placed at the bound outputs at the -time the operation is successfully completed on the offloaded timeline at which time the calling thread is +either on a parallel timeline in a separate worker thread for the CPU execution or on a GPU timeline in a GPU +command queue. This method returns immediately without blocking the calling thread while the actual execution is +offloaded to a different timeline. This type of execution is appropriate when the responsiveness of the calling +thread is critical to good user experience. The computation results will be placed at the bound outputs at the +time the operation is successfully completed on the offloaded timeline at which time the calling thread is signaled. This type of execution supports both the CPU and GPU device. -In both the {{MLContext}}.{{MLContext/compute()}} and {{MLContext}}.{{MLContext/computeSync()}} execution methods, the caller supplies +In both the {{MLContext}}.{{MLContext/compute()}} and {{MLContext}}.{{MLContext/computeSync()}} execution methods, the caller supplies the input values using {{MLNamedArrayBufferViews}}, binding the input {{MLOperand}}s to their values. The caller then supplies pre-allocated buffers for output {{MLOperand}}s using {{MLNamedArrayBufferViews}}. -The {{MLCommandEncoder}} interface created by the {{MLContext}}.{{MLContext/createCommandEncoder()}} method supports -a graph execution method that provides the maximum flexibility to callers that also utilize WebGPU in their -application. It does this by placing the workload required to initialize and compute the results of the -operations in the graph onto a {{GPUCommandBuffer}}. The callers are responsible for the eventual submission -of this workload on the {{GPUQueue}} through the WebGPU queue submission mechanism. Once the submitted workload +The {{MLCommandEncoder}} interface created by the {{MLContext}}.{{MLContext/createCommandEncoder()}} method supports +a graph execution method that provides the maximum flexibility to callers that also utilize WebGPU in their +application. It does this by placing the workload required to initialize and compute the results of the +operations in the graph onto a {{GPUCommandBuffer}}. The callers are responsible for the eventual submission +of this workload on the {{GPUQueue}} through the WebGPU queue submission mechanism. Once the submitted workload is completely executed, the result is avaialble in the bound output buffers. ## Device Selection ## {#programming-model-device-selection} @@ -539,10 +539,10 @@ The following table summarizes the types of resource supported by the context cr API {#api} ===================== -## navigator.ml ## {#api-navigator-ml} +## The navigator.ml interface ## {#api-navigator-ml} A {{ML}} object is available in the {{Window}} and {{DedicatedWorkerGlobalScope}} contexts through the {{Navigator}} -and {{WorkerNavigator}} interfaces respectively and is exposed via `navigator.ml`: +and {{WorkerNavigator}} interfaces respectively and is exposed via `navigator.ml`. -## ML ## {#api-ml} +## The ML interface ## {#api-ml} +### Permissions Policy Integration ### {#permissions-policy-integration} + +This specification defines a policy-controlled feature identified by the +string "webnn". +Its default allowlist is 'self'. + +### The {{ML/createContext()}} method ### {#api-ml-createcontext} The {{ML/createContext()}} method steps are: -1. If [=this=]'s [=relevant global object=]'s [=associated Document=] is not [=allowed to use=] the [=webnn-feature|webnn=] feature, then throw a "{{SecurityError!!exception}}" {{DOMException}} and abort these steps. -1. Let |promise| be [=a new promise=]. +1. If [=this=]'s [=relevant global object=]'s [=associated Document=] is not [=allowed to use=] the [=webnn-feature|webnn=] feature, return [=a new promise=] [=rejected=] with a "{{SecurityError}}" and abort these steps. +1. Return [=a new promise=] |promise| and run the following steps [=in parallel=]. 1. Let |context| be a new {{MLContext}} object. -1. Switch on the method's first argument: +1. Let |options| be the first argument. +1. Switch on |options|:
-
{{MLContextOptions}} +
{{MLContextOptions}} or {{undefined}}
Set |context|.{{[[contextType]]}} to [=default-context|default=]. -
Set |context|.{{[[deviceType]]}} to the value of {{MLContextOptions}}'s {{deviceType}}. -
Set |context|.{{[[powerPreference]]}} to the value of {{MLContextOptions}}'s {{powerPreference}}. - +
Set |context|.{{[[deviceType]]}} to the value of |options|'s {{deviceType}} or "[=device-type-cpu|cpu=]". +
Set |context|.{{[[powerPreference]]}} to the value of |options|'s {{powerPreference}} or "[=power-preference-default|default=]".
{{GPUDevice}}
Set |context|.{{[[contextType]]}} to [=webgpu-context|webgpu=].
Set |context|.{{[[deviceType]]}} to "[=device-type-gpu|gpu=]".
Set |context|.{{[[powerPreference]]}} to "[=power-preference-default|default=]".
-1. Issue the following steps to a separate timeline: - 1. If the User Agent can support the |context|.{{[[contextType]]}}, |context|.{{[[deviceType]]}} and |context|.{{[[powerPreference]]}}, then: - 1. Set |context|.{{MLContext/[[implementation]]}} to an implementation supporting |context|.{{[[contextType]]}}, |context|.{{[[deviceType]]}} and |context|.{{[[powerPreference]]}}. - 1. [=Resolve=] |promise| with |context|. - 1. Else: - 1. [=Resolve=] |promise| with a new {{NotSupportedError}}. -1. Return |promise|. - -### Permissions Policy Integration ### {#permissions-policy-integration} +1. If the user agent cannot support |context|.{{[[contextType]]}}, |context|.{{[[deviceType]]}} and |context|.{{[[powerPreference]]}}, [=reject=] |promise| with a new {{NotSupportedError}} and abort these steps. +1. [=Resolve=] |promise| with |context|. -This specification defines a policy-controlled feature identified by the -string "webnn". -Its default allowlist is 'self'. - -## MLContext ## {#api-mlcontext} +## The MLContext interface ## {#api-mlcontext} The {{MLContext}} interface represents a global state of neural network compute workload and execution processes. Each {{MLContext}} object has associated [=context type=], [=device type=] and [=power preference=]. The context type is the type of the execution context that manages the resources and facilitates the compilation and execution of the neural network graph: @@ -660,9 +656,6 @@ interface MLContext {}; : \[[powerPreference]] of type [=power preference=] :: The {{MLContext}}'s [=power preference=]. - : \[[implementation]] - :: - The underlying implementation provided by the User Agent.
@@ -778,7 +771,7 @@ partial interface MLContext { - *outputs*: an {{MLNamedArrayBufferViews}}. The pre-allocated resources of required outputs. **Returns:** Promise<{{undefined}}>. - + 1. If any of the following requirements are unmet, then throw a {{DataError}} {{DOMException}} and stop.
1. For each |key| -> |value| of |inputs|: @@ -830,7 +823,7 @@ partial interface MLContext { **Returns:** {{MLCommandEncoder}}. The command encoder used to record ML workload on the GPU.
-## MLOperandDescriptor ## {#api-mloperanddescriptor} +## The MLOperandDescriptor dictionary ## {#api-mloperanddescriptor}
**Arguments:** - - *input*: an {{MLOperand}}. The input 3-D tensor of shape [steps, batch_size, input_size]. + - *input*: an {{MLOperand}}. The input 3-D tensor of shape [steps, batch_size, input_size]. - *weight*: an {{MLOperand}}. The 3-D input weight tensor of shape [num_directions, 3 * hidden_size, input_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the *layout* argument. - *recurrentWeight*: an {{MLOperand}}. The 3-D recurrent weight tensor of shape [num_directions, 3 * hidden_size, hidden_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the *layout* argument. - *steps*: a {{long}} scalar. The number of time steps in the recurrent network. The value must be greater than 0. @@ -1436,7 +1447,7 @@ partial interface MLGraphBuilder { cellWeight.push(builder.squeeze(builder.slice(weight, [slot, 0, 0], [1, -1, -1]), { axes: [0] })); cellRecurrentWeight.push(builder.squeeze(builder.slice(recurrentWeight, [slot, 0, 0], [1, -1, -1]), { axes: [0] })); cellBias.push(options.bias ? (builder.squeeze(builder.slice(options.bias, [slot, 0], [1, -1]), { axes: [0] })) : null); - cellRecurrentBias.push(options.recurrentBias ? + cellRecurrentBias.push(options.recurrentBias ? (builder.squeeze(builder.slice(options.recurrentBias, [slot, 0], [1, -1]), { axes: [0] })) : null); } @@ -1476,7 +1487,7 @@ partial interface MLGraphBuilder {
-### gruCell ### {#api-mlgraphbuilder-grucell} +### The gruCell() method ### {#api-mlgraphbuilder-grucell} A single time step of the Gated Recurrent Unit [[GRU]] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of a recurrent network.
**Arguments:** - - *input*: an {{MLOperand}}. The input 2-D tensor of shape [batch_size, input_size]. + - *input*: an {{MLOperand}}. The input 2-D tensor of shape [batch_size, input_size]. - *weight*: an {{MLOperand}}. The 2-D input weight tensor of shape [3 * hidden_size, input_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *layout* argument. - *recurrentWeight*: an {{MLOperand}}. The 2-D recurrent weight tensor of shape [3 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *layout* argument. - *hiddenState*: an {{MLOperand}}. The 2-D input hidden state tensor of shape [batch_size, hidden_size]. @@ -1518,12 +1529,12 @@ partial interface MLGraphBuilder { let z = builder.sigmoid( builder.add( builder.add( - (options.bias ? builder.slice(options.bias, [0], [hiddenSize]) : zero), + (options.bias ? builder.slice(options.bias, [0], [hiddenSize]) : zero), (options.recurrentBias ? builder.slice(options.recurrentBias, [0], [hiddenSize]) : zero) ), builder.add( builder.matmul( - input, + input, builder.transpose(builder.slice(weight, [0, 0], [hiddenSize, -1])) ), builder.matmul( @@ -1543,11 +1554,11 @@ partial interface MLGraphBuilder { ), builder.add( builder.matmul( - input, + input, builder.transpose(builder.slice(weight, [hiddenSize, 0], [hiddenSize, -1])) ), builder.matmul( - hiddenState, + hiddenState, builder.transpose(builder.slice(recurrentWeight, [hiddenSize, 0], [hiddenSize, -1])) ) ) @@ -1562,7 +1573,7 @@ partial interface MLGraphBuilder { (options.bias ? builder.slice(options.bias, [2 * hiddenSize], [hiddenSize]) : zero), builder.add( builder.matmul( - input, + input, builder.transpose(builder.slice(weight, [2 * hiddenSize, 0], [hiddenSize, -1])) ), builder.mul( @@ -1570,7 +1581,7 @@ partial interface MLGraphBuilder { builder.add( (options.recurrentBias ? builder.slice(options.recurrentBias, [2 * hiddenSize], [hiddenSize]) : zero), builder.matmul( - hiddenState, + hiddenState, builder.transpose(builder.slice(recurrentWeight, [2 * hiddenSize, 0], [hiddenSize, -1])) ) ) @@ -1588,7 +1599,7 @@ partial interface MLGraphBuilder { ), builder.add( builder.matmul( - input, + input, builder.transpose(builder.slice(weight, [2 * hiddenSize, 0], [hiddenSize, -1])) ), builder.matmul( @@ -1606,7 +1617,7 @@ partial interface MLGraphBuilder {
-### hardSigmoid ### {#api-mlgraphbuilder-hard-sigmoid} +### The hardSigmoid() method ### {#api-mlgraphbuilder-hard-sigmoid} Calculate the non-smooth function used in place of a sigmoid function on the input tensor. @@ -1705,12 +1716,12 @@ partial interface MLGraphBuilder { - *bias*: an {{MLOperand}}. The 1-D tensor of the bias values whose length is equal to the size of the feature dimension of the input e.g. for the input tensor with *nchw* layout, the feature dimension is 1. - *epsilon*: a {{float}} scalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified. - *layout*: an {{MLInputOperandLayout}}. This option specifies the layout format of the input. The default value is *"nchw"*. - + **Returns:** an {{MLOperand}}. The instance-normalized 4-D tensor of the same shape as the input tensor.
- The behavior of this operation when the input tensor is 4-D of the *"nchw"* layout can be generically emulated from - the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, + The behavior of this operation when the input tensor is 4-D of the *"nchw"* layout can be generically emulated from + the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
     // The mean reductions happen over the spatial dimensions of the input
@@ -1719,7 +1730,7 @@ partial interface MLGraphBuilder {
     const mean = builder.reduceMean(input, reduceOptions);
     const variance = builder.reduceMean(
       builder.pow(
-        builder.sub(input, mean), 
+        builder.sub(input, mean),
         buider.constant(2)),
       reduceOptions
       );
@@ -1733,7 +1744,7 @@ partial interface MLGraphBuilder {
         builder.div(
           builder.sub(input, mean),
           buidler.pow(
-            builder.add(variance, options.epsilon), 
+            builder.add(variance, options.epsilon),
             builder.constant(0.5))
           )
         ),
@@ -1743,7 +1754,7 @@ partial interface MLGraphBuilder {
     
-### leakyRelu ### {#api-mlgraphbuilder-leakyrelu} +### The leakyRelu() method ### {#api-mlgraphbuilder-leakyrelu} Calculate the leaky version of rectified linear function on the input tensor element-wise. The calculation follows the expression `max(0, x) + alpha ∗ min(0, x)`.