diff --git a/index.bs b/index.bs index 6016c0ab..4cf59897 100644 --- a/index.bs +++ b/index.bs @@ -318,19 +318,19 @@ video summarization such as [[Video-Summarization-with-LSTM]]. ### Noise Suppression ### {#usecase-noise-suppression} -A web-based video conferencing application records received audio streams, but -usually the background noise is everywhere. The application leverages real-time -noise suppression using Recurrent Neural Network such as [[RNNoise]] for -suppressing background dynamic noise like baby cry or dog barking to improve +A web-based video conferencing application records received audio streams, but +usually the background noise is everywhere. The application leverages real-time +noise suppression using Recurrent Neural Network such as [[RNNoise]] for +suppressing background dynamic noise like baby cry or dog barking to improve audio experiences in video conferences. ### Detecting fake video ### {#usecase-detecting-fake-video} -A user is exposed to realistic fake videos generated by ‘deepfake’ on the web. -The fake video can swap the speaker’s face into the president’s face to incite -a user politically or to manipulate user’s opinion. The deepfake detection -applications such as [[FaceForensics++]] analyze the videos and protect a user against -the fake videos or images. When she watches a fake video on the web, the +A user is exposed to realistic fake videos generated by ‘deepfake’ on the web. +The fake video can swap the speaker’s face into the president’s face to incite +a user politically or to manipulate user’s opinion. The deepfake detection +applications such as [[FaceForensics++]] analyze the videos and protect a user against +the fake videos or images. When she watches a fake video on the web, the detection application alerts her of the fraud video in real-time. ## Framework Use Cases ## {#usecases-framework} @@ -472,7 +472,7 @@ during inference, as well as the output values of inference. At inference time, every {{MLOperand}} will be bound to a tensor (the actual data). The {{MLGraphBuilder}} interface enables the creation of {{MLOperand}}s. -A key part of the {{MLGraphBuilder}} interface are the operations (such as +A key part of the {{MLGraphBuilder}} interface are the operations (such as {{MLGraphBuilder}}.{{MLGraphBuilder/gemm()}} and {{MLGraphBuilder}}.{{MLGraphBuilder/softmax()}}). The operations have a functional semantics, with no side effects. Each operation invocation conceptually returns a distinct new value, without @@ -481,7 +481,7 @@ changing the value of any other {{MLOperand}}. The runtime values (of {{MLOperand}}s) are tensors, which are essentially multidimensional arrays. The representation of the tensors is implementation dependent, but it typically includes the array data stored in some buffer (memory) and some metadata describing the -array data (such as its shape). +array data (such as its shape). As mentioned above, the operations have a functional semantics. This allows the implementation to potentially share the array data between multiple tensors. For example, the implementation @@ -495,27 +495,27 @@ Before the execution, the computation graph that is used to compute one or more There are multiple ways by which the graph may be compiled. The {{MLGraphBuilder}}.{{MLGraphBuilder/build()}} method compiles the graph in the background without blocking the calling thread, and returns a {{Promise}} that resolves to an {{MLGraph}}. The {{MLGraphBuilder}}.{{MLGraphBuilder/buildSync()}} method compiles the graph immediately on the calling thread, which must be a worker thread running on CPU or GPU device, and returns an {{MLGraph}}. Both compilation methods produce an {{MLGraph}} that represents a compiled graph for optimal execution. Once the {{MLGraph}} is constructed, there are multiple ways by which the graph may be executed. The -{{MLContext}}.{{MLContext/computeSync()}} method represents a way the execution of the graph is carried out immediately -on the calling thread, which must also be a worker thread, either on a CPU or GPU device. The execution +{{MLContext}}.{{MLContext/computeSync()}} method represents a way the execution of the graph is carried out immediately +on the calling thread, which must also be a worker thread, either on a CPU or GPU device. The execution produces the results of the computation from all the inputs bound to the graph. The {{MLContext}}.{{MLContext/compute()}} method represents a way the execution of the graph is performed asynchronously -either on a parallel timeline in a separate worker thread for the CPU execution or on a GPU timeline in a GPU -command queue. This method returns immediately without blocking the calling thread while the actual execution is -offloaded to a different timeline. This type of execution is appropriate when the responsiveness of the calling -thread is critical to good user experience. The computation results will be placed at the bound outputs at the -time the operation is successfully completed on the offloaded timeline at which time the calling thread is +either on a parallel timeline in a separate worker thread for the CPU execution or on a GPU timeline in a GPU +command queue. This method returns immediately without blocking the calling thread while the actual execution is +offloaded to a different timeline. This type of execution is appropriate when the responsiveness of the calling +thread is critical to good user experience. The computation results will be placed at the bound outputs at the +time the operation is successfully completed on the offloaded timeline at which time the calling thread is signaled. This type of execution supports both the CPU and GPU device. -In both the {{MLContext}}.{{MLContext/compute()}} and {{MLContext}}.{{MLContext/computeSync()}} execution methods, the caller supplies +In both the {{MLContext}}.{{MLContext/compute()}} and {{MLContext}}.{{MLContext/computeSync()}} execution methods, the caller supplies the input values using {{MLNamedArrayBufferViews}}, binding the input {{MLOperand}}s to their values. The caller then supplies pre-allocated buffers for output {{MLOperand}}s using {{MLNamedArrayBufferViews}}. -The {{MLCommandEncoder}} interface created by the {{MLContext}}.{{MLContext/createCommandEncoder()}} method supports -a graph execution method that provides the maximum flexibility to callers that also utilize WebGPU in their -application. It does this by placing the workload required to initialize and compute the results of the -operations in the graph onto a {{GPUCommandBuffer}}. The callers are responsible for the eventual submission -of this workload on the {{GPUQueue}} through the WebGPU queue submission mechanism. Once the submitted workload +The {{MLCommandEncoder}} interface created by the {{MLContext}}.{{MLContext/createCommandEncoder()}} method supports +a graph execution method that provides the maximum flexibility to callers that also utilize WebGPU in their +application. It does this by placing the workload required to initialize and compute the results of the +operations in the graph onto a {{GPUCommandBuffer}}. The callers are responsible for the eventual submission +of this workload on the {{GPUQueue}} through the WebGPU queue submission mechanism. Once the submitted workload is completely executed, the result is avaialble in the bound output buffers. ## Device Selection ## {#programming-model-device-selection} @@ -542,7 +542,7 @@ API {#api} ## navigator.ml ## {#api-navigator-ml} A {{ML}} object is available in the {{Window}} and {{DedicatedWorkerGlobalScope}} contexts through the {{Navigator}} -and {{WorkerNavigator}} interfaces respectively and is exposed via `navigator.ml`: +and {{WorkerNavigator}} interfaces respectively and is exposed via `navigator.ml`.
**Arguments:** - - *input*: an {{MLOperand}}. The input 3-D tensor of shape [steps, batch_size, input_size]. + - *input*: an {{MLOperand}}. The input 3-D tensor of shape [steps, batch_size, input_size]. - *weight*: an {{MLOperand}}. The 3-D input weight tensor of shape [num_directions, 3 * hidden_size, input_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the *layout* argument. - *recurrentWeight*: an {{MLOperand}}. The 3-D recurrent weight tensor of shape [num_directions, 3 * hidden_size, hidden_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the *layout* argument. - *steps*: a {{long}} scalar. The number of time steps in the recurrent network. The value must be greater than 0. @@ -1436,7 +1436,7 @@ partial interface MLGraphBuilder { cellWeight.push(builder.squeeze(builder.slice(weight, [slot, 0, 0], [1, -1, -1]), { axes: [0] })); cellRecurrentWeight.push(builder.squeeze(builder.slice(recurrentWeight, [slot, 0, 0], [1, -1, -1]), { axes: [0] })); cellBias.push(options.bias ? (builder.squeeze(builder.slice(options.bias, [slot, 0], [1, -1]), { axes: [0] })) : null); - cellRecurrentBias.push(options.recurrentBias ? + cellRecurrentBias.push(options.recurrentBias ? (builder.squeeze(builder.slice(options.recurrentBias, [slot, 0], [1, -1]), { axes: [0] })) : null); } @@ -1488,13 +1488,13 @@ dictionary MLGruCellOptions { }; partial interface MLGraphBuilder { - MLOperand gruCell(MLOperand input, MLOperand weight, MLOperand recurrentWeight, + MLOperand gruCell(MLOperand input, MLOperand weight, MLOperand recurrentWeight, MLOperand hiddenState, long hiddenSize, optional MLGruCellOptions options = {}); };
**Arguments:** - - *input*: an {{MLOperand}}. The input 2-D tensor of shape [batch_size, input_size]. + - *input*: an {{MLOperand}}. The input 2-D tensor of shape [batch_size, input_size]. - *weight*: an {{MLOperand}}. The 2-D input weight tensor of shape [3 * hidden_size, input_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *layout* argument. - *recurrentWeight*: an {{MLOperand}}. The 2-D recurrent weight tensor of shape [3 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *layout* argument. - *hiddenState*: an {{MLOperand}}. The 2-D input hidden state tensor of shape [batch_size, hidden_size]. @@ -1518,12 +1518,12 @@ partial interface MLGraphBuilder { let z = builder.sigmoid( builder.add( builder.add( - (options.bias ? builder.slice(options.bias, [0], [hiddenSize]) : zero), + (options.bias ? builder.slice(options.bias, [0], [hiddenSize]) : zero), (options.recurrentBias ? builder.slice(options.recurrentBias, [0], [hiddenSize]) : zero) ), builder.add( builder.matmul( - input, + input, builder.transpose(builder.slice(weight, [0, 0], [hiddenSize, -1])) ), builder.matmul( @@ -1543,11 +1543,11 @@ partial interface MLGraphBuilder { ), builder.add( builder.matmul( - input, + input, builder.transpose(builder.slice(weight, [hiddenSize, 0], [hiddenSize, -1])) ), builder.matmul( - hiddenState, + hiddenState, builder.transpose(builder.slice(recurrentWeight, [hiddenSize, 0], [hiddenSize, -1])) ) ) @@ -1562,7 +1562,7 @@ partial interface MLGraphBuilder { (options.bias ? builder.slice(options.bias, [2 * hiddenSize], [hiddenSize]) : zero), builder.add( builder.matmul( - input, + input, builder.transpose(builder.slice(weight, [2 * hiddenSize, 0], [hiddenSize, -1])) ), builder.mul( @@ -1570,7 +1570,7 @@ partial interface MLGraphBuilder { builder.add( (options.recurrentBias ? builder.slice(options.recurrentBias, [2 * hiddenSize], [hiddenSize]) : zero), builder.matmul( - hiddenState, + hiddenState, builder.transpose(builder.slice(recurrentWeight, [2 * hiddenSize, 0], [hiddenSize, -1])) ) ) @@ -1588,7 +1588,7 @@ partial interface MLGraphBuilder { ), builder.add( builder.matmul( - input, + input, builder.transpose(builder.slice(weight, [2 * hiddenSize, 0], [hiddenSize, -1])) ), builder.matmul( @@ -1626,7 +1626,7 @@ partial interface MLGraphBuilder { - *alpha*: a {{float}} scalar multiplier, default to 0.2. - *beta*: a {{float}} scalar addition, default to 0.5. - **Returns:** + **Returns:** - an {{MLOperand}}. The output tensor of the same shape as *x*. - an {{MLOperator}}. The operator representing the hard sigmoid operation. @@ -1640,7 +1640,7 @@ partial interface MLGraphBuilder { builder.min( builder.add( builder.mul(builder.constant(options.alpha), x), - builder.constant(options.beta)), + builder.constant(options.beta)), builder.constant(1)), builder.constant(0)); @@ -1693,7 +1693,7 @@ dictionary MLInstanceNormalizationOptions { }; partial interface MLGraphBuilder { - MLOperand instanceNormalization(MLOperand input, + MLOperand instanceNormalization(MLOperand input, optional MLInstanceNormalizationOptions options = {}); }; @@ -1705,12 +1705,12 @@ partial interface MLGraphBuilder { - *bias*: an {{MLOperand}}. The 1-D tensor of the bias values whose length is equal to the size of the feature dimension of the input e.g. for the input tensor with *nchw* layout, the feature dimension is 1. - *epsilon*: a {{float}} scalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified. - *layout*: an {{MLInputOperandLayout}}. This option specifies the layout format of the input. The default value is *"nchw"*. - + **Returns:** an {{MLOperand}}. The instance-normalized 4-D tensor of the same shape as the input tensor.
- The behavior of this operation when the input tensor is 4-D of the *"nchw"* layout can be generically emulated from - the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, + The behavior of this operation when the input tensor is 4-D of the *"nchw"* layout can be generically emulated from + the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
     // The mean reductions happen over the spatial dimensions of the input
@@ -1719,7 +1719,7 @@ partial interface MLGraphBuilder {
     const mean = builder.reduceMean(input, reduceOptions);
     const variance = builder.reduceMean(
       builder.pow(
-        builder.sub(input, mean), 
+        builder.sub(input, mean),
         buider.constant(2)),
       reduceOptions
       );
@@ -1733,7 +1733,7 @@ partial interface MLGraphBuilder {
         builder.div(
           builder.sub(input, mean),
           buidler.pow(
-            builder.add(variance, options.epsilon), 
+            builder.add(variance, options.epsilon),
             builder.constant(0.5))
           )
         ),
@@ -1841,7 +1841,7 @@ partial interface MLGraphBuilder {
     performance standpoint.
     
     return builder.add(
-              builder.mul(x, builder.constant(options.alpha)), 
+              builder.mul(x, builder.constant(options.alpha)),
               builder.constant(options.beta));
     
@@ -1947,8 +1947,8 @@ partial interface MLGraphBuilder { is interpreted according to the value of *options.layout*. - *options*: an optional {{MLPool2dOptions}}. The optional parameters of the operation. - *windowDimensions*: a sequence of {{long}} of length 2. The dimensions of the sliding window, - [window_height, window_width]. If not present, the window dimensions are assumed to be the height - and width dimensions of the input shape. + [window_height, window_width]. If not present, the window dimensions are assumed to be the height + and width dimensions of the input shape. - *padding*: a sequence of {{long}} of length 4. The additional rows and columns added to the beginning and ending of each spatial dimension of *input*, [beginning_height, ending_height, beginning_width, ending_width]. If not present, the values are assumed to be [0,0,0,0]. - *strides*: a sequence of {{long}} of length 2. The stride of the sliding window for each spatial dimension of *input*, @@ -2140,7 +2140,7 @@ partial interface MLGraphBuilder { return builder.div( builder.constant(1), builder.add( - builder.exp(builder.neg(x)), + builder.exp(builder.neg(x)), builder.constant(1)));
@@ -2161,11 +2161,11 @@ partial interface MLGraphBuilder {
**Arguments:** - *input*: an {{MLOperand}}. The input tensor. - - *starts*: a sequence of {{long}}. The starting indices to slice of the corresponding axes of the input shape. A negative index value is interpreted as counting back from the end. For example, the value -1 + - *starts*: a sequence of {{long}}. The starting indices to slice of the corresponding axes of the input shape. A negative index value is interpreted as counting back from the end. For example, the value -1 - *sizes*: a sequence of {{long}}. The lengths to slice of the corresponding axes of the input shape. The length value of -1 selects all the remaining elements from the starting index of the given axis. - *options*: an optional {{MLSliceOptions}}. The optional parameters of the operation. - - *axes*: a sequence of {{long}}. The dimensions of the input shape to which *starts* and *sizes* apply. The values in the sequence are either within the [0, *r*-1] range where *r* is the input tensor rank, or the [*-r*, -1] range where negative values mean counting back from the end of the input shape. When not specified, the sequence is assumed to be [0,1,..*r-1*]. + - *axes*: a sequence of {{long}}. The dimensions of the input shape to which *starts* and *sizes* apply. The values in the sequence are either within the [0, *r*-1] range where *r* is the input tensor rank, or the [*-r*, -1] range where negative values mean counting back from the end of the input shape. When not specified, the sequence is assumed to be [0,1,..*r-1*]. **Returns:** an {{MLOperand}}. The output tensor of the same rank as the input tensor with tensor values stripped to the specified starting and ending indices in each dimension.
@@ -2372,7 +2372,7 @@ partial interface MLGraphBuilder { - *options*: an optional {{MLTransposeOptions}}. The optional parameters of the operation. - *permutation*: a sequence of {{long}} values. The values used to permute the output shape. When it's not specified, it's set to `[N-1...0]`, where `N` is the rank of the input tensor. These default values cause the output to become a transposed tensor of the input. When specified, the number of values in the sequence must be the same as the rank of the input tensor, and the values in the sequence must be within the range from 0 to N-1 with no two or more same values found in the sequence. - **Returns:** an {{MLOperand}}. The permuted or transposed N-D tensor. + **Returns:** an {{MLOperand}}. The permuted or transposed N-D tensor.
## MLGraph ## {#api-mlgraph} @@ -2463,7 +2463,7 @@ partial interface MLCommandEncoder { - *outputs*: an {{MLNamedGPUResources}}. The pre-allocated resources of required outputs. **Returns:** {{undefined}}. - + 1. If any of the following requirements are unmet, then throw a {{DataError}} {{DOMException}} and stop.
1. For each |key| -> |value| of |inputs|: @@ -2921,6 +2921,6 @@ Thanks to Kaustubha Govind and Chrome privacy reviewers for feedback and privacy "authors": [ "Mike West" ] - } + } }