From b2d414125fababb3bdf3c3c5151f0364d06ecc8d Mon Sep 17 00:00:00 2001 From: Sungpil Shin <57691044+spshin3@users.noreply.github.com> Date: Mon, 30 Aug 2021 14:48:11 +0900 Subject: [PATCH 1/5] Adding a new use case for 'Framework Use Cases' - Adding a new use case of neural network deployment for 'Framework Use Cases' --- index.bs | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/index.bs b/index.bs index 0f9aeaec..622f77c7 100644 --- a/index.bs +++ b/index.bs @@ -369,6 +369,19 @@ so that the application loads the tiny model in the case of CPU-only devices. A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device. +### Neural Network Deployment ### {#usecase-neural-net-deploy} +An application developer wants to develop a DNN model which outperforms in +Natural Language Processing such as [[GPT-3]] on the web. The model has pre-trained +with an enormous and diverse data, and she wants to deploy this DNN model in +various devices environments with efficiency. To address this issue, she develops +a framework using WebNN API for interoperability, so that the DNN model can +select the best options among CPU, GPU, or ML accelerators by checking +automatically the hardware capabilities of devices + +After she releases the DNN model, her DNN model are applied to various devices +from server-level devices and mobile devices and be used in AI applications +such as language translation, chatbots, random writing, and others. + Security Considerations {#security} =================================== @@ -2586,6 +2599,44 @@ Benjamin Poulain for their contributions to the API specification. "Hartwig Adam" ], "date": "November 2019" + }, + "GPT-3": { + "href": "https://arxiv.org/abs/2005.14165", + "title": "Language Models are Few-Shot Learners", + "authors": [ + "Tom B. Brown", + "Benjamin Mann", + "Nick Ryder", + "Melanie Subbiah", + "Jared Kaplan", + "Prafulla Dhariwal", + "Arvind Neelakantan", + "Pranav Shyam", + "Girish Sastry", + "Amanda Askell", + "Sandhini Agarwal", + "Ariel Herbert-Voss", + "Gretchen Krueger", + "Tom Henighan", + "Rewon Child", + "Aditya Ramesh", + "Daniel M. Ziegler", + "Jeffrey Wu", + "Clemens Winter", + "Christopher Hesse", + "Mark Chen", + "Eric Sigler", + "Mateusz Litwin", + "Scott Gray", + "Benjamin Chess", + "Jack Clark", + "Christopher Berner", + "Sam McCandlish", + "Alec Radford", + "Ilya Sutskever", + "Dario Amodei" + ], + "date": "July 2020" } } From 756e85ba26dd03fdee81ca4e786adf9095f43a4b Mon Sep 17 00:00:00 2001 From: Sungpil Shin <57691044+spshin3@users.noreply.github.com> Date: Tue, 19 Oct 2021 11:21:58 +0900 Subject: [PATCH 2/5] Revised use case This change reflects the previous comments from @wchao1115 and @anssiko. --- index.bs | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/index.bs b/index.bs index 622f77c7..cc37e44a 100644 --- a/index.bs +++ b/index.bs @@ -372,15 +372,23 @@ A JavaScript ML framework is responsible for loading, interpreting and executing ### Neural Network Deployment ### {#usecase-neural-net-deploy} An application developer wants to develop a DNN model which outperforms in Natural Language Processing such as [[GPT-3]] on the web. The model has pre-trained -with an enormous and diverse data, and she wants to deploy this DNN model in -various devices environments with efficiency. To address this issue, she develops -a framework using WebNN API for interoperability, so that the DNN model can -select the best options among CPU, GPU, or ML accelerators by checking -automatically the hardware capabilities of devices - -After she releases the DNN model, her DNN model are applied to various devices -from server-level devices and mobile devices and be used in AI applications -such as language translation, chatbots, random writing, and others. +with an enormous and diverse data, and finally requires transfer learning with user data for +fine-tuning of the model. + +She wants to deploy this DNN model in various devices environments including mobile devices. +She found that there is a high user demand to train the DNN model without interfering user’s +mobile device usage during the day. + +To address this issue, she develops a framework using WebNN API for better user-experience, so that +the framework can select the best options among CPU, GPU, or ML accelerators based on user’s +preference for training. + +The framework she designed can provide low work priority mode and charging-only mode during +the process of learning. When user selects the low work priority mode, the framework +automatically switches the hardware for training without interfering user’s device use by +monitoring resource usage. Also, when user selects the charging-only mode, the framework +limits the learning to only when the battery of user device is charging. + Security Considerations {#security} =================================== From e5b6e9ce06cc58b483142ba789fe96ba2aab5907 Mon Sep 17 00:00:00 2001 From: Sungpil Shin <57691044+spshin3@users.noreply.github.com> Date: Wed, 15 Dec 2021 14:46:57 +0900 Subject: [PATCH 3/5] Updated framework use case Updated an use case with considerations of hardware selection --- index.bs | 69 ++++++++++---------------------------------------------- 1 file changed, 12 insertions(+), 57 deletions(-) diff --git a/index.bs b/index.bs index cc37e44a..23b5a4a7 100644 --- a/index.bs +++ b/index.bs @@ -313,7 +313,7 @@ noise suppression using Recurrent Neural Network such as [[RNNoise]] for suppressing background dynamic noise like baby cry or dog barking to improve audio experiences in video conferences. -### Detecting fake video ### {#usecase-detecting-fake-video} +### Detecting Fake Video ### {#usecase-detecting-fake-video} A user is exposed to realistic fake videos generated by ‘deepfake’ on the web. The fake video can swap the speaker’s face into the president’s face to incite @@ -369,25 +369,18 @@ so that the application loads the tiny model in the case of CPU-only devices. A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device. -### Neural Network Deployment ### {#usecase-neural-net-deploy} -An application developer wants to develop a DNN model which outperforms in -Natural Language Processing such as [[GPT-3]] on the web. The model has pre-trained -with an enormous and diverse data, and finally requires transfer learning with user data for -fine-tuning of the model. +### Performance Benchmark ### {#usecase-perf-bench} +Unlike what the web application developer has expected, she finds that the inference time of +DNN model through the ML-specific acceleration is executed slower than the CPU or GPU in a +resource-limited environment such as mobile. Before deploying the DNN model, she wants to +provide the optimal inference time according to the user's hardware environment. To address +this issue, she activates WebNN API to benchmark and profile the DNN inference time performance +of each type of CPU, GPU, and MP-specific acceleration in different devices. -She wants to deploy this DNN model in various devices environments including mobile devices. -She found that there is a high user demand to train the DNN model without interfering user’s -mobile device usage during the day. - -To address this issue, she develops a framework using WebNN API for better user-experience, so that -the framework can select the best options among CPU, GPU, or ML accelerators based on user’s -preference for training. - -The framework she designed can provide low work priority mode and charging-only mode during -the process of learning. When user selects the low work priority mode, the framework -automatically switches the hardware for training without interfering user’s device use by -monitoring resource usage. Also, when user selects the charging-only mode, the framework -limits the learning to only when the battery of user device is charging. +She confirms that her DNN model performs better on CPU and GPU than ML-specific acceleration +in mobile environment, and there is an acceptable performance gap of a few tens of milliseconds +between the CPU and GPU. She finally releases the cost-effective web application using CPU and +the performance-effective web application using GPU based on the benchmark results. Security Considerations {#security} @@ -2607,44 +2600,6 @@ Benjamin Poulain for their contributions to the API specification. "Hartwig Adam" ], "date": "November 2019" - }, - "GPT-3": { - "href": "https://arxiv.org/abs/2005.14165", - "title": "Language Models are Few-Shot Learners", - "authors": [ - "Tom B. Brown", - "Benjamin Mann", - "Nick Ryder", - "Melanie Subbiah", - "Jared Kaplan", - "Prafulla Dhariwal", - "Arvind Neelakantan", - "Pranav Shyam", - "Girish Sastry", - "Amanda Askell", - "Sandhini Agarwal", - "Ariel Herbert-Voss", - "Gretchen Krueger", - "Tom Henighan", - "Rewon Child", - "Aditya Ramesh", - "Daniel M. Ziegler", - "Jeffrey Wu", - "Clemens Winter", - "Christopher Hesse", - "Mark Chen", - "Eric Sigler", - "Mateusz Litwin", - "Scott Gray", - "Benjamin Chess", - "Jack Clark", - "Christopher Berner", - "Sam McCandlish", - "Alec Radford", - "Ilya Sutskever", - "Dario Amodei" - ], - "date": "July 2020" } } From 437de95333fe8a82df3409d758bd12c5f999569b Mon Sep 17 00:00:00 2001 From: Sungpil Shin <57691044+spshin3@users.noreply.github.com> Date: Wed, 15 Dec 2021 14:54:13 +0900 Subject: [PATCH 4/5] Update index.bs --- index.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/index.bs b/index.bs index 23b5a4a7..8f03549e 100644 --- a/index.bs +++ b/index.bs @@ -370,9 +370,9 @@ so that the application loads the tiny model in the case of CPU-only devices. A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device. ### Performance Benchmark ### {#usecase-perf-bench} -Unlike what the web application developer has expected, she finds that the inference time of +Unlike what the web application developer has expected, she finds out that the inference time of DNN model through the ML-specific acceleration is executed slower than the CPU or GPU in a -resource-limited environment such as mobile. Before deploying the DNN model, she wants to +resource-limited environment such as mobile device. Before deploying the DNN model, she wants to provide the optimal inference time according to the user's hardware environment. To address this issue, she activates WebNN API to benchmark and profile the DNN inference time performance of each type of CPU, GPU, and MP-specific acceleration in different devices. From 7b49df4b4dadbbd463e498c4e713dce917b6b84a Mon Sep 17 00:00:00 2001 From: Sungpil Shin <57691044+spshin3@users.noreply.github.com> Date: Thu, 17 Feb 2022 14:43:25 +0900 Subject: [PATCH 5/5] Revised 'Performance Adaptation' use case This is the updated version of use case which we discussed. --- index.bs | 41 ++++++++++++++++++----------------------- 1 file changed, 18 insertions(+), 23 deletions(-) diff --git a/index.bs b/index.bs index 8f03549e..ddb5c667 100644 --- a/index.bs +++ b/index.bs @@ -355,34 +355,29 @@ fully-connected layers with it. ### Performance Adaptation ### {#usecase-perf-adapt} -A web application developer has a concern about performance of her DNN model on -mobile devices. She has confirmed that it may run too slow on mobile devices -which do not have GPU acceleration. To address this issue, her web application -refers to the WebNN API to confirm whether acceleration is available or not, so -that the application can display the warning for devices without acceleration. - -After several weeks, she has developed a tiny DNN model that can even run on -CPU. In order to accommodate CPU execution, she modifies the application -so that the application loads the tiny model in the case of CPU-only devices. +A web application developer has a concern about performance of her DNN model. +The model needs to run on both a mobile device with a low power CPU as well as +on a laptop with a powerful CPU, GPU and a dedicated AI accelerator. + +She has confirmed that the model may run too slow on the mobile device which does +not have GPU acceleration. To address this issue, her web application refers to +the WebNN API to confirm whether acceleration is available or not, so that the +application can display a warning for devices without acceleration. After several +weeks, she has developed a tiny DNN model that can even run on a CPU. In order to +accommodate CPU execution, she modifies the application so that the application +loads the tiny model in the case of CPU-only devices. + +When executing the DNN model on a laptop with a more powerful CPU, GPU and a +dedicated AI accelerator, she wants to use the execution device that minimizes +the inference time. To address this issue, she runs the model on each execution +device and measures the inference time for each test run. This information helps +her release a web application that provides the best possible user experience +given available hardware. ### Operation Level Execution ### {#usecase-op-level-exec} A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device. -### Performance Benchmark ### {#usecase-perf-bench} -Unlike what the web application developer has expected, she finds out that the inference time of -DNN model through the ML-specific acceleration is executed slower than the CPU or GPU in a -resource-limited environment such as mobile device. Before deploying the DNN model, she wants to -provide the optimal inference time according to the user's hardware environment. To address -this issue, she activates WebNN API to benchmark and profile the DNN inference time performance -of each type of CPU, GPU, and MP-specific acceleration in different devices. - -She confirms that her DNN model performs better on CPU and GPU than ML-specific acceleration -in mobile environment, and there is an acceptable performance gap of a few tens of milliseconds -between the CPU and GPU. She finally releases the cost-effective web application using CPU and -the performance-effective web application using GPU based on the benchmark results. - - Security Considerations {#security} ===================================