Add 16A8W quantization configuration utility for ARM backend#13898
Add 16A8W quantization configuration utility for ARM backend#13898Ninja91 wants to merge 4 commits intogh/Ninja91/17/basefrom
Conversation
This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479. ## Key Changes **1. New Quantization Configuration Function** - Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py` - Provides 16-bit activations with HistogramObserver (better precision than 8A8W) - Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient) - **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)** ## Benefits - **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets. ghstack-source-id: 305991462 @exported-using-ghexport @bypass-github-export-checks @bypass-github-pytorch-ci-checks @bypass-github-executorch-ci-checks Differential Revision: [D81550512](https://our.internmc.facebook.com/intern/diff/D81550512/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13898
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 878f63f with merge base ae07cb6 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D81550512 |
This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479. ## Key Changes **1. New Quantization Configuration Function** - Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py` - Provides 16-bit activations with HistogramObserver (better precision than 8A8W) - Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient) - **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)** ## Benefits - **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets. exported-using-ghexport bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Differential Revision: [D81550512](https://our.internmc.facebook.com/intern/diff/D81550512/) [ghstack-poisoned]
Pull Request resolved: #13898 This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479. ## Key Changes **1. New Quantization Configuration Function** - Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py` - Provides 16-bit activations with HistogramObserver (better precision than 8A8W) - Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient) - **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)** ## Benefits - **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets. ghstack-source-id: 307143911 @exported-using-ghexport ghstack-source-id: 307143911 Differential Revision: [D81550512](https://our.internmc.facebook.com/intern/diff/D81550512/)
|
This pull request was exported from Phabricator. Differential Revision: D81550512 |
This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479. ## Key Changes **1. New Quantization Configuration Function** - Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py` - Provides 16-bit activations with HistogramObserver (better precision than 8A8W) - Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient) - **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)** ## Benefits - **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets. exported-using-ghexport bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Differential Revision: [D81550512](https://our.internmc.facebook.com/intern/diff/D81550512/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D81550512 |
This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479. ## Key Changes **1. New Quantization Configuration Function** - Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py` - Provides 16-bit activations with HistogramObserver (better precision than 8A8W) - Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient) - **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)** ## Benefits - **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets. exported-using-ghexport bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Differential Revision: [D81550512](https://our.internmc.facebook.com/intern/diff/D81550512/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D81550512 |
|
Hi @Ninja91 please add |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack (oldest at bottom):
This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.
Key Changes
1. New Quantization Configuration Function
get_16a8w_quantization_config()infbcode/executorch/backends/arm/quantizer/arm_quantizer.pyBenefits
@exported-using-ghexport
@bypass-github-export-checks
@bypass-github-pytorch-ci-checks
@bypass-github-executorch-ci-checks
Differential Revision: D81550512