Add 16A8W quantization configuration utility for ARM backend by Ninja91 · Pull Request #13898 · pytorch/executorch

Ninja91 · 2025-09-03T05:25:09Z

Stack from ghstack (oldest at bottom):

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

Key Changes

1. New Quantization Configuration Function

Add get_16a8w_quantization_config() in fbcode/executorch/backends/arm/quantizer/arm_quantizer.py
Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
Technically supported by TOSA through EXT-INT16 extension/profile

Benefits

Better Precision: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.
@exported-using-ghexport

@bypass-github-export-checks
@bypass-github-pytorch-ci-checks
@bypass-github-executorch-ci-checks

Differential Revision: D81550512

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479. ## Key Changes **1. New Quantization Configuration Function** - Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py` - Provides 16-bit activations with HistogramObserver (better precision than 8A8W) - Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient) - **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)** ## Benefits - **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets. ghstack-source-id: 305991462 @exported-using-ghexport @bypass-github-export-checks @bypass-github-pytorch-ci-checks @bypass-github-executorch-ci-checks Differential Revision: [D81550512](https://our.internmc.facebook.com/intern/diff/D81550512/) [ghstack-poisoned]

pytorch-bot · 2025-09-03T05:25:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13898

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 878f63f with merge base ae07cb6 ():

NEW FAILURES - The following jobs have failed:

Build documentation / build (buck2) / Build doc (gh)
At least one of the pre-conditions you specified did not hold
Lint / lintrunner / linux-job (gh)
>>> Lint for backends/arm/quantizer/arm_quantizer.py:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-09-03T05:25:24Z

This pull request was exported from Phabricator. Differential Revision: D81550512

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479. ## Key Changes **1. New Quantization Configuration Function** - Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py` - Provides 16-bit activations with HistogramObserver (better precision than 8A8W) - Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient) - **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)** ## Benefits - **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets. exported-using-ghexport bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Differential Revision: [D81550512](https://our.internmc.facebook.com/intern/diff/D81550512/) [ghstack-poisoned]

Pull Request resolved: #13898 This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479. ## Key Changes **1. New Quantization Configuration Function** - Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py` - Provides 16-bit activations with HistogramObserver (better precision than 8A8W) - Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient) - **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)** ## Benefits - **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets. ghstack-source-id: 307143911 @exported-using-ghexport ghstack-source-id: 307143911 Differential Revision: [D81550512](https://our.internmc.facebook.com/intern/diff/D81550512/)

facebook-github-bot · 2025-09-03T05:58:07Z

This pull request was exported from Phabricator. Differential Revision: D81550512

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479. ## Key Changes **1. New Quantization Configuration Function** - Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py` - Provides 16-bit activations with HistogramObserver (better precision than 8A8W) - Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient) - **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)** ## Benefits - **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets. exported-using-ghexport bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Differential Revision: [D81550512](https://our.internmc.facebook.com/intern/diff/D81550512/) [ghstack-poisoned]

facebook-github-bot · 2025-09-03T06:41:38Z

This pull request was exported from Phabricator. Differential Revision: D81550512

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479. ## Key Changes **1. New Quantization Configuration Function** - Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py` - Provides 16-bit activations with HistogramObserver (better precision than 8A8W) - Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient) - **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)** ## Benefits - **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets. exported-using-ghexport bypass-github-export-checks bypass-github-pytorch-ci-checks bypass-github-executorch-ci-checks Differential Revision: [D81550512](https://our.internmc.facebook.com/intern/diff/D81550512/) [ghstack-poisoned]

facebook-github-bot · 2025-09-04T14:48:49Z

This pull request was exported from Phabricator. Differential Revision: D81550512

jackzhxng · 2025-09-04T18:59:31Z

Hi @Ninja91 please add release notes: arm label to these PRs so we can call out your work in our next release notes!

github-actions · 2025-11-05T00:51:14Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Ninja91 requested a review from digantdesai as a code owner September 3, 2025 05:25

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 3, 2025

Ninja91 mentioned this pull request Sep 3, 2025

Add 16A8W linear ops support and test #13899

Merged

facebook-github-bot added the fb-exported label Sep 3, 2025

shoumikhin approved these changes Sep 3, 2025

View reviewed changes

Ninja91 added the release notes: arm Changes to the ARM backend delegate label Sep 5, 2025

github-actions bot added the stale PRs inactive for over 60 days label Nov 5, 2025

Ninja91 closed this Nov 24, 2025

Ninja91 had a problem deploying to cherry-pick-bot November 24, 2025 16:10 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 16A8W quantization configuration utility for ARM backend#13898

Add 16A8W quantization configuration utility for ARM backend#13898
Ninja91 wants to merge 4 commits intogh/Ninja91/17/basefrom
gh/Ninja91/17/head

Ninja91 commented Sep 3, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 3, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 3, 2025

Uh oh!

facebook-github-bot commented Sep 3, 2025

Uh oh!

facebook-github-bot commented Sep 3, 2025

Uh oh!

facebook-github-bot commented Sep 4, 2025

Uh oh!

jackzhxng commented Sep 4, 2025

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Ninja91 commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Benefits

Uh oh!

pytorch-bot bot commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13898

❌ 2 New Failures

Uh oh!

facebook-github-bot commented Sep 3, 2025

Uh oh!

facebook-github-bot commented Sep 3, 2025

Uh oh!

facebook-github-bot commented Sep 3, 2025

Uh oh!

facebook-github-bot commented Sep 4, 2025

Uh oh!

jackzhxng commented Sep 4, 2025

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Ninja91 commented Sep 3, 2025 •

edited

Loading

pytorch-bot bot commented Sep 3, 2025 •

edited

Loading