Clean up linear_int8_dynamic_activation_intx_weight_subclass#1553
Clean up linear_int8_dynamic_activation_intx_weight_subclass#1553facebook-github-bot merged 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1553
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f04f3b2 with merge base de5c6e1 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D67821939 |
| quantize_( | ||
| my_model, | ||
| int8_dynamic_activation_intx_weight( | ||
| bit_width=4, |
There was a problem hiding this comment.
this could be updated to using real dtype now if you can use torch nightly btw, both torch.uintx and torch.intx are available in nightly (2.6 and later I think)
There was a problem hiding this comment.
Changed bit_width to weight_dtype
There was a problem hiding this comment.
please change README as well
|
This pull request was exported from Phabricator. Differential Revision: D67821939 |
…#1553) Summary: Pull Request resolved: pytorch#1553 Cleans up layout and quantization API: ``` int8_dynamic_activation_intx_weight( group_size: int = 128, bit_width: int = 4, has_weight_zeros: bool = False, weight_mapping_type=MappingType.ASYMMETRIC, act_mapping_type=MappingType.ASYMMETRIC, layout=PackedLinearInt8DynamicActivationIntxWeightLayout(), ) ``` int8_dynamic_activation_intx_weight is now very similar to int8_dynamic_activation_int4_weight. By passing bit_width=4, has_weight_zeros=false, and layout=PlainLayout(), it should be numerically identical (but slower). The fallback option is removed and instead relies on using PlainLayout(). Reviewed By: jerryzh168 Differential Revision: D67821939
8752c4c to
23688a9
Compare
|
This pull request was exported from Phabricator. Differential Revision: D67821939 |
…#1553) Summary: Pull Request resolved: pytorch#1553 Cleans up layout and quantization API: ``` int8_dynamic_activation_intx_weight( group_size: int = 128, bit_width: int = 4, has_weight_zeros: bool = False, weight_mapping_type=MappingType.ASYMMETRIC, act_mapping_type=MappingType.ASYMMETRIC, layout=PackedLinearInt8DynamicActivationIntxWeightLayout(), ) ``` int8_dynamic_activation_intx_weight is now very similar to int8_dynamic_activation_int4_weight. By passing bit_width=4, has_weight_zeros=false, and layout=PlainLayout(), it should be numerically identical (but slower). The fallback option is removed and instead relies on using PlainLayout(). Reviewed By: jerryzh168 Differential Revision: D67821939
23688a9 to
8d47b97
Compare
|
This pull request was exported from Phabricator. Differential Revision: D67821939 |
…#1553) Summary: Pull Request resolved: pytorch#1553 Cleans up layout and quantization API: ``` int8_dynamic_activation_intx_weight( group_size: int = 128, bit_width: int = 4, has_weight_zeros: bool = False, weight_mapping_type=MappingType.ASYMMETRIC, act_mapping_type=MappingType.ASYMMETRIC, layout=PackedLinearInt8DynamicActivationIntxWeightLayout(), ) ``` int8_dynamic_activation_intx_weight is now very similar to int8_dynamic_activation_int4_weight. By passing bit_width=4, has_weight_zeros=false, and layout=PlainLayout(), it should be numerically identical (but slower). The fallback option is removed and instead relies on using PlainLayout(). Reviewed By: jerryzh168 Differential Revision: D67821939
8d47b97 to
0552dcf
Compare
|
This pull request was exported from Phabricator. Differential Revision: D67821939 |
…#1553) Summary: Pull Request resolved: pytorch#1553 Cleans up layout and quantization API: ``` int8_dynamic_activation_intx_weight( group_size: int = 128, bit_width: int = 4, has_weight_zeros: bool = False, weight_mapping_type=MappingType.ASYMMETRIC, act_mapping_type=MappingType.ASYMMETRIC, layout=PackedLinearInt8DynamicActivationIntxWeightLayout(), ) ``` int8_dynamic_activation_intx_weight is now very similar to int8_dynamic_activation_int4_weight. By passing bit_width=4, has_weight_zeros=false, and layout=PlainLayout(), it should be numerically identical (but slower). The fallback option is removed and instead relies on using PlainLayout(). Reviewed By: jerryzh168 Differential Revision: D67821939
0552dcf to
dc32105
Compare
| + " Alternatively, use layout=PlainLayout() with int8_dynamic_activation_intx_weight, but note that doing so will result in much slower performance." | ||
| ) | ||
|
|
||
| dtype_to_bit_width = { |
There was a problem hiding this comment.
why not pass this around to layout as well?
There was a problem hiding this comment.
I think for layout, bit_width is more convenient a more to use because I can then do something like this to call the kernel:
getattr(torch.ops.torchao, f"_pack_8bit_act_{layout.bit_width}bit{wzp_suffix}_weight")(*args)
I can change, though, if you think it's better.
There was a problem hiding this comment.
I see, depending on whether there could be cases of uintx as well I think, if not, then it's fine since it's not user facing
There was a problem hiding this comment.
Let's leave it as bitwidth then. The quantizer is specifically for intx, not uintx, and the layout is not user facing.
There was a problem hiding this comment.
sg, also we have a util here:
ao/torchao/quantization/quant_primitives.py
Line 179 in b3deb16
…#1553) Summary: Pull Request resolved: pytorch#1553 Cleans up layout and quantization API: ``` int8_dynamic_activation_intx_weight( group_size: int = 128, bit_width: int = 4, has_weight_zeros: bool = False, weight_mapping_type=MappingType.ASYMMETRIC, act_mapping_type=MappingType.ASYMMETRIC, layout=PackedLinearInt8DynamicActivationIntxWeightLayout(), ) ``` int8_dynamic_activation_intx_weight is now very similar to int8_dynamic_activation_int4_weight. By passing bit_width=4, has_weight_zeros=false, and layout=PlainLayout(), it should be numerically identical (but slower). The fallback option is removed and instead relies on using PlainLayout(). Reviewed By: jerryzh168 Differential Revision: D67821939
dc32105 to
f04f3b2
Compare
|
This pull request was exported from Phabricator. Differential Revision: D67821939 |
| quant_min = -(1 << (bit_width - 1)) | ||
| quant_max = (1 << (bit_width - 1)) - 1 |
There was a problem hiding this comment.
we also have utils here:
ao/torchao/quantization/quant_primitives.py
Line 168 in b3deb16
Summary:
Cleans up layout and quantization API:
int8_dynamic_activation_intx_weight is now very similar to int8_dynamic_activation_int4_weight. By passing bit_width=4, has_weight_zeros=false, and layout=PlainLayout(), it should be numerically identical (but slower).
The fallback option is removed and instead relies on using PlainLayout().
Reviewed By: jerryzh168
Differential Revision: D67821939