Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14545
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 2 Unrelated FailuresAs of commit 96dc88e with merge base 9283b4e ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
11fa2a5 to
680725a
Compare
26a0a37 to
f503d2c
Compare
| convert_linear: bool = False | ||
| convert_tied_embedding: bool = False |
There was a problem hiding this comment.
nit: these feels like convert functions, maybe just use use_torchao_kernels_linear and use_torchao_kernels_tied_embedding?
| else: | ||
| # Otherwise, only enable the conversions that are specified | ||
| llm_config.backend.torchao.convert_linear = getattr( | ||
| args, "torchao_kernels_linear", False |
There was a problem hiding this comment.
nit: can match the name here as well use_torchao_kernels_linear
| args, "torchao_kernels_linear", False | ||
| ) | ||
| llm_config.backend.torchao.convert_tied_embedding = getattr( | ||
| args, "torchao_kernels_tied_embedding", False |
| parser.add_argument( | ||
| "--use-torchao-kernels", | ||
| action="store_true", | ||
| help="Delegate tied-embedding and quantized linear ops to torchao kernels", | ||
| ) |
There was a problem hiding this comment.
why do we need this when it's combining the below two args?
| """ | ||
| Configures the torchao-kernels backend. | ||
| """ | ||
|
|
There was a problem hiding this comment.
Can we follow the other backend config examples and use enabled?
This adds a new "torchao" backend for pre-quantized checkpoints. Pre-quantized checkpoints can be lowered to a backend (e.g., XNNPACK) by specifying "-X" in etLLM. With this PR, we can now lower pre-quantized checkpoints to torchao lowbit kernels by specifying "--torchao_kernels" in the export script instead of "-X". Note this will run both linear and tied_embedding kernels with torchao_kernels. If you want to run linear with XNNPACK, but only run tied embedding with torchao, use "--torchao_kernels_tied_embedding" and "-X". New CI tests are added for the flow.
This adds a new "torchao" backend for pre-quantized checkpoints.
Pre-quantized checkpoints can be lowered to a backend (e.g., XNNPACK) by specifying "-X" in etLLM.
With this PR, we can now lower pre-quantized checkpoints to torchao lowbit kernels by specifying "--torchao_kernels" in the export script instead of "-X". Note this will run both linear and tied_embedding kernels with torchao_kernels.
If you want to run linear with XNNPACK, but only run tied embedding with torchao, use "--torchao_kernels_tied_embedding" and "-X".
New CI tests are added for the flow.