[CB] Better parametrization for compile#44578
Merged
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
cd937dd to
0f7685e
Compare
0f7685e to
af14d04
Compare
ArthurZucker
approved these changes
Mar 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds three attributes to the compile config, to have granularity over how varlen (handles mixed prefil and decode batches) and decode (only decode batches) are compiled. We want to have this kind of granularity because varlen has 2 dynamic axis (number of query tokens and KV tokens) so the shape of varlen batches varies a lot, ad it greatly benefits from
dynamic=Truefor compile. On the contrary, decode batches have only 1 dynamic axis (number of query tokens) sodynamic=Falseis tolerable and brings some speedup (roughly 5 to 10%).This is a lot to handle for a non power user, so all of this is handled if
use_default_compile_configs=Truein the configPerformance
The script ran is the continuous batching example with the arguments listed above and
--use-async --block-table 64 --log-level WARNING --seed 0 --force-max-length.Tests
Added a test for the feature, which passes. All tests pass.
Generations look good.