Skip to content

[CB] Better parametrization for compile#44578

Merged
remi-or merged 27 commits intomainfrom
cb-better-compile
Mar 19, 2026
Merged

[CB] Better parametrization for compile#44578
remi-or merged 27 commits intomainfrom
cb-better-compile

Conversation

@remi-or
Copy link
Copy Markdown
Collaborator

@remi-or remi-or commented Mar 10, 2026

Summary

This PR adds three attributes to the compile config, to have granularity over how varlen (handles mixed prefil and decode batches) and decode (only decode batches) are compiled. We want to have this kind of granularity because varlen has 2 dynamic axis (number of query tokens and KV tokens) so the shape of varlen batches varies a lot, ad it greatly benefits from dynamic=True for compile. On the contrary, decode batches have only 1 dynamic axis (number of query tokens) so dynamic=False is tolerable and brings some speedup (roughly 5 to 10%).
This is a lot to handle for a non power user, so all of this is handled if use_default_compile_configs=True in the config

Performance

Arguments Duration (s) Generated tokens Throughput (tok/s)
--samples 32 --max-new-tokens 2048 24.65 65536 2658.25
--samples 32 --max-new-tokens 2048 --compile 19.45 65536 3368.95
--samples 32 --max-new-tokens 4096 55.19 131072 2374.86
--samples 32 --max-new-tokens 4096 --compile 43.66 131072 3002.35
--samples 32 --max-new-tokens 8192 131.01 262144 2000.97
--samples 32 --max-new-tokens 8192 --compile 109.9 262144 2385.21
--samples 100 --max-new-tokens 4096 90.02 409600 4549.98
--samples 100 --max-new-tokens 4096 --compile 77.44 409600 5289.48

The script ran is the continuous batching example with the arguments listed above and --use-async --block-table 64 --log-level WARNING --seed 0 --force-max-length.

Tests

Added a test for the feature, which passes. All tests pass.
Generations look good.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@remi-or remi-or force-pushed the cb-better-compile branch 2 times, most recently from cd937dd to 0f7685e Compare March 18, 2026 10:46
@remi-or remi-or force-pushed the cb-better-compile branch from 0f7685e to af14d04 Compare March 18, 2026 10:51
@remi-or remi-or marked this pull request as ready for review March 18, 2026 14:19
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIce! 🤗

Comment thread src/transformers/generation/continuous_batching/continuous_api.py Outdated
Comment thread src/transformers/integrations/flash_paged.py Outdated
@remi-or remi-or added this pull request to the merge queue Mar 19, 2026
Merged via the queue into main with commit e0f69d3 Mar 19, 2026
29 checks passed
@remi-or remi-or deleted the cb-better-compile branch March 19, 2026 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants