[CB] Tweaks to update and minor fixes by remi-or · Pull Request #45179 · huggingface/transformers

remi-or · 2026-04-02T06:15:08Z

Summary

This PR ads minor changes to cache.update, updates the memory handler with all new features and refactors a few parts of the code to make it more readable.

Cache indexing:

Replace fancy indexing (cache[idx, :, :]) with explicit torch.index_select / index_copy_, which have cleaner behavior under torch.compile and require non-negative indices.
Switch index storage tensors from int32 to int64 to match index_select/index_copy_ requirements, removing hidden .long() casts in the hot path.
Introduce sentinel_index and trash_index, which are dedicated positions in the cache padding zone that were already used but now have a name. It also avoids passing negative values (like -1) to indexing functions.

Memory handler (cache.py)

Collapse the three separate solving methods (compute_num_blocks_and_max_batch_tokens, compute_max_batch_tokens, compute_num_blocks) and verbose compute_memory_footprint into a single polynomial coefficient model. Each term maps to a tensor in _setup_static_tensors, making the memory model auditable and impossible to drift between solvers.
Account for previously unmodeled tensors: block_table, logprobs output rows, and async double-buffering (when use_async_batching is on).

Benchmark (continuous_batching_overall.py)

Store results in a timestamped directory instead of a single file, enabling comparison against any previous baseline.

Tests

Add TestMemoryHandlerPrediction: allocates tensors matching the handler's polynomial model and validates predicted vs actual GPU memory across 5 configurations.
Fix test_paged_attention: move cache params to ContinuousBatchingConfig, handle list-type eos_token_id, accept attention-impl-dependent output variants.

Performance

Those changes add between 1 and 3% of performance (when not using the block table) depending on the workload. No regressions.

Testing

All tests pass, expect tests/generation/test_continuous_batching.py::ContinuousBatchingWithAcceleratorTest::test_prefix_sharing which is fixed in #45026

HuggingFaceDocBuilderDev · 2026-04-02T06:25:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Good job! Nice unbloating!

ArthurZucker · 2026-04-02T10:22:22Z

+    def _equation_coefficients(self, cache_dtype: torch.dtype) -> tuple[int, int, int, int]:
+        """Returns (coeff_n, coeff_m, coeff_nm, coeff_mm) for the memory polynomial. Each addend is annotated with
+        the tensor it corresponds to in `ContinuousBatchingIOs._setup_static_tensors`.


ArthurZucker · 2026-04-02T10:22:50Z

+        """Largest positive root of a·x² + b·x + c = 0. Falls back to linear when a == 0."""
+        if a == 0:
+            return -c / b
+        discriminant = b**2 - 4 * a * c


high school memories

* Bette cache update * alternative cache uodate * Fix paged tests * Update cache computation * Add test * Memory for CB overall * int64 for tensors * Review compliance * Review compliance 2/2 * Style * Fix test

remi-or added 11 commits April 2, 2026 07:07

Bette cache update

5acc58d

alternative cache uodate

fc812ad

Fix paged tests

b525764

Update cache computation

0205d63

Add test

35c3978

Memory for CB overall

b7e0856

int64 for tensors

5073c55

Review compliance

2b7fa15

Review compliance 2/2

c8445de

Style

43431d9

Fix test

b5b7626

ArthurZucker approved these changes Apr 2, 2026

View reviewed changes

remi-or added this pull request to the merge queue Apr 2, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 2, 2026

remi-or force-pushed the cb-minor-fixes branch from 1313e0a to b5b7626 Compare April 2, 2026 11:13

Merge branch 'main' into cb-minor-fixes

5cc03de

remi-or enabled auto-merge April 2, 2026 11:14

Merge branch 'main' into cb-minor-fixes

b348605

remi-or added this pull request to the merge queue Apr 3, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 3, 2026

remi-or added this pull request to the merge queue Apr 3, 2026

Merged via the queue into main with commit 138f757 Apr 3, 2026
30 checks passed

remi-or deleted the cb-minor-fixes branch April 3, 2026 09:11

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CB] Tweaks to update and minor fixes#45179

[CB] Tweaks to update and minor fixes#45179
remi-or merged 13 commits intomainfrom
cb-minor-fixes

remi-or commented Apr 2, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2026

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Apr 2, 2026

Uh oh!

remi-or Apr 2, 2026

Uh oh!

ArthurZucker Apr 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

remi-or commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance

Testing

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

remi-or Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

remi-or commented Apr 2, 2026 •

edited

Loading