Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
c8539d8 to
fe01b1c
Compare
ArthurZucker
left a comment
There was a problem hiding this comment.
Nice! Missing examples in doc? but happy to have otherwise lets reduce bload in ContinuousBatchingLogitsProcessorList
|
|
||
| # Abstract base class for all continuous batching logits processors | ||
| class ContinuousBatchingLogitsProcessor(ABC): | ||
| supported_kwargs: tuple[str, ...] # Kwargs that this processor actively uses |
There was a problem hiding this comment.
should be a typedict
There was a problem hiding this comment.
Why? Do you want to add type checking when passing args?
There was a problem hiding this comment.
I mean the kwargs you can list which ones are supporterd / processor is what I mean
| def __init__(self, top_p: float, filter_value: float = -float("Inf"), min_tokens_to_keep: int = 1): | ||
| top_p = float(top_p) | ||
| if top_p < 0 or top_p > 1.0: | ||
| raise ValueError(f"`top_p` has to be a float > 0 and < 1, but is {top_p}") | ||
| if not isinstance(min_tokens_to_keep, int) or (min_tokens_to_keep < 1): | ||
| raise ValueError(f"`min_tokens_to_keep` has to be a positive integer, but is {min_tokens_to_keep}") |
There was a problem hiding this comment.
@strict decorator validation for this
There was a problem hiding this comment.
@strict is for config I think? I dont see any logits processor with strict.
There was a problem hiding this comment.
for dataclasses not config necessarily no?
There was a problem hiding this comment.
But logits processors are not dataclasses? Anyway, we initialize it from the classic logit processor now, so an error in parameters will be caught by the classic logits processors before CB one is even initialized
| ``` | ||
| """ | ||
|
|
||
| supports_continuous_batching: bool = False |
There was a problem hiding this comment.
should not be needed if we have a mapping / list (we don't "pollute" unrelated classes.
There was a problem hiding this comment.
But any time a new logit processor is added, its support is set to None that way. A mapping we will have to manually keep updating which is less redundant. wdyt?
db1c672 to
aace16a
Compare
323e792 to
3a8bba1
Compare
* Stacked commit before rebase * nit
* Stacked commit before rebase * nit
* Stacked commit before rebase * nit
Summary
This PR adds per-request logits processors and overalls the way CB handles logits processors.
It introduces batched logits processing with per-request parameters for continuous batching, enabling each request in a batch to use different sampling parameters (temperature, top_k, top_p). This is essential for serving scenarios where different users may request different generation configurations within the same batch.
The main changes are:
ContinuousBatchingLogitsProcessorListto manage logits processors and three per-request processor implementations (Temperature, TopK, TopP) that operate on the batched tensor format using vectorized operationsThe processor list validates compatibility at init time, warns about unsupported processors, and efficiently prepares tensor arguments by storing them as views into the bulk input tensor to minimize memory transfers.