Skip to content

fix: remove stale num_return_sequences warning in paged generate#45565

Closed
armorbreak001 wants to merge 1 commit intohuggingface:mainfrom
armorbreak001:fix/paged-generate-num-return-sequences-warning
Closed

fix: remove stale num_return_sequences warning in paged generate#45565
armorbreak001 wants to merge 1 commit intohuggingface:mainfrom
armorbreak001:fix/paged-generate-num-return-sequences-warning

Conversation

@armorbreak001
Copy link
Copy Markdown

Summary

The generate(..., cache_implementation="paged") path emits a misleading warning that num_return_sequences is not supported for continuous batching. This is stale — generate_batch() already handles num_return_sequences by expanding the number of requests internally.

Changes

  • Remove the num_return_sequences > 1 condition from the warning check
  • Keep the warning for num_beams > 1 (still not supported)
  • Read num_beams from the prepared generation config instead of raw kwargs (more accurate)
  • Cache the prepared generation config to avoid calling _prepare_generation_config() twice

Before

num_return_sequences and num_beams are not supported for continuous batching yet. Got num_return_sequences=2 and num_beams=1.

After (when num_beams=1)

No warning emitted.

After (when num_beams > 1)

num_beams is not supported for continuous batching yet. Got num_beams=3.

Fixes #45563

The continuous batching path already handles num_return_sequences by
expanding requests in generate_batch(). The warning about it being
unsupported is stale and misleading. Keep the warning for num_beams > 1
which is still not supported.

Also use the prepared generation config to read num_beams instead of
raw kwargs, and cache it to avoid calling _prepare_generation_config
twice.
@github-actions
Copy link
Copy Markdown
Contributor

This PR was flagged by our automated quality checks. If you're a genuine
contributor, please reply here and a maintainer will review your PR.

Common reasons for flagging:

  • New GitHub account
  • Unusually high number of repository forks in a 24-hour window

We appreciate your contribution and apologize if this is a false positive!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Paged generate() emits a stale warning for num_return_sequences

2 participants