fix num_assistant_tokens with heuristic schedule#28759
fix num_assistant_tokens with heuristic schedule#28759amyeroberts merged 17 commits intohuggingface:mainfrom
Conversation
gante
left a comment
There was a problem hiding this comment.
Thank you for the fix and the tests 💛
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
check that candidate_generator.assistant_model exists since some some speculations (like ngram and PLD) don't have assistant_model attribute
ArthurZucker
left a comment
There was a problem hiding this comment.
Thanks for the PR!
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for adding this!
For the code quality checks, you'll need to run make fixup and push the changes.
From the docstring, it mentions behaviour about `"heuristic_transient" being reset, but I don't see logic relating to it in this diff. Does this already happen?
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
…transformers into fix_num_assistant_tokens
I just fixed docstring |
|
@jmamou Let's try and make the CI green :) You'll need to resolve the quality checks by running For the other failing tests, you'll need to try rebasing on main. |
|
@amyeroberts |
|
@jmamou Tbh, I'm not sure what the cause of these failures are. I would first suggest rebasing on main to make sure you have all of the most recent commits. This will trigger a re-run of the CI too. |
|
@amyeroberts |
|
@amyeroberts both CI failures seem unrelated to this PR, including |
|
@amyeroberts |
|
@jmamou Thanks for this contribution and your patience with our misbehaving CI! |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
We have defined 2 different num_assistant_tokens_schedule values:
heuristic: When all speculative tokens are correct, increasenum_assistant_tokensby 2 else reduce by 1.num_assistant_tokensvalue is persistent over multiple generation calls with the same assistant model.heuristic_transient: Same as"heuristicbutnum_assistant_tokensis reset to its initial value after each generation call.Fixes # (issue)
#27979 (comment)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@gante @amyeroberts