[chat template] return assistant mask in processors#38545
[chat template] return assistant mask in processors#38545zucchini-nlp merged 9 commits intohuggingface:mainfrom
Conversation
7d3e2a9 to
61112a1
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Ready for review! |
| is_tokenizers_fast = hasattr(self, "tokenizer") and self.tokenizer.__class__.__name__.endswith("Fast") | ||
|
|
There was a problem hiding this comment.
I see that tokenizer's never checks this, probably because all new LLMs support fast tokenizers. Though users can force set use_fast=False for some reasons and the error message in that case is not informative
Should I add the check on tokenizer's apply_chat_template as well, WDYT?
Rocketknight1
left a comment
There was a problem hiding this comment.
Sorry for taking so long to get to this! The logic makes sense, but the use of bisect_left confused me for a bit. After staring at it for a while, though, I think it's valid.
|
[For maintainers] Suggested jobs to run (before merge) run-slow: csm, shieldgemma2 |
* messed up the git history, squash commits * raise error if slow and refine tests * index was off by one * fix the test
* messed up the git history, squash commits * raise error if slow and refine tests * index was off by one * fix the test
* messed up the git history, squash commits * raise error if slow and refine tests * index was off by one * fix the test
* messed up the git history, squash commits * raise error if slow and refine tests * index was off by one * fix the test
* messed up the git history, squash commits * raise error if slow and refine tests * index was off by one * fix the test
* messed up the git history, squash commits * raise error if slow and refine tests * index was off by one * fix the test
* messed up the git history, squash commits * raise error if slow and refine tests * index was off by one * fix the test
* messed up the git history, squash commits * raise error if slow and refine tests * index was off by one * fix the test
What does this PR do?
Fixes #38521. I checked with fast tokenizers' implementation of
word_to_charand saw no difference in the time taken, so I think this can be the permanent solutionOtherwise we can add in
BatchFeaturesupport forEncodingFastfeatures, though I don't think anyone needs them and I have never seen user requesting it