simple-chat : only add bos on first prompt#10129
Conversation
|
It seems that the |
|
However, maybe it's better to refactor llama-vocab because UPD: the issue with EOS happens in |
Should check if |
|
Is there any case where |
|
I was about to say that |
I think it kind of makes sense for embedding models.
No, if the model needs EOS in responses, it will generate it. So this option is not useful for chat use cases. |
Was there at any point (back in llama2 days or even earlier) problem with models not generating EOS tokens? There's a guard for antiprompt generated by the model in main. |
If this only makes sense for embeddings models, wouldn't this make the way most examples use it wrong? I took a quick look, and almost every case still treats this as an |
|
I wouldn't be surprised if there are incorrect usages of the
So with this logic in mind, the current implementation in this branch seems OK. Is it failing with a certain model that you tried? If yes, what are the 2 KV values set to? |
No, the responses generated look fine. I missed that it is only added if |
Fix bos being added on all calls to
llama_tokenize.