Add BPE pre-tokenization for Command-R.#7033
Add BPE pre-tokenization for Command-R.#7033dranger003 wants to merge 3 commits intoggml-org:masterfrom
Conversation
|
I tried to run the diff --git a/convert-hf-to-gguf-update.py b/convert-hf-to-gguf-update.py
index f4774003..4ad4d867 100644
--- a/convert-hf-to-gguf-update.py
+++ b/convert-hf-to-gguf-update.py
@@ -95,6 +95,14 @@ for model in models:
save_path = f"models/tokenizers/{name}/tokenizer.json"
download_file_with_auth(url, token, save_path)
+ # if downloaded file is less than 1KB, we likely need to download an LFS instead
+ if os.path.getsize(save_path) < 1024:
+ # remove the file
+ os.remove(save_path)
+ url = f"{repo}/resolve/main/tokenizer.json"
+ save_path = f"models/tokenizers/{name}/tokenizer.json"
+ download_file_with_auth(url, token, save_path)
+
if tokt == TOKENIZER_TYPE.SPM:
url = f"{repo}/resolve/main/tokenizer.model"
save_path = f"models/tokenizers/{name}/tokenizer.model" |
|
Lets goooooO! |
sealad886
left a comment
There was a problem hiding this comment.
I've had an a-ha moment, and these look good. I've run it and got the same.
Optional: add in command-r-plus with nearly identical footprint, since the pre-tokenizers are the same (couple ways to make that happen).
One note: tokenizer.config specifies the Digits pre-tokenizer is used and individual_digits=True. There's not a regex pattern in llama.cpp explicitly matching individual digits. EVen though the tests pass, would that be a possibilty to add in for completeness sake?
|
I get why it's changed, but does it mean the old quants are broken even when using the older versions of llamacpp? I'm not sure I can face downloading and quantizing it myself. |
My understanding of the overall issue (which admittedly is not as nuanced as others') would suggest that, for the most part, yes you should re-download and re-convert and re-quantize your models. The longer version is that newer models are more likely to use newer pre-tokenizer splits. Luckily, command-r and command-r-plus appear to be nearly identical to the default pre-tokenizer's splits, so for these models it's probably okay? |
I just remembered that the imat quants will definitely be incorrect for the new version of llamacpp. I have a non-imat q6k that I was hoping to use until some kind person makes a new q8 for huggingface ;) |
Yeah exactly...lots of componding issues. I've ended up moving some of these over to Ollama as well, so now I have to untangle what came from me and what I pulled from them. But I wouldn't be surprised if the majority of Ollama models have this same issue and will need to be re-built. |
sealad886
left a comment
There was a problem hiding this comment.
Actually, yeah. Let's just approve this to get it in there, and I'll add in command-r-plus separately.
|
This PR works for both Command-R version, since they have the same hash. |
Thanks, I added this update to the PR. |
had to run this several times (it'll create the folder and then error out it seems) i think cmd-r was already done at this point but i just wanted to point this out. blocked, but cmd-r is done anyway Got to convert Command R 35B into a f16 GGUF. I'm currently quantizing them and I'll test out Q5KM soon. |
|
@drummerv I am not getting any errors. convert-hf-to-gguf-update.py |
|
Superseded by PR #7063. |




I read #6920 and 120cf37 and attempting to add support Command-R support.
Closes #7030 and #7040.
EDIT: I also tested Command-R+ successfully using this PR.