convert : refactor rope_freqs generation#9396
Conversation
This should also fix vocab-only conversion for Phi-3.
ngxson
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the implementation!
|
Btw, you can use |
|
Related to issue Support for Phi-3 models #6849 |
|
Should we merge this, or wait for the rest of the tests in OP to be confirmed? |
|
I ran the test locally and can confirm that it passes. Let's wait for final confirmation from @compilade to merge this. |
Since #9322 was merged, MiniCPM3's conversion also has to be updated before merging this. I'll update it today. |
MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid having to run its custom Python code which mixes tokenization in the same file as tool calls. gguf-py : add long and short RoPE factors to tensor mappings Empty, but the key names are used to populate the mappings.
|
@compilade Hey bro, when I try to convert Minicpm3 to Will it be implemented by this PR? Thanks! |
Yes, actually, in e83d270 I've changed how MiniCPM3's tokenizer is loaded to exactly avoid that custom code. It uses SentencePiece directly instead. I think it results in the same model files, but I didn't test that yet because I can't really run the custom tokenization code since it repends on That was a single line change in |
|
Got that, as we have been turning to NixOS (especially for production use) these days. Hope everything goes well! ❤️ |
* convert : refactor rope_freqs generation This should also fix vocab-only conversion for Phi-3. * convert : adapt MiniCPM3 to separate rope_freqs insertion MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid having to run its custom Python code which mixes tokenization in the same file as tool calls. gguf-py : add long and short RoPE factors to tensor mappings Empty, but the key names are used to populate the mappings.
* convert : refactor rope_freqs generation This should also fix vocab-only conversion for Phi-3. * convert : adapt MiniCPM3 to separate rope_freqs insertion MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid having to run its custom Python code which mixes tokenization in the same file as tool calls. gguf-py : add long and short RoPE factors to tensor mappings Empty, but the key names are used to populate the mappings.
* convert : refactor rope_freqs generation This should also fix vocab-only conversion for Phi-3. * convert : adapt MiniCPM3 to separate rope_freqs insertion MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid having to run its custom Python code which mixes tokenization in the same file as tool calls. gguf-py : add long and short RoPE factors to tensor mappings Empty, but the key names are used to populate the mappings.
* convert : refactor rope_freqs generation This should also fix vocab-only conversion for Phi-3. * convert : adapt MiniCPM3 to separate rope_freqs insertion MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid having to run its custom Python code which mixes tokenization in the same file as tool calls. gguf-py : add long and short RoPE factors to tensor mappings Empty, but the key names are used to populate the mappings.
Follow-up from #9117 (comment).
This isolates handling of generated tensors like
rope_freqsfor Llama3, Phi-3 and Exaone. This should also fix--vocab-onlyconversion for Phi-3-128k and Phi-3.5 (which previously generated invalid GGUF files because they included a non-zero tensor count while not including any tensor data).Note that this will also be relevant for MiniCPM3 (#9322), which re-uses the misbehaving Phi-3 rope tensors insertion.
TODO
llama-tokenizetoo)rope_freqs.weight, and has the same checksum as a previous conversion I did a while ago with the same checkout of the upstream modelllama-tokenize)tests/test-lora-conversion-inference.shrope_freqs, though.master