Skip to content

add phi3 support#6852

Merged
ggerganov merged 9 commits intoggml-org:masterfrom
liuwei-git:master
Apr 24, 2024
Merged

add phi3 support#6852
ggerganov merged 9 commits intoggml-org:masterfrom
liuwei-git:master

Conversation

@liuwei-git
Copy link
Copy Markdown
Contributor

Make phi3 as an explicit model to support in llama.

@ggerganov
Copy link
Copy Markdown
Member

Might have to add <|end|> as an EOT token:

diff --git a/llama.cpp b/llama.cpp
index 63483b9a..698ad236 100644
--- a/llama.cpp
+++ b/llama.cpp
@@ -4381,6 +4381,7 @@ static void llm_load_vocab(
                         //vocab.id_to_token[t.second].type == LLAMA_TOKEN_TYPE_CONTROL &&
                         (t.first == "<|eot_id|>" ||
                          t.first == "<|im_end|>" ||
+                         t.first == "<|end|>" ||
                          t.first == "<end_of_turn>"
                         )
                    ) {

This seems to be producing better results than #6851
For example, I don't see the <|calc|> and <|/data|> tokens generated. I wonder where the difference comes from?

ggerganov added a commit that referenced this pull request Apr 23, 2024
@ggerganov
Copy link
Copy Markdown
Member

So the difference was in the tokenization - in the other PR the <|system|> token was being incorrectly mapped to 32034, while it should have been 32006. I applied the changes from this PR into _set_vocab_sentencepiece() since this implementation seems to be correct: 5dcccb3

I wonder if it affects the conversion of some other models too?

Anyway, now the results match except for the other PR having a BOS token added at the start, while this PR does not:

https://github.com/ggerganov/llama.cpp/pull/6852/files#diff-ecca4c14f9a354b5557247cafd79409b332bfa1e9c12594f83282af1fde4743eR2064

Just double-checking if this is the intent?

There is also a minor issue because of this - the tokenizer.ggml.add_bos_token KV is written 2 times in the header: one time it is true and another it is false:

python3 gguf-py/scripts/gguf-dump.py models/phi-3-4k-instruct/ggml-model-f16.gguf
* Loading: models/phi-3-4k-instruct/ggml-model-f16-new.gguf
Traceback (most recent call last):
KeyError: 'Duplicate tokenizer.ggml.add_bos_token already in list at offset 725511'

This is because we already write this field automatically here:

https://github.com/ggerganov/llama.cpp/blob/171a73890ec0948293c675a8ab1779e01aac906f/gguf-py/gguf/vocab.py#L61-L71

@bartowski1182
Copy link
Copy Markdown
Contributor

So is the implementation in #6851 preferred or are both needed for official support?

Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the nice implementation.

I decided to set the "add BOS" KV to True based on this configuration:

https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/3a811845d89f3c1b3f41b341d0f9f05104769f35/tokenizer_config.json#L2

@x4080
Copy link
Copy Markdown

x4080 commented Apr 25, 2024

hi, using server "<|eot_id|>" still printed at the end of conversation, and I can't find stop token now in /examples/server/utils.hpp, how to avoid this "<|eot_id|>" in server ?

thanks

@ggerganov
Copy link
Copy Markdown
Member

Most likely you are using base model instead of instruct model. See #6916 for clear explanation and way to add stop tokens from client-side

@x4080
Copy link
Copy Markdown

x4080 commented Apr 26, 2024

@ggerganov Hi, no i was using Phi-3-mini-128k-instruct.Q4_K_M.gguf, forget it, I think this was for server, for non server it already works fine

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* add explicit phi3 support

* add explicit phi3 support

* remove unused code

* convert : add BOS token

* llama : match EOT token <|end|>

* llama : minor / style

* llama : tabs -> spaces

* convert : fix lint checks

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
* add explicit phi3 support

* add explicit phi3 support

* remove unused code

* convert : add BOS token

* llama : match EOT token <|end|>

* llama : minor / style

* llama : tabs -> spaces

* convert : fix lint checks

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants