Fix func call tokens for internlm2 by RunningLeon · Pull Request #8506 · ggml-org/llama.cpp

RunningLeon · 2024-07-16T09:12:19Z

Fix function call tokens are not shown when calling llama-server for internlm2
Related issue #8405

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

compilade · 2024-07-16T21:03:38Z

+                    if foken_data.get("special") and not foken_data["content"] in func_call_tokens:
                        toktypes[token_id] = SentencePieceTokenTypes.CONTROL


I'm not sure these tokens should be marked as USER_DEFINED; this would mean they would always be specially pre-tokenized even if parse_special is false, making it impossible to avoid injections from text containing these tokens when not desired. This is related to #8228.

If this is simply a display issue, then it might be more appropriate to revisit whether to detokenize the control tokens output by the model.

These may be relevant:

https://github.com/ggerganov/llama.cpp/blob/5e116e8dd51775f8f1c090570be148d5d7eea6c3/examples/main/main.cpp#L858

https://github.com/ggerganov/llama.cpp/blob/5e116e8dd51775f8f1c090570be148d5d7eea6c3/examples/server/server.cpp#L1185

@compilade hi, these special tokens are used to parse function content in the output string, meaning they should be pre-tokenized even if parse_special is false. The following is an example. Now llama-cli works with --special, but llama-server does not work with --special. Ideally, it's desired only to show these function-call related special tokens.

tool_calls = None if request.tool_choice != 'none' and '<|plugin|>' in text: if final_res.finish_reason == 'stop': final_res.finish_reason = 'tool_calls' # TODO may move to generate function text, action = text.split('<|action_start|><|plugin|>') action = action.split('<|action_end|>'.strip())[0] action = action[action.find('{'):]

@RunningLeon Thanks for giving an example.

these special tokens are used to parse function content in the output string, meaning they should be pre-tokenized even if parse_special is false.

If I understand correctly, what you want is to get the function call tokens to render in the output, right? Pre-tokenization is about the input. If these tokens were pre-tokenized even when parse_special is false, this means it would be impossible to include <|plugin|> in some non-special text without the model seeing it as the <|plugin|> token.

For example:

$ ./bin/llama-tokenize --log-disable -m ../models/internlm2_5-vocab.gguf -p "What is a <|plugin|>?" 1 -> '<s>' 3993 -> 'What' 505 -> ' is' 395 -> ' a' 262 -> ' ' 92538 -> '<|plugin|>' 345 -> '?' $ ./bin/llama-tokenize --log-disable -m ../models/internlm2_5-vocab.gguf -p "What is a <|plugin|>?" --no-parse-special 1 -> '<s>' 3993 -> 'What' 505 -> ' is' 395 -> ' a' 497 -> ' <' 352 -> '|' 9267 -> 'plugin' 352 -> '|' 46973 -> '>?'

If the problem is about the output of llama-server, this should be fixable by changing how it calls the llama_token_to_piece function.

Ideally, it's desired only to show these function-call related special tokens.

If you want to hide control tokens, while still showing these ones, then... hmm. This seems complicated to do with the current token attributes (USER_DEFINED and CONTROL), given that USER_DEFINED is intended for always pre-tokenized tokens like the multi-space tokens in GPT-NeoX, while CONTROL is intended for tokens with special meaning, like <|im_start|>, and in my opinion the function call tokens fit the intention for CONTROL tokens.

Why not show all special tokens, like llama-cli --special does?

Would this work?

diff --git a/examples/server/server.cpp b/examples/server/server.cpp index badeb912..7813a295 100644 --- a/examples/server/server.cpp +++ b/examples/server/server.cpp @@ -1182,7 +1182,7 @@ struct server_context { bool process_token(completion_token_output & result, server_slot & slot) { // remember which tokens were sampled - used for repetition penalties during sampling - const std::string token_str = llama_token_to_piece(ctx, result.tok, false); + const std::string token_str = llama_token_to_piece(ctx, result.tok, params.special); slot.sampled = result.tok; // search stop word and delete it

Now llama-cli works with --special, but llama-server does not work with --special.

This is because llama-cli handles it here:

https://github.com/ggerganov/llama.cpp/blob/5e116e8dd51775f8f1c090570be148d5d7eea6c3/examples/main/main.cpp#L766

@compilade thanks for your quick response. Yes, ideally, we want to hide control tokens and show function call related tokens. But since <|plugin|> can be input, there might be a problem. So llama-server with --special works for me. @apresence, hi, what do you think of it as the real user?

Yes, I believe that works! I'm happy to test it once a fix is available.

@compilade @ggerganov hi, guys. What's the good way to include special tokens as input when using llama-cli and llama-server? I find something interesting.
The sys prompt is '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant.<|im_end|>\n<|im_start|>system name=<|plugin|>[{"name": "get_current_weather", "parameters": {"required": ["location"], "type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string"}}}, "description": "Get the current weather in a given location"}]<|im_end|>\n'
which has special tokens.
It does not work when using --prompt to pass the sys prompt while creating service using llama-server, but it works with --system-prompt-file when putting them in a local file. Besides, it also does not work when you put special tokens in messages like this

from openai import OpenAI client = OpenAI( api_key='YOUR_API_KEY', base_url='http://localhost:8080/v1' ) model_name = client.models.list().data[0].id response = client.chat.completions.create( model=model_name, messages=[ {"role": "system", "content": 'You are InternLM2-Chat, a harmless AI assistant.<|im_end|>\n<|im_start|>system name=<|plugin|>[{"name": "get_current_weather", "parameters": {"required": ["location"], "type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string"}}}, "description": "Get the current weather in a given location"}]'}, {"role": "user", "content": "I want to know today's weather in Shanghai"}, ], temperature=0.8, top_p=0.8 ) print(response)

ggerganov · 2024-07-18T08:07:07Z

Fixed in #8553

fix func call tokens for internlm2

7b575e7

github-actions Bot added the python python script changes label Jul 16, 2024

compilade reviewed Jul 16, 2024

View reviewed changes

This was referenced Jul 18, 2024

Bug: InternLM 2.5 Chat Tool Calls: Incorrect and Inconsistent Formatting #8405

Closed

fix special not work for llama-server #8553

Merged

ggerganov closed this Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix func call tokens for internlm2#8506

Fix func call tokens for internlm2#8506
RunningLeon wants to merge 1 commit intoggml-org:masterfrom
RunningLeon:func-call-tokens

RunningLeon commented Jul 16, 2024

Uh oh!

compilade Jul 16, 2024

Uh oh!

RunningLeon Jul 17, 2024

Uh oh!

compilade Jul 17, 2024

Uh oh!

RunningLeon Jul 17, 2024

Uh oh!

apresence Jul 18, 2024

Uh oh!

RunningLeon Jul 18, 2024 •

edited

Loading

Uh oh!

ggerganov commented Jul 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if foken_data.get("special") and not foken_data["content"] in func_call_tokens:
		toktypes[token_id] = SentencePieceTokenTypes.CONTROL

Conversation

RunningLeon commented Jul 16, 2024

Uh oh!

compilade Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

RunningLeon Jul 17, 2024

Choose a reason for hiding this comment

Uh oh!

compilade Jul 17, 2024

Choose a reason for hiding this comment

Uh oh!

RunningLeon Jul 17, 2024

Choose a reason for hiding this comment

Uh oh!

apresence Jul 18, 2024

Choose a reason for hiding this comment

Uh oh!

RunningLeon Jul 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Jul 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RunningLeon Jul 18, 2024 •

edited

Loading