Fix func call tokens for internlm2#8506
Conversation
| if foken_data.get("special") and not foken_data["content"] in func_call_tokens: | ||
| toktypes[token_id] = SentencePieceTokenTypes.CONTROL |
There was a problem hiding this comment.
I'm not sure these tokens should be marked as USER_DEFINED; this would mean they would always be specially pre-tokenized even if parse_special is false, making it impossible to avoid injections from text containing these tokens when not desired. This is related to #8228.
If this is simply a display issue, then it might be more appropriate to revisit whether to detokenize the control tokens output by the model.
These may be relevant:
There was a problem hiding this comment.
@compilade hi, these special tokens are used to parse function content in the output string, meaning they should be pre-tokenized even if parse_special is false. The following is an example. Now llama-cli works with --special, but llama-server does not work with --special. Ideally, it's desired only to show these function-call related special tokens.
tool_calls = None
if request.tool_choice != 'none' and '<|plugin|>' in text:
if final_res.finish_reason == 'stop':
final_res.finish_reason = 'tool_calls'
# TODO may move to generate function
text, action = text.split('<|action_start|><|plugin|>')
action = action.split('<|action_end|>'.strip())[0]
action = action[action.find('{'):]There was a problem hiding this comment.
@RunningLeon Thanks for giving an example.
these special tokens are used to parse function content in the output string, meaning they should be pre-tokenized even if
parse_specialisfalse.
If I understand correctly, what you want is to get the function call tokens to render in the output, right? Pre-tokenization is about the input. If these tokens were pre-tokenized even when parse_special is false, this means it would be impossible to include <|plugin|> in some non-special text without the model seeing it as the <|plugin|> token.
For example:
$ ./bin/llama-tokenize --log-disable -m ../models/internlm2_5-vocab.gguf -p "What is a <|plugin|>?"
1 -> '<s>'
3993 -> 'What'
505 -> ' is'
395 -> ' a'
262 -> ' '
92538 -> '<|plugin|>'
345 -> '?'
$ ./bin/llama-tokenize --log-disable -m ../models/internlm2_5-vocab.gguf -p "What is a <|plugin|>?" --no-parse-special
1 -> '<s>'
3993 -> 'What'
505 -> ' is'
395 -> ' a'
497 -> ' <'
352 -> '|'
9267 -> 'plugin'
352 -> '|'
46973 -> '>?'If the problem is about the output of llama-server, this should be fixable by changing how it calls the llama_token_to_piece function.
Ideally, it's desired only to show these function-call related special tokens.
If you want to hide control tokens, while still showing these ones, then... hmm. This seems complicated to do with the current token attributes (USER_DEFINED and CONTROL), given that USER_DEFINED is intended for always pre-tokenized tokens like the multi-space tokens in GPT-NeoX, while CONTROL is intended for tokens with special meaning, like <|im_start|>, and in my opinion the function call tokens fit the intention for CONTROL tokens.
Why not show all special tokens, like llama-cli --special does?
Would this work?
diff --git a/examples/server/server.cpp b/examples/server/server.cpp
index badeb912..7813a295 100644
--- a/examples/server/server.cpp
+++ b/examples/server/server.cpp
@@ -1182,7 +1182,7 @@ struct server_context {
bool process_token(completion_token_output & result, server_slot & slot) {
// remember which tokens were sampled - used for repetition penalties during sampling
- const std::string token_str = llama_token_to_piece(ctx, result.tok, false);
+ const std::string token_str = llama_token_to_piece(ctx, result.tok, params.special);
slot.sampled = result.tok;
// search stop word and delete itNow
llama-cliworks with--special, butllama-serverdoes not work with--special.
This is because llama-cli handles it here:
There was a problem hiding this comment.
@compilade thanks for your quick response. Yes, ideally, we want to hide control tokens and show function call related tokens. But since <|plugin|> can be input, there might be a problem. So llama-server with --special works for me. @apresence, hi, what do you think of it as the real user?
There was a problem hiding this comment.
Yes, I believe that works! I'm happy to test it once a fix is available.
There was a problem hiding this comment.
@compilade @ggerganov hi, guys. What's the good way to include special tokens as input when using llama-cli and llama-server? I find something interesting.
The sys prompt is '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant.<|im_end|>\n<|im_start|>system name=<|plugin|>[{"name": "get_current_weather", "parameters": {"required": ["location"], "type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string"}}}, "description": "Get the current weather in a given location"}]<|im_end|>\n'
which has special tokens.
It does not work when using --prompt to pass the sys prompt while creating service using llama-server, but it works with --system-prompt-file when putting them in a local file. Besides, it also does not work when you put special tokens in messages like this
from openai import OpenAI
client = OpenAI(
api_key='YOUR_API_KEY',
base_url='http://localhost:8080/v1'
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
model=model_name,
messages=[
{"role": "system", "content": 'You are InternLM2-Chat, a harmless AI assistant.<|im_end|>\n<|im_start|>system name=<|plugin|>[{"name": "get_current_weather", "parameters": {"required": ["location"], "type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string"}}}, "description": "Get the current weather in a given location"}]'},
{"role": "user", "content": "I want to know today's weather in Shanghai"},
],
temperature=0.8,
top_p=0.8
)
print(response)
|
Fixed in #8553 |
Fix function call tokens are not shown when calling
llama-serverfor internlm2Related issue #8405