Fix: Revert showing control tokens by default for server OpenAI Chat completions by K-Mistele · Pull Request #6860 · ggml-org/llama.cpp

K-Mistele · 2024-04-24T03:35:06Z

In #6807 @ggerganov added the ability to toggle showing control tokens (e.g. EOS tokens). In common.cpp this was set to true by default in two places, which broke the /v1/chat/completions endpoint as described in #6859 - in short, the OpenAI chat completions endpoint response now includes the EOS / stop token, which is different from past behavior / expected behavior.

I have confirmed that reverting the booleans to be false in the two places in common.cpp fixes this behavior.

While this PR fixes the breaking change, it may affect behavior that is dependent on #6807's new default of true in other places. This may need to be investigated further, but I propose reverting the change for now to fix the broken /v1/chat/completions behavior.

s/o @QueryType for opening #6847 as well which was caused by the same underlying issue.

API Response before the change (ChatML model):

API Response before the change (Mistral model / llama2 template):

Correct API response after this change:

(note the absence of control tokens)

…vide overridden declaration to receive "bool special" param to toggle showing control tokens

…mon/common.cpp to specify "false" so that control tokens are not shown in chat completion responses"

K-Mistele · 2024-04-24T03:51:10Z

I provided an alternative solution that reverts the change to common.cpp to prevent breaking things that depended on the change in #6807

I added an overridden declaration of llama_token_to_piece in common.cpp and common.h that allows passing in a boolean to actually toggle if the control characters should be shown:

`common.h` overridden declaration

std::string llama_token_to_piece(const struct llama_context * ctx,  llama_token  token, bool  special);

`common.cpp` definition for the declaration

// duplicate with ability to specify whether to use special token
std::string llama_token_to_piece(const struct llama_context * ctx, llama_token token, bool special) {
    std::vector<char> result(8, 0);
    const int n_tokens = llama_token_to_piece(llama_get_model(ctx), token, result.data(), result.size(), special);
    if (n_tokens < 0) {
        result.resize(-n_tokens);
        int check = llama_token_to_piece(llama_get_model(ctx), token, result.data(), result.size(), special);
        GGML_ASSERT(check == -n_tokens);
    } else {
        result.resize(n_tokens);
    }

    return std::string(result.data(), result.size());
}

Then, I edited server.cpp to use the overridden version of the function, and to pass in false instead of the true default from common.cpp.

The net result of this is that it still fixes the broken behavior in the /v1/chat/completions endpoint on the server, but anything downstream that depends on the changes in #6807 elsewhere in the project should still be fine.

You'll have to forgive me because my C++ is a little rusty (no pun intended), but as best as I can remember from my C++ class in college, this is the correct way to override a function.

Bonus: everything compiles correctly, and I confirmed that this is still a working fix for the problem with the server -- no stop tokens in the server's chat completions response:

(sense of humor not included)

K-Mistele · 2024-04-24T03:51:38Z

I am open to comments, concerns and/or complaints regarding if this is the correct way to fix this problem

K-Mistele · 2024-05-06T00:17:29Z

Hmm i am not sure what happened but on the most up-to-date llama.cpp master branch, I am observing this issue again even though the default appears to remain unchanged, and server.cpp still explicitly passes false to llama_token_to_piece. Possibly only related to llama 3 though - it works fine on mistral, but on llama 3 I get a at the end of every response with --chat-template llama3

…g#6860) * fix: revert showing control tokens by default * feat: revert changes to default behavior of llama_token_to_piece; provide overridden declaration to receive "bool special" param to toggle showing control tokens * feat: use the overridden declaration of llama_token_to_piece from common/common.cpp to specify "false" so that control tokens are not shown in chat completion responses" * common : simplify --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

K-Mistele added 3 commits April 23, 2024 22:25

fix: revert showing control tokens by default

572960a

feat: revert changes to default behavior of llama_token_to_piece; pro…

206c974

…vide overridden declaration to receive "bool special" param to toggle showing control tokens

feat: use the overridden declaration of llama_token_to_piece from com…

6c081e5

…mon/common.cpp to specify "false" so that control tokens are not shown in chat completion responses"

K-Mistele changed the title ~~Fix: Revert showing control tokens by default~~ Fix: Revert showing control tokens by default for server OpenAI Chat completions Apr 24, 2024

common : simplify

411137a

ggerganov approved these changes Apr 24, 2024

View reviewed changes

ggerganov merged commit 37246b1 into ggml-org:master Apr 24, 2024

This was referenced Apr 24, 2024

server: recieving <|im_end|> in all responses of llama 3 #6873

Closed

OpenAI-Compatible Chat Completions API Endpoint Responses include EOS / stop tokens #6859

Closed

The model began to add </s > to each main and server response #6872

Closed

QueryType mentioned this pull request Apr 24, 2024

[Old models] Gibberish text at the end of chat/completion - server #6847

Closed

DifferentialityDevelopment mentioned this pull request Apr 25, 2024

Added llama-3 chat template #6751

Merged

thecivilizedgamer mentioned this pull request Apr 25, 2024

server: phi-3 end token not handled? #6903

Closed

Inego mentioned this pull request May 6, 2024

Server completion streaming returns special tokens as empty strings in chunks #7106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Revert showing control tokens by default for server OpenAI Chat completions#6860

Fix: Revert showing control tokens by default for server OpenAI Chat completions#6860
ggerganov merged 4 commits intoggml-org:masterfrom
K-Mistele:kmistele/fix-default-showing-control-tokens

K-Mistele commented Apr 24, 2024

Uh oh!

K-Mistele commented Apr 24, 2024

Uh oh!

K-Mistele commented Apr 24, 2024

Uh oh!

K-Mistele commented May 6, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

K-Mistele commented Apr 24, 2024

API Response before the change (ChatML model):

API Response before the change (Mistral model / llama2 template):

Correct API response after this change:

Uh oh!

K-Mistele commented Apr 24, 2024

common.h overridden declaration

common.cpp definition for the declaration

Uh oh!

K-Mistele commented Apr 24, 2024

Uh oh!

K-Mistele commented May 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`common.h` overridden declaration

`common.cpp` definition for the declaration

K-Mistele commented May 6, 2024 •

edited

Loading