examples: fix utf8 decoding error by zhangfuwen · Pull Request #5935 · ggml-org/llama.cpp

zhangfuwen · 2024-03-08T09:54:04Z

some models have a tokenizer that decodes an id into an incomplete utf8 sequence, need to validate and wait for next token
one example would be: https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GGUF/resolve/main/qwen1_5-1_8b-chat-q4_0.gguf and and an example of the token is 18137

some models have a tokenizer that decodes an id into an incomplete utf8 sequence, need to validate and wait for next token one example would be: https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GGUF/resolve/main/qwen1_5-1_8b-chat-q4_0.gguf and and an example of the token is 18137

* examples: fix utf8 decoding error some models have a tokenizer that decodes an id into an incomplete utf8 sequence, need to validate and wait for next token one example would be: https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GGUF/resolve/main/qwen1_5-1_8b-chat-q4_0.gguf and and an example of the token is 18137 * android : minor --------- Co-authored-by: zhangfuwen <zhangfuwen@foxmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

zhangfuwen marked this pull request as ready for review March 8, 2024 09:55

android : minor

13d21fa

ggerganov approved these changes Mar 10, 2024

View reviewed changes

ggerganov merged commit 7ab7b73 into ggml-org:master Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples: fix utf8 decoding error#5935

examples: fix utf8 decoding error#5935
ggerganov merged 2 commits intoggml-org:masterfrom
zhangfuwen:bugfix

zhangfuwen commented Mar 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhangfuwen commented Mar 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants