You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary: The --no-cnv and --simple-io flags are ignored by the binary, forcing the model into conversation mode and causing garbled output. The only workaround is to manually format prompts in ChatML.
Steps to reproduce:
Build from the current ROCm main branch (commit 96ac5a2) with: cmake .. -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release
Run a model with a chat template (like Dolphin) using the flag meant to disable conversation mode: ./bin/llama-cli -m dolphin-2.9.4-llama3.1-8b-Q4_K_M.gguf -p "Hello" -n 50 --ctx-size 4096 -ngl 25 --temp 0.7 --no-cnv
Observe the logs: main: chat template is available, enabling conversation mode (disable it with -no-cnv) followed by main: interactive mode on. and garbled output (e.g., `B!5F%+@F.C)Is there a requirements.txt ? ggml-org/llama.cpp#8...").
The --simple-io flag exhibits the same behavior.
Workaround: Manually formatting the prompt in ChatML (<|im_start|>system...) produces correct text, proving the model and GPU work fine.
Expected behavior: The --no-cnv flag should disable conversation mode and allow simple prompt completion without requiring manual ChatML formatting.
First Bad Commit
No response
Relevant log output
Command: ./bin/llama-cli -m /home/twm/ai_models/dolphin-2.9.4-llama3.1-8b-Q4_K_M.gguf -p "Hello" -n 50 --ctx-size 4096 -ngl 25 --temp 0.7 --no-cnv
Key log lines:
...
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
*** User-specified prompt will pre-start conversation, did you mean to set --system-prompt (-sys) instead?
main: interactive mode on.
...
user
Hello
assistant
B!5F%+@F.C)#8-")420D'0<<,C8<8<'GB(HB'&098&!,G+7#)2
Name and Version
Build: 7134 (Commit: 96ac5a2)
Built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
HIP
Hardware
AMD Radeon PRO W6800 (gfx1030) + AMD Radeon Graphics (gfx1036)
Models
1.deepseek-coder-33b-base.Q4_K_M.gguf
2.deepseek-llm-7b-chat.Q5_K_M.gguf
3.dolphin-2.9.4-llama3.1-8b-Q4_K_M.gguf
4.MS3.2-PaintedFantasy-24B.Q4_K_M.gguf
5.Qwen2.5-VL-32B-instruct-Q4_K_M.gguf
6.WizardLM-13B-Uncensored-Q4_K_M.gguf
Problem description & steps to reproduce
Summary: The
--no-cnvand--simple-ioflags are ignored by the binary, forcing the model into conversation mode and causing garbled output. The only workaround is to manually format prompts in ChatML.Steps to reproduce:
mainbranch (commit 96ac5a2) with:cmake .. -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release./bin/llama-cli -m dolphin-2.9.4-llama3.1-8b-Q4_K_M.gguf -p "Hello" -n 50 --ctx-size 4096 -ngl 25 --temp 0.7 --no-cnvmain: chat template is available, enabling conversation mode (disable it with -no-cnv)followed bymain: interactive mode on.and garbled output (e.g., `B!5F%+@F.C)Is there a requirements.txt ? ggml-org/llama.cpp#8...").--simple-ioflag exhibits the same behavior.<|im_start|>system...) produces correct text, proving the model and GPU work fine.Expected behavior: The
--no-cnvflag should disable conversation mode and allow simple prompt completion without requiring manual ChatML formatting.First Bad Commit
No response
Relevant log output