Name and Version
I have a Mac x86 with Radeon 6900XT RDNA2 GPU with self-built llamacpp with Vulkan support. Not super fast, but usable since, at least from b6431, when things broke recently from 8143
8142 works and mistral-11b-omnimix-bf16.Q8_0.gguf produces sensible output
Operating systems
Mac
Which llama.cpp modules do you know to be affected?
llama-server, llama-cli
Command line
cli or server producing garbage
% ./build/bin/llama-cli --ctx-size 8192 -m ~/shared/models/mistral-11b-omnimix-bf16.Q8_0.gguf -dev Vulkan0 -p "what is population of Russian capital"
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.028 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name: MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = false
ggml_metal_device_init: has unified memory = false
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: has tensor = false
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = false
ggml_metal_device_init: recommendedMaxWorkingSetSize = 17163.09 MB
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6900 XT (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
Loading model...
▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀
build : b8143-aa6f918c1
model : mistral-11b-omnimix-bf16.Q8_0.gguf
modalities : text
available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file
> what is population of Russian capital
the capital of dévelopment ofץ
[ Prompt: 15.3 t/s | Generation: 41.0 t/s ]
Problem description & steps to reproduce
running some question simpler than just "Hi" produces garbage
./build/bin/llama-cli --ctx-size 8192 -m ~/shared/models/mistral-11b-omnimix-bf16.Q8_0.gguf -dev Vulkan0 -p "what is population of Russian capital"
First Bad Commit
b8143 broken, b8142 still works fine
Relevant log output
Logs
> what is population of Russian capital
the capital of dévelopment ofץ
Name and Version
I have a Mac x86 with Radeon 6900XT RDNA2 GPU with self-built
llamacppwith Vulkan support. Not super fast, but usable since, at least from b6431, when things broke recently from 81438142 works and
mistral-11b-omnimix-bf16.Q8_0.ggufproduces sensible outputOperating systems
Mac
Which llama.cpp modules do you know to be affected?
llama-server, llama-cli
Command line
Problem description & steps to reproduce
running some question simpler than just "Hi" produces garbage
First Bad Commit
b8143 broken, b8142 still works fine
Relevant log output
Logs