Skip to content

Misc. bug: b8143 produces garbage Mac x86 Vulkan with AMD GPU #20029

@deniskokarev

Description

@deniskokarev

Name and Version

I have a Mac x86 with Radeon 6900XT RDNA2 GPU with self-built llamacpp with Vulkan support. Not super fast, but usable since, at least from b6431, when things broke recently from 8143

8142 works and mistral-11b-omnimix-bf16.Q8_0.gguf produces sensible output

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-server, llama-cli

Command line

cli or server producing garbage

% ./build/bin/llama-cli --ctx-size 8192 -m ~/shared/models/mistral-11b-omnimix-bf16.Q8_0.gguf -dev Vulkan0 -p "what is population of Russian capital"
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.028 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = false
ggml_metal_device_init: has unified memory    = false
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = false
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 17163.09 MB
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6900 XT (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none

Loading model...  


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8143-aa6f918c1
model      : mistral-11b-omnimix-bf16.Q8_0.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> what is population of Russian capital

the capital of dévelopment ofץ

[ Prompt: 15.3 t/s | Generation: 41.0 t/s ]

Problem description & steps to reproduce

running some question simpler than just "Hi" produces garbage

./build/bin/llama-cli --ctx-size 8192 -m ~/shared/models/mistral-11b-omnimix-bf16.Q8_0.gguf -dev Vulkan0 -p "what is population of Russian capital"

First Bad Commit

b8143 broken, b8142 still works fine

Relevant log output

Logs
> what is population of Russian capital

the capital of dévelopment ofץ

Metadata

Metadata

Assignees

No one assigned

    Labels

    AMD GPUIssues specific to AMD GPUsVulkanIssues specific to the Vulkan backendbug-unconfirmedmacosIssues specific to macOS

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions