Skip to content

Misc. bug: GLM-4.7-Flash (Vulkan on 7900 XTX) inference faster @d8192 than @d4096 or @d0 #19255

@Nindaleth

Description

@Nindaleth

Name and Version

version: 7901 (8a98ba4)
built with GNU 15.2.1 for Linux x86_64

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-bench

Command line

./build/bin/llama-bench --model ./models/GLM-4.7-Flash-IQ4_XS.gguf -ngl 99 -fa 0,1 -d 0,4096,8192

Problem description & steps to reproduce

With FA enabled, I get generally worse performance with Vulkan than with ROCm, but I'm not here about PP this time. What is interesting is that smaller pre-filled context window results in worse TG performance (see the logs below for full table including fa=0 and ROCm).

This is on Fedora 43, 7900 XTX, ROCm 6.4.2, Vulkan 1.4.342, Mesa RADV 26.1.0-0.5.gita8fac76:

model size params backend ngl fa test t/s
deepseek2 30B.A3B IQ4_XS - 4.25 bpw 15.15 GiB 29.94 B Vulkan 99 1 tg128 102.97 ± 0.08
deepseek2 30B.A3B IQ4_XS - 4.25 bpw 15.15 GiB 29.94 B Vulkan 99 1 tg128 @ d4096 41.81 ± 0.01
deepseek2 30B.A3B IQ4_XS - 4.25 bpw 15.15 GiB 29.94 B Vulkan 99 1 tg128 @ d8192 117.59 ± 0.07

First Bad Commit

No response

Relevant log output

Logs
# device info on ROCm
  Device 0: Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
# device info on Vulkan
ggml_vulkan: 0 = Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
# reproducible results
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | ROCm       |  99 |  0 |           pp512 |      2590.64 ± 22.66 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | Vulkan     |  99 |  0 |           pp512 |        916.86 ± 2.69 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | ROCm       |  99 |  0 |           tg128 |         83.17 ± 0.03 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | Vulkan     |  99 |  0 |           tg128 |         74.51 ± 0.06 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | ROCm       |  99 |  0 |   pp512 @ d4096 |      1146.33 ± 20.77 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | Vulkan     |  99 |  0 |   pp512 @ d4096 |        272.53 ± 2.52 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | ROCm       |  99 |  0 |   tg128 @ d4096 |         44.94 ± 0.02 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | Vulkan     |  99 |  0 |   tg128 @ d4096 |         23.02 ± 0.12 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | ROCm       |  99 |  0 |   pp512 @ d8192 |        734.88 ± 5.89 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | Vulkan     |  99 |  0 |   pp512 @ d8192 |        341.09 ± 1.12 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | ROCm       |  99 |  0 |   tg128 @ d8192 |         27.81 ± 0.02 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | Vulkan     |  99 |  0 |   tg128 @ d8192 |         73.74 ± 0.04 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | ROCm       |  99 |  1 |           pp512 |      2684.54 ± 19.55 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | Vulkan     |  99 |  1 |           pp512 |        786.92 ± 1.40 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | ROCm       |  99 |  1 |           tg128 |         89.74 ± 0.08 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | Vulkan     |  99 |  1 |           tg128 |        102.97 ± 0.08 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | ROCm       |  99 |  1 |   pp512 @ d4096 |       1202.67 ± 3.16 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | Vulkan     |  99 |  1 |   pp512 @ d4096 |        748.20 ± 2.73 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | ROCm       |  99 |  1 |   tg128 @ d4096 |         82.65 ± 0.07 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | Vulkan     |  99 |  1 |   tg128 @ d4096 |         41.81 ± 0.01 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | ROCm       |  99 |  1 |   pp512 @ d8192 |        775.44 ± 2.65 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | Vulkan     |  99 |  1 |   pp512 @ d8192 |        637.03 ± 3.20 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | ROCm       |  99 |  1 |   tg128 @ d8192 |         76.71 ± 0.03 |
| deepseek2 30B.A3B IQ4_XS - 4.25 bpw |  15.15 GiB |    29.94 B | Vulkan     |  99 |  1 |   tg128 @ d8192 |        117.59 ± 0.07 |

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions