Eval bug: gemma-4-26b-a4b-it-turbo using tbq4_0 is insane (and gemma3 fails to load correctly)

### Name and Version

```console
zekromllm@zekromllm:/HomeLab/GPU/TurboQuantAmesianX$ docker compose run --rm --remove-orphans --entrypoint llama-cli turboa-llama-cpp   --model /models/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf   --mmproj /models/mmproj-google_gemma-4-26B-A4B-it-f16.gguf   --flash-attn on   --mlock   --no-warmup   --parallel 4   --n-gpu-layers -1   --n-cpu-moe 18   --cache-type-k tbq4_0   --cache-type-v tbq4_0   --threads 12   --ctx-size 128000   --batch-size 4096   --ubatch-size 1024   --n-predict 4096   -p "Why is the sky blue?" --version
Container turboquantamesianx-turboa-llama-cpp-run-7a90e7ab92d8 Creating 
Container turboquantamesianx-turboa-llama-cpp-run-7a90e7ab92d8 Created 
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 15842 MiB):
  Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes, VRAM: 15842 MiB
version: 8702 (cf2170e5f)
built with GNU 13.3.0 for Linux x86_64
```
The 1.5.2 version


### Operating systems

#### Docker compose
Builder: nvidia/cuda:13.1.1-devel-ubuntu24.04
Runtime: nvidia/cuda:13.1.1-runtime-ubuntu24.04 

[docker-compose.yml](https://github.com/user-attachments/files/26542421/docker-compose.yml)

[Dockerfile.txt](https://github.com/user-attachments/files/26542470/Dockerfile.txt)

### GGML backends

CUDA

### Hardware

Ryzen 9900x + 5070Ti @PCIe5 + 64GB DDR5

### Models

gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF
gemma-3-27b-it-IQ4_XS.gguf https://huggingface.co/unsloth/gemma-3-27b-it-GGUF

### Problem description & steps to reproduce

When I set `--cache-type-k tbq4_0 --cache-type-v tbq4_0` for Gemma4 26B A4B Q6_K_XL it produces gibberish just like it did with TheTom's version with turbo4.  Even with out mmproj.

```console
docker compose run --rm --entrypoint llama-cli turboa-llama-cpp \
  --model /models/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf \
  --mmproj /models/mmproj-google_gemma-4-26B-A4B-it-f16.gguf \
  --flash-attn on \
  --mlock \
  --no-warmup \
  --parallel 4 \
  --n-gpu-layers -1 \
  --n-cpu-moe 18 \
  --cache-type-k tbq4_0 \
  --cache-type-v tbq4_0 \
  --threads 12 \
  --ctx-size 128000 \
  --batch-size 4096 \
  --ubatch-size 1024 \
  --n-predict 4096 \
  -p "Why is the sky blue?"
```

Note, I tried to run Gemma3 and it failed to init at all, no errors just stopped.  It works fine with `q4_0`:
```console
zekromllm@zekromllm:/HomeLab/GPU/TurboQuantAmesianX$ docker compose run --rm --remove-orphans --entrypoint llama-cli turboa-llama-cpp \
  --model /models/gemma-3-27b-it-IQ4_XS.gguf \
  --fit off \
  --flash-attn on \
  --mlock \
  --no-warmup \
  --parallel 1 \
  --n-gpu-layers 50 \
  --cache-type-k tbq3_0 \
  --cache-type-v tbq3_0 \
  --threads 12 \
  --ctx-size 16000 \
  --batch-size 2048 \
  --ubatch-size 512 \
  -p "Why is the sky blue?"
Container turboquantamesianx-turboa-llama-cpp-run-f4ce0c6cce1b Creating 
Container turboquantamesianx-turboa-llama-cpp-run-f4ce0c6cce1b Created 
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 15842 MiB):
  Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes, VRAM: 15842 MiB

Loading model... /
```

### First Bad Commit

I am not sure, though it did appear not to be stable from the 1.5.0 version.

### Relevant log output

<details>
<summary>Logs</summary>

```console

zekromllm@zekromllm:/HomeLab/GPU/TurboQuantAmesianX$ docker compose run --rm --remove-orphans --entrypoint llama-cli turboa-llama-cpp \
  --model /models/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf \
  --mmproj /models/mmproj-google_gemma-4-26B-A4B-it-f16.gguf \
  --flash-attn on \
  --mlock \
  --no-warmup \
  --parallel 4 \
  --n-gpu-layers -1 \
  --n-cpu-moe 18 \
  --cache-type-k tbq4_0 \
  --cache-type-v tbq4_0 \
  --threads 12 \
  --ctx-size 128000 \
  --batch-size 4096 \
  --ubatch-size 1024 \
  --n-predict 4096 \
  -p "Why is the sky blue?"
Container turboquantamesianx-turboa-llama-cpp-run-5f31db41abcb Creating 
Container turboquantamesianx-turboa-llama-cpp-run-5f31db41abcb Created 
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 15842 MiB):
  Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes, VRAM: 15842 MiB

Loading model...  


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8702-cf2170e5f
model      : gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf
modalities : text, vision

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern
  /image <file>       add an image file


> Why is the sky blue?

瑞 de- actually-----—--—–--------—-
10.-10-1-1-1--1-1--1-0-0--1-1--1--1--1-1--1--1--1--1--1--1--1-0-1--1--1-0--1-1--1--1--1--1--1-1--1-1--1--1--1--1--1-1--1--1--1-1--1--1-1--1--1--1--—-— [1-10-10-10--10-10-0-1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1-1--1--1-1-1--1--1--1--1--1---1--1--1--1--1-

[ Prompt: 105.0 t/s | Generation: 46.7 t/s ]


zekromllm@zekromllm:/HomeLab/GPU/TurboQuantAmesianX$ docker compose run --rm --remove-orphans --entrypoint llama-cli turboa-llama-cpp \
  --model /models/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf \
  --fit off \
  --flash-attn on \
  --mlock \
  --no-warmup \
  --parallel 4 \
  --n-gpu-layers -1 \
  --n-cpu-moe 18 \
  --cache-type-k tbq4_0 \
  --cache-type-v tbq4_0 \
  --threads 12 \
  --ctx-size 128000 \
  --batch-size 4096 \
  --ubatch-size 1024 \
  --n-predict 4096 \
  -p "Why is the sky blue?"
Container turboquantamesianx-turboa-llama-cpp-run-b5c9ba13613d Creating 
Container turboquantamesianx-turboa-llama-cpp-run-b5c9ba13613d Created 
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 15842 MiB):
  Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes, VRAM: 15842 MiB

Loading model...  


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8702-cf2170e5f
model      : gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern


> Why is the sky blue?

horizon-100-v-1.0-11.-10-10-0-0-0-0-0-0-0-0-0-0-0-0-0--0-0-0-0-0-0-0-0-0-0-0--10-10-0-0-0-0-0-0-0--0-0-0-0-0-0-0--
```

Expected result
```console
zekromllm@zekromllm:/HomeLab/GPU/TurboQuantAmesianX$ docker compose run --rm --remove-orphans --entrypoint llama-cli turboa-llama-cpp \
  --model /models/gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf \
  --fit off \
  --flash-attn on \
  --mlock \
  --no-warmup \
  --parallel 4 \
  --n-gpu-layers -1 \
  --n-cpu-moe 18 \
  --cache-type-k tbq3_0 \
  --cache-type-v tbq3_0 \
  --threads 12 \
  --ctx-size 128000 \
  --batch-size 4096 \
  --ubatch-size 1024 \
  --n-predict 4096 \
  -p "Why is the sky blue?"
Container turboquantamesianx-turboa-llama-cpp-run-45926ad6657a Creating 
Container turboquantamesianx-turboa-llama-cpp-run-45926ad6657a Created 
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 15842 MiB):
  Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes, VRAM: 15842 MiB

Loading model...  


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8702-cf2170e5f
model      : gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern


> Why is the sky blue?

[Start thinking]
"Why is the sky blue?"
Science/Physics/Atmospheric science.
General audience (needs to be clear, accurate, but not overly dense with jargon unless explained).

    *   Sunlight (white light) is made of all colors of the rainbow (ROYGBIV).
    *   Light travels as waves.
    *   Different colors have different wavelengths (Red = long, Blue/Violet = short).
    *   Earth's atmosphere is filled with gases (Nitrogen, Oxygen).
    *   *Rayleigh Scattering:* When light hits gas molecules, shorter wavelengths (blue) scatter more easily in all directions than longer wavelengths (red).
    *   The human eye is more sensitive to blue than violet

```
</details>



[Gemma4tbq4.txt](https://github.com/user-attachments/files/26542594/Gemma4tbq4.txt)

[gemma3load.txt](https://github.com/user-attachments/files/26543119/gemma3load.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: gemma-4-26b-a4b-it-turbo using tbq4_0 is insane (and gemma3 fails to load correctly) #17

Name and Version

Operating systems

Docker compose

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Eval bug: gemma-4-26b-a4b-it-turbo using tbq4_0 is insane (and gemma3 fails to load correctly) #17

Description

Name and Version

Operating systems

Docker compose

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions