1.99.1 Cannot Detect VRAM Properly with AMD Integrated GPU on Vulkan

**Describe the Issue**
Upon upgraded to 1.99.1 release, koboldcpp cannot reliably detect the shared VRAM via UMA with Vulkan backend, with `--gpulayers -1` argument 

It was alright on 1.98.1 release. 

**Additional Information:**

With 1.99.1 release:

```
Welcome to KoboldCpp - Version 1.99.1
Loading Chat Completions Adapter: /tmp/_MEIam6bjU/kcpp_adapters/AutoGuess.json
Chat Completions Adapter Loaded
Detected AMD GPU VRAM from rocminfo: [('AMD Radeon 780M Graphics', '23983')] MB
Unable to detect VRAM, please set layers manually.
System: Linux #1 SMP PREEMPT_DYNAMIC Thu Sep 11 17:46:54 UTC 2025 x86_64 
Detected Available GPU Memory: 0 MB
Detected Available RAM: 43351 MB
Initializing dynamic library: koboldcpp_vulkan.so
...
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: relocated tensors: 1 of 399
load_tensors: offloading 0 repeating layers to GPU
load_tensors: offloaded 0/37 layers to GPU
load_tensors:  Vulkan_Host model buffer size =  2750.40 MiB
load_tensors:          CPU model buffer size =   304.28 MiB
```

With 1.98.1 release:

```
Welcome to KoboldCpp - Version 1.98.1
Loading Chat Completions Adapter: /tmp/_MEId9PzAz/kcpp_adapters/AutoGuess.json
Chat Completions Adapter Loaded
Auto Recommended GPU Layers: 39
System: Linux #1 SMP PREEMPT_DYNAMIC Thu Sep 11 17:46:54 UTC 2025 x86_64 
Detected Available GPU Memory: 16384 MB
Unable to determine available RAM
Initializing dynamic library: koboldcpp_vulkan.so
...
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: relocated tensors: 1 of 399
load_tensors: offloading 36 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 37/37 layers to GPU
load_tensors:      Vulkan0 model buffer size =  2749.97 MiB
load_tensors:          CPU model buffer size =   304.28 MiB
```

Note the differences in launching and finally offloaded GPU layers.

Yes, I could manually specify the number of layers ( in this case,`--gpulayers 37)  to offload with 1.99.1, and it works just as before.
But it renders the purpose of autodetection a bit off.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.99.1 Cannot Detect VRAM Properly with AMD Integrated GPU on Vulkan #1748

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

1.99.1 Cannot Detect VRAM Properly with AMD Integrated GPU on Vulkan #1748

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions