Describe the Issue
Upon upgraded to 1.99.1 release, koboldcpp cannot reliably detect the shared VRAM via UMA with Vulkan backend, with --gpulayers -1 argument
It was alright on 1.98.1 release.
Additional Information:
With 1.99.1 release:
Welcome to KoboldCpp - Version 1.99.1
Loading Chat Completions Adapter: /tmp/_MEIam6bjU/kcpp_adapters/AutoGuess.json
Chat Completions Adapter Loaded
Detected AMD GPU VRAM from rocminfo: [('AMD Radeon 780M Graphics', '23983')] MB
Unable to detect VRAM, please set layers manually.
System: Linux #1 SMP PREEMPT_DYNAMIC Thu Sep 11 17:46:54 UTC 2025 x86_64
Detected Available GPU Memory: 0 MB
Detected Available RAM: 43351 MB
Initializing dynamic library: koboldcpp_vulkan.so
...
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: relocated tensors: 1 of 399
load_tensors: offloading 0 repeating layers to GPU
load_tensors: offloaded 0/37 layers to GPU
load_tensors: Vulkan_Host model buffer size = 2750.40 MiB
load_tensors: CPU model buffer size = 304.28 MiB
With 1.98.1 release:
Welcome to KoboldCpp - Version 1.98.1
Loading Chat Completions Adapter: /tmp/_MEId9PzAz/kcpp_adapters/AutoGuess.json
Chat Completions Adapter Loaded
Auto Recommended GPU Layers: 39
System: Linux #1 SMP PREEMPT_DYNAMIC Thu Sep 11 17:46:54 UTC 2025 x86_64
Detected Available GPU Memory: 16384 MB
Unable to determine available RAM
Initializing dynamic library: koboldcpp_vulkan.so
...
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: relocated tensors: 1 of 399
load_tensors: offloading 36 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 37/37 layers to GPU
load_tensors: Vulkan0 model buffer size = 2749.97 MiB
load_tensors: CPU model buffer size = 304.28 MiB
Note the differences in launching and finally offloaded GPU layers.
Yes, I could manually specify the number of layers ( in this case,`--gpulayers 37) to offload with 1.99.1, and it works just as before.
But it renders the purpose of autodetection a bit off.
Describe the Issue
Upon upgraded to 1.99.1 release, koboldcpp cannot reliably detect the shared VRAM via UMA with Vulkan backend, with
--gpulayers -1argumentIt was alright on 1.98.1 release.
Additional Information:
With 1.99.1 release:
With 1.98.1 release:
Note the differences in launching and finally offloaded GPU layers.
Yes, I could manually specify the number of layers ( in this case,`--gpulayers 37) to offload with 1.99.1, and it works just as before.
But it renders the purpose of autodetection a bit off.