Skip to content

1.99.1 Cannot Detect VRAM Properly with AMD Integrated GPU on Vulkan #1748

@lovenemesis

Description

@lovenemesis

Describe the Issue
Upon upgraded to 1.99.1 release, koboldcpp cannot reliably detect the shared VRAM via UMA with Vulkan backend, with --gpulayers -1 argument

It was alright on 1.98.1 release.

Additional Information:

With 1.99.1 release:

Welcome to KoboldCpp - Version 1.99.1
Loading Chat Completions Adapter: /tmp/_MEIam6bjU/kcpp_adapters/AutoGuess.json
Chat Completions Adapter Loaded
Detected AMD GPU VRAM from rocminfo: [('AMD Radeon 780M Graphics', '23983')] MB
Unable to detect VRAM, please set layers manually.
System: Linux #1 SMP PREEMPT_DYNAMIC Thu Sep 11 17:46:54 UTC 2025 x86_64 
Detected Available GPU Memory: 0 MB
Detected Available RAM: 43351 MB
Initializing dynamic library: koboldcpp_vulkan.so
...
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: relocated tensors: 1 of 399
load_tensors: offloading 0 repeating layers to GPU
load_tensors: offloaded 0/37 layers to GPU
load_tensors:  Vulkan_Host model buffer size =  2750.40 MiB
load_tensors:          CPU model buffer size =   304.28 MiB

With 1.98.1 release:

Welcome to KoboldCpp - Version 1.98.1
Loading Chat Completions Adapter: /tmp/_MEId9PzAz/kcpp_adapters/AutoGuess.json
Chat Completions Adapter Loaded
Auto Recommended GPU Layers: 39
System: Linux #1 SMP PREEMPT_DYNAMIC Thu Sep 11 17:46:54 UTC 2025 x86_64 
Detected Available GPU Memory: 16384 MB
Unable to determine available RAM
Initializing dynamic library: koboldcpp_vulkan.so
...
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: relocated tensors: 1 of 399
load_tensors: offloading 36 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 37/37 layers to GPU
load_tensors:      Vulkan0 model buffer size =  2749.97 MiB
load_tensors:          CPU model buffer size =   304.28 MiB

Note the differences in launching and finally offloaded GPU layers.

Yes, I could manually specify the number of layers ( in this case,`--gpulayers 37) to offload with 1.99.1, and it works just as before.
But it renders the purpose of autodetection a bit off.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions