Skip to content

Conversation

@blob42
Copy link
Contributor

@blob42 blob42 commented Oct 22, 2025

The llama model loading function expects KV overrides to be terminated with an empty key (key[0] == 0). Previously, the kv_overrides vector was not being properly terminated, causing an assertion failure.
140: GGML_ASSERT(params.kv_overrides.back().key[0] == 0 && "KV overrides not terminated with empty key") failed

This commit ensures that after parsing all KV override strings, we add a final terminating entry with an empty key to satisfy the C-style array termination requirement. This fixes the assertion error and allows the model to load correctly with custom KV overrides.

  • Also included a reference to the usage of the overrides option in the advanced-usage section.

Description

This PR fixes #6643 and relates to #5745

Notes for Reviewers

@mudler I tested these changes with qwen3moe and could change the number of experts. This also mean an API option to set the number of experts on MoE models is possible now with llama.cpp backend.

Also, Compiling the backend takes ~40min on my rig. Is there an easy way to quickly recompile the grpc server without rebuilding the whole llama backend ?

Signed commits

  • [x ] Yes, I signed my commits.

@netlify
Copy link

netlify bot commented Oct 22, 2025

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 86e14f6
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/68f9140c55aece0008c378e8
😎 Deploy Preview https://deploy-preview-6672--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@blob42 blob42 changed the title fix: properly terminate (llama.cpp) kv_overrides array with empty key + doc fix: properly terminate llama.cpp kv_overrides array with empty key + updated doc Oct 22, 2025
@blob42 blob42 marked this pull request as draft October 22, 2025 15:33
@blob42 blob42 marked this pull request as ready for review October 22, 2025 15:39
@mudler
Copy link
Owner

mudler commented Oct 22, 2025

@blob42 thank you for opening the PR and looking at this!

Looking at the llama.cpp code, I'm not sure how this is handled upstream, as far as I can see, they use the same approach of populating the kv_overrides as ours, but don't terminate with 0 explicitly:

https://github.com/ggml-org/llama.cpp/blob/9b9201f65a22c02cee8e300f58f480a588591227/common/arg.cpp#L2976

However, I didn't tested it yet - so my implementation was a bit naive here and I eventually forgot to test this out (sorry!). I'll give a try to your PR soon.

For testing, usually I go with:

❯ make backends/llama-cpp build && ./local-ai run --debug --address "0.0.0.0:8080"

This builds only the llama-cpp backend and the grpc cache (once), and the next builds will only build the llama-cpp backend

@mudler
Copy link
Owner

mudler commented Oct 22, 2025

Looking at the llama.cpp code, I'm not sure how this is handled upstream, as far as I can see, they use the same approach of populating the kv_overrides as ours, but don't terminate with 0 explicitly:

Ah, just for the records, found it here:

https://github.com/ggml-org/llama.cpp/blob/9b9201f65a22c02cee8e300f58f480a588591227/common/arg.cpp#L1432

Update:

@blob42 would you mind changing this approach of this PR to be closer to upstream? The rationale is to try to not diverge too much from the original implementation, that helps in maintenance long-term.

@blob42
Copy link
Contributor Author

blob42 commented Oct 22, 2025

Ah, just for the this approach of this PR to be closer to upstream? The rationale is to try to not diverge too much from the original implementation, that helps in maintenance long-term.

Sure I will update the PR.

The llama model loading function expects KV overrides to be terminated
with an empty key (key[0] == 0). Previously, the kv_overrides vector was
not being properly terminated, causing an assertion failure.

This commit ensures that after parsing all KV override strings, we add a
final terminating entry with an empty key to satisfy the C-style array
termination requirement. This fixes the assertion error and allows the
model to load correctly with custom KV overrides.

Fixes mudler#6643

- Also included a reference to the usage of the `overrides` option in
  the advanced-usage section.

Signed-off-by: blob42 <contact@blob42.xyz>
@blob42
Copy link
Contributor Author

blob42 commented Oct 22, 2025

@mudler should be good to go now. I copied verbatim and it works for me.

@mudler
Copy link
Owner

mudler commented Oct 23, 2025

@mudler should be good to go now. I copied verbatim and it works for me.

great, thanks!

@mudler mudler merged commit 32c0ab3 into mudler:master Oct 23, 2025
34 of 35 checks passed
@mudler mudler added the bug Something isn't working label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

model loading crash using kv-overrides #5745 with llama cuda backend

2 participants