-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
fix: properly terminate llama.cpp kv_overrides array with empty key + updated doc #6672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
08b9fee to
e6d738c
Compare
e6d738c to
faab1fc
Compare
faab1fc to
d51cd76
Compare
|
@blob42 thank you for opening the PR and looking at this! Looking at the llama.cpp code, I'm not sure how this is handled upstream, as far as I can see, they use the same approach of populating the kv_overrides as ours, but don't terminate with However, I didn't tested it yet - so my implementation was a bit naive here and I eventually forgot to test this out (sorry!). I'll give a try to your PR soon. For testing, usually I go with: This builds only the llama-cpp backend and the grpc cache (once), and the next builds will only build the llama-cpp backend |
Ah, just for the records, found it here: Update: @blob42 would you mind changing this approach of this PR to be closer to upstream? The rationale is to try to not diverge too much from the original implementation, that helps in maintenance long-term. |
Sure I will update the PR. |
The llama model loading function expects KV overrides to be terminated with an empty key (key[0] == 0). Previously, the kv_overrides vector was not being properly terminated, causing an assertion failure. This commit ensures that after parsing all KV override strings, we add a final terminating entry with an empty key to satisfy the C-style array termination requirement. This fixes the assertion error and allows the model to load correctly with custom KV overrides. Fixes mudler#6643 - Also included a reference to the usage of the `overrides` option in the advanced-usage section. Signed-off-by: blob42 <contact@blob42.xyz>
d51cd76 to
86e14f6
Compare
|
@mudler should be good to go now. I copied verbatim and it works for me. |
great, thanks! |
The llama model loading function expects KV overrides to be terminated with an empty key (key[0] == 0). Previously, the kv_overrides vector was not being properly terminated, causing an assertion failure.
140: GGML_ASSERT(params.kv_overrides.back().key[0] == 0 && "KV overrides not terminated with empty key") failedThis commit ensures that after parsing all KV override strings, we add a final terminating entry with an empty key to satisfy the C-style array termination requirement. This fixes the assertion error and allows the model to load correctly with custom KV overrides.
overridesoption in the advanced-usage section.Description
This PR fixes #6643 and relates to #5745
Notes for Reviewers
@mudler I tested these changes with qwen3moe and could change the number of experts. This also mean an API option to set the number of experts on MoE models is possible now with llama.cpp backend.
Also, Compiling the backend takes ~40min on my rig. Is there an easy way to quickly recompile the grpc server without rebuilding the whole llama backend ?
Signed commits