Skip to content

context: ignore zero scale LoRAs when checking sameness#20166

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
TimNN:fix-lora
Mar 6, 2026
Merged

context: ignore zero scale LoRAs when checking sameness#20166
ggerganov merged 1 commit intoggml-org:masterfrom
TimNN:fix-lora

Conversation

@TimNN
Copy link
Copy Markdown
Contributor

@TimNN TimNN commented Mar 6, 2026

When passed a set of input LoRAs, set_adapters_lora never stores zero scale LoRAs in the loras field, but adapters_lora_are_same expects the number of inputs to be equal to the number of stored LoRAs. So there will always be a mismatch if zero scale adapters are passed repeatedly.

This causes a server started with --lora-init-without-apply to re-reserve the graph for every token when handling requests with "lora": [{"id": 0, "scale": 0.0}].

Tested:

  • Started a server with llama-server --model tmp/ggml-org_stories15M_MOE_stories15M_MOE-F16.gguf --lora tmp/moe_shakespeare15M.gguf --lora-init-without-apply
  • Sent a chat request with "lora": [{"id": 0, "scale": 0.0}] via the Web UI
  • Results without this patch: The server logs print sched_reserve: reserving ... for every token
  • Results with this patch: The server prints sched_reserve: reserving ... just once (or not at all)

The server unit tests still pass, but I don't know how to add a test specifically for this bug (unless I want to rely on inference performance, which seems suboptimal).

Alternative fixes considered:

  • Filter the list of adapters in set_adapters_lora before calling adapters_lora_are_same. However, that would require allocating even for the not-modified case, which it seemed better to avoid.
  • Filter the list of adapters in the server code. However, this functionality is exposed via the public llama_set_adapters_lora API, so it seemed better to fix this for everyone.

AI Disclaimer: Claude Code was used to identify the problem after describing the symptoms. The patch was authored by a human (me).

@TimNN TimNN requested a review from ggerganov as a code owner March 6, 2026 12:19
@ggerganov ggerganov merged commit 388baab into ggml-org:master Mar 6, 2026
75 checks passed
@TimNN TimNN deleted the fix-lora branch March 6, 2026 13:33
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 10, 2026
Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants