context: ignore zero scale LoRAs when checking sameness by TimNN · Pull Request #20166 · ggml-org/llama.cpp

TimNN · 2026-03-06T12:19:19Z

When passed a set of input LoRAs, set_adapters_lora never stores zero scale LoRAs in the loras field, but adapters_lora_are_same expects the number of inputs to be equal to the number of stored LoRAs. So there will always be a mismatch if zero scale adapters are passed repeatedly.

This causes a server started with --lora-init-without-apply to re-reserve the graph for every token when handling requests with "lora": [{"id": 0, "scale": 0.0}].

Tested:

Started a server with llama-server --model tmp/ggml-org_stories15M_MOE_stories15M_MOE-F16.gguf --lora tmp/moe_shakespeare15M.gguf --lora-init-without-apply
Sent a chat request with "lora": [{"id": 0, "scale": 0.0}] via the Web UI
Results without this patch: The server logs print sched_reserve: reserving ... for every token
Results with this patch: The server prints sched_reserve: reserving ... just once (or not at all)

The server unit tests still pass, but I don't know how to add a test specifically for this bug (unless I want to rely on inference performance, which seems suboptimal).

Alternative fixes considered:

Filter the list of adapters in set_adapters_lora before calling adapters_lora_are_same. However, that would require allocating even for the not-modified case, which it seemed better to avoid.
Filter the list of adapters in the server code. However, this functionality is exposed via the public llama_set_adapters_lora API, so it seemed better to fix this for everyone.

AI Disclaimer: Claude Code was used to identify the problem after describing the symptoms. The patch was authored by a human (me).

context: ignore zero scale LoRAs when checking sameness

fa35b04

TimNN requested a review from ggerganov as a code owner March 6, 2026 12:19

ggerganov approved these changes Mar 6, 2026

View reviewed changes

ggerganov merged commit 388baab into ggml-org:master Mar 6, 2026
75 checks passed

TimNN deleted the fix-lora branch March 6, 2026 13:33

bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 10, 2026

context: ignore zero scale LoRAs when checking sameness (ggml-org#20166)

41f38d9

Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026

context: ignore zero scale LoRAs when checking sameness (ggml-org#20166)

30001c6

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

context: ignore zero scale LoRAs when checking sameness (ggml-org#20166)

fd0a625

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

context: ignore zero scale LoRAs when checking sameness (ggml-org#20166)

9ac9b15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context: ignore zero scale LoRAs when checking sameness#20166

context: ignore zero scale LoRAs when checking sameness#20166
ggerganov merged 1 commit intoggml-org:masterfrom
TimNN:fix-lora

TimNN commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TimNN commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants