server : add default-model preset and fallback logic by mikhail-shevtsov-wiregate · Pull Request #19855 · ggml-org/llama.cpp

mikhail-shevtsov-wiregate · 2026-02-24T16:30:31Z

Summary

Introduced a new preset option default-model = true that can be used in server router mode to specify a default model.
When multiple default-model options are found only the first one will be used.
Updated common/arg.cpp/common/arg.h to expose the option.
Enhanced server_models to detect the default model during model loading and store it in default_model_name.
Added server_models::resolve_model_name() to return the requested model if it exists, otherwise fall back to the default model.
Updated routing logic (server-models.cpp) to call resolve_model_name() for both GET and POST requests, ensuring requests
without a model parameter or with an unknown model use the default.

Motivation

When the server runs in router mode, clients may omit the model field or request a model that isn’t loaded.
Previously this caused an error; the new preset allows the server to automatically use a pre‑selected default model, improving robustness and usability.

How to test:

Start the server in router mode with default-model = true in required preset in presets file.
Send a request to /v1/chat/completions without a model query or JSON field – the server should use <your-default-model>.
Send a request specifying a non‑existent model – the server should fall back to <your-default-model> instead of returning an error.
Verify that requests that do specify a valid model still use the requested model.

Example:

$ cat models.ini 
[preset1]
hf = ggml-org/Qwen3-0.6B-GGUF
default-model = true

[preset2]
hf = ggml-org/Qwen3-0.6B-GGUF
default-model = true

[preset3]
hf = ggml-org/Qwen3-0.6B-GGUF
default-model = true

$ ./build/bin/llama-server --models-preset models.ini --host 0.0.0.0 
main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
build: 8209 (d3635a5fc) with GNU 13.3.0 for Linux x86_64
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16

system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

Running without SSL
init: using 15 threads for HTTP server
srv   load_models: Loaded 1 cached model presets
srv   load_models: Loaded 3 custom model presets from models.ini
srv   load_models: Default preset model: preset1
srv   load_models: Multiple default models detected: 'preset2' and 'preset1'; using 'preset1' as default
srv   load_models: Multiple default models detected: 'preset3' and 'preset1'; using 'preset1' as default
srv   load_models: Available models (4) (*: custom preset)
srv   load_models:     ggml-org/Qwen3-0.6B-GGUF
srv   load_models:   * preset1
srv   load_models:   * preset2
srv   load_models:   * preset3
main: starting router server, no model will be loaded in this process
start: binding port with default address family
main: router server is listening on http://0.0.0.0:8080
main: NOTE: router mode is experimental
main:       it is not recommended to use this mode in untrusted environments

Impact

Existing behavior for clients that supply a model name remains unchanged.
Clients that omit the model will now automatically use the configured default model (if set).
Backwards compatibility is preserved when the preset is not set; the server behaves as before.

mikhail-shevtsov-wiregate · 2026-02-27T15:44:55Z

Hi @ggerganov @ngxson

Is there anything I can do to merge this "Quality of Life" feature improvement?

Thanks,
Mikhail

ngxson

This option is unclear, how the config below behave?

default-model = model_a

[model_b]
default-model = model_c

ngxson · 2026-02-27T22:28:20Z

Introduced a new preset option --default-model

Is it --default-model or default-model ?

ngxson · 2026-02-27T22:29:06Z

    ).set_env(COMMON_ARG_PRESET_LOAD_ON_STARTUP).set_preset_only());

+    args.push_back(common_arg(
+        {"default-model"}, "NAME",


You mentioned in the description:

Start the server in router mode with default-model = true in required preset.

It's confusing: default-model = true or default-model = NAME ?

@ngxson Good point on that one! I'm wasn't sure how to make it better so I followed your PR for load-on-startup #18206 argument.
Do you have any suggestions how to make it cleaner?

multiple models can be load-on-startup, but only one single can be default-model

unless this is made clear, I don't think this proposal can be accepted as-is.

Ah...! I see. Thanks for clarification. That makes a lot of sense. I will rewrite this one.

@ngxson I've added a simple condition that will show a warning when multiple default-mode = true options are detected. I've tried to keep logic as simple as possible to avoid any confusion. I've updated PR description to include example. I've also updated README.md. I've re-tested this changed 3 times and rebased commit to latest upstream. I've forced pushed this commit. Please take a look and let me know if you would like to make additional changes.

I've updated only tools/server/server-models.cpp and tools/server/README.md files. Everything else stayed the same.

mikhail-shevtsov-wiregate · 2026-03-04T17:03:46Z

This option is unclear, how the config below behave?
default-model = model_a

[model_b]
default-model = model_c

@ngxson I've updated autogenerated PR description to avoid any confusion.

mikhail-shevtsov-wiregate · 2026-03-06T17:24:28Z

@ngxson is there anything else you would like to see in this feature to get merged?

mikhail-shevtsov-wiregate · 2026-03-11T16:01:53Z

@ngxson @ggerganov I've rebased this tiny feature with latest commit from master branch.
I see interested from at least 3 people in this PR. Please tell me how can I improve it to get it merged?

marina9568 · 2026-03-11T16:38:14Z

this would be super useful, hope it gets merged soon

mikhail-shevtsov-wiregate · 2026-03-24T13:04:52Z

@ngxson @ggerganov I've rebased the code with latest commit. I would really appreciate it if you let me know whether this PR has the future. Or this feature is not required? If it has the future but missing something like unit tests - please tell me. As I don't want to waste time of the maintainers.

fiesh · 2026-04-10T06:17:16Z


 We also offer additional options that are exclusive to presets (these aren't treated as command-line arguments):
 - `load-on-startup` (boolean): Controls whether the model loads automatically when the server starts
+- `default-model` (boolean): The model to use when no model is specified in a request or the model is not found.


The description is honestly confusing and wrong. The model to use indicates you state a name as as string. I think Use this model when... would be better. Also of course When multiple default-model options are found, only the first one will be used is incorrect. The first true will probably be the one selected?

mikhail-shevtsov-wiregate requested review from ggerganov and ngxson as code owners February 24, 2026 16:30

mikhail-shevtsov-wiregate force-pushed the master branch from 329368a to 23783ea Compare February 24, 2026 16:32

github-actions Bot added examples server labels Feb 24, 2026

ngxson reviewed Feb 27, 2026

View reviewed changes

mikhail-shevtsov-wiregate force-pushed the master branch 2 times, most recently from d3635a5 to 0196e22 Compare March 5, 2026 12:53

mikhail-shevtsov-wiregate force-pushed the master branch from 0196e22 to be9837b Compare March 11, 2026 15:59

server : add default-model preset and fallback logic

e2df42d

mikhail-shevtsov-wiregate force-pushed the master branch from be9837b to e2df42d Compare March 24, 2026 12:56

mikhail-shevtsov-wiregate requested review from a team as code owners March 24, 2026 12:56

fiesh reviewed Apr 10, 2026

View reviewed changes

Conversation

mikhail-shevtsov-wiregate commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

How to test:

Impact

Uh oh!

mikhail-shevtsov-wiregate commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson commented Feb 27, 2026

Uh oh!

ngxson Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

mikhail-shevtsov-wiregate Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikhail-shevtsov-wiregate Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

mikhail-shevtsov-wiregate Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

mikhail-shevtsov-wiregate Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

mikhail-shevtsov-wiregate commented Mar 4, 2026

Uh oh!

mikhail-shevtsov-wiregate commented Mar 6, 2026

Uh oh!

mikhail-shevtsov-wiregate commented Mar 11, 2026

Uh oh!

marina9568 commented Mar 11, 2026

Uh oh!

mikhail-shevtsov-wiregate commented Mar 24, 2026

Uh oh!

fiesh Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mikhail-shevtsov-wiregate commented Feb 24, 2026 •

edited

Loading

mikhail-shevtsov-wiregate commented Feb 27, 2026 •

edited

Loading

ngxson Mar 4, 2026 •

edited

Loading