Skip to content

server : add default-model preset and fallback logic#19855

Open
mikhail-shevtsov-wiregate wants to merge 1 commit intoggml-org:masterfrom
mikhail-shevtsov-wiregate:master
Open

server : add default-model preset and fallback logic#19855
mikhail-shevtsov-wiregate wants to merge 1 commit intoggml-org:masterfrom
mikhail-shevtsov-wiregate:master

Conversation

@mikhail-shevtsov-wiregate
Copy link
Copy Markdown

@mikhail-shevtsov-wiregate mikhail-shevtsov-wiregate commented Feb 24, 2026

Summary

  • Introduced a new preset option default-model = true that can be used in server router mode to specify a default model.
  • When multiple default-model options are found only the first one will be used.
  • Updated common/arg.cpp/common/arg.h to expose the option.
  • Enhanced server_models to detect the default model during model loading and store it in default_model_name.
  • Added server_models::resolve_model_name() to return the requested model if it exists, otherwise fall back to the default model.
  • Updated routing logic (server-models.cpp) to call resolve_model_name() for both GET and POST requests, ensuring requests
    without a model parameter or with an unknown model use the default.

Motivation

When the server runs in router mode, clients may omit the model field or request a model that isn’t loaded.
Previously this caused an error; the new preset allows the server to automatically use a pre‑selected default model, improving robustness and usability.

How to test:

  1. Start the server in router mode with default-model = true in required preset in presets file.
  2. Send a request to /v1/chat/completions without a model query or JSON field – the server should use <your-default-model>.
  3. Send a request specifying a non‑existent model – the server should fall back to <your-default-model> instead of returning an error.
  4. Verify that requests that do specify a valid model still use the requested model.

Example:

$ cat models.ini 
[preset1]
hf = ggml-org/Qwen3-0.6B-GGUF
default-model = true

[preset2]
hf = ggml-org/Qwen3-0.6B-GGUF
default-model = true

[preset3]
hf = ggml-org/Qwen3-0.6B-GGUF
default-model = true

$ ./build/bin/llama-server --models-preset models.ini --host 0.0.0.0 
main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
build: 8209 (d3635a5fc) with GNU 13.3.0 for Linux x86_64
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16

system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

Running without SSL
init: using 15 threads for HTTP server
srv   load_models: Loaded 1 cached model presets
srv   load_models: Loaded 3 custom model presets from models.ini
srv   load_models: Default preset model: preset1
srv   load_models: Multiple default models detected: 'preset2' and 'preset1'; using 'preset1' as default
srv   load_models: Multiple default models detected: 'preset3' and 'preset1'; using 'preset1' as default
srv   load_models: Available models (4) (*: custom preset)
srv   load_models:     ggml-org/Qwen3-0.6B-GGUF
srv   load_models:   * preset1
srv   load_models:   * preset2
srv   load_models:   * preset3
main: starting router server, no model will be loaded in this process
start: binding port with default address family
main: router server is listening on http://0.0.0.0:8080
main: NOTE: router mode is experimental
main:       it is not recommended to use this mode in untrusted environments

Impact

  • Existing behavior for clients that supply a model name remains unchanged.
  • Clients that omit the model will now automatically use the configured default model (if set).
  • Backwards compatibility is preserved when the preset is not set; the server behaves as before.

@mikhail-shevtsov-wiregate
Copy link
Copy Markdown
Author

mikhail-shevtsov-wiregate commented Feb 27, 2026

Hi @ggerganov @ngxson

Is there anything I can do to merge this "Quality of Life" feature improvement?

Thanks,
Mikhail

Copy link
Copy Markdown
Contributor

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is unclear, how the config below behave?

default-model = model_a

[model_b]
default-model = model_c

@ngxson
Copy link
Copy Markdown
Contributor

ngxson commented Feb 27, 2026

  • Introduced a new preset option --default-model

Is it --default-model or default-model ?

Comment thread common/arg.cpp
).set_env(COMMON_ARG_PRESET_LOAD_ON_STARTUP).set_preset_only());

args.push_back(common_arg(
{"default-model"}, "NAME",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned in the description:

Start the server in router mode with default-model = true in required preset.

It's confusing: default-model = true or default-model = NAME ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngxson Good point on that one! I'm wasn't sure how to make it better so I followed your PR for load-on-startup #18206 argument.
Do you have any suggestions how to make it cleaner?

Copy link
Copy Markdown
Contributor

@ngxson ngxson Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple models can be load-on-startup, but only one single can be default-model

unless this is made clear, I don't think this proposal can be accepted as-is.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah...! I see. Thanks for clarification. That makes a lot of sense. I will rewrite this one.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngxson I've added a simple condition that will show a warning when multiple default-mode = true options are detected. I've tried to keep logic as simple as possible to avoid any confusion. I've updated PR description to include example. I've also updated README.md. I've re-tested this changed 3 times and rebased commit to latest upstream. I've forced pushed this commit. Please take a look and let me know if you would like to make additional changes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated only tools/server/server-models.cpp and tools/server/README.md files. Everything else stayed the same.

@mikhail-shevtsov-wiregate
Copy link
Copy Markdown
Author

This option is unclear, how the config below behave?

default-model = model_a

[model_b]
default-model = model_c

@ngxson I've updated autogenerated PR description to avoid any confusion.

@mikhail-shevtsov-wiregate mikhail-shevtsov-wiregate force-pushed the master branch 2 times, most recently from d3635a5 to 0196e22 Compare March 5, 2026 12:53
@mikhail-shevtsov-wiregate
Copy link
Copy Markdown
Author

@ngxson is there anything else you would like to see in this feature to get merged?

@mikhail-shevtsov-wiregate
Copy link
Copy Markdown
Author

@ngxson @ggerganov I've rebased this tiny feature with latest commit from master branch.
I see interested from at least 3 people in this PR. Please tell me how can I improve it to get it merged?

@marina9568
Copy link
Copy Markdown

this would be super useful, hope it gets merged soon

@mikhail-shevtsov-wiregate
Copy link
Copy Markdown
Author

@ngxson @ggerganov I've rebased the code with latest commit. I would really appreciate it if you let me know whether this PR has the future. Or this feature is not required? If it has the future but missing something like unit tests - please tell me. As I don't want to waste time of the maintainers.

Comment thread tools/server/README.md

We also offer additional options that are exclusive to presets (these aren't treated as command-line arguments):
- `load-on-startup` (boolean): Controls whether the model loads automatically when the server starts
- `default-model` (boolean): The model to use when no model is specified in a request or the model is not found.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is honestly confusing and wrong. The model to use indicates you state a name as as string. I think Use this model when... would be better. Also of course When multiple default-model options are found, only the first one will be used is incorrect. The first true will probably be the one selected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants