server: support load model on startup, support preset-only options#18206
server: support load model on startup, support preset-only options#18206ServeurpersoCom merged 5 commits intoggml-org:masterfrom
Conversation
|
Basic usecase test OK : |
|
Faster than me! I was almost ready with the PR for this feature. Anyway, the code proposed here is better than mine. I recommend changing the name because it can confuse people, loading a model is different from starting it. I got confused by that too. I suggest renaming this .ini property to "autostart = true". Thank you so much to everybody. |
|
"load" model weight in memory vs. "start" inference on a model, using load sound good! |
|
Are we talking about starting or loading a model? thx |
In llama.cpp context, we "load" models (load/unload endpoints), not "start" them. The model gets loaded into memory and becomes available, "autoload" describes this action perfectly. What happens internally (spawning instances) is implementation detail, but from user perspective: you configure which models to auto-load at startup. I think "autoload" is the right term here. |
|
hmm yeah I think a more specific term the problem with |
Co-authored-by: Pascal <admin@serveurperso.com>
|
it works for me. version = 1 [*] [unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF] /opt/llama.cpp/bin/llama-server --host 0.0.0.0 --port 8088 --models-preset /opt/llama.cpp/etc/models.ini system_info: n_threads = 12 (n_threads_batch = 12) / 12 | CUDA : ARCHS = 870 | FORCE_CUBLAS = 1 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | FA_ALL_QUANTS = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | init: using 11 threads for HTTP server |
ServeurpersoCom
left a comment
There was a problem hiding this comment.
tested successfully, waiting for RISC-V CI to pass
|
I'm not sure this works as intended. "load-on-startup = true" has the same behaviour as "load-on-startup = false". In fact the only way to set a model to be not loaded at startup is to exclude "load-on-startup" from it's preset entirely. |
|
yes that will need to be fixed. but in the meantime, you can also use INI comment: ; load-on-startup = true |
|
When the parameter For a test I used a file
When I execute the command
I get the error message
|
|
To differentiate the meaning of the parameter
|
…gml-org#18206) * server: support autoload model, support preset-only options * add docs * load-on-startup * fix * Update common/arg.cpp Co-authored-by: Pascal <admin@serveurperso.com> --------- Co-authored-by: Pascal <admin@serveurperso.com>
…18206) * server: support autoload model, support preset-only options * add docs * load-on-startup * fix * Update common/arg.cpp Co-authored-by: Pascal <admin@serveurperso.com> --------- Co-authored-by: Pascal <admin@serveurperso.com>
…gml-org#18206) * server: support autoload model, support preset-only options * add docs * load-on-startup * fix * Update common/arg.cpp Co-authored-by: Pascal <admin@serveurperso.com> --------- Co-authored-by: Pascal <admin@serveurperso.com>
Fix #18163
Fix #18035
Example config:
Note: it will throw an error if limit
models-maxis less than the number of models that requires autoload