presets: refactor, allow cascade presets from different sources, add global section#18169
Conversation
|
Looks good! I'll deploy this on my test server tonight and report back with results! |
|
First basic test OK: In my case, as a user, this allows to have a complete configuration file; it is no longer necessary to modify the command line (systemd) each time I want to change a global ./llama-server --port 8082 --models-max 1 --models-preset backend.ini --webui-config-file frontend.json Presets (backend.ini)Log :-> Minor nitpick (not in this PR): if we want --kv-unified in logs instead of -kvu, we could swap the order in arg.cpp: {"-kvu", "--kv-unified"} since to_args() uses .back() -> EDIT : #18196 (for all args + doc) CLI with -c 1024 overwrite .ini per-model config -> wanted -> OKSingle model mode testing (with some args) OK!Major functionality validated, we can merge it! |
Thanks for testing. Yes, feel free to create a new PR for fixing this. Out convention is to have short form first, then followed by long form |
ggerganov
left a comment
There was a problem hiding this comment.
I'm traveling for a few days and won't be able to do very detailed testing/review. Approving to not block the work on this and added @ServeurpersoCom to write access group for additional approvals if needed.
| } | ||
| // 2. local models specificed via --models-dir | ||
| common_presets cached_models = ctx_preset.load_from_cache(); | ||
| SRV_INF("Loaded %zu cached model presets\n", cached_models.size()); |
There was a problem hiding this comment.
nit: for most logs we prefix with the function name:
| SRV_INF("Loaded %zu cached model presets\n", cached_models.size()); | |
| SRV_INF("%s: Loaded %zu cached model presets\n", __func__, cached_models.size()); |
There was a problem hiding this comment.
We can just test Windows (build OK on my side, need some basic test), merge, and I complete my separate PR "special nits" #18196 (that way we don't have any conflicts) :)
There was a problem hiding this comment.
the SRV_INF macro should already be prefixed with the function name, so I think this is not necessary:
#define SRV_INF(fmt, ...) LOG_INF("srv %12.*s: " fmt, 12, __func__, __VA_ARGS__)…global section (ggml-org#18169) * presets: refactor, allow cascade presets from different sources * update docs * fix neg arg handling * fix empty mmproj * also filter out server-controlled args before to_ini() * skip loading custom_models if not specified * fix unset_reserved_args * fix crash on windows
…global section (#18169) * presets: refactor, allow cascade presets from different sources * update docs * fix neg arg handling * fix empty mmproj * also filter out server-controlled args before to_ini() * skip loading custom_models if not specified * fix unset_reserved_args * fix crash on windows
Alternative to #17959
Fix #17948
Before this PR, the logic for loading models from different sources (cache / local / custom ini) was quite messy and doesn't allow ini preset to take precedence over other sources.
With this PR, we unify the method for loading server models and presets:
preset.cppis responsible for collecting all model sources (cache / local) and generate a base preset for each of the known GGUFpreset.cppthen load INI and parse the global section ([*])server-models.cpp) to decide how to cascade these presetsThe current cascading rule can be found in server's docs:
llama-server(highest priority)[ggml-org/MY-MODEL...])[*])