Only show -ngl option when relevant + other doc/arg handling updates#1625
Conversation
|
In terms of usability I think any binaries should give you an error message telling you that you need compile option XY to use some feature. An error message that just tells you that some argument doesn't exist is I think much less helpful for users that are unaware of the compilation option even existing. |
You're just talking about display a better error if the user specifies the option that wasn't compiled in, not saying a bunch of detailed information about how to enable it should be added to the edit: How about this? Last commit changes the behavior when using |
Refer to CLBlast and cuBLAS correctly in doc changes.
|
You're just talking about display a better error if the user specifies the option that wasn't compiled in, not saying a bunch of detailed information about how to enable it should be added to the --help text. Correct? Yes. It's enough to e.g. tell the user that the feature doesn't work because llama.cpp wasn't compiled with cuBLAS/clblast. The user just needs to be told that something is wrong so they'll know to look it up.
I would prefer an error over a warning since people have a tendency not to read warnings but a warning would be fine too - the program just shouldn't silently ignore the CLI argument. |
I can change it if you feel strongly. Otherwise, are you okay with the current behavior? The warning is two lines and the result of ignoring it is also pretty benign: possibly lower performance. |
|
I don't feel strongly about it. As I said: a warning would be fine too. |
…gml-org#1625) 1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: ggml-org#1593 (reply in thread)
…gml-org#1625) 1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: ggml-org#1593 (reply in thread)
* Fix MiniMax V-cache Hadamard * Add ffn_up_gate_exps argument to MiniMax llm_build_std_moe_ffn call * Fix typo
…gml-org#1625) 1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: ggml-org#1593 (reply in thread)
This pull:
LLAMA_SUPPORTS_GPU_OFFLOADdefine tollama.h(defined when compiled with clBLAST or cuBLAST)-ngl,--n-gpu-layersoption when GPU offload is possible.-ngl,--n-gpu-layersoption to themainandserverexamples documentationmainandserverexamples documentation to use the new style dash separator argument formatserverexample to use dash separators for its arguments and adds-nglto--help(only shown when compiled with appropriate support). It will still support--memory_f32and--ctx_sizefor compatibility.--memory-f32for themainandserverexamples--helptext as well as documentation. Rationale: Is there any perplexity data for using 16bit vs 32bit memory? #1593 (reply in thread)@JohannesGaessler Hopefully this isn't stepping on your toes, I took a different approach to dealing with the GPU offload support issue.
Closes #1555