Only show -ngl option when relevant + other doc/arg handling updates by KerfuffleV2 · Pull Request #1625 · ggml-org/llama.cpp

KerfuffleV2 · 2023-05-28T10:13:58Z

This pull:

Adds a LLAMA_SUPPORTS_GPU_OFFLOAD define to llama.h (defined when compiled with clBLAST or cuBLAST)
Updates the argument handling in the common example code to only show the -ngl, --n-gpu-layers option when GPU offload is possible.
Adds an entry for the -ngl, --n-gpu-layers option to the main and server examples documentation
Updates main and server examples documentation to use the new style dash separator argument format
Updates the server example to use dash separators for its arguments and adds -ngl to --help (only shown when compiled with appropriate support). It will still support --memory_f32 and --ctx_size for compatibility.
Adds a warning discouraging use for --memory-f32 for the main and server examples --help text as well as documentation. Rationale: Is there any perplexity data for using 16bit vs 32bit memory? #1593 (reply in thread)

@JohannesGaessler Hopefully this isn't stepping on your toes, I took a different approach to dealing with the GPU offload support issue.

Closes #1555

…tion

JohannesGaessler · 2023-05-28T10:45:37Z

In terms of usability I think any binaries should give you an error message telling you that you need compile option XY to use some feature. An error message that just tells you that some argument doesn't exist is I think much less helpful for users that are unaware of the compilation option even existing.

KerfuffleV2 · 2023-05-28T11:20:16Z

An error message that just tells you that some argument doesn't exist is I think much less helpful for users that are unaware of the compilation option even existing.

You're just talking about display a better error if the user specifies the option that wasn't compiled in, not saying a bunch of detailed information about how to enable it should be added to the --help text. Correct?

edit: How about this? Last commit changes the behavior when using -ngl or --n-gpu-layers when support isn't compiled in: It now just shows a warning rather than erroring out and advises the user to check the README for instructions on how to compile with GPU BLAS support.

Refer to CLBlast and cuBLAS correctly in doc changes.

JohannesGaessler · 2023-05-28T16:12:26Z

You're just talking about display a better error if the user specifies the option that wasn't compiled in, not saying a bunch of detailed information about how to enable it should be added to the --help text. Correct?

Yes. It's enough to e.g. tell the user that the feature doesn't work because llama.cpp wasn't compiled with cuBLAS/clblast. The user just needs to be told that something is wrong so they'll know to look it up.

How about this? Last commit changes the behavior when using -ngl or --n-gpu-layers when support isn't compiled in: It now just shows a warning rather than erroring out and advises the user to check the README for instructions on how to compile with GPU BLAS support.

I would prefer an error over a warning since people have a tendency not to read warnings but a warning would be fine too - the program just shouldn't silently ignore the CLI argument.

KerfuffleV2 · 2023-05-28T16:29:13Z

I would prefer an error over a warning since people have a tendency not to read warnings but a warning would be fine too

I can change it if you feel strongly. Otherwise, are you okay with the current behavior? The warning is two lines and the result of ignoring it is also pretty benign: possibly lower performance.

JohannesGaessler · 2023-05-28T16:34:26Z

I don't feel strongly about it. As I said: a warning would be fine too.

…gml-org#1625) 1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: ggml-org#1593 (reply in thread)

* Fix MiniMax V-cache Hadamard * Add ffn_up_gate_exps argument to MiniMax llm_build_std_moe_ffn call * Fix typo

…gml-org#1625) 1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: ggml-org#1593 (reply in thread)

KerfuffleV2 added 3 commits May 28, 2023 03:36

Only show -ngl option when relevant + add warning for --memory-f32 op…

764a21c

…tion

Documentation and arg help/handling updates

f40f6e8

Fix derp in ngl ifdef

a70095e

Display a warning if -ngl is supplied without support.

5eacb84

Refer to CLBlast and cuBLAS correctly in doc changes.

ggerganov approved these changes May 28, 2023

View reviewed changes

KerfuffleV2 merged commit 1b78ed2 into ggml-org:master May 28, 2023

JohannesGaessler mentioned this pull request Jun 9, 2023

Server Example Refactor and Improvements #1570

Merged

KerfuffleV2 deleted the feat-ngl_arg_only_when_supported branch September 6, 2023 08:50

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only show -ngl option when relevant + other doc/arg handling updates#1625

Only show -ngl option when relevant + other doc/arg handling updates#1625
KerfuffleV2 merged 4 commits intoggml-org:masterfrom
KerfuffleV2:feat-ngl_arg_only_when_supported

KerfuffleV2 commented May 28, 2023

Uh oh!

JohannesGaessler commented May 28, 2023

Uh oh!

KerfuffleV2 commented May 28, 2023 •

edited

Loading

Uh oh!

JohannesGaessler commented May 28, 2023

Uh oh!

KerfuffleV2 commented May 28, 2023

Uh oh!

JohannesGaessler commented May 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

KerfuffleV2 commented May 28, 2023

Uh oh!

JohannesGaessler commented May 28, 2023

Uh oh!

KerfuffleV2 commented May 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler commented May 28, 2023

Uh oh!

KerfuffleV2 commented May 28, 2023

Uh oh!

JohannesGaessler commented May 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KerfuffleV2 commented May 28, 2023 •

edited

Loading