Update README.md by howard0su · Pull Request #1946 · ggml-org/llama.cpp

howard0su · 2023-06-20T02:03:10Z

Note on cmake version when building with CUDA.

JohannesGaessler · 2023-06-20T11:44:36Z

If it works that's great; I think the authority of which cmake versions are officially supported is up to @ggerganov though. There is also the question of whether CUDA < 11.6 is officially supported.

howard0su · 2023-06-20T11:55:47Z

Also I tested with native and OFF on Windows, I cannot tell the difference. the variance is very minor.

If you have more info regarding perf on Windows, I can help debug. my main dev box is Windows with P100 card.

ggerganov

In general, we want to keep the CMake version as low as possible.

It is OK to do things like this for example:

https://github.com/ggerganov/llama.cpp/blob/2322ec223a21625dfe9bd73ee677444a98a24ac9/CMakeLists.txt#L160-L162

https://github.com/ggerganov/llama.cpp/blob/2322ec223a21625dfe9bd73ee677444a98a24ac9/CMakeLists.txt#L228-L230

But we should not bump the required CMake version for standard CPU-only builds for no good reason.

What are the pros and cons of not supporting CUDA <11.6?

howard0su · 2023-06-20T16:03:11Z

For cmake, i would vote to go back to OFF instead of native. I don't see perf gain on Windows. @JohannesGaessler what's your intentions for the change?

JohannesGaessler · 2023-06-20T16:17:33Z

What are the pros and cons of not supporting CUDA <11.6?

CUDA 11.6 seems to be when the -march=native compilation option was added. So e.g. the Makefile which already used that option previously won't work out-of-the-box for lower CUDA versions.

For cmake, i would vote to go back to OFF instead of native. I don't see perf gain on Windows. @JohannesGaessler what's your intentions for the change?

I implemented an option that enables the use of CUDA f16 intrinsics. With make you can build it without issue but with cmake the compilation failed; setting CUDA_ARCHITECTURES to native has fixed this. Maybe we should make the minimum required cmake version and arch conditional on that compilation option? In any case, I think we should try to be consistent in regards to arch for make and cmake.

slaren · 2023-06-20T16:26:19Z

You could set CUDA_ARCHITECTURES to the minimum arch required to use these intrinsics and remove CUDA_SELECT_NVCC_ARCH_FLAGS.

slaren · 2023-06-20T18:20:36Z

What are the pros and cons of not supporting CUDA <11.6?

I am not sure that there are many pros. At some point, this was required to support some devices like the NVIDIA Jetson, but that no longer seems to be the case. I guess it may save the trouble of upgrading to people who already have an older version installed.

jllllll · 2023-06-20T18:35:25Z

What are the pros and cons of not supporting CUDA <11.6?

CUDA 11.6 seems to be when the -march=native compilation option was added. So e.g. the Makefile which already used that option previously won't work out-of-the-box for lower CUDA versions.

For cmake, i would vote to go back to OFF instead of native. I don't see perf gain on Windows. @JohannesGaessler what's your intentions for the change?

I implemented an option that enables the use of CUDA f16 intrinsics. With make you can build it without issue but with cmake the compilation failed; setting CUDA_ARCHITECTURES to native has fixed this. Maybe we should make the minimum required cmake version and arch conditional on that compilation option? In any case, I think we should try to be consistent in regards to arch for make and cmake.

The CUDA 11.6 Release Notes mentions this: NVCC introduced -arch=native in CUDA 11.5 update1

Maybe I am misunderstanding the NVCC docs, but I was under the impression that the default behavior of NVCC was to compile for the currently installed GPU architectures when no architecture is specified. This is what happens on Windows when CUDA_ARCHITECTURES is set to OFF and CUDA_SELECT_NVCC_ARCH_FLAGS is removed. Is this not the case for other systems?

JohannesGaessler · 2023-06-21T07:50:34Z

I did some quick testing on my RTX 3090:

arch	t/s 33b q4_0	t/s 33b q4_0 LLAMA_CUDA_DMMV_F16=1
native == compute_86	18.78	20.73
compute_75	18.56	20.65
compute_61	18.55	20.71
default == compute_52	18.48	Does not compile

The minimum arch that can run all code is compute_61 (Pascal). If there is a performance difference it is very small. Looking at the nvcc documentation however it seems to me that arch should not affect performance. If I understand it correctly, the compilation is done in two stages: first intermediary PTX code is generated during the conventional compilation. Then, during runtime the actual binary code for the GPU is generated if it does not exist. With native or the --gpu-code option the binary code is generated ahead of time. However, for llama.cpp this does not seem to make a measurable difference for the runtime:

arch	runtime 7b q4_0 1 token [s]
native	2.432
compute_86	2.433
compute_61	2.429

So my impression is that setting arch to compute_61 should work as long as we don't want to support Maxwell.

slaren · 2023-06-21T10:37:29Z

My understanding is that using a lower architecture may prevent the compiler from using the newer features or instructions in the intermediate code, which in turn may cause the JIT to produce worse code for the newer architectures. But if the tests don't show any meaningful difference, it may be the best solution.

slaren · 2023-06-21T10:43:24Z

If I understand correctly, the use of the Pascal intrinsics can already be disabled by undefining GGML_CUDA_DMMV_F16. So you could just check if we are building for compute_61+ using the __CUDA_ARCH__ macro, and disable GGML_CUDA_DMMV_F16 accordingly. Then, we add an option to the cmake to allow the user to select the architecture, and set it to 61 by default.
The possibles values of __CUDA_ARCH__ are described here, but essentially:

The architecture identification macro CUDA_ARCH is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy.

JohannesGaessler · 2023-06-21T10:50:08Z

By default the f16 intrinsics are disabled. They are only used if the user compiles with LLAMA_CUDA_DMMV_F16. So if we want to support Maxwell I think we should instead keep the default for arch which is compute_52 for CUDA 12 and switch to a higher arch if necessary.

howard0su · 2023-06-21T14:38:07Z

Thank you for everyone's input and good discussion. We don't need this PR anymore, which is replaced by #1959

Update README.md

7accbe1

Note on cmake version when building with CUDA.

howard0su requested a review from JohannesGaessler June 20, 2023 11:04

howard0su mentioned this pull request Jun 20, 2023

cmake compilations issues after setting CUDA_ARCHITECTURES to native #1940

Closed

ggerganov approved these changes Jun 20, 2023

View reviewed changes

JohannesGaessler mentioned this pull request Jun 21, 2023

cmake: revert CUDA arch default to 52, 61 if f16 #1959

Merged

howard0su closed this Jun 21, 2023

howard0su deleted the patch-1 branch June 28, 2023 08:43

Conversation

howard0su commented Jun 20, 2023

Uh oh!

JohannesGaessler commented Jun 20, 2023

Uh oh!

howard0su commented Jun 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

howard0su commented Jun 20, 2023

Uh oh!

JohannesGaessler commented Jun 20, 2023

Uh oh!

slaren commented Jun 20, 2023

Uh oh!

slaren commented Jun 20, 2023

Uh oh!

jllllll commented Jun 20, 2023

Uh oh!

JohannesGaessler commented Jun 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Jun 21, 2023

Uh oh!

slaren commented Jun 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler commented Jun 21, 2023

Uh oh!

howard0su commented Jun 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

howard0su commented Jun 20, 2023 •

edited

Loading

ggerganov left a comment •

edited

Loading

JohannesGaessler commented Jun 21, 2023 •

edited

Loading

slaren commented Jun 21, 2023 •

edited

Loading