Update README.md#1946
Conversation
Note on cmake version when building with CUDA.
|
If it works that's great; I think the authority of which cmake versions are officially supported is up to @ggerganov though. There is also the question of whether CUDA < 11.6 is officially supported. |
|
Also I tested with native and OFF on Windows, I cannot tell the difference. the variance is very minor. If you have more info regarding perf on Windows, I can help debug. my main dev box is Windows with P100 card. |
There was a problem hiding this comment.
In general, we want to keep the CMake version as low as possible.
It is OK to do things like this for example:
But we should not bump the required CMake version for standard CPU-only builds for no good reason.
What are the pros and cons of not supporting CUDA <11.6?
|
For cmake, i would vote to go back to OFF instead of native. I don't see perf gain on Windows. @JohannesGaessler what's your intentions for the change? |
CUDA 11.6 seems to be when the
I implemented an option that enables the use of CUDA f16 intrinsics. With make you can build it without issue but with cmake the compilation failed; setting |
|
You could set |
I am not sure that there are many pros. At some point, this was required to support some devices like the NVIDIA Jetson, but that no longer seems to be the case. I guess it may save the trouble of upgrading to people who already have an older version installed. |
The CUDA 11.6 Release Notes mentions this: Maybe I am misunderstanding the NVCC docs, but I was under the impression that the default behavior of NVCC was to compile for the currently installed GPU architectures when no architecture is specified. This is what happens on Windows when |
|
I did some quick testing on my RTX 3090:
The minimum arch that can run all code is
So my impression is that setting |
|
My understanding is that using a lower architecture may prevent the compiler from using the newer features or instructions in the intermediate code, which in turn may cause the JIT to produce worse code for the newer architectures. But if the tests don't show any meaningful difference, it may be the best solution. |
|
If I understand correctly, the use of the Pascal intrinsics can already be disabled by undefining
|
|
By default the f16 intrinsics are disabled. They are only used if the user compiles with |
|
Thank you for everyone's input and good discussion. We don't need this PR anymore, which is replaced by #1959 |
Note on cmake version when building with CUDA.