-
Notifications
You must be signed in to change notification settings - Fork 1.6k
fabs -> fabsl #1709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fabs -> fabsl #1709
Conversation
Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.
|
Looks like there are similar usages in |
|
I was surprised to see that "long double" variables are being used anywhere in OpenBLAS. Although gcc on x86_64 uses 80 bit Intel extended precision for these, the performance is slow in comparison to AVX2 or AVX512 SIMD. Other x86_64 compilers might just use IEEE double precision for long double. On many other architectures there's no hardware support for an extended precision long double (and the compiler often turns long double into double precision.) It's also possible that some compiler might implement a quadruple precision long double in software, but that would be far slower than the x86_64 80 bit extended precision. Thus there's no real way to count on this helping with precision, and furthermore, if you do get some kind of extended precision using long double it might have poor performance. A final concern is that users of BLAS/LAPACK routines don't expect extended precision and might be surprised by results which are too accurate. Which of the BLAS/LAPACK functions in OpenBLAS make use of long double? |
|
At least some of these - most likely all - are unchanged from GotoBLAS2 (which predates AVX2). |
|
Also if you are concerned about excess precision on x86, you will probably want to set the FPU control word to use internal rounding to double precision (0x27 I think it was) if your operating system does not already default to that mode (*BSD does I believe). I have not tried what effect enforcing fpmath=sse would have, possibly this would be sufficient to demote the "long double" to "double" already. |
|
Perhaps the discussion could be moved to an issue ticket, if nobody actually questions the validity of the PR as such ? (Unfortunately I am not aware of a github function that would do this while keeping individual messages and attributions intact) |
|
I opened a separate issue for this with a reference to the pull request. |
|
If people agree that this PR is correct, it should probably be expanded to include the other calls to |
|
The only instances of "long double" that I can find appear to be in interface/rotg.c and interface/zrotg.c, not sure why you think symv.c and others are affected ? |
|
As I understand it, the previous code took long double values, rounded them to double, took the absolute value and then converted back to long double at the end, losing any extended precision. The new version should actually keep the extended precision. |
|
@martin-frbg, I haven't looked at the other files in detail — I just listed the files for which I saw a similar compiler warning about |
|
I did not see any other warnings with clang, nor any other uses of "long double". |
|
@martin-frbg, if all the code using |
Fixes two calls that were using
fabson along doubleargument rather thanfabsl, which looks like it is doing an unintentional truncation todoubleprecision.