Skip to content

Conversation

@martin-frbg
Copy link
Collaborator

to work around recent fpu exception bug in Win10 since build 19041

tentative fix for #2709

@mattip
Copy link
Contributor

mattip commented Oct 7, 2020

Is there a reason to prefer the *.S version over the *.c version on non-windows? I am a bit worried that separating the code paths by OS might cause confusion in the future if changes are made and they get out of sync.

@martin-frbg
Copy link
Collaborator Author

Performance ? This is nothing but a workaround for the convenience of users of a particularly buggy patchlevel of Windows, I do not see why it should be forced on other uses cases that do not need it. There is no reason to expect the plain (and simple) "generic" C version to suddenly outperform the .S across several generations of hardware and compilers, and I lack the time to benchmark it now. If anything, a "modern" replacement would probably need to be coded with SSE or AVX (the present nrm2_sse.S does not handle double precision) but I am not an assembler wizard who could whip this up overnight.
Having the conditional code in the KERNEL file (i.e. basically at Makefile level) should make it easier to keep track of
than having it somehow buried in the .S file itself

@mattip
Copy link
Contributor

mattip commented Oct 7, 2020

Thanks for the explanation. I agree that this whole issue has been quite a time sink, and appreciate your work in trying to fix what should be fixed elsewhere.

@mattip
Copy link
Contributor

mattip commented Oct 12, 2020

I am still getting failures with this and NumPy. I guess we still haven't worked out what is causing these:

 ** On entry to DLASCL parameter number  4 had an illegal value
 ** On entry to DLASCL parameter number  4 had an illegal value
 ** On entry to DGEBAL parameter number  3 had an illegal value
 ** On entry to DGEHRD parameter number  2 had an illegal value
 ** On entry to DORGHR DORGQR parameter number  2 had an illegal value
 ** On entry to DHSEQR parameter number  4 had an illegal value
 ** On entry to DLASCL parameter number  4 had an illegal value
 ** On entry to DLASCL parameter number  4 had an illegal value

@martin-frbg
Copy link
Collaborator Author

My assumption was that numpy would be unlikely to use dsum/zsum as these are BLAS extensions

@mattip
Copy link
Contributor

mattip commented Oct 12, 2020

Not sure I follow. In KERNEL.generic DSUM, ZSUM are already the c-variants. Is there some way I can test the hypothesis that these kernels are involved?

@martin-frbg
Copy link
Collaborator Author

KERNEL.generic will only be used when building with TARGET=GENERIC, or when using MSVC. For the mingw builds, KERNEL (plus the respective KERNEL.target) are relevant.

@martin-frbg
Copy link
Collaborator Author

#2887 adds these to kernel/x86_64/KERNEL (the initial defaults to override are given in kernel/Makefile.L1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants