-
Notifications
You must be signed in to change notification settings - Fork 77
device-libs: Use f32 denormal check in rtn f64->f32 conversions #876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: amd-staging
Are you sure you want to change the base?
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
I think PSDB is still not running any tests with -z, but the relevant conversions test pass for me |
amd/device-libs/ocml/src/convert.cl
Outdated
| if (DAZ_OPT()) { | ||
| float z = BUILTIN_COPYSIGN_F32(0.0f, r); | ||
| r = a >= -0x1.fffffcp-127 && a < 0x1.0p-126 ? z : r; | ||
| r = a_f == 0.0f ? a_f : r; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will always return the same results as the original, but I think it's OK. Would you please replace "a_f" with "fa"?
|
https://alive2.llvm.org/ce/z/BYCRrh Need a local build for it to not timeout, but: Changes the behavior for 0x1.fffffep-127 from 0 to 0x1.fffffcp-127, but the flush isn't mandatory. I'm not sure if alive2 has a way to force producing additional counterexamples |
Instead of doing a range check for whether the converted value will be in the denormal range, with the off by one for the direction, just check if the converted value is equal to 0. This saves 5 instructions, mostly from materializing the 64-bit constants.
|
Unsurprisingly alive2 is happy with the assumption the value isn't in the float denormal range: I'm getting the impression it's not treating the flushed output of the fptrunc as a valid result from the denormal-fp-math-f32 |
|
Manually forcing canonicalization of the fptrunc result: https://alive2.llvm.org/ce/z/8a9oVY So the old code didn't flush for all values either, which makes sense as there's no canonicalizing operation coming out of pred. |
|
Forcing canonicalize on the pred path makes it pass: https://alive2.llvm.org/ce/z/_VNHTn, so the change is just which denormal results are flushed or not I still need the canonicalize of the fptrunc output, so I think that's an alive2 bug |
86b77cb to
05dc045
Compare
Instead of doing a range check for whether the converted value will be in the denormal range, with the off by one for the direction, just check if the converted value is equal to 0.
This does have a small behavior change. Neither version canonicalizes the results on the pred path, so this changes which denormal results are flushed to 0 or left uncanonicalized.
This saves 5 instructions, mostly from materializing the 64-bit constants.