-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[POWER] Vectorize SGEMV transpose reduce stage #4880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[POWER] Vectorize SGEMV transpose reduce stage #4880
Conversation
|
Unfortunately I am (belatedly) seeing test failures for SGEMV and SSYMV on POWER8 under CentOS Linux, ppc64le (host cfarm112 in the Gcc Compile Farm, which is said to be POWER8 8247-22L provided by OSUOSL) that can be fixed by reverting the changes to sgemv_t.c |
|
I'm not 100% sure how to reproduce this failure. |
|
It was a surprise find from just running |
|
Yes, I would recommend gcc11+ or later. |
|
Could update this PR (or a new one) to only vectorize this data for P9. Mainly it would be #ifndef _ARCH_PWR9 |
|
I'm willing to leave it as it is now, if the error only appears with the ancient compiler. But the only other (gccfarm) little-endian power8 I have access to appears to be in Moscow of all places and its connection is flaky. So I'll have to see if building gcc14 on the old CentOS is faster than trying to git pull OpenBLAS a la russe |
|
Sorry, dropped the ball on this until being reminded by #5122 today - the failures on POWER8 turn out to be reproducible with gcc14 (also when building for TARGET=POWER8 on POWER10 |
Vectorize SGEMV transpose (vector x matrix) reduce stage.
Decreases number of instructions in reduce by 5-6X.