Skip to content

Conversation

@ChipKerchner
Copy link
Contributor

Vectorize SGEMV transpose (vector x matrix) reduce stage.

Decreases number of instructions in reduce by 5-6X.

@martin-frbg martin-frbg added this to the 0.3.29 milestone Aug 15, 2024
@martin-frbg martin-frbg merged commit dd71df8 into OpenMathLib:develop Aug 15, 2024
@martin-frbg
Copy link
Collaborator

Unfortunately I am (belatedly) seeing test failures for SGEMV and SSYMV on POWER8 under CentOS Linux, ppc64le (host cfarm112 in the Gcc Compile Farm, which is said to be POWER8 8247-22L provided by OSUOSL) that can be fixed by reverting the changes to sgemv_t.c

@ChipKerchner
Copy link
Contributor Author

I'm not 100% sure how to reproduce this failure.

@martin-frbg
Copy link
Collaborator

It was a surprise find from just running make on the aforementioned machine - however I notice now that it only has gcc8 installed. I'll try to reproduce this with a newer compiler if possible

@ChipKerchner
Copy link
Contributor Author

Yes, I would recommend gcc11+ or later.

@ChipKerchner
Copy link
Contributor Author

ChipKerchner commented Nov 18, 2024

Could update this PR (or a new one) to only vectorize this data for P9. Mainly it would be

#ifndef _ARCH_PWR9
old way
#else
new way
#endif

@martin-frbg
Copy link
Collaborator

I'm willing to leave it as it is now, if the error only appears with the ancient compiler. But the only other (gccfarm) little-endian power8 I have access to appears to be in Moscow of all places and its connection is flaky. So I'll have to see if building gcc14 on the old CentOS is faster than trying to git pull OpenBLAS a la russe

@martin-frbg
Copy link
Collaborator

Sorry, dropped the ball on this until being reminded by #5122 today - the failures on POWER8 turn out to be reproducible with gcc14 (also when building for TARGET=POWER8 on POWER10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants