-
Notifications
You must be signed in to change notification settings - Fork 0
ARMv8 support #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARMv8 support #32
Conversation
2751cd6 to
8adaf32
Compare
…amax(). Make potrf() compile on fma3<avx2> Smarter way of calculating register matrix number of columns in potrf() Make the code compile for ARM again Fixed calculation of RegisterMatrix columns number in potrf()
roversch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Please address here or in a follow-up MR
| #if XSIMD_WITH_NEON64 | ||
| # include <blast/math/algorithm/arch/neon64/Tile.hpp> | ||
| #endif | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use #if, #elif, #else and #error for more robust error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But on the other hand, this #if-#elif thingy is already in xsimd. And if it happens that different architecture constants are defined at the same time, then you just have different specializations of tile() declared. The architecture is a parameter of tile(), so they will not conflict. And if you require to instantiate blast::tile() with Arch template parameter for which detail::tile<...>() does not exist, the compiler should give a pretty clear message IMO.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok 👍
| { | ||
| size_t constexpr SS = SimdSize_v<ET>; | ||
| size_t constexpr TILE_STEP = 4; // TODO: this is almost arbitrary and needs to be ppoperly determined | ||
| size_t constexpr TILE_STEP = 4; // TODO: this is almost arbitrary and needs to be properly determined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the step in 'columns' right? It depends on the architecture as well. E.g. the optimal tiles are different for AVX and AVX2, AVX512
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, true. Also depends on data type (float or double, and potentially std::complex<float> and std::complex<double>). This is a reminder to do it in a more clever way. It would be also good to avoid duplicating the tile() function code between architectures.
| /** | ||
| * @brief Default size of a panel (in a panel matrix) for a given architecture and data type | ||
| * | ||
| * TODO: Is it always equal to SIMD size? Deprecate? | ||
| * | ||
| * @tparam Arch architecture | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Please deprecate
| { | ||
| std::size_t constexpr registerCapacity(xsimd::avx2) | ||
| { | ||
| return 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In bytes? Couldn't find the documentation on this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no, it's num_of_vector_registers. Better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more precise, yes.
Uh oh!
There was an error while loading. Please reload this page.