Conversation
| ZTE(power << 1); | ||
| for (;;) { | ||
| result = result ^ result.shiftIntraLaneLeft(power, shiftMask); | ||
| if (!--log2Count) { break; } |
There was a problem hiding this comment.
@thecppzoo note one condition in the inner loop now
There was a problem hiding this comment.
This condition is broken: it does not work with a bit count of non-powers of 2, like 7
inc/zoo/swar/associative_iteration.h
Outdated
|
|
||
| template<typename S> | ||
| constexpr auto parallel_suffix(S input) { | ||
| constexpr auto log2Count = S::Lanes; |
There was a problem hiding this comment.
This can't be right, the parallel suffix does not depend on the number of lanes, but the number of bits in the lanes
There was a problem hiding this comment.
looks like you might be reviewing an outdated version ?
|
This implementation might be simple enough, sure, but it can only accept lane sizes that have a power of two number of bits. |
|
hmmm... yeah i see what you mean...
…On Thu, 12 Sept 2024 at 19:18, thecppzoo ***@***.***> wrote:
This implementation might be simple enough, sure, but it can only accept
lane sizes that have a power of two number of bits.
Let's review if the implementation I made is less efficient than yours.
Otherwise, the much harder challenge of supporting any arbitrary bitcount
will have to decompose the number of bits into its binary representation to
make the groups, and then AI would come to bear more clearly.
In simpler terms, this implementation is like multiplication when the
factor is a power of two, much easier.
—
Reply to this email directly, view it on GitHub
<#102 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARQHT3HLE4OYF3LWWQQLWE3ZWJDRBAVCNFSM6AAAAABOCUJ25GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXHEYTCNBXGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
|
ok now working with non-power of two nuim bits, just needed to idiot check
myself about the log2 impl for non-powers of two:
https://godbolt.org/z/aPa6q8r8c
…On Thu, 12 Sept 2024 at 20:23, Jamie Pond ***@***.***> wrote:
hmmm... yeah i see what you mean...
On Thu, 12 Sept 2024 at 19:18, thecppzoo ***@***.***> wrote:
> This implementation might be simple enough, sure, but it can only accept
> lane sizes that have a power of two number of bits.
> Let's review if the implementation I made is less efficient than yours.
> Otherwise, the much harder challenge of supporting any arbitrary bitcount
> will have to decompose the number of bits into its binary representation to
> make the groups, and then AI would come to bear more clearly.
> In simpler terms, this implementation is like multiplication when the
> factor is a power of two, much easier.
>
> —
> Reply to this email directly, view it on GitHub
> <#102 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ARQHT3HLE4OYF3LWWQQLWE3ZWJDRBAVCNFSM6AAAAABOCUJ25GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXHEYTCNBXGE>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
|
I just did this: I am very surprised and disappointed that the generated code for powers of two is basically identical, we have now a good example of code that the optimizer does not "understand", or perhaps we have to look deeper about whether this implementation is inherently not efficient. Another lesson is to always, always, always! work on the straightforward solution of the straightforward need to have something to compare to sophisticated solutions to abstract and general needs. |
Not totally sure how this can be turned into AI right now... i think this function might be too simple for associative iteration?