Conversation
jamierpond
commented
Jan 13, 2025
- add main logic
- add even lane mask
- even/odd lane mask
- clean up a little
inc/zoo/swar/associative_iteration.h
Outdated
|
|
||
| using S = SWAR<4, u32>; | ||
|
|
||
| static_assert(S::oddLaneMask().value() == 0xF0F0'F0F0); |
There was a problem hiding this comment.
aware these tests not formatted nicely, just making a draft for visibility
thecppzoo
left a comment
There was a problem hiding this comment.
Thanks for this draft.
This work surfaces several important questions:
- Support for non-power-of-two lane sizes
- What are we going to do with two's complement signs? perhaps this is not
fullMultiplicationbutsafeMultiplication - We have to implement "negation" (two's complement flipping the sign)
That being said, we are in an excellent position to also implement division as multiplication by the reciprocal, which would be useful at least for compile-time divisors.
Please merge the auto declarations: this does not coerce the type, but verifies that all the declarands have the same type:
auto a = initialize_a(inputsForA);
auto b = initialize_b(inputsForB);In that code, the types are not coerced (very good!) but a and b may be of different types.
auto
a = initA(iA),
b = initB(iB);We still don't coerce the types, but if a and b have different types, it is a compilation error.
This is especially useful here in the SWAR library
inc/zoo/swar/associative_iteration.h
Outdated
| auto [l_even, l_odd] = doublePrecision(multiplicand); | ||
| auto [r_even, r_odd] = doublePrecision(multiplier); | ||
| auto res_even = multiplication_OverflowUnsafe(l_even, r_even); | ||
| auto res_odd = multiplication_OverflowUnsafe(l_odd, r_odd); |
There was a problem hiding this comment.
Merge these declarations into a single auto, the idea is that in that way you are verifying they are all of the same type.
There was a problem hiding this comment.
also todo signed multiplication
|
Perhaps we can also make "widening multiplication", that doubles the lane size. For example, in x86-64, there are the instructions to multiply two register-size values and get a result of double the number of bits, using the "DX:AX" for the result, so, for 64 bits, it would be RDX with the upper 64 bits, and RAX with the lower, in this way, the multiplication also widens. Ask Claude what is the name of this. |
inc/zoo/swar/associative_iteration.h
Outdated
| SWAR<NB, T> result; | ||
| SWAR<NB, T> overflow; |
inc/zoo/swar/associative_iteration.h
Outdated
|
|
||
| template <int NB, typename T> | ||
| constexpr auto | ||
| doublingMultiplication(SWAR<NB, T> multiplicand, SWAR<NB, T> multiplier) { |
There was a problem hiding this comment.
doubling is confusing here. doublePrecisionMultiplication is fine, multiplicationByDoublingPrecision, ...
inc/zoo/swar/associative_iteration.h
Outdated
| } | ||
|
|
||
| template <int NB, typename T> | ||
| constexpr MultiplicationResult<NB, T> |
inc/zoo/swar/associative_iteration.h
Outdated
| template<int NB, typename T> | ||
| constexpr auto saturatingExponentiation( | ||
| SWAR<NB, T> x, |
There was a problem hiding this comment.
Absolutely not.
We're not removing the non-saturating exponentiation and provide only the saturating exponentiation. Don't do that.
Always the general operation is pre-requisite for the more specific.
| SWAR<NB, T> lower; | ||
| SWAR<NB, T> upper; |
inc/zoo/swar/associative_iteration.h
Outdated
| over_even = D{(lower.value() & UpperHalfOfLanes) >> HalfLane}, | ||
| over_odd = D{(upper.value() & UpperHalfOfLanes) >> HalfLane}; |
There was a problem hiding this comment.
shift intra lane allows you to provide the mask.
Please use those primitives instead of deploying the pick-axe
inc/zoo/swar/SWAR.h
Outdated
| template <int NBits, typename T> | ||
| constexpr static auto consumeMSB(SWAR<NBits, T> s) noexcept { | ||
| using S = SWAR<NBits, T>; | ||
| auto msbCleared = s & ~S{S::MostSignificantBit}; | ||
| return S{static_cast<T>(msbCleared.value() << 1)}; | ||
| } | ||
|
|
There was a problem hiding this comment.
I am not sold on promoting this to the main header of swar.
This really seems to be an artifact of the "regressive" direction of "associative iteration", it does not cohere enough to the SWAR library itself.
| auto | ||
| doublePrecisionMultiplication(SWAR<NB, T> multiplicand, SWAR<NB, T> multiplier) { | ||
| auto | ||
| icand = doublePrecision(multiplicand), |
There was a problem hiding this comment.
Nice! never thought about omitting the prefix
|
I can not resist to comment about how elegant this is all looking. |
9fb833d to
dea354a
Compare