Skip to content

Conversation

@MaxGraey
Copy link
Contributor

No description provided.


template<> int PopCount<uint64_t>(uint64_t v) {
return PopCount((uint32_t)v) + PopCount((uint32_t)(v >> 32));
return PopCount((uint32_t)v) + (v >> 32 ? PopCount((uint32_t)(v >> 32)) : 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for this change? This seems slightly harder to read than before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main motivation is consistency with CountLeadingZeros for 64-bit and potentially speedup calculation when high part of 64-bit is zero

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the consistency with CountLeadingZeroes is a non-goal here because they are fundamentally different optimizations. If we could rewrite CountLeadingZeroes to match PopCount, I think we would want to do that for improved readability.

If we're going to sacrifice some readability for performance, it would be good to see that the performance difference is measurable rather than hypothetical.

This PR LGTM other than this point, so it might be nice to split this out and land the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I switched to prev implementation for PopCount<uint64_t>

Comment on lines +68 to +70
template<typename T> bool IsPowerOf2(T v) {
return v != 0 && (v & (v - 1)) == 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍 I like that this lets us delete code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually popcnt(x) == 1 (btw it include x != 0 case also) faster but only if we have native popcnt support. Alos LLVM & GCC smart enought to replace pattern above to popcnt(x) == 1 if it possible

template<> int PopCount<uint64_t>(uint64_t v) {
#if __has_builtin(__builtin_popcountll) || defined(__GNUC__)
return __builtin_popcountll(v);
#if __has_builtin(__builtin_popcount) || defined(__GNUC__) || defined(_MSC_VER)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use __has_builtin(__builtin_popcount) instead __has_builtin(__builtin_popcountll) due to clang-format forcing line terminator and carry defined(_MSC_VER) to new line and this looks weird.


template<typename T, typename U> inline static T RotateLeft(T val, U count) {
T mask = sizeof(T) * CHAR_BIT - 1;
auto value = typename std::make_unsigned<T>::type(val);
Copy link
Contributor Author

@MaxGraey MaxGraey Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty important cast to unsigned. Otherwise not LLVM nor GCC can fold this to single rol / rot op

@MaxGraey MaxGraey requested a review from tlively June 17, 2020 17:58
Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@tlively tlively merged commit f6eb790 into WebAssembly:master Jun 18, 2020
@MaxGraey MaxGraey deleted the optimize-bit-helpers branch June 18, 2020 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants