Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion include/fast_float/float_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -1132,7 +1132,13 @@ template <typename T> constexpr uint64_t int_luts<T>::min_safe_u64[];

template <typename UC>
fastfloat_really_inline constexpr uint8_t ch_to_digit(UC c) {
return int_luts<>::chdigit[static_cast<unsigned char>(c)];
using UnsignedUC = typename std::make_unsigned<UC>::type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this so complicated? Won't the following do?

  return (c <= 255) ? static_cast<unsigned char>(c) : 0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is UC unsigned, or guaranteed to be a non-negative value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is, then the compiler should be able to generate optimal code for the c <= 255 test, without the manual masking etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

char8_t, char16_t and char32_t are non-negative by the standard. Only char can be negative or not.

Unfortunately, we also support wchar_t which can be (in practice) a signed integer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, wchar_t was indeed the annoying one in here. I wanted to avoid introducing extra branches and wasn't sure whether compilers could reliably optimize the conditional out. So I went with the handwritten masking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do:

if constexpr (is_signed<UC>::value) {
  // ...
}
else
  return (c <= 255) ? static_cast<unsigned char>(c) : 0;

Or just:

return (make_unsigned_t<UC>(c) <= 255) ? static_cast<unsigned char>(c) : 0;

If it's a negative wchar_t it will become a value much greater than 255.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwakely We try to support legacy compilers.... so we avoid if constexpr but you are right, of course.

The current code is a bit nasty, but at least it is correct.

auto uc = static_cast<UnsignedUC>(c);
// For types larger than one byte, we need to force an index with sentinel
// value (using index zero because that is easiest) if any byte other than
// the low byte is non-zero.
auto mask = static_cast<UnsignedUC>(-((uc & ~0xFFull) == 0));
return int_luts<>::chdigit[static_cast<unsigned char>(uc & mask)];
}

fastfloat_really_inline constexpr size_t max_digits_u64(int base) {
Expand Down
271 changes: 270 additions & 1 deletion tests/fast_int.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -831,6 +831,275 @@ int main() {
return EXIT_FAILURE;
}
}
// dont parse UTF-16 code units of emojis as int if low byte is ascii digit
{
const std::u16string emojis[] = {
u"ℹ", u"ℹ️", u"☸", u"☸️", u"☹", u"☹️", u"✳", u"✳️",
u"✴", u"✴️", u"⤴", u"⤴️", u"⤵", u"⤵️", u"〰", u"〰️",
};
bool failed = false;
auto array_size = sizeof(emojis) / sizeof(emojis[0]);
for (size_t i = 0; i < array_size; i++) {
auto e = emojis[i];
int foo;
auto answer = fast_float::from_chars(e.data(), e.data() + e.size(), foo);
if (answer.ec == std::errc()) {
failed = true;
std::cerr << "Incorrectly parsed emoji #" << i << " as integer " << foo
<< "." << std::endl;
}
}

if (failed) {
return EXIT_FAILURE;
}
}
// dont parse UTF-32 code points of emojis as int if low byte is ascii digit
{
const std::u32string emojis[] = {
U"ℹ",
U"ℹ️",
U"☸",
U"☸️",
U"☹",
U"☹️",
U"✳",
U"✳️",
U"✴",
U"✴️",
U"⤴",
U"⤴️",
U"⤵",
U"⤵️",
U"〰",
U"〰️",
U"🈲",
U"🈳",
U"🈴",
U"🈵",
U"🈶",
U"🈷",
U"🈷️",
U"🈸",
U"🈹",
U"🌰",
U"🌱",
U"🌲",
U"🌳",
U"🌴",
U"🌵",
U"🌶",
U"🌶️",
U"🌷",
U"🌸",
U"🌹",
U"🐰",
U"🐱",
U"🐲",
U"🐳",
U"🐴",
U"🐵",
U"🐶",
U"🐷",
U"🐸",
U"🐹",
U"🔰",
U"🔱",
U"🔲",
U"🔳",
U"🔴",
U"🔵",
U"🔶",
U"🔷",
U"🔸",
U"🔹",
U"😰",
U"😱",
U"😲",
U"😳",
U"😴",
U"😵",
U"😵‍💫",
U"😶",
U"😶‍🌫",
U"😶‍🌫️",
U"😷",
U"😸",
U"😹",
U"🤰",
U"🤰🏻",
U"🤰🏼",
U"🤰🏽",
U"🤰🏾",
U"🤰🏿",
U"🤱",
U"🤱🏻",
U"🤱🏼",
U"🤱🏽",
U"🤱🏾",
U"🤱🏿",
U"🤲",
U"🤲🏻",
U"🤲🏼",
U"🤲🏽",
U"🤲🏾",
U"🤲🏿",
U"🤳",
U"🤳🏻",
U"🤳🏼",
U"🤳🏽",
U"🤳🏾",
U"🤳🏿",
U"🤴",
U"🤴🏻",
U"🤴🏼",
U"🤴🏽",
U"🤴🏾",
U"🤴🏿",
U"🤵",
U"🤵‍♀",
U"🤵‍♀️",
U"🤵‍♂",
U"🤵‍♂️",
U"🤵🏻",
U"🤵🏻‍♀",
U"🤵🏻‍♀️",
U"🤵🏻‍♂",
U"🤵🏻‍♂️",
U"🤵🏼",
U"🤵🏼‍♀",
U"🤵🏼‍♀️",
U"🤵🏼‍♂",
U"🤵🏼‍♂️",
U"🤵🏽",
U"🤵🏽‍♀",
U"🤵🏽‍♀️",
U"🤵🏽‍♂",
U"🤵🏽‍♂️",
U"🤵🏾",
U"🤵🏾‍♀",
U"🤵🏾‍♀️",
U"🤵🏾‍♂",
U"🤵🏾‍♂️",
U"🤵🏿",
U"🤵🏿‍♀",
U"🤵🏿‍♀️",
U"🤵🏿‍♂",
U"🤵🏿‍♂️",
U"🤶",
U"🤶🏻",
U"🤶🏼",
U"🤶🏽",
U"🤶🏾",
U"🤶🏿",
U"🤷",
U"🤷‍♀",
U"🤷‍♀️",
U"🤷‍♂",
U"🤷‍♂️",
U"🤷🏻",
U"🤷🏻‍♀",
U"🤷🏻‍♀️",
U"🤷🏻‍♂",
U"🤷🏻‍♂️",
U"🤷🏼",
U"🤷🏼‍♀",
U"🤷🏼‍♀️",
U"🤷🏼‍♂",
U"🤷🏼‍♂️",
U"🤷🏽",
U"🤷🏽‍♀",
U"🤷🏽‍♀️",
U"🤷🏽‍♂",
U"🤷🏽‍♂️",
U"🤷🏾",
U"🤷🏾‍♀",
U"🤷🏾‍♀️",
U"🤷🏾‍♂",
U"🤷🏾‍♂️",
U"🤷🏿",
U"🤷🏿‍♀",
U"🤷🏿‍♀️",
U"🤷🏿‍♂",
U"🤷🏿‍♂️",
U"🤸",
U"🤸‍♀",
U"🤸‍♀️",
U"🤸‍♂",
U"🤸‍♂️",
U"🤸🏻",
U"🤸🏻‍♀",
U"🤸🏻‍♀️",
U"🤸🏻‍♂",
U"🤸🏻‍♂️",
U"🤸🏼",
U"🤸🏼‍♀",
U"🤸🏼‍♀️",
U"🤸🏼‍♂",
U"🤸🏼‍♂️",
U"🤸🏽",
U"🤸🏽‍♀",
U"🤸🏽‍♀️",
U"🤸🏽‍♂",
U"🤸🏽‍♂️",
U"🤸🏾",
U"🤸🏾‍♀",
U"🤸🏾‍♀️",
U"🤸🏾‍♂",
U"🤸🏾‍♂️",
U"🤸🏿",
U"🤸🏿‍♀",
U"🤸🏿‍♀️",
U"🤸🏿‍♂",
U"🤸🏿‍♂️",
U"🤹",
U"🤹‍♀",
U"🤹‍♀️",
U"🤹‍♂",
U"🤹‍♂️",
U"🤹🏻",
U"🤹🏻‍♀",
U"🤹🏻‍♀️",
U"🤹🏻‍♂",
U"🤹🏻‍♂️",
U"🤹🏼",
U"🤹🏼‍♀",
U"🤹🏼‍♀️",
U"🤹🏼‍♂",
U"🤹🏼‍♂️",
U"🤹🏽",
U"🤹🏽‍♀",
U"🤹🏽‍♀️",
U"🤹🏽‍♂",
U"🤹🏽‍♂️",
U"🤹🏾",
U"🤹🏾‍♀",
U"🤹🏾‍♀️",
U"🤹🏾‍♂",
U"🤹🏾‍♂️",
U"🤹🏿",
U"🤹🏿‍♀",
U"🤹🏿‍♀️",
U"🤹🏿‍♂",
U"🤹🏿‍♂️",
};
bool failed = false;
auto array_size = sizeof(emojis) / sizeof(emojis[0]);
for (size_t i = 0; i < array_size; i++) {
auto e = emojis[i];
int foo;
auto answer = fast_float::from_chars(e.data(), e.data() + e.size(), foo);
if (answer.ec == std::errc()) {
failed = true;
std::cerr << "Incorrectly parsed emoji #" << i << " as integer " << foo
<< "." << std::endl;
}
}

if (failed) {
return EXIT_FAILURE;
}
}

return EXIT_SUCCESS;
}
Expand All @@ -842,4 +1111,4 @@ int main() {
std::cerr << "The test requires C++17." << std::endl;
return EXIT_SUCCESS;
}
#endif
#endif
Loading