implement ryu 64-bit backend#19484
Merged
andrewrk merged 1 commit intoziglang:masterfrom Mar 30, 2024
Merged
Conversation
The 64-bit backend supports printing all floats up to 64-bits. The 128-bit continues to be used for larger values. This backend is approximately ~3x faster. Code size is a little smaller in the full table case and much smaller if using the samll tables. The implementation uses the same code-paths, parameterized by a set of tables and their pow5 implementations. We continue to use the same rounding/formatting mechanisms. Initially I explored a separate implementation, as upstream does this and has specific optimizations for these paths but for simplicity we don't. The performance loss is small enough at this point and keeping them combined keeps them in sync. Closes ziglang#19264.
tiehuis
commented
Mar 29, 2024
|
|
||
| const has_explicit_leading_bit = std.math.floatMantissaBits(T) - std.math.floatFractionalBits(T) != 0; | ||
| const d = binaryToDecimal(@as(I, @bitCast(v)), std.math.floatMantissaBits(T), std.math.floatExponentBits(T), has_explicit_leading_bit); | ||
| const d = binaryToDecimal(DT, @as(I, @bitCast(v)), std.math.floatMantissaBits(T), std.math.floatExponentBits(T), has_explicit_leading_bit, tables); |
Member
Author
There was a problem hiding this comment.
Should probably place the comptime tables as the second argument here instead of last.
Member
|
Thanks for the follow up! Great work. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The 64-bit backend supports printing all floats up to 64-bits. The 128-bit continues to be used for larger values.
This implementation uses the same code-paths as the 128-bit, parameterised by a new table structure.
I have fuzzed the 128-bit backend against the 64-bit backend and found no differences in output (shortest mode) for all
f16,f32, and ~1 trillionf64. Behaviour is expected to be identical.Performance
~3x faster than the 128-bit backend. ReleaseSmall notably is ~7x faster.
Master
This PR
Size
ReleaseSmall: 13.7Ki -> 4.58Ki
ReleaseFast: 22.6Ki -> 19.2Ki
Using the following sample program and https://github.com/google/bloaty.
Master
ReleaseSmall (13.7Ki)
ReleaseFast (22.6Ki)
This PR
ReleaseSmall (4.58Ki)
ReleaseFast (19.2Ki)
Notes
f64andf128in the same program, the two backends will both be in the output binary. I consider this a non-concern.f32could be easily added but until someone requests (likely an embedded user) I will omit the tables+path informatFloat.Closes #19264.