-
-
Notifications
You must be signed in to change notification settings - Fork 483
Use NaN-boxing on value::InnerValue #4091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
All tests in |
|
Still failing some tests:
Broken tests (10):But the bigger issue that I see is performance. I ran the benchmarks locally and the performance was worse in all benchmarks, but in some up to almost 40%. I though a performance tradeoff vs lower memory usage seemed reasonable and expected for this change, but this seems way to extreme. Benchmarks before: RESULT Richards 78.5
RESULT DeltaBlue 71.8
RESULT Crypto 80.3
RESULT RayTrace 182
RESULT EarleyBoyer 227
RESULT RegExp 43.7
RESULT Splay 326
RESULT NavierStokes 187
SCORE 122After: RESULT Richards 49.9
RESULT DeltaBlue 43.7
RESULT Crypto 77.7
RESULT RayTrace 138
RESULT EarleyBoyer 161
RESULT RegExp 38.9
RESULT Splay 214
RESULT NavierStokes 163
SCORE 92.0 |
|
@raskad @hansl As you probably already know, not all memory optimizations will yield (or even preserve) performance gains. It may be that way sometimes, but there's this balance between memory and speed that big memory optimizations (such as this one) will have to deal with. The first step is to make every test work, then we'd have to optimize repeatedly to recover all the lost performance, and hopefully we'll have an engine that's as fast as before but using a fraction of the memory. Having said that, we should see what the overall gains in memory are, because this should at least save a considerable amount of memory to be worth putting more work on. |
|
There's no rush for this, since the actual breaking change already happened :) I'll run benchmarks on my own, and I'll try to check a profiler for memory. In the next 2 weeks or so. It's also possible that I'm missing a few easy inline, and adding some bit magic might make things faster down the line. First things first, I'll make sure test262 has no regressions. |
|
@raskad BTW this should cover both; it should take less memory, AND be more performant. But I guess we pass JsValue more by reference than by value. Now that we can move JsValue for free, maybe we should reconsider some other parts of hte code as well. Later work. This article shows a 28% increase in performance for a tagged union of pointers, in JavaScriptCore (Mozilla's engine). Tough TBF we weren't using pointers. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4091 +/- ##
==========================================
+ Coverage 47.24% 52.97% +5.72%
==========================================
Files 476 488 +12
Lines 46892 52465 +5573
==========================================
+ Hits 22154 27793 +5639
+ Misses 24738 24672 -66 ☔ View full report in Codecov by Sentry. |
|
@hansl I did a quick perf run on the |
That makes a lot of sense actually, because clones for The "proper" solution would be to make |
|
@raskad If we nan boxed Let me make sure we don't regress on test262 then I'll look into the performance. |
|
Okay this one is fascinating... If I only create 1 value in an expression, e.g. If I create multiple negative zeroes in an expression, only the first one keeps the parity. Cloning the negative zero keeps the parity, copying is not supported, ... It seems the bug is not in my code, but that's the only thing I changed... Any idea? |
|
There's something funny with the parser and/or optimizer, but in any case I'm still planning on running the benchmark before the end of the year. |
|
Benchmarks (using Enum-basedNaN-Boxed |
|
On the latest commit, I changed the pointer tagging to using the top bits of the pointer. This means that we lose bits on the pointer value itself (after doing some research 48 bits should be enough), but we gain on performance by having non overlapping ranges for pointers. Seems like it's speeding it up a bit overall: The final score is getting real close to before this PR on my machine. |
| //! | `BigInt` | `7FF8:PPPP:PPPP:PPPP` | 49-bits pointer. Assumes non-null pointer. | | ||
| //! | `Object` | `7FFA:PPPP:PPPP:PPPP` | 49-bits pointer. | | ||
| //! | `Symbol` | `7FFC:PPPP:PPPP:PPPP` | 49-bits pointer. | | ||
| //! | `String` | `7FFE:PPPP:PPPP:PPPP` | 49-bits pointer. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the pointer space we only have 1 unused slot left right which is 7FF8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7FF8 (or b0111_1111_1111_1000) is used by bigint if the pointer is non-null. The 16th MSB isn't used (yet?), so pointers are actually 48 bits (I'll correct the comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if the commit helps clarifying this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry I meant to ask if 7FFF is unused and reserved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unused and reserved for now.
raskad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greak work. Also very nice docs in nan_boxed!
Ran the benchmarks locally rebased on the current register enabled main branch and the difference looked ok to me.
|
@jedel1043 PTAL. |
jedel1043
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice implementation! Looks good
This adds a new
InnerValuethat uses NaN-boxing to allowJsValueto ultimately only take 64-bits (8 bytes). This allowsJsValueto fit into one x86_64 register which should greatly improve performance.For more details on NaN-boxing (and its alternative, Pointer tagging), see https://piotrduperas.com/posts/nan-boxing which is a great tutorial (in C) on how to get there. A proper explanation of each tags/range of values is described in the header of the
inner.rsmodule.Fixes #1373.