[emval] Move reference counting to C++#20447
[emval] Move reference counting to C++#20447RReverser wants to merge 13 commits intoemscripten-core:mainfrom
Conversation
This allows to completely optimise away the extra allocation / wrapper when values are local, and generally avoids crossing the JS<>Wasm boundary as often.
This reduces size of class val to a single pointer, and still gets optimised out as an allocation.
This allows to completely eliminate statements like `val v;` as compiler can statically prove it doesn't need _emval_free.
C++ doesn't allow elimination of new/delete pairs, but in C malloc/free pairs can be and are eliminated by LLVM.
In theory Clang's __builtin_operator_{new,delete} could also help here, but in my experiments it wasn't optimised out while malloc/free still were.
I don't think we care about calling into user's overrides in this class too much, and the optimisation is valuable enough, so just using standard C funcs.
This reverts commit e15a9f7.
014081a to
333caed
Compare
|
Marking as draft for now as there's one unresolved issue. |
|
Hm the remaining issues might be easier to solve after #20383 and another planned subsequent PR. I'll leave this as a draft for now. |
|
Have you considered using the same array approach on the C++ side for refcount as the JS-side? That is, having a single vector<val_metadata> with refcounts indexed by handle, such that most new handles won't require an allocation? Ah, because in multi-threaded mode you could have handle collisions between threads? Could have an array per thread. |
Yes, I played with it, but that prevents compiler optimisations that allow to completely eliminate allocation/free pairs, which are extremely valuable as most of Embind values are temporary. Compilers are happy to optimise malloc/free pairs because those are simple well-known APIs but they can't see through all the mechanics of something as complex as a |
| __emval_decref(handle); | ||
| return rv; | ||
| }, | ||
| 'fromWireType': (handle) => Emval.toValue(handle), |
There was a problem hiding this comment.
Sorry if I'm missing something obvious, but how do you get away with removing the incref/decref from the C++ toWireType and JS fromWireType? Won't the C++ destructor call free before the JS fromWireType reads Emval.toValue?
C++ is super eager with copying values by default
and asking questions laterinstead of moving, and each of those copies had to cross JS boundary on creation/deletion just to increase/decrease reference count correspondingly.Moving reference count from JS to C++ allows to make all this copying substantially cheaper as it becomes a single integer increment/decrement on the Wasm side which, in most cases, can even be completely eliminated by LLVM.
This does mean that we need to heap-allocate a small object with value's metadata on the C++ side, and I know it can be a controversial change, but:
Note that reference counting and allocation optimisation for local values only applies to speed modes (
-O1and higher) and not to-Oswhere LLVM avoids inlining functions, but microbenchmarks show that even in situations where copies can't be eliminated, this approach is still faster:benchmarks.cpp
Before (timings and JS+Wasm byte sizes for different modes):
After:
The only small slowdown is for initial allocation & freeing in
-Osmode, but it's typically less common than copying, and it's in exchange for a small code size win, which is, well, point of that mode anyway.