New implementation of -profile=gc#1806
New implementation of -profile=gc#1806mihails-strasuns-sociomantic wants to merge 10 commits intodlang:masterfrom
Conversation
f254c1f to
fb6bfd6
Compare
nemanja-boric-sociomantic
left a comment
There was a problem hiding this comment.
some 50k ft comments
src/rt/tracegc.d
Outdated
| extern (C) void* _d_newitemiT(in TypeInfo _ti); | ||
|
|
||
| extern (C) Object _d_newclassTrace(string file, int line, string funcname, const ClassInfo ci) | ||
| private string generatePrintf ( ) |
There was a problem hiding this comment.
nitpick: suggest generatePrintfTrace
There was a problem hiding this comment.
Btw, it seems that this change is completely removed in one of the next commits (it's using old style of tracing printf).
| else | ||
| enum bool hasElaborateCopyConstructor = false; | ||
| } | ||
|
|
There was a problem hiding this comment.
extra newline
| import core.memory : GC; | ||
|
|
||
| static if (is(typeof(ci))) | ||
| string name = ci.name; |
There was a problem hiding this comment.
suggest: allocation_type or similar
| accumulate(file, line, funcname, ti.toString(), ti.next.tsize * n); | ||
| return _d_arrayappendcTX(ti, px, n); | ||
| } | ||
| static size_t findParamIndex(string s) |
There was a problem hiding this comment.
Would be very nice if these helper methods would have ddoc, or at least generateWrapper.
|
Note: failures are most likely related to the mentioned issue with lack of single lock with GC implementation - there seems to be a data race on multi-threaded allocation test case. |
Comes at cost of losing some custom function arguments but is much more maintainable and readable.
Replaced approximation based of function arguments with a call to GC stats to always get exact amount of bytes allocated. As new implementation is generic and suitable for any wrapped function, manual wrappers have been replaced with string mixin approach.
Ordering by name as last measure means order of lines in the generate trace will be reproducible, allowing for easy test cases.
Moves code snippet which refers to various language features that allocate to actual test to ensure better functionality coverage.
After switching to `GC.stats` implementation of calculating allocated amount, it has become necessary to ensure that no other thread affects those stats in between two calculations points. Fixes issue https://issues.dlang.org/show_bug.cgi?id=17294 Fixes issue https://issues.dlang.org/show_bug.cgi?id=16280 Fixes issue https://issues.dlang.org/show_bug.cgi?id=15481
It is more useful than plain size or plain count
852da73 to
8d16e58
Compare
leandro-lucarella-sociomantic
left a comment
There was a problem hiding this comment.
On a more general topic, I don't understand why is all the calculation done in the runtime functions in the first place instead of directly in the GC, where it is much easier to know when stuff is really being allocated and you don't have to care about thread synchronization since you are already inside a global lock, or using the stats function. Doing this will also guarantee that enabling the memory profiling won't affect ordering and have less overhead since it avoid taking a global lock multiple times, and even times when it wasn't even necessary.
| size_t freeSize; | ||
| ulong freeSize; | ||
| /// number of bytes freed during collections through program lifetime so | ||
| /// far (will count same memory multiple times if re-used) |
There was a problem hiding this comment.
What's the point of keeping track of this? It should be mentioned at least in the commit message, but probably also in the changelog and the code docs.
src/rt/tracegc.d
Outdated
| line, | ||
| funcname.length, funcname.ptr | ||
| ); | ||
| }; |
There was a problem hiding this comment.
Why a mixin and not just a function with a version statement inside? The optimizer should remove the empty function call I guess...
There was a problem hiding this comment.
__FUNCTION__
src/rt/tracegc.d
Outdated
| } | ||
| else | ||
| return ""; | ||
| } |
There was a problem hiding this comment.
I agree with @nemanja-boric-sociomantic, is weird that you almost remove everything you did in the previous commit here, but also for me is very distracting that you are reworking all the tracing again mixed with what the commit is supposed to do "Rework how -profile=gc output is calculated". Is hard to review those changes like this.
There was a problem hiding this comment.
Funnily, I did exactly to simplify reviewing, minimizing amount of lines removed in the commit that does interesting stuff (as tracing copy-paste bits have been removed in previous commit) :)
| 640 10 int[] profilegc.main.bar src/profilegc.d:73 | ||
| 288 1 immutable(char)[][int] D main src/profilegc.d:34 | ||
| 240 4 core.thread.Thread[] D main src/profilegc.d:77 | ||
| 288 1 immutable(char)[][int] D main src/profilegc.d:34 |
There was a problem hiding this comment.
I don't understand, so, bytes allocated is how much is supposed to be allocated each time? How is something like auto x = new int[n]; handled, since the bytes allocated on each call will vary depending on n?
There was a problem hiding this comment.
No, bytes allocated should be total amount through the life of the program.
There was a problem hiding this comment.
I don't understand, the commit says sort by total size first, but this is putting first 240 and 280 afterwards...
| import core.exception : onOutOfMemoryError; | ||
|
|
||
| struct Entry { size_t count, size; } | ||
| struct Entry { ulong count, size; } |
There was a problem hiding this comment.
Why is changing size_t to ulong is necessary or better? Needs clarification in the commit message at least, but maybe even here too, so nobody is tempted to change it back.
There was a problem hiding this comment.
Was trying to fix differences in stats between 32-bit and 64-bit builds, stupid idea that will need to be reverted (I have later switched to using two different log files to compare against in test runner).
| class ConservativeGC : GC | ||
| { | ||
| bool profiling_enabled; | ||
| shared Mutex profiler_lock; |
There was a problem hiding this comment.
Why creating this new mutex instead of using the pre-existing GC global lock for everything?
There was a problem hiding this comment.
Currenty main GC implementation uses non re-entrant spinlock. Anyway, all lock/sync hacking is more of proof of concept to see if I can make it work that way (have failed so far).
It is simply impossible with current Main problem with existing implementation is that I couldn't manage to get determenistic stats as soon as threads are involved, even with dump global lock for every operation (it seems like amount of memory allocated by |
|
Back to my question about why not doing all this tracking inside the GC instead of the allocating functions in the runtime... Is this because it will be a complete rewrite of this feature, or is there any other fundamental problem with that approach that I'm not seeing? |
Only former - I actually do agree that would be fundamentally more correct approach. But is much more changes, including plenty of breaking changes, thus I was reluctant to pursue that path :( |
|
Closing until we come up with solution that can be used upstream |
Requires merge of
stableintomasterto remove a975720 commit.This is an attempt to fix https://issues.dlang.org/show_bug.cgi?id=17294 - it is mostly finished but there are few improvements to make which I am not 100% sure about:
GC.malloccan still interfere with statsmalloccalls in log too, but that requires some enhancements on compiler side too, to let runtime know that-profile=gcis enabled. I will probably delay that to separate PR.