Rework of -profile=gc calculations by mihails-strasuns-sociomantic · Pull Request #1846 · dlang/druntime

mihails-strasuns-sociomantic · 2017-06-19T10:11:15Z

Another attempt for #1806, now using
thread-local counter inside GC to circumvent sync issues.

mihails-strasuns-sociomantic · 2017-06-19T10:12:30Z

(going to abuse auto-tester a bit)

leandro-lucarella-sociomantic

Besides the small comments and the non-in-depth review of the code generating bits. LGTM. It is much more sensible than the previous implementation.

leandro-lucarella-sociomantic · 2017-06-20T11:36:57Z

src/gc/impl/conservative/gc.d

@@ -529,7 +529,7 @@ class ConservativeGC : GC
        }

        gcx.log_malloc(p, size);


In some distant future it might be good to gcx.log_malloc alloc_size too...

leandro-lucarella-sociomantic · 2017-06-20T11:39:10Z

src/rt/tracegc.d

-extern (C) void* _d_arrayliteralTX(const TypeInfo ti, size_t length);
-extern (C) void* _d_assocarrayliteralTX(const TypeInfo_AssociativeArray ti, void[] keys, void[] vals);
+mixin(generateTraceWrappers());
+//pragma(msg, generateTraceWrappers());


I guess commented pragmas should be gone.

leandro-lucarella-sociomantic · 2017-06-20T11:40:44Z

src/rt/tracegc.d

-    accumulate(file, line, funcname, "closure", sz);
-    return _d_allocmemory(sz);
+    void foo(int x, double y) { }
+    static assert (Arguments!foo == "x, y, ");


OK, I got completely lost with all the code generation. I will let someone else more used to black magic to review it.

leandro-lucarella-sociomantic · 2017-09-13T15:47:38Z

Anyone interested in having a look at this? Maybe @MartinNowak ?

mihails-strasuns-sociomantic · 2017-09-13T20:53:46Z

FYI: current state is "works like a charm for me but I haven't found time to rework how test suite runs so that it can pass on all platforms with more extensive tests added"

leandro-lucarella-sociomantic · 2017-09-14T17:42:23Z

I think this is quite important, the current profiling information given by the GC is pretty much garbage, and giving wrong information could be more dangerous than not giving any information at all. So any help here is very appreciated.

Also, besides the currently not ideal testing situation and failure on some platforms, it would be nice to know if the general solution looks good for other druntime maintainers.

ping @dlang/team-druntime

ibuclaw · 2017-09-15T07:07:38Z

High level design seems to reasonable to me.

leandro-lucarella-sociomantic · 2017-09-15T16:12:16Z

I just had a look at the results for FreeBSD 32 and it's just an ordering issue. I guess something is wrong (or incomplete) with b64ac1f.

According to the auto-tester, the BSD 32 results are:

bytes allocated, allocations, type, function, file:line
            176	              1	immutable(char)[][int] D main src/profilegc.d:34
            128	              1	float[][] D main src/profilegc.d:18
            128	              1	int[][] D main src/profilegc.d:15
             64	              1	float[] D main src/profilegc.d:53
             64	              1	int[] D main src/profilegc.d:52
             16	              1	char[] D main src/profilegc.d:47
             16	              1	int[] D main src/profilegc.d:14
             16	              1	int[] D main src/profilegc.d:48
             16	              1	char[] D main src/profilegc.d:45
             16	              1	float D main src/profilegc.d:16
             16	              1	int D main src/profilegc.d:13
             16	              1	float[] D main src/profilegc.d:17
             16	              1	closure profilegc.main.foo src/profilegc.d:56
             16	              1	int[] D main src/profilegc.d:33
             16	              1	profilegc.main.C D main src/profilegc.d:12
             16	              1	wchar[] D main src/profilegc.d:46

And the one in the test case:

bytes allocated, allocations, type, function, file:line
            176	              1	immutable(char)[][int] D main src/profilegc.d:34
            128	              1	float[][] D main src/profilegc.d:18
            128	              1	int[][] D main src/profilegc.d:15
             64	              1	float[] D main src/profilegc.d:53
             64	              1	int[] D main src/profilegc.d:52
             16	              1	profilegc.main.C D main src/profilegc.d:12
             16	              1	char[] D main src/profilegc.d:45
             16	              1	char[] D main src/profilegc.d:47
             16	              1	closure profilegc.main.foo src/profilegc.d:56
             16	              1	float D main src/profilegc.d:16
             16	              1	float[] D main src/profilegc.d:17
             16	              1	int D main src/profilegc.d:13
             16	              1	int[] D main src/profilegc.d:14
             16	              1	int[] D main src/profilegc.d:33
             16	              1	int[] D main src/profilegc.d:48
             16	              1	wchar[] D main src/profilegc.d:46

Neither seem to be sorted by "name" for the 16 "allocated" values.

leandro-lucarella-sociomantic · 2017-09-15T16:24:29Z

Just for the records, the correct order should be:

bytes allocated, allocations, type, function, file:line
            176	              1	immutable(char)[][int] D main src/profilegc.d:34
            128	              1	float[][] D main src/profilegc.d:18
            128	              1	int[][] D main src/profilegc.d:15
             64	              1	float[] D main src/profilegc.d:53
             64	              1	int[] D main src/profilegc.d:52
             16	              1	char[] D main src/profilegc.d:45
             16	              1	char[] D main src/profilegc.d:47
             16	              1	closure profilegc.main.foo src/profilegc.d:56
             16	              1	float D main src/profilegc.d:16
             16	              1	float[] D main src/profilegc.d:17
             16	              1	int D main src/profilegc.d:13
             16	              1	int[] D main src/profilegc.d:14
             16	              1	int[] D main src/profilegc.d:33
             16	              1	int[] D main src/profilegc.d:48
             16	              1	profilegc.main.C D main src/profilegc.d:12
             16	              1	wchar[] D main src/profilegc.d:46

leandro-lucarella-sociomantic · 2017-09-15T16:33:24Z

Having a quick look, test/profile/myprofilegc.log.linux.32.exp, test/profile/myprofilegc.log.osx.32.exp results are also broken (bad ordering), so it seems like everything 32bit is broken.

But then I saw a pattern, so I think we have 2 problems here:

All 32 bit results have the 16 1 profilegc.main.C D main src/profilegc.d:12 misplaced, I guess you fabricated these test results by changing the bytes in the 64 bit results without re-sorting, because in 64 bit that is 32 bytes, so it's above the 16 bytes records.
Well, the FreeBSD actual results are wrongly sorted too, so even fixing the expected results won't fix the problem, I don't know why the sorting is not working for FreeBSD.

leandro-lucarella-sociomantic · 2017-09-15T16:58:04Z

OK, checking:

Linux_64_32 fails because the linux.32.exp file is wrong (as I described before, probably because of a copy&paste error).
Linux_32 fails because the linux.32.exp file is wrong (as I described before, probably because of a copy&paste error).
FreeBSD_32 fails because the freebsd.32.exp file is wrong AND because the actual results gotten by the test run are also wrong (bad ordering).
FreeBSD_64_64 fails because the actual results gotten by the test run are also wrong (bad ordering). The line 16 1 closure profilegc.main.foo src/profilegc.d:56 is misplaced and the 16 1 float D main src/profilegc.d:16 too.
Darwin_64_32 I don't know because I can't see the logs.
Darwin_64_64 I don't know because I can't see the logs.

rainers · 2017-10-31T16:09:18Z

Instead of relying on the rather inaccurate callbacks by the compiler (e.g. anything allocated in the precompiled runtime will not be detected), I recently used a proxy GC that just forwards the interface calls to the actual GC. It records allocation information in malloc/calloc/qalloc calls (including the call stack which can then be used to determine the call location). This also avoids modifing the GC interface and implementation.

leandro-lucarella-sociomantic · 2017-11-02T18:54:12Z

Instead of relying on the rather inaccurate callbacks by the compiler (e.g. anything allocated in the precompiled runtime will not be detected), I recently used a proxy GC that just forwards the interface calls to the actual GC. It records allocation information in malloc/calloc/qalloc calls (including the call stack which can then be used to determine the call location). This also avoids modifing the GC interface and implementation.

But can you really know exactly how much you allocated at that level of abstraction? How would you do that, just querying the GC to know what's the real capacity of the reserved block?

rainers · 2017-11-02T22:45:23Z

But can you really know exactly how much you allocated at that level of abstraction? How would you do that, just querying the GC to know what's the real capacity of the reserved block?

I guess you are referring to the difference between requested allocation size, returned size and actually used size, e.g. in array code in rt.lifetime. With intercepting the GC interface, you know the requested and allocated memory precisely (using qalloc), but can only estimate how much of it is in actual use.

The array interface functions yield some other information, but without knowing the exact implementation in rt.lifetime, calculating memory usage is quite some guesswork. For example, how do you interpret setlength? Will it allocate new memory, extend existing memory without allocation, or will it do nothing? How much memory will a new allocation actually use?

mihails-strasuns-sociomantic · 2017-11-03T09:41:23Z

For our purpose it does not matter how much exactly is used - only impact it makes on total memory usage by GC. With current implementation setlength reallocation will be reported as full size allocation and extending in place - only for the extra size.

Uncoincidentally, setlengh case was the primary motivating reason why we had to reimplement this feature because we reset lengths of arrays to 0 all the time to reuse memory.

mihails-strasuns-sociomantic · 2017-11-03T09:43:27Z

I kind of like the idea of using proxy GC instead of compiler support but don't want to lose type information current approach provides. Having to investigate backtrace for each logged allocation is a lot of work.

rainers · 2017-11-04T08:30:55Z

I kind of like the idea of using proxy GC instead of compiler support but don't want to lose type information current approach provides.

The TypeInfo parameter is supposed to give you that information, though it might not always be passed correctly now, e.g. when extending a block. The precise GC fixes that.

mihails-strasuns-sociomantic · 2018-03-21T18:05:01Z

Huh, green

mihails-strasuns-sociomantic · 2018-04-03T13:17:20Z

Ping. Any review/requests for this one? It is good to go from my side in its basic form.

Intermediate step for generating tracegc handlers automatically intended to simplify reviewing by reducing size of each individual diff.

Replaces imprecise/wrong calculation based exclusively on checking TypeInfo - now the data is retrieved directly from the GC and represents real amount of bytes allocated. Fix issue https://issues.dlang.org/show_bug.cgi?id=17294 Fix issue https://issues.dlang.org/show_bug.cgi?id=16280 Fix issue https://issues.dlang.org/show_bug.cgi?id=15481

Ordering by name as last measure means order of lines in the generate trace will be reproducible, allowing for easy test cases.

Moves code snippet which refers to various language features that allocate to actual test to ensure better functionality coverage.

Causes closure allocation despite being marked as scope

mihails-strasuns-sociomantic · 2018-04-10T11:33:52Z

Right now biggest performance problem comes from how crazy expensive calling GC.stats is (because it calculates free/used data on the fly). Looking if it can be changed to pre-calculate stuff.

leandro-lucarella-sociomantic

LGTM despite the few comments.

leandro-lucarella-sociomantic · 2018-04-12T15:36:53Z

src/rt/profilegc.d

 extern (C) void profilegc_setlogfilename(string name)
 {
-    logfilename = name;
+    logfilename = name ~ "\0";


This looks unrelated to this commit.

leandro-lucarella-sociomantic · 2018-04-12T15:39:19Z

src/rt/profilegc.d

    {
        qsort(counts.ptr, counts.length, Result.sizeof, &Result.qsort_cmp);
-
-        FILE* fp = logfilename.length == 0 ? stdout : fopen((logfilename ~ '\0').ptr, "w");


Oh, right, is because of this. I would maybe make the commit message a bit more verbose that this commit is about avoid using language features that indirectly use the GC, not about removing some explicit recursive usage.

leandro-lucarella-sociomantic · 2018-04-12T15:40:02Z

src/rt/profilegc.d

 {
+    if (sz == 0)
+        return;
+


This is just an optimization, right? Also some more verbose commit message could help.

leandro-lucarella-sociomantic · 2018-04-12T15:42:20Z

Right now biggest performance problem comes from how crazy expensive calling GC.stats is (because it calculates free/used data on the fly). Looking if it can be changed to pre-calculate stuff.

Maybe do that in a separate PR to avoid this one to get stalled? Specially because that change has to be done very carefully, as one mistake might trigger wrong results and it's very hard to track/find out (speaking from experience).

mihails-strasuns-sociomantic · 2018-04-16T17:49:30Z

#2164

leandro-lucarella-sociomantic · 2018-04-18T16:46:41Z

ping @DmitryOlshansky @MartinNowak @andralex @rainers @wilzbach

Can you merge or give us some feedback if anything needs to be changed?

leandro-lucarella-sociomantic · 2018-05-04T10:37:08Z

I would like to expand this PR much further to get a lot of other stats that are already kept by the GG (at least when profiling is enabled).

Since keeping some stats might be a bit expensive, in particular timing stats (we need to overcome not having GC.monitor()), which could do up to 5 syscalls per collection, I would also like to add a generic way to select which stats should be collected.

Conceptually, I think Stats/stats() should be renamed to Status/status(), as what you get from them are not really stats, but the current state of the GC (free/used memory). I know is probably not practical to make this name change, but I wanted to mention it anyway :)

New Stats/stats() should expose real stats, things that can be accumulated (and reset), so it should get an API with at least 4 basic operations: get(), reset(), enable() and disable(). The GC currently already get track of these:

__gshared Duration prepTime;
__gshared Duration markTime;
__gshared Duration sweepTime;
__gshared Duration recoverTime;
__gshared Duration maxPauseTime;
__gshared size_t numCollections;
__gshared size_t maxPoolMemory;

__gshared long numMallocs;
__gshared long numFrees;
__gshared long numReallocs;
__gshared long numExtends;
__gshared long numOthers;
__gshared long mallocTime;
__gshared long freeTime;
__gshared long reallocTime;
__gshared long extendTime;
__gshared long otherTime;
__gshared long lockTime;

(and this PR introduces size_t bytesAllocated)

Ideally we should be able to enable, disable and reset individual stats (and reset to a specific value rather than zero might be useful in some cases), so this can be done in a "only pay for what you need" style. But if it's too complicated, I guess we can also have a couple of categories (like "allocation" or "collection" or "counters" or "timing") or levels of detail (like "shallow" to only get the very basic stuff, like total collection time, and "deep" to get more fine-grained stats like "markTime" and other intermediate timings).

It might be useful to discuss this briefly at DConf to reduce the back and forth.

@MartinNowak @andralex @DmitryOlshansky @rainers

andralex

@leandro-lucarella-sociomantic LMK when this is ready - we can pull it now and expand it later.

A simple way to enable/disable collection of specific stats is to write -1 or some invalid value to the values that are not of interest, i.e. if numReallocs is set to -1 the reallocations won't be counted. This is of course of interest mostly for the more costly stats.

andralex · 2018-05-05T15:50:22Z

src/gc/impl/conservative/gc.d

 __gshared long lockTime;

+// thread-local counter
+size_t bytesAllocated;


I think ulong is better here; size_t is for sizes of objects in memory, whereas bytesAllocated is a tally. It could go over 4 GB on a 32-bit system.

I just copy&pasted what the GC is already currently gathering (when some profiling options are included). But yeah, we could review how these are being stored too (there is also the inconsistency of some timing being long and some Duration).

RazvanN7 · 2021-11-02T14:47:58Z

This has been superseded by #2607

mihails-strasuns-sociomantic force-pushed the new-gc-profile branch from e429767 to b1b02d8 Compare June 19, 2017 10:12

mihails-strasuns-sociomantic force-pushed the new-gc-profile branch 8 times, most recently from 68324f6 to 91b7fb9 Compare June 19, 2017 17:04

leandro-lucarella-sociomantic approved these changes Jun 20, 2017

View reviewed changes

dlang-bot added Needs Rebase needs a `git rebase` performed Needs Work and removed Needs Rebase needs a `git rebase` performed labels Jan 1, 2018

mihails-strasuns-sociomantic force-pushed the new-gc-profile branch from 7047357 to 9ec72d9 Compare March 21, 2018 16:29

mihails-strasuns-sociomantic force-pushed the new-gc-profile branch from 9ec72d9 to 46aedf2 Compare March 21, 2018 18:14

rainers mentioned this pull request Apr 2, 2018

Add access to GC runtime profile stats #2155

Closed

mihails-strasuns-sociomantic added 8 commits April 10, 2018 10:09

Extend GC stats with resettable thread-local counter

4a842c1

Replace manual trace printfs with a mixin

59ec078

Intermediate step for generating tracegc handlers automatically intended to simplify reviewing by reducing size of each individual diff.

Ensure consistent trace entry order

7d2b947

Ordering by name as last measure means order of lines in the generate trace will be reproducible, allowing for easy test cases.

Enhance -profile=gc test case

99a37a6

Moves code snippet which refers to various language features that allocate to actual test to ensure better functionality coverage.

Use platform-specific expected logs for profile=gc tests

4f338c3

Remove possibly recursive usage of GC in profilegc

337cdc8

Do not accumulate empty entries

e1a6984

mihails-strasuns-sociomantic force-pushed the new-gc-profile branch from f439ea0 to e1a6984 Compare April 10, 2018 09:09

Do not use delegate in generated trace mixin

4ecbaab

Causes closure allocation despite being marked as scope

leandro-lucarella-sociomantic approved these changes Apr 12, 2018

View reviewed changes

leandro-lucarella-sociomantic mentioned this pull request May 4, 2018

Implement a monitor system for the GC #2052

Closed

andralex reviewed May 5, 2018

View reviewed changes

mihails-strasuns-sociomantic mentioned this pull request May 25, 2018

Modify GC.stats to allow requesting specific stats #2193

Open

daniel-zullo mentioned this pull request Aug 22, 2018

Patch druntime to rework -profile=gc calculations sociomantic-tsunami/dmd-transitional#34

Merged

RazvanN7 mentioned this pull request May 15, 2019

Rework of -profile=gc calculations #2607

Merged

dlang-bot added Needs Rebase needs a `git rebase` performed stalled labels May 27, 2021

RazvanN7 closed this Nov 2, 2021

		@@ -529,7 +529,7 @@ class ConservativeGC : GC
		}

		gcx.log_malloc(p, size);

Uh oh!

Comments

Conversation

mihails-strasuns-sociomantic commented Jun 19, 2017

Uh oh!

mihails-strasuns-sociomantic commented Jun 19, 2017

Uh oh!

leandro-lucarella-sociomantic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leandro-lucarella-sociomantic commented Sep 13, 2017

Uh oh!

mihails-strasuns-sociomantic commented Sep 13, 2017

Uh oh!

leandro-lucarella-sociomantic commented Sep 14, 2017

Uh oh!

ibuclaw commented Sep 15, 2017

Uh oh!

leandro-lucarella-sociomantic commented Sep 15, 2017

Uh oh!

leandro-lucarella-sociomantic commented Sep 15, 2017

Uh oh!

leandro-lucarella-sociomantic commented Sep 15, 2017

Uh oh!

leandro-lucarella-sociomantic commented Sep 15, 2017

Uh oh!

rainers commented Oct 31, 2017

Uh oh!

leandro-lucarella-sociomantic commented Nov 2, 2017

Uh oh!

rainers commented Nov 2, 2017

Uh oh!

mihails-strasuns-sociomantic commented Nov 3, 2017

Uh oh!

mihails-strasuns-sociomantic commented Nov 3, 2017

Uh oh!

rainers commented Nov 4, 2017

Uh oh!

mihails-strasuns-sociomantic commented Mar 21, 2018

Uh oh!

mihails-strasuns-sociomantic commented Apr 3, 2018

Uh oh!

mihails-strasuns-sociomantic commented Apr 10, 2018

Uh oh!

leandro-lucarella-sociomantic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leandro-lucarella-sociomantic commented Apr 12, 2018

Uh oh!

mihails-strasuns-sociomantic commented Apr 16, 2018

Uh oh!

leandro-lucarella-sociomantic commented Apr 18, 2018

Uh oh!

leandro-lucarella-sociomantic commented May 4, 2018

Uh oh!

andralex left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RazvanN7 commented Nov 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects