New implementation of -profile=gc by mihails-strasuns-sociomantic · Pull Request #1806 · dlang/druntime

mihails-strasuns-sociomantic · 2017-04-10T11:20:06Z

Requires merge of stable into master to remove a975720 commit.

This is an attempt to fix https://issues.dlang.org/show_bug.cgi?id=17294 - it is mostly finished but there are few improvements to make which I am not 100% sure about:

to be 100% reliable, trace module needs to reuse existing GC lock instead of introducing own one, because direct calls to GC.malloc can still interfere with stats
it would be really good to trace those malloc calls in log too, but that requires some enhancements on compiler side too, to let runtime know that -profile=gc is enabled. I will probably delay that to separate PR.

mihails-strasuns-sociomantic · 2017-04-10T11:20:31Z

FYI @leandro-lucarella-sociomantic @mathias-lang-sociomantic

nemanja-boric-sociomantic

some 50k ft comments

nemanja-boric-sociomantic · 2017-04-10T11:38:03Z

src/rt/tracegc.d

 extern (C) void* _d_newitemiT(in TypeInfo _ti);

-extern (C) Object _d_newclassTrace(string file, int line, string funcname, const ClassInfo ci)
+private string generatePrintf ( )


nitpick: suggest generatePrintfTrace

Btw, it seems that this change is completely removed in one of the next commits (it's using old style of tracing printf).

nemanja-boric-sociomantic · 2017-04-10T11:39:34Z

src/core/internal/traits.d

    else
        enum bool hasElaborateCopyConstructor = false;
 }
+


extra newline

nemanja-boric-sociomantic · 2017-04-10T11:50:09Z

src/rt/tracegc.d

+    import core.memory : GC;
+
+    static if (is(typeof(ci)))
+        string name = ci.name;


suggest: allocation_type or similar

nemanja-boric-sociomantic · 2017-04-10T11:54:27Z

src/rt/tracegc.d

-    accumulate(file, line, funcname, ti.toString(), ti.next.tsize * n);
-    return _d_arrayappendcTX(ti, px, n);
-}
+    static size_t findParamIndex(string s)


Would be very nice if these helper methods would have ddoc, or at least generateWrapper.

mihails-strasuns-sociomantic · 2017-04-10T12:10:37Z

Note: failures are most likely related to the mentioned issue with lack of single lock with GC implementation - there seems to be a data race on multi-threaded allocation test case.

Comes at cost of losing some custom function arguments but is much more maintainable and readable.

Replaced approximation based of function arguments with a call to GC stats to always get exact amount of bytes allocated. As new implementation is generic and suitable for any wrapped function, manual wrappers have been replaced with string mixin approach.

Ordering by name as last measure means order of lines in the generate trace will be reproducible, allowing for easy test cases.

Moves code snippet which refers to various language features that allocate to actual test to ensure better functionality coverage.

After switching to `GC.stats` implementation of calculating allocated amount, it has become necessary to ensure that no other thread affects those stats in between two calculations points. Fixes issue https://issues.dlang.org/show_bug.cgi?id=17294 Fixes issue https://issues.dlang.org/show_bug.cgi?id=16280 Fixes issue https://issues.dlang.org/show_bug.cgi?id=15481

It is more useful than plain size or plain count

leandro-lucarella-sociomantic

On a more general topic, I don't understand why is all the calculation done in the runtime functions in the first place instead of directly in the GC, where it is much easier to know when stuff is really being allocated and you don't have to care about thread synchronization since you are already inside a global lock, or using the stats function. Doing this will also guarantee that enabling the memory profiling won't affect ordering and have less overhead since it avoid taking a global lock multiple times, and even times when it wasn't even necessary.

leandro-lucarella-sociomantic · 2017-04-20T10:16:15Z

src/core/memory.d

-        size_t freeSize;
+        ulong freeSize;
+        /// number of bytes freed during collections through program lifetime so
+        /// far (will count same memory multiple times if re-used)


What's the point of keeping track of this? It should be mentioned at least in the commit message, but probably also in the changelog and the code docs.

leandro-lucarella-sociomantic · 2017-04-20T10:19:34Z

src/rt/tracegc.d

+                line,
+                funcname.length, funcname.ptr
+                );
+        };


Why a mixin and not just a function with a version statement inside? The optimizer should remove the empty function call I guess...

__FUNCTION__

leandro-lucarella-sociomantic · 2017-04-20T10:24:34Z

src/rt/tracegc.d

-    }
-    else
-        return "";
-}


I agree with @nemanja-boric-sociomantic, is weird that you almost remove everything you did in the previous commit here, but also for me is very distracting that you are reworking all the tracing again mixed with what the commit is supposed to do "Rework how -profile=gc output is calculated". Is hard to review those changes like this.

Funnily, I did exactly to simplify reviewing, minimizing amount of lines removed in the commit that does interesting stuff (as tracing copy-paste bits have been removed in previous commit) :)

leandro-lucarella-sociomantic · 2017-04-20T10:32:37Z

test/profile/myprofilegc.log.exp

            640	             10	int[] profilegc.main.bar src/profilegc.d:73
-            288	              1	immutable(char)[][int] D main src/profilegc.d:34
            240	              4	core.thread.Thread[] D main src/profilegc.d:77
+            288	              1	immutable(char)[][int] D main src/profilegc.d:34


I don't understand, so, bytes allocated is how much is supposed to be allocated each time? How is something like auto x = new int[n]; handled, since the bytes allocated on each call will vary depending on n?

No, bytes allocated should be total amount through the life of the program.

I don't understand, the commit says sort by total size first, but this is putting first 240 and 280 afterwards...

leandro-lucarella-sociomantic · 2017-04-20T10:33:39Z

src/rt/profilegc.d

 import core.exception : onOutOfMemoryError;

-struct Entry { size_t count, size; }
+struct Entry { ulong count, size; }


Why is changing size_t to ulong is necessary or better? Needs clarification in the commit message at least, but maybe even here too, so nobody is tempted to change it back.

Was trying to fix differences in stats between 32-bit and 64-bit builds, stupid idea that will need to be reverted (I have later switched to using two different log files to compare against in test runner).

leandro-lucarella-sociomantic · 2017-04-20T10:38:01Z

src/gc/impl/conservative/gc.d

 class ConservativeGC : GC
 {
+    bool profiling_enabled;
+    shared Mutex profiler_lock;


Why creating this new mutex instead of using the pre-existing GC global lock for everything?

Currenty main GC implementation uses non re-entrant spinlock. Anyway, all lock/sync hacking is more of proof of concept to see if I can make it work that way (have failed so far).

mihails-strasuns-sociomantic · 2017-04-22T12:27:34Z

On a more general topic, I don't understand why is all the calculation done in the runtime functions in the first place instead of directly in the GC

It is simply impossible with current -profile=gc design, as it requires either changing signatures of involved extern(C) functions to also return gc stats diff or somehow passing file/line/function info to those GC functions (== also changing signatures). Both are nasty breaking changes while approach in this PR is a simple enhancement.

Main problem with existing implementation is that I couldn't manage to get determenistic stats as soon as threads are involved, even with dump global lock for every operation (it seems like amount of memory allocated by new Thread itself may vary). It is not important for our Sociomantic use case so I haven't invested too much into trying to nail the issue but will need to be addressed if this approach to be considered for upstream.

leandro-lucarella-sociomantic · 2017-04-26T13:36:33Z

Back to my question about why not doing all this tracking inside the GC instead of the allocating functions in the runtime... Is this because it will be a complete rewrite of this feature, or is there any other fundamental problem with that approach that I'm not seeing?

mihails-strasuns-sociomantic · 2017-04-26T13:37:51Z

Is this because it will be a complete rewrite of this feature, or is there any other fundamental problem with that approach that I'm not seeing?

Only former - I actually do agree that would be fundamentally more correct approach. But is much more changes, including plenty of breaking changes, thus I was reluctant to pursue that path :(

mihails-strasuns-sociomantic · 2017-05-14T10:02:39Z

Closing until we come up with solution that can be used upstream

mihails-strasuns-sociomantic added 2 commits April 10, 2017 13:30

TMP

bdb4939

Introduce GC.Stats.totalCollected

9b41d5b

mihails-strasuns-sociomantic force-pushed the gc-profile-fix branch 5 times, most recently from f254c1f to fb6bfd6 Compare April 10, 2017 11:51

nemanja-boric-sociomantic reviewed Apr 10, 2017

View reviewed changes

mihails-strasuns-sociomantic added 8 commits April 10, 2017 17:52

Use ulong instead of size_t by -profile=gc

6102484

Replace manual trace printfs with a mixin

9ffc947

Comes at cost of losing some custom function arguments but is much more maintainable and readable.

Ensure consistent trace entry order

bde0792

Ordering by name as last measure means order of lines in the generate trace will be reproducible, allowing for easy test cases.

Enhance -profile=gc test case

59a53c4

Moves code snippet which refers to various language features that allocate to actual test to ensure better functionality coverage.

Order -profile=gc entries by total size first

dd28709

It is more useful than plain size or plain count

PoC: try using single mutex in both profiler and gc

8d16e58

mihails-strasuns-sociomantic force-pushed the gc-profile-fix branch from 852da73 to 8d16e58 Compare April 10, 2017 15:52

leandro-lucarella-sociomantic reviewed Apr 20, 2017

View reviewed changes

mihails-strasuns-sociomantic closed this May 14, 2017

mihails-strasuns-sociomantic mentioned this pull request Jun 19, 2017

Rework of -profile=gc calculations #1846

Closed

Uh oh!

Comments

Conversation

mihails-strasuns-sociomantic commented Apr 10, 2017

Uh oh!

mihails-strasuns-sociomantic commented Apr 10, 2017

Uh oh!

nemanja-boric-sociomantic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mihails-strasuns-sociomantic commented Apr 10, 2017

Uh oh!

leandro-lucarella-sociomantic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mihails-strasuns-sociomantic Apr 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mihails-strasuns-sociomantic commented Apr 22, 2017

Uh oh!

leandro-lucarella-sociomantic commented Apr 26, 2017

Uh oh!

mihails-strasuns-sociomantic commented Apr 26, 2017

Uh oh!

mihails-strasuns-sociomantic commented May 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mihails-strasuns-sociomantic Apr 22, 2017 •

edited

Loading