Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.
/ druntime Public archive

Comments

New implementation of -profile=gc#1806

Closed
mihails-strasuns-sociomantic wants to merge 10 commits intodlang:masterfrom
mihails-strasuns-sociomantic:gc-profile-fix
Closed

New implementation of -profile=gc#1806
mihails-strasuns-sociomantic wants to merge 10 commits intodlang:masterfrom
mihails-strasuns-sociomantic:gc-profile-fix

Conversation

@mihails-strasuns-sociomantic
Copy link
Contributor

Requires merge of stable into master to remove a975720 commit.

This is an attempt to fix https://issues.dlang.org/show_bug.cgi?id=17294 - it is mostly finished but there are few improvements to make which I am not 100% sure about:

  • to be 100% reliable, trace module needs to reuse existing GC lock instead of introducing own one, because direct calls to GC.malloc can still interfere with stats
  • it would be really good to trace those malloc calls in log too, but that requires some enhancements on compiler side too, to let runtime know that -profile=gc is enabled. I will probably delay that to separate PR.

@mihails-strasuns-sociomantic
Copy link
Contributor Author

@mihails-strasuns-sociomantic mihails-strasuns-sociomantic force-pushed the gc-profile-fix branch 5 times, most recently from f254c1f to fb6bfd6 Compare April 10, 2017 11:51
Copy link
Contributor

@nemanja-boric-sociomantic nemanja-boric-sociomantic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some 50k ft comments

src/rt/tracegc.d Outdated
extern (C) void* _d_newitemiT(in TypeInfo _ti);

extern (C) Object _d_newclassTrace(string file, int line, string funcname, const ClassInfo ci)
private string generatePrintf ( )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: suggest generatePrintfTrace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, it seems that this change is completely removed in one of the next commits (it's using old style of tracing printf).

else
enum bool hasElaborateCopyConstructor = false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra newline

import core.memory : GC;

static if (is(typeof(ci)))
string name = ci.name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest: allocation_type or similar

accumulate(file, line, funcname, ti.toString(), ti.next.tsize * n);
return _d_arrayappendcTX(ti, px, n);
}
static size_t findParamIndex(string s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be very nice if these helper methods would have ddoc, or at least generateWrapper.

@mihails-strasuns-sociomantic
Copy link
Contributor Author

Note: failures are most likely related to the mentioned issue with lack of single lock with GC implementation - there seems to be a data race on multi-threaded allocation test case.

Comes at cost of losing some custom function arguments but is much more
maintainable and readable.
Replaced approximation based of function arguments with a call to GC stats to
always get exact amount of bytes allocated. As new implementation is generic
and suitable for any wrapped function, manual wrappers have been replaced with
string mixin approach.
Ordering by name as last measure means order of lines in the generate trace
will be reproducible, allowing for easy test cases.
Moves code snippet which refers to various language features that allocate to
actual test to ensure better functionality coverage.
After switching to `GC.stats` implementation of calculating allocated amount, it
has become necessary to ensure that no other thread affects those stats in
between two calculations points.

Fixes issue https://issues.dlang.org/show_bug.cgi?id=17294
Fixes issue https://issues.dlang.org/show_bug.cgi?id=16280
Fixes issue https://issues.dlang.org/show_bug.cgi?id=15481
It is more useful than plain size or plain count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a more general topic, I don't understand why is all the calculation done in the runtime functions in the first place instead of directly in the GC, where it is much easier to know when stuff is really being allocated and you don't have to care about thread synchronization since you are already inside a global lock, or using the stats function. Doing this will also guarantee that enabling the memory profiling won't affect ordering and have less overhead since it avoid taking a global lock multiple times, and even times when it wasn't even necessary.

size_t freeSize;
ulong freeSize;
/// number of bytes freed during collections through program lifetime so
/// far (will count same memory multiple times if re-used)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of keeping track of this? It should be mentioned at least in the commit message, but probably also in the changelog and the code docs.

src/rt/tracegc.d Outdated
line,
funcname.length, funcname.ptr
);
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a mixin and not just a function with a version statement inside? The optimizer should remove the empty function call I guess...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__FUNCTION__

src/rt/tracegc.d Outdated
}
else
return "";
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @nemanja-boric-sociomantic, is weird that you almost remove everything you did in the previous commit here, but also for me is very distracting that you are reworking all the tracing again mixed with what the commit is supposed to do "Rework how -profile=gc output is calculated". Is hard to review those changes like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funnily, I did exactly to simplify reviewing, minimizing amount of lines removed in the commit that does interesting stuff (as tracing copy-paste bits have been removed in previous commit) :)

640 10 int[] profilegc.main.bar src/profilegc.d:73
288 1 immutable(char)[][int] D main src/profilegc.d:34
240 4 core.thread.Thread[] D main src/profilegc.d:77
288 1 immutable(char)[][int] D main src/profilegc.d:34

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, so, bytes allocated is how much is supposed to be allocated each time? How is something like auto x = new int[n]; handled, since the bytes allocated on each call will vary depending on n?

Copy link
Contributor Author

@mihails-strasuns-sociomantic mihails-strasuns-sociomantic Apr 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, bytes allocated should be total amount through the life of the program.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, the commit says sort by total size first, but this is putting first 240 and 280 afterwards...

import core.exception : onOutOfMemoryError;

struct Entry { size_t count, size; }
struct Entry { ulong count, size; }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is changing size_t to ulong is necessary or better? Needs clarification in the commit message at least, but maybe even here too, so nobody is tempted to change it back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was trying to fix differences in stats between 32-bit and 64-bit builds, stupid idea that will need to be reverted (I have later switched to using two different log files to compare against in test runner).

class ConservativeGC : GC
{
bool profiling_enabled;
shared Mutex profiler_lock;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why creating this new mutex instead of using the pre-existing GC global lock for everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currenty main GC implementation uses non re-entrant spinlock. Anyway, all lock/sync hacking is more of proof of concept to see if I can make it work that way (have failed so far).

@mihails-strasuns-sociomantic
Copy link
Contributor Author

On a more general topic, I don't understand why is all the calculation done in the runtime functions in the first place instead of directly in the GC

It is simply impossible with current -profile=gc design, as it requires either changing signatures of involved extern(C) functions to also return gc stats diff or somehow passing file/line/function info to those GC functions (== also changing signatures). Both are nasty breaking changes while approach in this PR is a simple enhancement.

Main problem with existing implementation is that I couldn't manage to get determenistic stats as soon as threads are involved, even with dump global lock for every operation (it seems like amount of memory allocated by new Thread itself may vary). It is not important for our Sociomantic use case so I haven't invested too much into trying to nail the issue but will need to be addressed if this approach to be considered for upstream.

@leandro-lucarella-sociomantic
Copy link
Contributor

Back to my question about why not doing all this tracking inside the GC instead of the allocating functions in the runtime... Is this because it will be a complete rewrite of this feature, or is there any other fundamental problem with that approach that I'm not seeing?

@mihails-strasuns-sociomantic
Copy link
Contributor Author

Is this because it will be a complete rewrite of this feature, or is there any other fundamental problem with that approach that I'm not seeing?

Only former - I actually do agree that would be fundamentally more correct approach. But is much more changes, including plenty of breaking changes, thus I was reluctant to pursue that path :(

@mihails-strasuns-sociomantic
Copy link
Contributor Author

Closing until we come up with solution that can be used upstream

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants