Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.
/ druntime Public archive

Comments

allow GC API profiling with --DRT-gcopt=profile:2#1147

Merged
MartinNowak merged 4 commits intodlang:masterfrom
rainers:gc_benchmalloc
Apr 5, 2015
Merged

allow GC API profiling with --DRT-gcopt=profile:2#1147
MartinNowak merged 4 commits intodlang:masterfrom
rainers:gc_benchmalloc

Conversation

@rainers
Copy link
Member

@rainers rainers commented Feb 1, 2015

Here's a benchmark result with API time:

R bulk             1.075 s,    38 MB,   16 GC  173 ms, Pauses   79 ms <   15 ms API  616 ms
R resize           0.872 s,    12 MB,   11 GC   24 ms, Pauses    2 ms <    0 ms API  133 ms
R string           0.027 s,     5 MB,    2 GC    0 ms, Pauses    0 ms <    0 ms API    1 ms
R conalloc         0.069 s,     5 MB,   83 GC   10 ms, Pauses   10 ms <    0 ms API   12 ms
R conappend        0.014 s,     5 MB,    6 GC    0 ms, Pauses    0 ms <    0 ms API    4 ms
R concpu           0.092 s,     5 MB,   79 GC   34 ms, Pauses   33 ms <    0 ms API   34 ms
R conmsg           0.805 s,     5 MB,  130 GC   44 ms, Pauses   15 ms <    0 ms API  552 ms
R dlist            1.834 s,    12 MB,   52 GC  331 ms, Pauses  227 ms <    4 ms API  692 ms
R huge_single      0.011 s,  1501 MB,    3 GC    0 ms, Pauses    0 ms <    0 ms API    0 ms
R rand_large       3.164 s,    92 MB, 3152 GC  304 ms, Pauses  194 ms <    0 ms API 3118 ms
R rand_small       0.768 s,    12 MB, 2032 GC  368 ms, Pauses  209 ms <    0 ms API  594 ms
R slist            1.749 s,    12 MB,   52 GC  285 ms, Pauses  180 ms <    3 ms API  648 ms
R testgc3          1.312 s,    70 MB,   11 GC  357 ms, Pauses  247 ms <   57 ms API  878 ms
R tree1            0.930 s,    12 MB,   98 GC  442 ms, Pauses  333 ms <    6 ms API  692 ms
R tree2            1.078 s,     1 MB,  216 GC   73 ms, Pauses    9 ms <    0 ms API  333 ms
R words            0.913 s,   292 MB,    9 GC  145 ms, Pauses  144 ms <   77 ms API   68 ms

Example output details for dlist:

    malloc:  9768628 calls, 531 ms
    realloc: 0 calls, 0 ms
    free:    0 calls, 0 ms
    extend:  1100 calls, 0 ms
    other:   826 calls, 0 ms
    lock time: 160 ms
    GC API: 692 ms
GC summary:   12 MB,   52 GC  331 ms, Pauses  227 ms <    4 ms API  692 ms

malloc includes collections, GC.lock time not included in single function time.

Running it disabled has a tiny impact on performance, so I've made it optional with -debug=PROFILE_API.

This also corrects a copy and paste error in runbench which fails to extract the GC summary from gcx.log.

@rainers rainers force-pushed the gc_benchmalloc branch 2 times, most recently from 8d8d494 to b7d237c Compare February 1, 2015 20:08
@rainers rainers mentioned this pull request Feb 3, 2015
@MartinNowak
Copy link
Member

Just what I need right now ;). Can you please rebase it?

@MartinNowak
Copy link
Member

Also wanted to add something like runLocked already.

src/gc/gc.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the only conversion you're saving by all this MonoTimeNative. Not worth the trouble IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The larger performance hit was subtracting two MonoTimes to get a Duration. This caused a noticable slowdown.
The actual problem I was trying to solve was that the time functions do not work without MonoTime.ticksPerSecond being initialized, but the GC is running before that. I did not want to add an initialization check everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that I could get similar performance when using MonoTime.currTime.ticks. Unfortunately this still needs MonoTime._ticksPerSecond to be initialized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced the code duplication with a runtime check for MonoTime._ticksPerSecond.

@rainers
Copy link
Member Author

rainers commented Feb 15, 2015

Can you please rebase it?

Done. I wonder how the option parser could pass the review ;-)

@rainers
Copy link
Member Author

rainers commented Feb 15, 2015

Please note that "profile" is no longer a boolean option, so y/n don't work anymore. This is rather inconsistent if debug=PROFILE_API isn't set, because profile:2 has no meaning without it.

I don't see a noticable performance hit with PROFILE_API enabled, but profile<2 on win32 on my laptop, so maybe we could always build it unconditionally. (I can't test it for Win64, still have to figure out how resonably to build after the -conf= change ;-( )

@MartinNowak
Copy link
Member

(I can't test it for Win64, still have to figure out how resonably to build after the -conf= change ;-( )

What happened there?

@MartinNowak
Copy link
Member

I don't see a noticable performance hit with PROFILE_API enabled, but profile<2 on win32 on my laptop, so maybe we could always build it unconditionally.

I wouldn't be surprised, because the branch is easily predicted.

src/gc/gc.d Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get up to a few percent slowdown (worst in tree1), apparently because the delegate call isn't inlined.

@MartinNowak
Copy link
Member

Can you please split-off the gc.config fixes so that they can go into 2.067?

@MartinNowak MartinNowak added this to the 2.067 milestone Feb 16, 2015
@rainers
Copy link
Member Author

rainers commented Feb 16, 2015

What happened there?

The problem is linking the unittests:

  • for Win32, the main issue was that I called dmd with forward slashes that were passed on to optlink. The path was then interpreted as options. BTW: optlink still uses sc.ini, otherwise it cannot find snn.lib.
  • for Win64 and Win32mscoff, you now have to set the environment variables LINKCMD and LIB ruining your VS setup. In addition to CC and AR I had to override VCDIR and SDKDIR on the command line because they have dumb defaults in the makefile (see Simplify Visual C configuration for -m64/-m32mscoff builds #960) (I guess this was a side effect of my tries to get things back to work, but don't seem to be caused by the -conf= change). It now boils down to this:
set vs=vs12
set dm_make=c:\l\dmc\bin\make

set vcdir=c:\l\%vs%\vc
set cl64=%vcdir%/bin/amd64/cl.exe
set ar64=%vcdir%/bin/amd64/lib.exe
set sdkdir=c:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A
set LIB=m:\s\d\rainers\lib64;%vcdir%\lib\amd64;%sdkdir%\lib\x64
set LINKCMD=%vcdir%\bin\link.exe
set ARGS=DMD=%DMD% MAKE=%dm_make% "CC=\"%cl64%\"" "AR=\"%ar64%\"" VCDIR=%vcdir% "SDKDIR=%sdkdir%"

Maybe it's just my build script, but I expect a lot of people will have trouble, especially if you just want to rebuild druntime/phobos within a distributed archive. Also note that the source folders in the distribution have a different layout than what you usually fetch from github.

@rainers
Copy link
Member Author

rainers commented Feb 16, 2015

Can you please split-off the gc.config fixes so that they can go into 2.067?

Done: #1173

@MartinNowak
Copy link
Member

Also note that the source folders in the distribution have a different layout than what you usually fetch from github.

I know, I know, it's a mess.

@MartinNowak
Copy link
Member

I wonder how the option parser could pass the review ;-)

Don't sweat the small stuff.

@MartinNowak
Copy link
Member

So what do we do about the callback cost?
For me this is actually a blocker, maybe try passing the delegate via alias or if necessary use a mixin?

@rainers
Copy link
Member Author

rainers commented Feb 16, 2015

Anything but using a string mixin fails, but I'd like to avoid that. I'm trying to fix the inliner...

@MartinNowak
Copy link
Member

I'm trying to fix the inliner...

Well, good luck...

I kind of gave up trying to optimize for dmd. But maybe improving the frontend inliner also helps LDC/GDC?

@MartinNowak MartinNowak removed this from the 2.067 milestone Feb 17, 2015
@rainers
Copy link
Member Author

rainers commented Feb 17, 2015

Well, good luck...

I got runLocked inlined, but it's probably a little unsafe. It's not so easy to determine whether a passed delegate might get modified during function execution or not.

I kind of gave up trying to optimize for dmd. But maybe improving the frontend inliner also helps LDC/GDC?

I remember the GDC/LDC developers saying that they avoid the inliner.

@rainers
Copy link
Member Author

rainers commented Feb 17, 2015

I've changed runLocked to a pair of mixin templates doLock/doUnlock that inline perfectly. I don't see a difference to master when compiling without -debug=PROFILE_API.

@MartinNowak MartinNowak added the GC garbage collector label Mar 7, 2015
@MartinNowak
Copy link
Member

Auto-merge toggled on

MartinNowak added a commit that referenced this pull request Apr 5, 2015
allow GC API profiling with --DRT-gcopt=profile:2
@MartinNowak MartinNowak merged commit 0010412 into dlang:master Apr 5, 2015
@MartinNowak
Copy link
Member

thx

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

GC garbage collector

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants