Optimize maxAlignment compile time by TurkeyMan · Pull Request #6799 · dlang/phobos

TurkeyMan · 2018-12-10T07:21:05Z

Because max is gigantic!

dlang-bot · 2018-12-10T07:21:06Z

Thanks for your pull request and interest in making D better, @TurkeyMan! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + phobos#6799"

thewilsonator · 2018-12-10T07:24:25Z

std/traits.d

-        enum maxAlignment = max(staticMap!(.maxAlignment, U));
+        enum a = maxAlignment!(U[0 .. ($+1)/2]);
+        enum b = maxAlignment!(U[($+1)/2 .. $]);
+        enum maxAlignment = a > b ? a : b;


It may well be worth unrolling the template recursion a bit too to limit the number of template instantiations.

See how I did a logN expansion rather than a linear recursive expansion? I think that probably does the job, no?

Also, compared to instantiating max, this is practically nothing.

...but I was also pondering whether I should use CTFE rather than a template recursion.
I don't know enough about the cost/benefit tradeoff on that front. At what point does CTFE become profitable?

Probably always, @UplinkCoder ?

Also, truth is that U.length almost always == 2

In which case just do a static if (U.length == 2) branch ;)

CTFE is better in almost all cases, except maybe for a N ∈ {0 ... 5} :)

edi33416

Nice job!
I think rewriting this using static foreach would be even better

TurkeyMan · 2018-12-11T05:10:48Z

I don't know the relative costs... using runtime foreach structures would generate a considerably larger AST, but there's a high cost to template instantiations that I don't know how to trade against AST volume and CTFE overhead.

wilzbach · 2018-12-11T06:24:24Z

I don't know the relative costs... using runtime foreach structures would generate a considerably larger AST, but there's a high cost to template instantiations that I don't know how to trade against AST volume and CTFE overhead.

static foreach is faster. Someone made tests when a few traits were rewritten to use static foreach. I can't recall the exact numbers though :/

TurkeyMan · 2018-12-11T06:30:44Z

What about static foreach compared to CTFE foreach?

wilzbach · 2018-12-11T06:39:10Z

What about static foreach compared to CTFE foreach?

No difference at all. That's why we converted a lot of CTFE foreach to static foreach (clearer intent).

TurkeyMan · 2018-12-11T07:40:45Z

I just did a whole heap of tests. CTFE foreach substantially beat static foreach every time.
Can you produce evidence that they cost the same? It would seem that converting to static foreach might not be a good idea...

wilzbach · 2018-12-11T07:49:14Z

Sure see e.g. #5729

Maybe "not measurable in a build of Phobos" would have been the better wording.

It would seem that converting to static foreach might not be a good idea...

The real perf difference is negligible and static clearly shows intent. Also if you use static foreach you don't need to create the extra function. Did your tests consider this?

wilzbach · 2018-12-11T07:50:49Z

Sorry I wanted to link to #5989 (things are a bit hard to do with your phone only)

TurkeyMan · 2018-12-12T08:23:05Z

Also if you use static foreach you don't need to create the extra function. Did your tests consider this?

Yes, I compared the sensible implementation of each style. Just that static foreach was substantially slower in all my test cases.
I'm not sure how the function is a major problem?

To measure timing differences I needed long loops, but I don't know how to measure overhead for short loops... we need a best-practice doc.

edi33416 · 2018-12-12T08:36:42Z

Also if you use static foreach you don't need to create the extra function. Did your tests consider this?

Yes, I compared the sensible implementation of each style. Just that static foreach was substantially slower in all my test cases.
I'm not sure how the function is a major problem?

Can you drop a gist of the tests you wrote? This seems to be a very interesting behaviour, as I would have expected that both static foreach and ctfe foreach would have (aprox) the same performance, with the observation that static foreach states the intent more clearly.

Since this seems to not be the case, I'd say it's an issue (bug?).

To measure timing differences I needed long loops, but I don't know how to measure overhead for short loops... we need a best-practice doc.

A best practice doc would be a good addition. There is also this idioms page

wilzbach · 2018-12-12T10:41:42Z

To measure timing differences I needed long loops, but I don't know how to measure overhead for short loops... we need a best-practice doc.

Best-practice for Phobos is to use static foreach as there's no noticable difference in the real world and it reads better.

TurkeyMan · 2018-12-13T07:37:57Z

there's no noticable difference in the real world

Erm... phobos can be extremely slow if you're not careful. And it doesn't help that everything in phobos seems to pull in everything else in phobos for good measure.

But anyway, is this fine? I don't think a superior implementation is possible without ... expression.

TurkeyMan · 2018-12-18T04:51:24Z

So, is this bad?

edi33416 · 2018-12-18T12:10:56Z

I'm still in favour of the static foreach instead of the template recursion, as foreach will be faster

TurkeyMan · 2018-12-19T06:18:55Z

How would you write this with static foreach? You can't accumulate values from one loop iteration into the next...

UplinkCoder · 2018-12-19T15:27:11Z

I'm still in favour of the static foreach instead of the template recursion, as foreach will be faster

It will not. template-recursion make use of caching.

It should also be noted that static foreach does the equivalent of template-recursion just without the caching support.
If you can do it CTFE foreach is the the more scalable solution!

edi33416 · 2018-12-19T15:29:04Z

How would you write this with static foreach? You can't accumulate values from one loop iteration into the next...

I couldn't figure out how the new suggestion feature works, so I'll just write it as a comment

I'd replace the else static if (U.length == 2) { ... } else { ... } with

else static if (U.length == 1)
    enum maxAlignment = U[0].alignof;
else
{
    static size_t maxAlignmentHelper(U...)(size_t m)
    {
        static foreach(T; U[0 .. $])
        {
            m = m > T.alignof ? m : T.alignof;
        }
        return m;
    }
    enum maxAlignment = maxAlignmentHelper!(U[1 .. $])(U[0].alignof);
}

edi33416 · 2018-12-19T15:30:03Z

I had to make the helper so I could use m as an "accumulator"

edi33416 · 2018-12-19T15:33:00Z

I had to make the helper so I could use m as an "accumulator"

I just saw @UplinkCoder 's comment, as it was posted while I was typing the code suggestion.
To make it CTFE foreach, just remove the static from the static foreach inside the helper.

TurkeyMan · 2018-12-19T18:30:34Z

See, that's the only way I can think to write it, and I'm pretty sure it's worse than my PR.
My PR will gain from caching of leaf and near-leaf instantiations. The binary-subdivision means on long lists, the leaf instantiations are likely to be common, so you are only likely to pay a new-instantiation cost for the nodes near the root of the expansion tree, which is a logN cost. static foreach is strictly O(N) on every instantiation.

TurkeyMan · 2018-12-19T18:31:35Z

Incidentally, operator ... from C++ would make D compile a lot faster, because it would replace staticMap, and functions like this could be fully CTFE resolved.

TurkeyMan · 2018-12-19T18:57:39Z

just remove the static from the static foreach inside the helper

I thought non-static foreach on type tuples was deprecated and replaced by static foreach exclusively now?
I've received errors insisting that I use static foreach in old code... but I don't remember the exact context.

TurkeyMan requested review from PetarKirov and dnadlinger as code owners December 10, 2018 07:21

thewilsonator reviewed Dec 10, 2018

View reviewed changes

simplify maxAlignment (because max is gigantic!)

655a818

TurkeyMan force-pushed the max_alignment branch from 2a94e34 to 655a818 Compare December 10, 2018 07:44

thewilsonator approved these changes Dec 10, 2018

View reviewed changes

edi33416 reviewed Dec 11, 2018

View reviewed changes

n8sh added the Merge:72h no objection -> merge The PR will be merged if there are no objections raised. label Dec 23, 2018

n8sh added Merge:auto-merge and removed Merge:72h no objection -> merge The PR will be merged if there are no objections raised. labels Dec 29, 2018

dlang-bot merged commit 93f1b31 into dlang:master Dec 29, 2018

PetarKirov changed the title ~~simplify maxAlignment~~ Optimize maxAlignment compile time Dec 30, 2018

TurkeyMan deleted the max_alignment branch January 1, 2019 22:04

Uh oh!

Conversation

TurkeyMan commented Dec 10, 2018

Uh oh!

dlang-bot commented Dec 10, 2018

Bugzilla references

Testing this PR locally

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edi33416 left a comment

Choose a reason for hiding this comment

Uh oh!

TurkeyMan commented Dec 11, 2018

Uh oh!

wilzbach commented Dec 11, 2018

Uh oh!

TurkeyMan commented Dec 11, 2018

Uh oh!

wilzbach commented Dec 11, 2018

Uh oh!

TurkeyMan commented Dec 11, 2018

Uh oh!

wilzbach commented Dec 11, 2018

Uh oh!

wilzbach commented Dec 11, 2018

Uh oh!

TurkeyMan commented Dec 12, 2018

Uh oh!

edi33416 commented Dec 12, 2018

Uh oh!

wilzbach commented Dec 12, 2018

Uh oh!

TurkeyMan commented Dec 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TurkeyMan commented Dec 18, 2018

Uh oh!

edi33416 commented Dec 18, 2018

Uh oh!

TurkeyMan commented Dec 19, 2018

Uh oh!

UplinkCoder commented Dec 19, 2018

Uh oh!

edi33416 commented Dec 19, 2018

Uh oh!

edi33416 commented Dec 19, 2018

Uh oh!

edi33416 commented Dec 19, 2018

Uh oh!

TurkeyMan commented Dec 19, 2018

Uh oh!

TurkeyMan commented Dec 19, 2018

Uh oh!

TurkeyMan commented Dec 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

TurkeyMan commented Dec 13, 2018 •

edited

Loading