Skip to content

Optimize maxAlignment compile time#6799

Merged
dlang-bot merged 1 commit intodlang:masterfrom
TurkeyMan:max_alignment
Dec 29, 2018
Merged

Optimize maxAlignment compile time#6799
dlang-bot merged 1 commit intodlang:masterfrom
TurkeyMan:max_alignment

Conversation

@TurkeyMan
Copy link
Contributor

Because max is gigantic!

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @TurkeyMan! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + phobos#6799"

enum maxAlignment = max(staticMap!(.maxAlignment, U));
enum a = maxAlignment!(U[0 .. ($+1)/2]);
enum b = maxAlignment!(U[($+1)/2 .. $]);
enum maxAlignment = a > b ? a : b;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may well be worth unrolling the template recursion a bit too to limit the number of template instantiations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See how I did a logN expansion rather than a linear recursive expansion? I think that probably does the job, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, compared to instantiating max, this is practically nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...but I was also pondering whether I should use CTFE rather than a template recursion.
I don't know enough about the cost/benefit tradeoff on that front. At what point does CTFE become profitable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably always, @UplinkCoder ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, truth is that U.length almost always == 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case just do a static if (U.length == 2) branch ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CTFE is better in almost all cases, except maybe for a N ∈ {0 ... 5} :)

Copy link
Contributor

@edi33416 edi33416 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job!
I think rewriting this using static foreach would be even better

@TurkeyMan
Copy link
Contributor Author

I don't know the relative costs... using runtime foreach structures would generate a considerably larger AST, but there's a high cost to template instantiations that I don't know how to trade against AST volume and CTFE overhead.

@wilzbach
Copy link
Contributor

I don't know the relative costs... using runtime foreach structures would generate a considerably larger AST, but there's a high cost to template instantiations that I don't know how to trade against AST volume and CTFE overhead.

static foreach is faster. Someone made tests when a few traits were rewritten to use static foreach. I can't recall the exact numbers though :/

@TurkeyMan
Copy link
Contributor Author

What about static foreach compared to CTFE foreach?

@wilzbach
Copy link
Contributor

What about static foreach compared to CTFE foreach?

No difference at all. That's why we converted a lot of CTFE foreach to static foreach (clearer intent).

@TurkeyMan
Copy link
Contributor Author

I just did a whole heap of tests. CTFE foreach substantially beat static foreach every time.
Can you produce evidence that they cost the same? It would seem that converting to static foreach might not be a good idea...

@wilzbach
Copy link
Contributor

Sure see e.g. #5729

Maybe "not measurable in a build of Phobos" would have been the better wording.

It would seem that converting to static foreach might not be a good idea...

The real perf difference is negligible and static clearly shows intent. Also if you use static foreach you don't need to create the extra function. Did your tests consider this?

@wilzbach
Copy link
Contributor

Sorry I wanted to link to #5989 (things are a bit hard to do with your phone only)

@TurkeyMan
Copy link
Contributor Author

Also if you use static foreach you don't need to create the extra function. Did your tests consider this?

Yes, I compared the sensible implementation of each style. Just that static foreach was substantially slower in all my test cases.
I'm not sure how the function is a major problem?

To measure timing differences I needed long loops, but I don't know how to measure overhead for short loops... we need a best-practice doc.

@edi33416
Copy link
Contributor

Also if you use static foreach you don't need to create the extra function. Did your tests consider this?

Yes, I compared the sensible implementation of each style. Just that static foreach was substantially slower in all my test cases.
I'm not sure how the function is a major problem?

Can you drop a gist of the tests you wrote? This seems to be a very interesting behaviour, as I would have expected that both static foreach and ctfe foreach would have (aprox) the same performance, with the observation that static foreach states the intent more clearly.

Since this seems to not be the case, I'd say it's an issue (bug?).

To measure timing differences I needed long loops, but I don't know how to measure overhead for short loops... we need a best-practice doc.

A best practice doc would be a good addition. There is also this idioms page

@wilzbach
Copy link
Contributor

To measure timing differences I needed long loops, but I don't know how to measure overhead for short loops... we need a best-practice doc.

Best-practice for Phobos is to use static foreach as there's no noticable difference in the real world and it reads better.

@TurkeyMan
Copy link
Contributor Author

TurkeyMan commented Dec 13, 2018

there's no noticable difference in the real world

Erm... phobos can be extremely slow if you're not careful. And it doesn't help that everything in phobos seems to pull in everything else in phobos for good measure.

But anyway, is this fine? I don't think a superior implementation is possible without ... expression.

@TurkeyMan
Copy link
Contributor Author

So, is this bad?

@edi33416
Copy link
Contributor

I'm still in favour of the static foreach instead of the template recursion, as foreach will be faster

@TurkeyMan
Copy link
Contributor Author

How would you write this with static foreach? You can't accumulate values from one loop iteration into the next...

@UplinkCoder
Copy link
Member

I'm still in favour of the static foreach instead of the template recursion, as foreach will be faster

It will not. template-recursion make use of caching.

It should also be noted that static foreach does the equivalent of template-recursion just without the caching support.
If you can do it CTFE foreach is the the more scalable solution!

@edi33416
Copy link
Contributor

How would you write this with static foreach? You can't accumulate values from one loop iteration into the next...

I couldn't figure out how the new suggestion feature works, so I'll just write it as a comment

I'd replace the else static if (U.length == 2) { ... } else { ... } with

else static if (U.length == 1)
    enum maxAlignment = U[0].alignof;
else
{
    static size_t maxAlignmentHelper(U...)(size_t m)
    {
        static foreach(T; U[0 .. $])
        {
            m = m > T.alignof ? m : T.alignof;
        }
        return m;
    }
    enum maxAlignment = maxAlignmentHelper!(U[1 .. $])(U[0].alignof);
}

@edi33416
Copy link
Contributor

I had to make the helper so I could use m as an "accumulator"

@edi33416
Copy link
Contributor

I had to make the helper so I could use m as an "accumulator"

I just saw @UplinkCoder 's comment, as it was posted while I was typing the code suggestion.
To make it CTFE foreach, just remove the static from the static foreach inside the helper.

@TurkeyMan
Copy link
Contributor Author

See, that's the only way I can think to write it, and I'm pretty sure it's worse than my PR.
My PR will gain from caching of leaf and near-leaf instantiations. The binary-subdivision means on long lists, the leaf instantiations are likely to be common, so you are only likely to pay a new-instantiation cost for the nodes near the root of the expansion tree, which is a logN cost. static foreach is strictly O(N) on every instantiation.

@TurkeyMan
Copy link
Contributor Author

Incidentally, operator ... from C++ would make D compile a lot faster, because it would replace staticMap, and functions like this could be fully CTFE resolved.

@TurkeyMan
Copy link
Contributor Author

just remove the static from the static foreach inside the helper

I thought non-static foreach on type tuples was deprecated and replaced by static foreach exclusively now?
I've received errors insisting that I use static foreach in old code... but I don't remember the exact context.

@n8sh n8sh added the Merge:72h no objection -> merge The PR will be merged if there are no objections raised. label Dec 23, 2018
@n8sh n8sh added Merge:auto-merge and removed Merge:72h no objection -> merge The PR will be merged if there are no objections raised. labels Dec 29, 2018
@dlang-bot dlang-bot merged commit 93f1b31 into dlang:master Dec 29, 2018
@PetarKirov PetarKirov changed the title simplify maxAlignment Optimize maxAlignment compile time Dec 30, 2018
@TurkeyMan TurkeyMan deleted the max_alignment branch January 1, 2019 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants