Optimize maxAlignment compile time#6799
Conversation
|
Thanks for your pull request and interest in making D better, @TurkeyMan! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub fetch digger
dub run digger -- build "master + phobos#6799" |
| enum maxAlignment = max(staticMap!(.maxAlignment, U)); | ||
| enum a = maxAlignment!(U[0 .. ($+1)/2]); | ||
| enum b = maxAlignment!(U[($+1)/2 .. $]); | ||
| enum maxAlignment = a > b ? a : b; |
There was a problem hiding this comment.
It may well be worth unrolling the template recursion a bit too to limit the number of template instantiations.
There was a problem hiding this comment.
See how I did a logN expansion rather than a linear recursive expansion? I think that probably does the job, no?
There was a problem hiding this comment.
Also, compared to instantiating max, this is practically nothing.
There was a problem hiding this comment.
...but I was also pondering whether I should use CTFE rather than a template recursion.
I don't know enough about the cost/benefit tradeoff on that front. At what point does CTFE become profitable?
There was a problem hiding this comment.
Also, truth is that U.length almost always == 2
There was a problem hiding this comment.
In which case just do a static if (U.length == 2) branch ;)
There was a problem hiding this comment.
CTFE is better in almost all cases, except maybe for a N ∈ {0 ... 5} :)
2a94e34 to
655a818
Compare
edi33416
left a comment
There was a problem hiding this comment.
Nice job!
I think rewriting this using static foreach would be even better
|
I don't know the relative costs... using runtime foreach structures would generate a considerably larger AST, but there's a high cost to template instantiations that I don't know how to trade against AST volume and CTFE overhead. |
|
|
What about |
No difference at all. That's why we converted a lot of CTFE foreach to |
|
I just did a whole heap of tests. CTFE |
|
Sure see e.g. #5729 Maybe "not measurable in a build of Phobos" would have been the better wording.
The real perf difference is negligible and |
|
Sorry I wanted to link to #5989 (things are a bit hard to do with your phone only) |
Yes, I compared the sensible implementation of each style. Just that To measure timing differences I needed long loops, but I don't know how to measure overhead for short loops... we need a best-practice doc. |
Can you drop a gist of the tests you wrote? This seems to be a very interesting behaviour, as I would have expected that both Since this seems to not be the case, I'd say it's an issue (bug?).
A best practice doc would be a good addition. There is also this idioms page |
Best-practice for Phobos is to use |
Erm... phobos can be extremely slow if you're not careful. And it doesn't help that everything in phobos seems to pull in everything else in phobos for good measure. But anyway, is this fine? I don't think a superior implementation is possible without |
|
So, is this bad? |
|
I'm still in favour of the |
|
How would you write this with static foreach? You can't accumulate values from one loop iteration into the next... |
It will not. template-recursion make use of caching. It should also be noted that |
I couldn't figure out how the new I'd replace the |
|
I had to make the helper so I could use |
I just saw @UplinkCoder 's comment, as it was posted while I was typing the code suggestion. |
|
See, that's the only way I can think to write it, and I'm pretty sure it's worse than my PR. |
|
Incidentally, operator |
I thought non- |
Because
maxis gigantic!