Conversation
|
Thanks for your pull request, @MartinNowak! Bugzilla references
|
|
Jeez, got broken by dlang/dmd#7019. |
src/core/internal/arrayop.d
Outdated
| else version (X86_64) | ||
| version = X86_OR_X64; | ||
| else | ||
| static assert(0, "unimplemented"); |
src/core/internal/arrayop.d
Outdated
| { | ||
| storeUnaligned!vec(val, p); | ||
| } | ||
| else version (GNU) |
There was a problem hiding this comment.
You have version (GNU) inside DigitalMars
src/core/internal/arrayop.d
Outdated
| else static if (is(T == double)) | ||
| return __builtin_ia32_loadupd(p); | ||
| else | ||
| return __builtin_ia32_loaddqu(cast(const char*) p); |
There was a problem hiding this comment.
Oh, just a leftover, I removed the GNU/LDC stuff later on, when I first tested this a few month ago auto-vectorization didn't work too well.
There was a problem hiding this comment.
Yeah, there's a very very limited set of conditions that allow it to generate small and optimal code. In the best case for parameters, you might at least have a branch generated that is ran if all conditions are met.
Otherwise I guess we could generate a generic simd load using:
auto v1 = *cast(float4*) a1.ptr;
Loop as necessary, then do single element operations after that.
There was a problem hiding this comment.
*cast(float4*) a1.ptr
That would assume alignment to float4.alignof.
There was a problem hiding this comment.
That would assume alignment to float4.alignof.
Right. I should have remembered the segfault in dmd I recently encountered. Because that is precisely what would happen. ;-)
| } | ||
| } | ||
| for (; pos < res.length; ++pos) | ||
| mixin(scalarExp!Args ~ ";"); |
There was a problem hiding this comment.
Does this generate something comparable to what the compiler currently generates for fd->isArrayOp functions?
There was a problem hiding this comment.
I checked, and the loop is identical. https://explore.dgnu.org/g/sNV7Bo 👍
(On an unrelated note, I should make gdc more smarter with its template emission strategy)
There was a problem hiding this comment.
Yes, we should figure out which template instances are needed at runtime, affects all compilers and could likely be done in the frontend.
There was a problem hiding this comment.
Two really important pieces of information that could be most beneficial are:
- Was this instantiated in only by CTFE? We never need these.
- Was this instantiated inside a function or module/class scope? With instantiations from a function it should be fine to discard inlined and unreferenced functions in this current compilation.
|
A bit frustrating how much effort was only spent because of dmd's backend. |
| enum bool hasElaborateCopyConstructor = false; | ||
| } | ||
|
|
||
| template Filter(alias pred, TList...) |
There was a problem hiding this comment.
Please add comment saying what Filter does.
| assert(__cmp([c2, c2], [c1, c1]) > 0); | ||
| } | ||
|
|
||
| template _arrayOp(Args...) |
There was a problem hiding this comment.
The purpose of this is mysterious. Is it to just forward to core.internal.arrayop.arrayOp ? If so, why not just use alias? Please add documentation comment.
There was a problem hiding this comment.
It's a template so we don't unnecessarily import internal modules.
|
@ibuclaw is your review satisfied? |
Yeah, but your improvements made it worth the effort! |
Yes, I tested a copy of and it works on all gdc targets. Just the comment on the extra template bloat of entirely unreferenced functions. But that is a compiler concern and not a blocker for this. |
Because @ibuclaw said his review was satisfied.
|
Done @WalterBright
Hardly, neither GDC nor LDC need any of this. The vector code in this PR (which required the various dmd backend fixes for SIMD) basically just does the same as auto-vectorization in GDC/LDC. At least we can stop maintaining a huge amout of hand-written assembly code. |
c729481 to
9d04170
Compare
|
Anything left @WalterBright, @ibuclaw? |
|
I have no problems with this. |
|
Do we have an open issue about template instantiation in general? |
It's not even clear whether it's an actual problem (and how big it is). |
|
|
indeed. But I think we (possibly meaning I and @klickverbot, not dmd) could probably get away with just adding one new field that represents the least restrictive scope that a template was instantiated in. If an instantiation is only ever used inside a function, then I could allow my backend to discard inlined and/or unreferenced templates. However if it were instantiated inside a top-level type or module scope, then everything will need to be emitted. Maybe I'm only thinking of contrived / simple examples though. |
|
@WalterBright - please review. :-) |
nemanja-boric-sociomantic
left a comment
There was a problem hiding this comment.
small comment
src/core/internal/arrayop.d
Outdated
| enum vectorizeable = vectorizeableOps!E([Filter!(not!isType, Args)]) | ||
| && compatibleVecTypes!(E, Filter!(isType, Args)); | ||
| else | ||
| enum vectorizeable = false; |
There was a problem hiding this comment.
indentation
There was a problem hiding this comment.
It's a bug in dlang-community/dfmt#286, fixed manually for now.
- use RPN to encode operand precedence - fixes Issue 15619, and 16680
- properly sort/order values on abscissa
- support for targets specific vector ops (e.g. AVX vs. SSE2)
- dmd got broadcast init with #6248
- seems to have made quite some improvements while that module was written - generated code for scalar loops and for vector loops ends up being almost identical, so it seems more reasonable to leave decisions completely to the auto-vectorizers.
- e.g. replacement of ary[] / scalar with weaker ary[] >> 1
| return op.length == 2 && op[1] == '=' && isBinaryOp(op[0 .. 1]); | ||
| } | ||
|
|
||
| string scalarExp(Args...)() |
There was a problem hiding this comment.
Desperately needs documentation - for example, what is the format of the RPN string? What are the Args? Where does the RPN string come from?
|
|
||
| // Generate mixin expression to perform scalar arrayOp loop expression, assumes | ||
| // `pos` to be the current slice index, `args` to contain operand values, and | ||
| // `res` the target slice. |
There was a problem hiding this comment.
I don't see pos, args, or res in the parameter list or even in the function body. Also, when documenting parameters, please use Ddoc conventions, i.e. a Params: block.
There was a problem hiding this comment.
It's just a small helper function that generates a mixin string for the only public method in this module _arrayOps.
and the
floatArray[] / scalarchange mentioned in the changeloglatency
throughput