IdModel: Step 5 of the loop promotion analysis#2220
Conversation
4eecddf to
aa2c9aa
Compare
|
!build |
| bool build_graphs, | ||
| bool allow_self_mapping) { | ||
| bool allow_self_mapping) | ||
| : allow_self_mapping_(allow_self_mapping) { |
There was a problem hiding this comment.
I think this was just accidentally forgotten.
|
|
||
| s5_loop_graph = idGraph(IdMappingMode::LOOP); | ||
| s5_loop_promotion_map = | ||
| updateValGroupIdMap(s5_loop_promotion_map, s5_loop_graph); |
There was a problem hiding this comment.
This whole function is mostly just a copy of IdModel::buildLoopPromotionMap but modified to save intermediate results for validations. I'll cleanup this part of the code after this PR.
|
Should this be added back? |
Thanks. Added back. |
| // LOOP mode is important to resolve inlined broadcassts. If we have something | ||
| // like: consumer[i0o, threadIdx.x{i0i}] = producer[i0o, | ||
| // threadIdx.y{i0i}](computeAt = 1) which can easily happen when using shared | ||
| // memory. Loop is actually defined for all iteration domains, and resembles | ||
| // groups of iter domains that are effectively inlined with each other. | ||
| // Therefore iter domain's that are a common dependency of inlined leaf domains | ||
| // may be loop mapped together. |
There was a problem hiding this comment.
I would be very interested in seeing when the loop promotion we have is strictly required, and when it is just one way to do things. For example, if I have a fusion
T0[1, 4]
T1[3, 4] = T0[1, 4]
T1->reorder({{0, 1}});
T1->merge(0);
T1->split(2, inner=false);
propagate;
T0->inlineAt(1);
Then for this specific case, loop promotion is not necessary, because (i, j) in the leaf domain of T1 is ((i*6+j)%3, (i*6+j)/3) in T1's root domain. For T0, without loop promotion, (i, k) in the leaf domain is (0, i*2+k) in T0's root domain. According to Theorem 2.15.1 in https://github.com/NVIDIA/Fuser/blob/main/doc/math/integer-division.md, (i*6+j)/3 = i*2+j/3, which has very similar mathematical form as i*2+k. And this mathematical similarity tells us that each i takes the same slice of T0 and T1, therefore the program is valid even without loop promotion.
There was a problem hiding this comment.
That's very interesting. 🤯
No logic change. Mostly mechanical cleanup. Replaced the test-specific IdModel subclass with a callback interface. The callback interface allows to save all necessary temporary results for validation. No more duplication of `buildLoopPromotionMap`. (Related comment: #2220 (comment)) To introduce the callback interface, moved the loop promotion part out of `IdModel` to its own builder class.
This is the final step of the loop promotion analysis. The promotion map is almost completed at Step 3, but some partially inlined domains need one more propagation, which is done by Step 4 and Step 5. Step 5 is mostly just a repeat of Step 3.
This basically concludes the loop promotion analysis, although there are a couple of issues that were found while working on indexing (#2218). Those issues will be addressed as further follow-up PRs.