Conversation
|
!build |
|
!build |
zasdfgbnm
left a comment
There was a problem hiding this comment.
Finished reading the major part of this PR. Have not read any test yet. Don't see any blocker so far. I remembered you mentioned that after this step, the uninlined ID's promotion is still not correct. Could you remind me why this is the case? For example, if my fusion is
b I1
/ \ --> / \
128 | 1/128 128 I1/128
Then after step 1, I will have b->I1, but won't step 2 just give me 128->128 and 1/128 -> I1/128, which is already correct?
zasdfgbnm
left a comment
There was a problem hiding this comment.
Finished reading tests. Looks good to me. No blocking issues in this PR. But I have another question:
Consider that I have another fusion:
I0 b b I1 I0 I2 I3 I1
| \ / | | \ / |
| b | | I2*I3 |
| / \ | ----> | / \ |
| b{128} b{1/128} | | 128 I2*I3/128 |
| / \ | | / \ |
I0*128 (1/128) * I1 I0*128 (I2*I3/128) * I1
IIUC, we will promote (1/128) * I1 -> (I2*I3/128) * I1. But for this case, does it make sense to just not promote (1/128) * I1?
Does this answer your question? |
This is a very good point. I assume the CA position is 1. You're right that then the inner domain of the broadcast tensor would be promoted, but that shouldn't be necessary. I think that the promotion is necessary only if a broadcast domain is merged with a non-broadcast domain. If it's not merged or just merged with another broadcast domain, it would remain to be a broadcast domain, so nothing to index. Even when it's merged with a non-broadcast domain but if the merge is outside of loop mapped domains, like the case above, that doesn't seem to require promotion. For example: And assume they are inlined at position 1. |
|
!build |
1 similar comment
|
!build |
I need to read the step 3 PR to fully understand this piece of doc, but yes, I think this answers exactly what I asked. Another question: does this only happen when there are new broadcasts in the fusion? If all the broadcasts are already in fusion input, will the result of step 2 be just correct? I am wondering, assume that one day we decide to fix #1759, and assume that we decide to kill forwarding by inserting dummy broadcasting IDs all the way to inputs. Then as a side thing, will we just remove step 3-5? |
This is I'm not sure yet.
Maybe? Honestly I'm not sure. To be more precise, step 3 will still be necessary, which just projects back to loop groups. Steps 4 and 5 repeat steps 2 and 3, and they may not be necessary. |
|
!build |
Step 3 of the loop promotion analysis. Stacked on top of #1777. After this, there'll be PRs for steps 4 and 5. They would be relatively smaller PRs as they are mostly just repeating steps 2 and 3.
This is the final step of the loop promotion analysis. The promotion map is almost completed at Step 3, but some partially inlined domains need one more propagation, which is done by Step 4 and Step 5. Step 5 is mostly just a repeat of Step 3. This basically concludes the loop promotion analysis, although there are a couple of issues that were found while working on indexing (#2218). Those issues will be addressed as further follow-up PRs. - Step 1: #1650 - Step 2: #1777 - Step 3: #1830 - Step 4: #2003
This is Step 2 of the loop promotion analysis. The main routines are:
propagatePromotionsInIELGraphaddReplayAs