Skip to content

IdModel: Loop promotion analysis step 2#1777

Merged
naoyam merged 15 commits intomainfrom
idmodel_step2
Mar 8, 2024
Merged

IdModel: Loop promotion analysis step 2#1777
naoyam merged 15 commits intomainfrom
idmodel_step2

Conversation

@naoyam
Copy link
Collaborator

@naoyam naoyam commented Feb 16, 2024

This is Step 2 of the loop promotion analysis. The main routines are:

@naoyam
Copy link
Collaborator Author

naoyam commented Feb 16, 2024

!build

@naoyam naoyam added the idmodel label Feb 16, 2024
@naoyam
Copy link
Collaborator Author

naoyam commented Feb 23, 2024

!build

@naoyam naoyam marked this pull request as ready for review February 23, 2024 06:17
@naoyam naoyam changed the title WIP: Loop promotion analysis step 2 IdModel: Loop promotion analysis step 2 Feb 23, 2024
Copy link
Collaborator

@zasdfgbnm zasdfgbnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished reading the major part of this PR. Have not read any test yet. Don't see any blocker so far. I remembered you mentioned that after this step, the uninlined ID's promotion is still not correct. Could you remind me why this is the case? For example, if my fusion is

    b                I1
  /   \      -->    /  \
128 | 1/128       128 I1/128

Then after step 1, I will have b->I1, but won't step 2 just give me 128->128 and 1/128 -> I1/128, which is already correct?

Copy link
Collaborator

@zasdfgbnm zasdfgbnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished reading tests. Looks good to me. No blocking issues in this PR. But I have another question:

Consider that I have another fusion:

I0 b       b      I1             I0 I2     I3      I1
|   \     /       |              |   \     /       |
|      b          |              |    I2*I3        |
|   /     \       |     ---->    |   /     \       |
| b{128} b{1/128} |              | 128  I2*I3/128  |
| /         \     |              | /         \     |
I0*128   (1/128) * I1            I0*128 (I2*I3/128) * I1

IIUC, we will promote (1/128) * I1 -> (I2*I3/128) * I1. But for this case, does it make sense to just not promote (1/128) * I1?

@naoyam
Copy link
Collaborator Author

naoyam commented Mar 7, 2024

Finished reading the major part of this PR. Have not read any test yet. Don't see any blocker so far. I remembered you mentioned that after this step, the uninlined ID's promotion is still not correct. Could you remind me why this is the case? For example, if my fusion is

    b                I1
  /   \      -->    /  \
128 | 1/128       128 I1/128

Then after step 1, I will have b->I1, but won't step 2 just give me 128->128 and 1/128 -> I1/128, which is already correct?

Does this answer your question?

https://docs.google.com/document/d/1y1ihF9R2kLfoMTJrqVvPVCsJCeMCNchz8WcSrqHAR_Q/edit#bookmark=id.jd2i8budl1do

@naoyam
Copy link
Collaborator Author

naoyam commented Mar 7, 2024

Finished reading tests. Looks good to me. No blocking issues in this PR. But I have another question:

Consider that I have another fusion:

I0 b       b      I1             I0 I2     I3      I1
|   \     /       |              |   \     /       |
|      b          |              |    I2*I3        |
|   /     \       |     ---->    |   /     \       |
| b{128} b{1/128} |              | 128  I2*I3/128  |
| /         \     |              | /         \     |
I0*128   (1/128) * I1            I0*128 (I2*I3/128) * I1

IIUC, we will promote (1/128) * I1 -> (I2*I3/128) * I1. But for this case, does it make sense to just not promote (1/128) * I1?

This is a very good point. I assume the CA position is 1. You're right that then the inner domain of the broadcast tensor would be promoted, but that shouldn't be necessary.

I think that the promotion is necessary only if a broadcast domain is merged with a non-broadcast domain. If it's not merged or just merged with another broadcast domain, it would remain to be a broadcast domain, so nothing to index. Even when it's merged with a non-broadcast domain but if the merge is outside of loop mapped domains, like the case above, that doesn't seem to require promotion.

For example:

   b0
 /    \
b1   b2

   i3
 /    \
i4   i5

And assume they are inlined at position 1. b2 will be promoted to i5, but that isn't necessary. In this case since b2 is still a broadcast domain, it'd be easy to ignore the promotion. We could just ignore promotion of broadcast domains. That's not the case with the example above as the promoted domain is no longer a broadcast domain, so we should probably need to do something more. I think we can avoid the promotion of such domains. I'll create an issue.

@naoyam
Copy link
Collaborator Author

naoyam commented Mar 7, 2024

!build

1 similar comment
@naoyam
Copy link
Collaborator Author

naoyam commented Mar 7, 2024

!build

@zasdfgbnm
Copy link
Collaborator

Does this answer your question?

https://docs.google.com/document/d/1y1ihF9R2kLfoMTJrqVvPVCsJCeMCNchz8WcSrqHAR_Q/edit#bookmark=id.jd2i8budl1do

I need to read the step 3 PR to fully understand this piece of doc, but yes, I think this answers exactly what I asked.

Another question: does this only happen when there are new broadcasts in the fusion? If all the broadcasts are already in fusion input, will the result of step 2 be just correct?

I am wondering, assume that one day we decide to fix #1759, and assume that we decide to kill forwarding by inserting dummy broadcasting IDs all the way to inputs. Then as a side thing, will we just remove step 3-5?

@naoyam
Copy link
Collaborator Author

naoyam commented Mar 7, 2024

Does this answer your question?
https://docs.google.com/document/d/1y1ihF9R2kLfoMTJrqVvPVCsJCeMCNchz8WcSrqHAR_Q/edit#bookmark=id.jd2i8budl1do

I need to read the step 3 PR to fully understand this piece of doc, but yes, I think this answers exactly what I asked.

Another question: does this only happen when there are new broadcasts in the fusion? If all the broadcasts are already in fusion input, will the result of step 2 be just correct?

This is I'm not sure yet.

I am wondering, assume that one day we decide to fix #1759, and assume that we decide to kill forwarding by inserting dummy broadcasting IDs all the way to inputs. Then as a side thing, will we just remove step 3-5?

Maybe? Honestly I'm not sure. To be more precise, step 3 will still be necessary, which just projects back to loop groups. Steps 4 and 5 repeat steps 2 and 3, and they may not be necessary.

@naoyam
Copy link
Collaborator Author

naoyam commented Mar 7, 2024

!build

@naoyam naoyam merged commit 3516c7a into main Mar 8, 2024
@naoyam naoyam deleted the idmodel_step2 branch March 8, 2024 00:14
naoyam added a commit that referenced this pull request Mar 13, 2024
Step 3 of the loop promotion analysis. 

Stacked on top of #1777.

After this, there'll be PRs for steps 4 and 5. They would be relatively
smaller PRs as they are mostly just repeating steps 2 and 3.
naoyam added a commit that referenced this pull request May 10, 2024
This is the final step of the loop promotion analysis. The promotion map
is almost completed at Step 3, but some partially inlined domains need
one more propagation, which is done by Step 4 and Step 5. Step 5 is
mostly just a repeat of Step 3.

This basically concludes the loop promotion analysis, although there are
a couple of issues that were found while working on indexing (#2218).
Those issues will be addressed as further follow-up PRs.

- Step 1: #1650 
- Step 2: #1777 
- Step 3: #1830 
- Step 4: #2003
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants