Conversation
|
local test failures are encountered here: I'm suspecting most to be test related failures, certain tests are likely expecting output allocation to be in certain way that's now violated from this change. I'll try to clean them up a bit. |
| EXPECT_THAT(getAllocationDomainPermutation(tv3), ElementsAre(3, 1, 0, 2)); | ||
| } | ||
| // TODO: open an issue. seems to hit an assert in IdModel(&fusion) | ||
| // { |
There was a problem hiding this comment.
Note to myself. verify this with ToT main. I'm guessing it's just some idmodel config that I wasn't using properly.
…location_order_refactor
|
local test failures are encountered here: |
SGTM |
Doesn't transpose mean we cannot have a total order on |
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
| std::vector<IterDomain*> mapped_id_vec; | ||
| std::unordered_set<IterDomain*> mapped_id_set; | ||
|
|
||
| // logic to preserve reduction iter domain in target to WAR issue #2202 |
There was a problem hiding this comment.
Is this going to be removed once #2202 is addressed?
There was a problem hiding this comment.
yes that's the plan.
|
!bulid --diff |
|
!build --diff |
|
!build --diff |
|
jit_thunder_tests on A100 seems to be reporting kernel with wrong arch. Don't think that one is coming from this PR and I don't see a CI nightly with thunder failure. cc'ing @xwang233 |
Somehow that test job got a T4 GPU instead of A100. Need to investigate that. Please restart new jobs CI if needed. |
|
!build |
|
I addressed all issues in comment. I'll be merging the PR after CI clears again since I have already got a stamp earlier. Please do block merge if you have further concerns. Regarding comments on propagation rule for reshape. Let's move the discussion here #2235 . |
|
Thunder failure isn't related. I'm merging this. |
Fixes a subtle [bug](https://gitlab-master.nvidia.com/dl/pytorch/fuser-gh-mirror/-/jobs/92948751), exposed by #2168
This reverts commit 8c18701.
refactored allocation order inference pass:
dstscloser to thesrcsto simplify scheduling as well as facilitate vectorization. It works roughly as:dst, among all its producers insrcs, we'll find the one with the most loop iter domain in its allocation domain as the referencerefdst's rfactor domain toref's allocation order domain and push those as the inner dimension indst's new allocation domain, while pushing unmapped iter domains as outer dimensions.