Allocate and Deallocate HostIr insertion in FusionKernelRuntime#4329
Merged
nsarka merged 19 commits intoNVIDIA:mainfrom May 5, 2025
Merged
Allocate and Deallocate HostIr insertion in FusionKernelRuntime#4329nsarka merged 19 commits intoNVIDIA:mainfrom
nsarka merged 19 commits intoNVIDIA:mainfrom
Conversation
wujingyue
reviewed
Apr 28, 2025
wujingyue
reviewed
Apr 28, 2025
e6d02e1 to
3fce074
Compare
nsarka
commented
Apr 30, 2025
| output_tensor, info.tv, expr_eval); | ||
| info.shape_info.allocation_sizes = alloc_sizes; | ||
| info.shape_info.allocation_strides = alloc_strides; | ||
| } |
Member
Author
There was a problem hiding this comment.
I am not 100% sure whether this may have any unintended side effects. I will take a look into it
Collaborator
There was a problem hiding this comment.
cc @jjsjann123 do you recall why it was problematic for pre-allocated outputs to have allocation domains?
848f41f to
d0d169e
Compare
Member
Author
|
!test |
1 similar comment
Member
Author
|
!test |
wujingyue
reviewed
Apr 30, 2025
Collaborator
wujingyue
left a comment
There was a problem hiding this comment.
Thanks! Looks good overall. I'm also unsure about the change executor.cpp. So cc'ed Jie
| output_tensor, info.tv, expr_eval); | ||
| info.shape_info.allocation_sizes = alloc_sizes; | ||
| info.shape_info.allocation_strides = alloc_strides; | ||
| } |
Collaborator
There was a problem hiding this comment.
cc @jjsjann123 do you recall why it was problematic for pre-allocated outputs to have allocation domains?
507b9b1 to
a103e0b
Compare
wujingyue
approved these changes
May 1, 2025
Collaborator
wujingyue
left a comment
There was a problem hiding this comment.
LGTM with comments!
The PR description seems outdated. Please fix that as well.
wujingyue
reviewed
May 1, 2025
wujingyue
reviewed
May 1, 2025
wujingyue
reviewed
May 1, 2025
Member
Author
|
!test |
1 similar comment
Member
Author
|
!test |
wujingyue
reviewed
May 1, 2025
wujingyue
reviewed
May 1, 2025
Member
Author
|
!test |
d02cdb5 to
c0848ed
Compare
Member
Author
|
!test |
5d8db2d to
96427a0
Compare
Member
Author
|
!test |
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
This reverts commit 96427a0.
de54612 to
989177b
Compare
Member
Author
|
!test |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up PR to #4286. The PR will insert allocate ops for every LaunchKernel output, and also insert a Deallocate right after the last use of every input expr in the Hostir container. It adds a test to check the number of Deallocate ops and the max memory usage is correct for an example fusion as well.