Merged
Conversation
Collaborator
Author
|
!build |
naoyam
approved these changes
Jun 2, 2023
Collaborator
naoyam
left a comment
There was a problem hiding this comment.
Agree this should be relatively safe. I still think we should consider a different IterType for size-zero domains.
Collaborator
Author
|
@naoyam Clang-tidy is broken due to unrelated thing: Fuser/.github/workflows/lint.yml Lines 60 to 63 in 1ca8581 Edit: the clang-tidy error is easy to fix. Will fix it in this PR. |
Collaborator
Author
|
And I agree with a separate IterType for zero extent. |
Collaborator
Author
Maybe not, I think the clang-tidy error is easy to fix. Let me try it. |
zasdfgbnm
commented
Jun 2, 2023
Collaborator
Author
There was a problem hiding this comment.
Collaborator
Author
|
!build |
Collaborator
jacobhinkle
added a commit
that referenced
this pull request
Jul 14, 2023
A number of issues have come up when trying to process empty tensors (i.e. ones with at least one non-reduction axis with extent of zero) during scheduling and lowering. See: #264 #369 #269. Additionally, we now assume extents are positive (#440). Along with #543, this PR makes that a reality by removing all intermediate empty tensors. This PR: - Marks a `Fusion` as dynamic if dynamic reshapes/resizes exist or if _any_ alive `TensorView` has a static size-zero extent or a dynamic extent, since it might be empty. **This is means only Fusions with nothing but concrete non-zero sizes are static now.** That is, even if all static shapes are provided, it will be marked as a dynamic Fusion and those `TensorView`s will be modified during concretization. - Adds a pass done during `getConcretizationInfo()` that collects a vector of empty tensors which are not fusion inputs. It does not traverse their definitions, since there is nothing to compute for an empty tensor. - During concretization, sets the size-0 extents of identified empty tensors to constant 0. When encountering a new set of input sizes/scalars, we evaluate a minimal set of `Val`s (those that appear in dynamic extents), and only proceed with removing branches if any of these are zero. So there is a rather quick path to re-using concretizations in the common case where none of the extents are zero. Even after #543, this PR does not guarantee that all tensors present in the Fusion during scheduling have non-zero extent. It does guarantee that any remaining empty tensors are either fusion inputs or outputs, and that empty tensors will have constant 0 extents in any empty dimensions. Stripping empty inputs and outputs from the Fusion could potentially be done at segmentation, but should only be done if it does not result in additional kernels being launched; that is left for another PR (see #448). Fixes #365 and fixes #264. This replaces PRs #369 and #269. --------- Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #320. Add a new optimization pass that append
extent > 0as axioms of the fusion.Currently, we are not doing concretization of extent zero, so this assumption is actually wrong. But fortunately, it is not too wrong, given the fact that:
extent > 0), the only difference thatextent > 0vsextent >= 0is whenextentis a denominator. Considering that we already havepreserve_error = falseby default, so changing fromextent >= 0toextent > 0will hopefully be a no-op for lowering.But this do help on vectorization analysis. Currently, I am seeing a lot of things like
where(abs(i1) == i1, i2, i3), and the added axioms should help on eliminate this trivial computation.Also note that I removed the explicit assumption on parallel dimension map, which means that, if we are using fusion executor manually, we must run pre-segmentation pass before scheduling in order to get the same behavior as using executor cache. In the future, I think this is the direction to go as we are adding more and more pre-segmentation passes. Especially, I am thinking about moving
replaceSymbolicSizesfrom a lowering pass into a pre-segmentation or maybe pre-concretization pass. This would reduce the number ofVal*and generally speeding up all expr simplification andsameAscheck.I am OK with holding this PR until zero extent concretization is merged. But I don't think this is necessary. (Just be a bad guy and break things😈)