Skip to content

Fusion level axioms#440

Merged
zasdfgbnm merged 9 commits intomainfrom
axioms-fusion
Jun 2, 2023
Merged

Fusion level axioms#440
zasdfgbnm merged 9 commits intomainfrom
axioms-fusion

Conversation

@zasdfgbnm
Copy link
Collaborator

@zasdfgbnm zasdfgbnm commented Jun 1, 2023

Fixes #320. Add a new optimization pass that append extent > 0 as axioms of the fusion.

Currently, we are not doing concretization of extent zero, so this assumption is actually wrong. But fortunately, it is not too wrong, given the fact that:

  • no test / benchmark fail
  • AFAICT, except at parallel dimension map (where we already assume extent > 0), the only difference that extent > 0 vs extent >= 0 is when extent is a denominator. Considering that we already have preserve_error = false by default, so changing from extent >= 0 to extent > 0 will hopefully be a no-op for lowering.

But this do help on vectorization analysis. Currently, I am seeing a lot of things like where(abs(i1) == i1, i2, i3), and the added axioms should help on eliminate this trivial computation.

Also note that I removed the explicit assumption on parallel dimension map, which means that, if we are using fusion executor manually, we must run pre-segmentation pass before scheduling in order to get the same behavior as using executor cache. In the future, I think this is the direction to go as we are adding more and more pre-segmentation passes. Especially, I am thinking about moving replaceSymbolicSizes from a lowering pass into a pre-segmentation or maybe pre-concretization pass. This would reduce the number of Val* and generally speeding up all expr simplification and sameAs check.

I am OK with holding this PR until zero extent concretization is merged. But I don't think this is necessary. (Just be a bad guy and break things😈)

@zasdfgbnm
Copy link
Collaborator Author

!build

Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this should be relatively safe. I still think we should consider a different IterType for size-zero domains.

@zasdfgbnm
Copy link
Collaborator Author

zasdfgbnm commented Jun 2, 2023

@naoyam Clang-tidy is broken due to unrelated thing:

git fetch origin main
git checkout origin/main
head_commit=$(git rev-parse HEAD)
git checkout $this_commit

I think I need admin power to force merge.
Edit: the clang-tidy error is easy to fix. Will fix it in this PR.

@zasdfgbnm
Copy link
Collaborator Author

And I agree with a separate IterType for zero extent.

@zasdfgbnm
Copy link
Collaborator Author

@naoyam Clang-tidy is broken due to unrelated thing:

git fetch origin main
git checkout origin/main
head_commit=$(git rev-parse HEAD)
git checkout $this_commit

I think I need admin power to force merge.

Maybe not, I think the clang-tidy error is easy to fix. Let me try it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zasdfgbnm
Copy link
Collaborator Author

!build

@jacobhinkle
Copy link
Collaborator

jacobhinkle commented Jun 5, 2023

I think we can guarantee positivity somewhat soon, and kill a few birds with one stone. #449 is a start, and #448 is the followup (haven't started that just yet but hopefully will soon).

jacobhinkle added a commit that referenced this pull request Jul 14, 2023
A number of issues have come up when trying to process empty tensors
(i.e. ones with at least one non-reduction axis with extent of zero)
during scheduling and lowering. See: #264 #369 #269. Additionally, we
now assume extents are positive (#440). Along with #543, this PR makes
that a reality by removing all intermediate empty tensors.

This PR:
- Marks a `Fusion` as dynamic if dynamic reshapes/resizes exist or if
_any_ alive `TensorView` has a static size-zero extent or a dynamic
extent, since it might be empty. **This is means only Fusions with
nothing but concrete non-zero sizes are static now.** That is, even if
all static shapes are provided, it will be marked as a dynamic Fusion
and those `TensorView`s will be modified during concretization.
- Adds a pass done during `getConcretizationInfo()` that collects a
vector of empty tensors which are not fusion inputs. It does not
traverse their definitions, since there is nothing to compute for an
empty tensor.
- During concretization, sets the size-0 extents of identified empty
tensors to constant 0.

When encountering a new set of input sizes/scalars, we evaluate a
minimal set of `Val`s (those that appear in dynamic extents), and only
proceed with removing branches if any of these are zero. So there is a
rather quick path to re-using concretizations in the common case where
none of the extents are zero.

Even after #543, this PR does not guarantee that all tensors present in
the Fusion during scheduling have non-zero extent. It does guarantee
that any remaining empty tensors are either fusion inputs or outputs,
and that empty tensors will have constant 0 extents in any empty
dimensions. Stripping empty inputs and outputs from the Fusion could
potentially be done at segmentation, but should only be done if it does
not result in additional kernels being launched; that is left for
another PR (see #448).

Fixes #365 and fixes #264. This replaces PRs #369 and #269.

---------

Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Definition-time simplifyExpr with concretization axioms

3 participants