Skip to content

RemoveEmptyPass optimization pass to remove empty tensors#543

Merged
jacobhinkle merged 43 commits intomainfrom
empty_branch_opt_pass
Jul 7, 2023
Merged

RemoveEmptyPass optimization pass to remove empty tensors#543
jacobhinkle merged 43 commits intomainfrom
empty_branch_opt_pass

Conversation

@jacobhinkle
Copy link
Collaborator

@jacobhinkle jacobhinkle commented Jun 28, 2023

This adds an optimization pass that detects TensorViews with constant zeros in some extents. These empty tensors do not require any computation, so their definitions can be DCE'd. This lets us, for example, replace a sum(tv, {0}) with full(shape, fusion->zeroVal()) where shape is the appropriate output shape. Currently we check Fusion outputs, as well as inputs to the following ops, which are able to take empty inputs without producing empty outputs:

  • ReductionOp: replace with full(shape, op->init()).
  • WelfordOp: similar to ReductionOp, but we replace avg and var with nan, and N with 0.
  • CatOp: replace with cat of all non-empty inputs.
  • PadOp: replace with full.

Until #449 or similar is merged, this PR does not guarantee that intermediate tensors will not be empty; this PR only guarantees that if intermediate tensors exist, their extents are symbolic during segmentation. We examine all intermediate TensorViews and warn once if we encounter any unhandled empty intermediate tensors.

This pass is implemented using a new reusable traversal class called DeadCodeRemover which tracks live and dead Statements and provides a replaceTV() method for easily redefining non-input TensorViews. Note that this PR also introduces a new debug dump option to show the Fusion math after the optimization passes: e.g. NVFUSER_DUMP=fusion_ir_preseg.

Comment on lines +675 to +679
if (isDebugDumpEnabled(DebugDumpOption::FusionIrPreseg)) {
std::cout << "Fusion IR after pre-segmenter optimization passes:"
<< std::endl;
fusion->printMath();
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New dump option fusion_ir_preseg to more easily monitor what the optimization passes are doing.

We remove from the front and push to the back to process Statements in
FIFO order. This ensures we have traversed in a reverse topological
order, so that we can safely remove any TensorViews downstream of the
Statement we're looking at, as they have already been processed and
should not appear later in the stack (though we should check still
because they might be an output).
@jacobhinkle jacobhinkle changed the title Remove empty branches as optimization pass Remove empty branches in RemoveEmptyPass optimization pass Jun 28, 2023
@jacobhinkle jacobhinkle changed the title Remove empty branches in RemoveEmptyPass optimization pass RemoveEmptyPass optimization pass to remove empty tensors Jun 28, 2023
@jacobhinkle jacobhinkle changed the title RemoveEmptyPass optimization pass to remove empty tensors RemoveEmptyPass optimization pass to remove unneeded empty tensors Jun 28, 2023

void PreSegmenter::runPass(Fusion* fusion) {
// Replace TensorViews with zero extent. Outputs and inputs may still be empty
OptimizationPass<RemoveEmptyPass>::runPass(fusion);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I placed this pass first since I assumed we may want to do any DCE passes before other patterns are matched.

@jacobhinkle jacobhinkle marked this pull request as ready for review June 29, 2023 11:01
@jacobhinkle jacobhinkle requested review from jjsjann123 and naoyam June 29, 2023 11:01
@jacobhinkle jacobhinkle marked this pull request as draft June 29, 2023 11:33
I am going to abstract this out into a DeadCodeEliminator since this
pattern will also be used for concretizing slice, and potentially we may
want to combine multiple passes to share the traversal machinery.
@jacobhinkle jacobhinkle marked this pull request as ready for review June 29, 2023 13:28
//! we traverse backwards, and we handle all active Expr outputs, this ensures
//! that it is safe to removing an Expr will not result in erasing definitions
//! of active Expr outputs.
class DeadCodeRemover : BackwardVisitor {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might belong in iter_visitor.cpp.

@jacobhinkle
Copy link
Collaborator Author

Sorry for being late here. Still looking through it.

Please ignore if already discussed, but did you consider multi-output exprs, where you can't just remove one of the outputs? Not sure if that could happen, but maybe at least it should be asserted?

Good question. That issue actually came up for the Welford test so it is handled now.

@naoyam
Copy link
Collaborator

naoyam commented Jun 30, 2023

Sorry for being late here. Still looking through it.
Please ignore if already discussed, but did you consider multi-output exprs, where you can't just remove one of the outputs? Not sure if that could happen, but maybe at least it should be asserted?

Good question. That issue actually came up for the Welford test so it is handled now.

How is it handled?

@jacobhinkle
Copy link
Collaborator Author

jacobhinkle commented Jul 3, 2023

How is it handled?

In maybeRemoveExpr, we check whether all outputs are dead, and we register all of them for removal if so. This is the path taken by removeVal if its argument has a definition; the val is only registered directly for removal without calling maybeRemoveExpr if there is no definition.

@jacobhinkle jacobhinkle closed this Jul 3, 2023
@jacobhinkle jacobhinkle reopened this Jul 3, 2023
This renames doRemoval() to modifyFusion() and places
ir_utils::replaceValInExpr() there. It performs these replacements in
the same order as it receives them. Note that some other cleanup was
also done in this commit.
@jacobhinkle
Copy link
Collaborator Author

!build

Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Added a few suggestions for cleanup.

Comment on lines +415 to +422
//! Replaces a Val in outputs, and in all uses.
//!
//! The argument old_val is always marked Dead by this method. If old_val is a
//! Fusion input, we do not replace it. If old_val's definition is non-null
//! and has other outputs which are not dead, we do not remove old_val.
//!
//! Returns whether old_val was registered for removal from the Fusion.
bool replaceVal(Val* old_val, Val* new_val);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention this does not do the replacement right away but registers the replacement.

void registerRemoval(Val* val);

//! Register a Val for later replacement
inline void registerReplacement(Val* old_val, Val* new_val) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is just used from replaceVal. Since it's just one line, why not just combine them and name it as registerReplacement?

@jacobhinkle jacobhinkle changed the title RemoveEmptyPass optimization pass to remove unneeded empty tensors RemoveEmptyPass optimization pass to remove empty tensors Jul 7, 2023
@jacobhinkle jacobhinkle merged commit 74f66a1 into main Jul 7, 2023
@jacobhinkle jacobhinkle deleted the empty_branch_opt_pass branch July 7, 2023 13:26
jacobhinkle added a commit that referenced this pull request Jul 14, 2023
A number of issues have come up when trying to process empty tensors
(i.e. ones with at least one non-reduction axis with extent of zero)
during scheduling and lowering. See: #264 #369 #269. Additionally, we
now assume extents are positive (#440). Along with #543, this PR makes
that a reality by removing all intermediate empty tensors.

This PR:
- Marks a `Fusion` as dynamic if dynamic reshapes/resizes exist or if
_any_ alive `TensorView` has a static size-zero extent or a dynamic
extent, since it might be empty. **This is means only Fusions with
nothing but concrete non-zero sizes are static now.** That is, even if
all static shapes are provided, it will be marked as a dynamic Fusion
and those `TensorView`s will be modified during concretization.
- Adds a pass done during `getConcretizationInfo()` that collects a
vector of empty tensors which are not fusion inputs. It does not
traverse their definitions, since there is nothing to compute for an
empty tensor.
- During concretization, sets the size-0 extents of identified empty
tensors to constant 0.

When encountering a new set of input sizes/scalars, we evaluate a
minimal set of `Val`s (those that appear in dynamic extents), and only
proceed with removing branches if any of these are zero. So there is a
rather quick path to re-using concretizations in the common case where
none of the extents are zero.

Even after #543, this PR does not guarantee that all tensors present in
the Fusion during scheduling have non-zero extent. It does guarantee
that any remaining empty tensors are either fusion inputs or outputs,
and that empty tensors will have constant 0 extents in any empty
dimensions. Stripping empty inputs and outputs from the Fusion could
potentially be done at segmentation, but should only be done if it does
not result in additional kernels being launched; that is left for
another PR (see #448).

Fixes #365 and fixes #264. This replaces PRs #369 and #269.

---------

Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>
jacobhinkle added a commit that referenced this pull request Jul 19, 2023
When replacing empty tensors with `full` in `RemoveEmptyPass` (#543), we
use the `extent` to determine the output shape. This is a problem if
that tensor was the output of an `expand` call, because then even a
tensor that is not empty might appear to be so if an expanded extent is
zero. This change addresses this issue by using `getMaybeExpandedExtent`
in `RemoveEmptyPass` and during concretization for finding empty
tensors.

Fixes #603
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants