RemoveEmptyPass optimization pass to remove empty tensors by jacobhinkle · Pull Request #543 · NVIDIA/Fuser

jacobhinkle · 2023-06-28T14:17:15Z

This adds an optimization pass that detects TensorViews with constant zeros in some extents. These empty tensors do not require any computation, so their definitions can be DCE'd. This lets us, for example, replace a sum(tv, {0}) with full(shape, fusion->zeroVal()) where shape is the appropriate output shape. Currently we check Fusion outputs, as well as inputs to the following ops, which are able to take empty inputs without producing empty outputs:

ReductionOp: replace with full(shape, op->init()).
WelfordOp: similar to ReductionOp, but we replace avg and var with nan, and N with 0.
CatOp: replace with cat of all non-empty inputs.
PadOp: replace with full.

Until #449 or similar is merged, this PR does not guarantee that intermediate tensors will not be empty; this PR only guarantees that if intermediate tensors exist, their extents are symbolic during segmentation. We examine all intermediate TensorViews and warn once if we encounter any unhandled empty intermediate tensors.

This pass is implemented using a new reusable traversal class called DeadCodeRemover which tracks live and dead Statements and provides a replaceTV() method for easily redefining non-input TensorViews. Note that this PR also introduces a new debug dump option to show the Fusion math after the optimization passes: e.g. NVFUSER_DUMP=fusion_ir_preseg.

jacobhinkle · 2023-06-28T14:17:58Z

csrc/kernel_cache.cpp

+  if (isDebugDumpEnabled(DebugDumpOption::FusionIrPreseg)) {
+    std::cout << "Fusion IR after pre-segmenter optimization passes:"
+              << std::endl;
+    fusion->printMath();
+  }


New dump option fusion_ir_preseg to more easily monitor what the optimization passes are doing.

We remove from the front and push to the back to process Statements in FIFO order. This ensures we have traversed in a reverse topological order, so that we can safely remove any TensorViews downstream of the Statement we're looking at, as they have already been processed and should not appear later in the stack (though we should check still because they might be an output).

jacobhinkle · 2023-06-28T19:08:12Z

csrc/optimization/pre_segmenter.cpp


 void PreSegmenter::runPass(Fusion* fusion) {
+  // Replace TensorViews with zero extent. Outputs and inputs may still be empty
+  OptimizationPass<RemoveEmptyPass>::runPass(fusion);


I placed this pass first since I assumed we may want to do any DCE passes before other patterns are matched.

I am going to abstract this out into a DeadCodeEliminator since this pattern will also be used for concretizing slice, and potentially we may want to combine multiple passes to share the traversal machinery.

jacobhinkle · 2023-06-29T13:29:31Z

csrc/optimization/remove_empty.cpp

+//! we traverse backwards, and we handle all active Expr outputs, this ensures
+//! that it is safe to removing an Expr will not result in erasing definitions
+//! of active Expr outputs.
+class DeadCodeRemover : BackwardVisitor {


Might belong in iter_visitor.cpp.

csrc/optimization/remove_empty.cpp

jacobhinkle · 2023-06-30T19:10:49Z

Sorry for being late here. Still looking through it.

Please ignore if already discussed, but did you consider multi-output exprs, where you can't just remove one of the outputs? Not sure if that could happen, but maybe at least it should be asserted?

Good question. That issue actually came up for the Welford test so it is handled now.

naoyam · 2023-06-30T19:12:17Z

Sorry for being late here. Still looking through it.
Please ignore if already discussed, but did you consider multi-output exprs, where you can't just remove one of the outputs? Not sure if that could happen, but maybe at least it should be asserted?

Good question. That issue actually came up for the Welford test so it is handled now.

How is it handled?

jacobhinkle · 2023-07-03T20:05:54Z

How is it handled?

In maybeRemoveExpr, we check whether all outputs are dead, and we register all of them for removal if so. This is the path taken by removeVal if its argument has a definition; the val is only registered directly for removal without calling maybeRemoveExpr if there is no definition.

csrc/iter_visitor.cpp

This renames doRemoval() to modifyFusion() and places ir_utils::replaceValInExpr() there. It performs these replacements in the same order as it receives them. Note that some other cleanup was also done in this commit.

jacobhinkle · 2023-07-06T16:36:13Z

!build

naoyam

LGTM. Added a few suggestions for cleanup.

naoyam · 2023-07-06T17:40:04Z

csrc/iter_visitor.h

+  //! Replaces a Val in outputs, and in all uses.
+  //!
+  //! The argument old_val is always marked Dead by this method. If old_val is a
+  //! Fusion input, we do not replace it. If old_val's definition is non-null
+  //! and has other outputs which are not dead, we do not remove old_val.
+  //!
+  //! Returns whether old_val was registered for removal from the Fusion.
+  bool replaceVal(Val* old_val, Val* new_val);


Please mention this does not do the replacement right away but registers the replacement.

naoyam · 2023-07-06T17:41:22Z

csrc/iter_visitor.h

+  void registerRemoval(Val* val);
+
+  //! Register a Val for later replacement
+  inline void registerReplacement(Val* old_val, Val* new_val) {


Looks like this is just used from replaceVal. Since it's just one line, why not just combine them and name it as registerReplacement?

This just lets us return whether or not any modification was performed, which can give us a termination criterion in optimization.

A number of issues have come up when trying to process empty tensors (i.e. ones with at least one non-reduction axis with extent of zero) during scheduling and lowering. See: #264 #369 #269. Additionally, we now assume extents are positive (#440). Along with #543, this PR makes that a reality by removing all intermediate empty tensors. This PR: - Marks a `Fusion` as dynamic if dynamic reshapes/resizes exist or if _any_ alive `TensorView` has a static size-zero extent or a dynamic extent, since it might be empty. **This is means only Fusions with nothing but concrete non-zero sizes are static now.** That is, even if all static shapes are provided, it will be marked as a dynamic Fusion and those `TensorView`s will be modified during concretization. - Adds a pass done during `getConcretizationInfo()` that collects a vector of empty tensors which are not fusion inputs. It does not traverse their definitions, since there is nothing to compute for an empty tensor. - During concretization, sets the size-0 extents of identified empty tensors to constant 0. When encountering a new set of input sizes/scalars, we evaluate a minimal set of `Val`s (those that appear in dynamic extents), and only proceed with removing branches if any of these are zero. So there is a rather quick path to re-using concretizations in the common case where none of the extents are zero. Even after #543, this PR does not guarantee that all tensors present in the Fusion during scheduling have non-zero extent. It does guarantee that any remaining empty tensors are either fusion inputs or outputs, and that empty tensors will have constant 0 extents in any empty dimensions. Stripping empty inputs and outputs from the Fusion could potentially be done at segmentation, but should only be done if it does not result in additional kernels being launched; that is left for another PR (see #448). Fixes #365 and fixes #264. This replaces PRs #369 and #269. --------- Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>

When replacing empty tensors with `full` in `RemoveEmptyPass` (#543), we use the `extent` to determine the output shape. This is a problem if that tensor was the output of an `expand` call, because then even a tensor that is not empty might appear to be so if an expanded extent is zero. This change addresses this issue by using `getMaybeExpandedExtent` in `RemoveEmptyPass` and during concretization for finding empty tensors. Fixes #603

jacobhinkle added 6 commits June 28, 2023 09:11

Initial draft of empty remover pass

6061f3b

Add NVFUSER_DUMP=fusion_ir_preseg to view opt pass output

4bce330

Add FusionRemoveEmptyOutput_CUDA

6c64915

Fill with zeroVal instead of oneVal

6fb368c

Add FusionRemoveEmptyReduction_CUDA

0d8b4fc

Handle CatOp, with test

436832f

jacobhinkle commented Jun 28, 2023

View reviewed changes

jacobhinkle added 7 commits June 28, 2023 10:19

Move tests to test_optimization_pass.cpp

50d763f

Handle PadOp

56dcfae

Use BackwardVisitor

fb1b670

Handle WelfordOp

f6487e0

Silence clang-tidy and convert empty check to TORCH_WARN

b9953f7

Use TORCH_WARN_ONCE instead of TORCH_WARN

32ab55a

jacobhinkle changed the title ~~Remove empty branches as optimization pass~~ Remove empty branches in RemoveEmptyPass optimization pass Jun 28, 2023

jacobhinkle changed the title ~~Remove empty branches in RemoveEmptyPass optimization pass~~ RemoveEmptyPass optimization pass to remove empty tensors Jun 28, 2023

jacobhinkle changed the title ~~RemoveEmptyPass optimization pass to remove empty tensors~~ RemoveEmptyPass optimization pass to remove unneeded empty tensors Jun 28, 2023

jacobhinkle commented Jun 28, 2023

View reviewed changes

jacobhinkle marked this pull request as ready for review June 29, 2023 11:01

jacobhinkle requested review from jjsjann123 and naoyam June 29, 2023 11:01

Cleanup, mark upstream unused tensors dead

7b1d763

jacobhinkle marked this pull request as draft June 29, 2023 11:33

jacobhinkle added 4 commits June 29, 2023 08:55

Add live statement tracking.

bc57aa3

I am going to abstract this out into a DeadCodeEliminator since this pattern will also be used for concretizing slice, and potentially we may want to combine multiple passes to share the traversal machinery.

Refactor into parent class DeadCodeRemover

cf24bd0

Update comment for DeadCodeRemover

4d0c656

Comment update

d1532c2

jacobhinkle marked this pull request as ready for review June 29, 2023 13:28

jacobhinkle commented Jun 29, 2023

View reviewed changes

Update comment

4f83f3f

naoyam reviewed Jun 30, 2023

View reviewed changes

csrc/optimization/remove_empty.cpp Outdated Show resolved Hide resolved

csrc/optimization/remove_empty.cpp Show resolved Hide resolved

csrc/optimization/remove_empty.cpp Outdated Show resolved Hide resolved

jacobhinkle added 4 commits July 3, 2023 12:07

Fix constness and use all_of in allUsesDead. Check I/O in markDead

53d51ed

Rework interface: only replaceVal& removeVal used in child classes

b4c2372

Return vector<int64_t> in emptyAxes

4373c1b

Defer removal of Vals and Exprs until after traversal

0a1d21f

jacobhinkle closed this Jul 3, 2023

jacobhinkle reopened this Jul 3, 2023

Merge branch 'main' into empty_branch_opt_pass

f2270e6

naoyam reviewed Jul 5, 2023

View reviewed changes

csrc/iter_visitor.cpp Show resolved Hide resolved

naoyam reviewed Jul 5, 2023

View reviewed changes

csrc/iter_visitor.cpp Outdated Show resolved Hide resolved

jacobhinkle added 2 commits July 6, 2023 12:27

Defer modifying Fusion at all until after traversal.

fa7aed1

This renames doRemoval() to modifyFusion() and places ir_utils::replaceValInExpr() there. It performs these replacements in the same order as it receives them. Note that some other cleanup was also done in this commit.

Merge remote-tracking branch 'origin/main' into empty_branch_opt_pass

d1c4b8f

naoyam approved these changes Jul 6, 2023

View reviewed changes

jacobhinkle and others added 6 commits July 6, 2023 13:49

Merge branch 'main' into empty_branch_opt_pass

c1c386f

Rename replaceVal as registerReplacement.

7ed25be

Merge branch 'main' into empty_branch_opt_pass

6cecc75

Merge branch 'main' into empty_branch_opt_pass

9a61d1e

Return bool from run() and modifyFusion()

9609922

This just lets us return whether or not any modification was performed, which can give us a termination criterion in optimization.

Merge branch 'main' into empty_branch_opt_pass

8d45d43

jacobhinkle changed the title ~~RemoveEmptyPass optimization pass to remove unneeded empty tensors~~ RemoveEmptyPass optimization pass to remove empty tensors Jul 7, 2023

jacobhinkle merged commit 74f66a1 into main Jul 7, 2023

jacobhinkle deleted the empty_branch_opt_pass branch July 7, 2023 13:26

This was referenced Jul 18, 2023

Wrong output shape in broadcast_in_dim with empty inputs #603

Closed

Use expanded extent for empty check in RemoveEmptyPass #604

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RemoveEmptyPass optimization pass to remove empty tensors#543

RemoveEmptyPass optimization pass to remove empty tensors#543
jacobhinkle merged 43 commits intomainfrom
empty_branch_opt_pass

jacobhinkle commented Jun 28, 2023 •

edited

Loading

Uh oh!

jacobhinkle Jun 28, 2023

Uh oh!

jacobhinkle Jun 28, 2023

Uh oh!

jacobhinkle Jun 29, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jacobhinkle commented Jun 30, 2023

Uh oh!

naoyam commented Jun 30, 2023

Uh oh!

jacobhinkle commented Jul 3, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jacobhinkle commented Jul 6, 2023

Uh oh!

naoyam left a comment

Uh oh!

naoyam Jul 6, 2023

Uh oh!

naoyam Jul 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jacobhinkle commented Jun 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobhinkle Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jun 29, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jacobhinkle commented Jun 30, 2023

Uh oh!

naoyam commented Jun 30, 2023

Uh oh!

jacobhinkle commented Jul 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jacobhinkle commented Jul 6, 2023

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

naoyam Jul 6, 2023

Choose a reason for hiding this comment

Uh oh!

naoyam Jul 6, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jacobhinkle commented Jun 28, 2023 •

edited

Loading

jacobhinkle commented Jul 3, 2023 •

edited

Loading