RemoveEmptyPass optimization pass to remove empty tensors#543
RemoveEmptyPass optimization pass to remove empty tensors#543jacobhinkle merged 43 commits intomainfrom
Conversation
| if (isDebugDumpEnabled(DebugDumpOption::FusionIrPreseg)) { | ||
| std::cout << "Fusion IR after pre-segmenter optimization passes:" | ||
| << std::endl; | ||
| fusion->printMath(); | ||
| } |
There was a problem hiding this comment.
New dump option fusion_ir_preseg to more easily monitor what the optimization passes are doing.
We remove from the front and push to the back to process Statements in FIFO order. This ensures we have traversed in a reverse topological order, so that we can safely remove any TensorViews downstream of the Statement we're looking at, as they have already been processed and should not appear later in the stack (though we should check still because they might be an output).
|
|
||
| void PreSegmenter::runPass(Fusion* fusion) { | ||
| // Replace TensorViews with zero extent. Outputs and inputs may still be empty | ||
| OptimizationPass<RemoveEmptyPass>::runPass(fusion); |
There was a problem hiding this comment.
I placed this pass first since I assumed we may want to do any DCE passes before other patterns are matched.
I am going to abstract this out into a DeadCodeEliminator since this pattern will also be used for concretizing slice, and potentially we may want to combine multiple passes to share the traversal machinery.
csrc/optimization/remove_empty.cpp
Outdated
| //! we traverse backwards, and we handle all active Expr outputs, this ensures | ||
| //! that it is safe to removing an Expr will not result in erasing definitions | ||
| //! of active Expr outputs. | ||
| class DeadCodeRemover : BackwardVisitor { |
There was a problem hiding this comment.
Might belong in iter_visitor.cpp.
Good question. That issue actually came up for the Welford test so it is handled now. |
How is it handled? |
In |
This renames doRemoval() to modifyFusion() and places ir_utils::replaceValInExpr() there. It performs these replacements in the same order as it receives them. Note that some other cleanup was also done in this commit.
|
!build |
naoyam
left a comment
There was a problem hiding this comment.
LGTM. Added a few suggestions for cleanup.
csrc/iter_visitor.h
Outdated
| //! Replaces a Val in outputs, and in all uses. | ||
| //! | ||
| //! The argument old_val is always marked Dead by this method. If old_val is a | ||
| //! Fusion input, we do not replace it. If old_val's definition is non-null | ||
| //! and has other outputs which are not dead, we do not remove old_val. | ||
| //! | ||
| //! Returns whether old_val was registered for removal from the Fusion. | ||
| bool replaceVal(Val* old_val, Val* new_val); |
There was a problem hiding this comment.
Please mention this does not do the replacement right away but registers the replacement.
csrc/iter_visitor.h
Outdated
| void registerRemoval(Val* val); | ||
|
|
||
| //! Register a Val for later replacement | ||
| inline void registerReplacement(Val* old_val, Val* new_val) { |
There was a problem hiding this comment.
Looks like this is just used from replaceVal. Since it's just one line, why not just combine them and name it as registerReplacement?
This just lets us return whether or not any modification was performed, which can give us a termination criterion in optimization.
A number of issues have come up when trying to process empty tensors (i.e. ones with at least one non-reduction axis with extent of zero) during scheduling and lowering. See: #264 #369 #269. Additionally, we now assume extents are positive (#440). Along with #543, this PR makes that a reality by removing all intermediate empty tensors. This PR: - Marks a `Fusion` as dynamic if dynamic reshapes/resizes exist or if _any_ alive `TensorView` has a static size-zero extent or a dynamic extent, since it might be empty. **This is means only Fusions with nothing but concrete non-zero sizes are static now.** That is, even if all static shapes are provided, it will be marked as a dynamic Fusion and those `TensorView`s will be modified during concretization. - Adds a pass done during `getConcretizationInfo()` that collects a vector of empty tensors which are not fusion inputs. It does not traverse their definitions, since there is nothing to compute for an empty tensor. - During concretization, sets the size-0 extents of identified empty tensors to constant 0. When encountering a new set of input sizes/scalars, we evaluate a minimal set of `Val`s (those that appear in dynamic extents), and only proceed with removing branches if any of these are zero. So there is a rather quick path to re-using concretizations in the common case where none of the extents are zero. Even after #543, this PR does not guarantee that all tensors present in the Fusion during scheduling have non-zero extent. It does guarantee that any remaining empty tensors are either fusion inputs or outputs, and that empty tensors will have constant 0 extents in any empty dimensions. Stripping empty inputs and outputs from the Fusion could potentially be done at segmentation, but should only be done if it does not result in additional kernels being launched; that is left for another PR (see #448). Fixes #365 and fixes #264. This replaces PRs #369 and #269. --------- Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>
When replacing empty tensors with `full` in `RemoveEmptyPass` (#543), we use the `extent` to determine the output shape. This is a problem if that tensor was the output of an `expand` call, because then even a tensor that is not empty might appear to be so if an expanded extent is zero. This change addresses this issue by using `getMaybeExpandedExtent` in `RemoveEmptyPass` and during concretization for finding empty tensors. Fixes #603
This adds an optimization pass that detects
TensorViews with constant zeros in some extents. These empty tensors do not require any computation, so their definitions can be DCE'd. This lets us, for example, replace asum(tv, {0})withfull(shape, fusion->zeroVal())whereshapeis the appropriate output shape. Currently we checkFusionoutputs, as well as inputs to the following ops, which are able to take empty inputs without producing empty outputs:ReductionOp: replace withfull(shape, op->init()).WelfordOp: similar toReductionOp, but we replaceavgandvarwith nan, andNwith 0.CatOp: replace withcatof all non-empty inputs.PadOp: replace withfull.Until #449 or similar is merged, this PR does not guarantee that intermediate tensors will not be empty; this PR only guarantees that if intermediate tensors exist, their extents are symbolic during segmentation. We examine all intermediate
TensorViews and warn once if we encounter any unhandled empty intermediate tensors.This pass is implemented using a new reusable traversal class called
DeadCodeRemoverwhich tracks live and deadStatements and provides areplaceTV()method for easily redefining non-inputTensorViews. Note that this PR also introduces a new debug dump option to show theFusionmath after the optimization passes: e.g.NVFUSER_DUMP=fusion_ir_preseg.