Skip to content

Fix getPointwiseHeuristics with empty tensors#369

Closed
jacobhinkle wants to merge 9 commits intomainfrom
projected_extent_size0
Closed

Fix getPointwiseHeuristics with empty tensors#369
jacobhinkle wants to merge 9 commits intomainfrom
projected_extent_size0

Conversation

@jacobhinkle
Copy link
Collaborator

Fixes #365.

Note that this is not really safe as it will set zero_ = false even if
it's still zero because of zero input numerator.
@jacobhinkle
Copy link
Collaborator Author

After removing the failing assertions, I'm hitting

it != broadcast_origin_map_.end() INTERNAL ASSERT FAILED at "/opt/pytorch/
nvfuser/csrc/device_lower/analysis/trivial_broadcast.cpp":98, please report a bug to PyTorch. Broadcast origin info not found for producer broadcast domain: bS2{1}rf of T4_l[ bS2{1}rf ]

This is a known issue being addressed in #358 and comes from the broadcast coming from the resize. If instead, we use a size-2 slice instead of size-1, the test passes. I don't think it's necessarily safe to just uncomment the failing check in ProjectedExtent though, so I'll verify that class works properly for size-0 extents next.

Comment on lines +252 to +255
if (ext->isConstInt() && ext->evaluateInt() == 0) {
// A size-zero extent ID will be predicated out always, so it should
// not affect the projected extent calculation
continue;
Copy link
Collaborator Author

@jacobhinkle jacobhinkle May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zasdfgbnm I know you've been looking at this code lately also, in relation to #393 . I am trying to avoid bad behavior when an empty TensorView with size zero extent is encountered. I currently am just pretending it doesn't exist here for the purposes of computing the projected extent. But I'm a little uncertain of whether that is enough, or if it's even safe. I think I have an idea of how this class supposed to work but I'd appreciate you having a look if you have some time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should clean up this entire thing with expr simplifier, and I am thinking about doing so after #393.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the case of zero extent, as long as this does not throw an error, whatever value it returns does not matter, because we will not do any memory read/write anyway. Am I correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a tensor has zero elements, should we just stop propagation and assume that tensor is OK with arbitrary vectorization size?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am refactoring this part of code for allocation domain support. So I don't want you to invest too much time digging into the codebase on how to do it best. The workaround here makes sense if it can let the test pass.

@jacobhinkle
Copy link
Collaborator Author

Closing in favor of #449

@zasdfgbnm zasdfgbnm deleted the projected_extent_size0 branch June 22, 2023 15:50
jacobhinkle added a commit that referenced this pull request Jul 14, 2023
A number of issues have come up when trying to process empty tensors
(i.e. ones with at least one non-reduction axis with extent of zero)
during scheduling and lowering. See: #264 #369 #269. Additionally, we
now assume extents are positive (#440). Along with #543, this PR makes
that a reality by removing all intermediate empty tensors.

This PR:
- Marks a `Fusion` as dynamic if dynamic reshapes/resizes exist or if
_any_ alive `TensorView` has a static size-zero extent or a dynamic
extent, since it might be empty. **This is means only Fusions with
nothing but concrete non-zero sizes are static now.** That is, even if
all static shapes are provided, it will be marked as a dynamic Fusion
and those `TensorView`s will be modified during concretization.
- Adds a pass done during `getConcretizationInfo()` that collects a
vector of empty tensors which are not fusion inputs. It does not
traverse their definitions, since there is nothing to compute for an
empty tensor.
- During concretization, sets the size-0 extents of identified empty
tensors to constant 0.

When encountering a new set of input sizes/scalars, we evaluate a
minimal set of `Val`s (those that appear in dynamic extents), and only
proceed with removing branches if any of these are zero. So there is a
rather quick path to re-using concretizations in the common case where
none of the extents are zero.

Even after #543, this PR does not guarantee that all tensors present in
the Fusion during scheduling have non-zero extent. It does guarantee
that any remaining empty tensors are either fusion inputs or outputs,
and that empty tensors will have constant 0 extents in any empty
dimensions. Stripping empty inputs and outputs from the Fusion could
potentially be done at segmentation, but should only be done if it does
not result in additional kernels being launched; that is left for
another PR (see #448).

Fixes #365 and fixes #264. This replaces PRs #369 and #269.

---------

Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple slices with one of them zero-length fails assertion

2 participants