Clip slice range expressions by jacobhinkle · Pull Request #460 · NVIDIA/Fuser

jacobhinkle · 2023-06-06T19:41:23Z

This PR normalizes the inputs to slice in order to mimic the semantics of numpy/PyTorch slicing. For an axis with extent ext, if we receive a slice of (start, stop, step) we normalize it to (norm_start, norm_stop, step) where

norm_start = max(0, start < 0 ? start + ext : start);
norm_stop = max(norm_start, min(ext, stop < 0 ? stop + ext : stop));

Specific changes in this PR:

Form the above expressions in the slice op.
Add shmoo tests that test various scenarios with constant and input size slices.

The simple Fusion in the input range test prints like this:

Inputs:
  T0_g[ iS0{9} ], float
  i3, nvfuser_index_t
  i4, nvfuser_index_t
Outputs:
  T1_g[ ?S2{( ( ( -( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ) ) + 9 ) + ( ( fmax(( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ), ( fmin(9, ( where(( i4 < 0 ), ( i4 + 9 ), i4) )) )) ) - 9 ) )}rf ], float

%kernel_math {
b7 = i3 < 0;
i5 = i3 + 9;
i9 = where(b7, i5, i3);
i11 = fmax(0, i9);
b15 = i4 < 0;
i13 = i4 + 9;
i17 = where(b15, i13, i4);
i19 = fmin(9, i17);
i21 = fmax(i11, i19);
T1_g[ ?S2{( ( ( -( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ) ) + 9 ) + ( ( fmax(( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ), ( fmin(9, ( where(( i4 < 0 ), ( i4 + 9 ), i4) )) )) ) - 9 ) )}rf ]
   = slice( T0_g[ iS0{9} ], { {i11, i21, 1} } )
}

T0_g[ iS0{9} ]
 root domain : (iS0{9})
 contiguity: f
 leaf domain : (iS0{9})
T1_g[ ?S2{( ( ( -( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ) ) + 9 ) + ( ( fmax(( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ), ( fmin(9, ( where(( i4 < 0 ), ( i4 + 9 ), i4) )) )) ) - 9 ) )}rf ]
 root domain : (iS1{9}rf)
  Resize: iS1{9}rf by ( -( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ) ) and ( ( fmax(( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ), ( fmin(9, ( where(( i4 < 0 ), ( i4 + 9 ), i4) )) )) ) - 9 ) -> ?S2{( ( ( -( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ) ) + 9 ) + ( ( fmax(( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ), ( fmin(9, ( where(( i4 < 0 ), ( i4 + 9 ), i4) )) )) ) - 9 ) )}rf
 rfactor domain : (?S2{( ( ( -( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ) ) + 9 ) + ( ( fmax(( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ), ( fmin(9, ( where(( i4 < 0 ), ( i4 + 9 ), i4) )) )) ) - 9 ) )}rf)
 contiguity: t
 leaf domain : (?S2{( ( ( -( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ) ) + 9 ) + ( ( fmax(( fmax(0, ( where(( i3 < 0 ), ( i3 + 9 ), i3) )) ), ( fmin(9, ( where(( i4 < 0 ), ( i4 + 9 ), i4) )) )) ) - 9 ) )}rf)

resulting in the following CUDA kernel:

__global__ void kernel1(Tensor<float, 1, 1> T0, nvfuser_index_t i0, nvfuser_index_t i1, Tensor<float, 1, 1> T1) {
  nvfuser_index_t i2;
  i2 = i0 + 9;
  bool b3;
  b3 = i0 < 0;
  nvfuser_index_t i4;
  i4 = b3 ? i2 : i0;
  nvfuser_index_t i5;
  i5 = max(0, i4);
  nvfuser_index_t i6;
  i6 = i1 + 9;
  bool b7;
  b7 = i1 < 0;
  nvfuser_index_t i8;
  i8 = b7 ? i6 : i1;
  nvfuser_index_t i9;
  i9 = min(9, i8);
  nvfuser_index_t i10;
  i10 = max(i5, i9);
  nvfuser_index_t i11;
  i11 = (-i5) + i10;
  nvfuser_index_t i12;
  i12 = i5 * T0.alloc_stride[0];
  #pragma unroll 1
  for(nvfuser_index_t i13 = 0; i13 < i11; ++i13) {
    T1[i13]
       = T0[(i12 + (T0.alloc_stride[0] * i13))];
  }
}

This PR does NOT simplify these expressions for non-constant inputs. This can be done at concretization, which will be left for a follow-up PR.

Stacked on #892 and #895.

Fixes #439. Fixes #52.

jacobhinkle · 2023-06-06T19:41:51Z

!build

csrc/ops/alias.cpp

This currently fails at lowering due to infinite recursion in nvfuser::prove::lessEqual when trying to simplify index expressions for index hoisting.

test/test_resize.cpp

jacobhinkle · 2023-06-22T11:03:14Z

Closing in favor of #511.

jacobhinkle · 2023-09-11T16:29:41Z

test/test_resize.cpp


  const int64_t slice_offset = 4;
-  const std::vector<int64_t> shape({1024 * 1024});
+  const std::vector<int64_t> shape({1024L * 1024L});


Silencing clang-tidy

csrc/ir/builder.cpp

csrc/ir/builder.h

jacobhinkle · 2023-09-14T13:02:57Z

csrc/kernel_cache.cpp

-    std::cout << "Fusion IR after pre-segmenter optimization passes:"
-              << std::endl;
+    debug() << "Fusion IR after pre-segmenter optimization passes:"
+            << std::endl;


Unrelated to this PR. Just found wrong ostream in this debug dump.

csrc/evaluator_common.cpp

csrc/ir/builder.cpp

jacobhinkle · 2023-09-26T13:09:41Z

!build

fixes bcast error

jacobhinkle · 2023-09-26T15:49:01Z

!build

csrc/ops/alias.cpp

test/test_resize.cpp

zasdfgbnm · 2023-09-26T17:27:08Z

test/test_resize.cpp


+// Test slice with a variety of constant ranges
+TEST_F(NVFuserTest, FusionResizeSliceConstantShmoo_CUDA) {
+  for (auto [start, stop] : std::vector<std::pair<int64_t, int64_t>>(


Should we use the same set of slices as FusionResizeSliceInputShmoo_CUDA?

Yes that's good now. The reason I didn't originally is just that it slows down the test a lot since we need to recompile for each slice.

csrc/dynamic_transform.cpp

zasdfgbnm · 2023-09-26T17:53:50Z

test/test_resize.cpp

+  fe.compileFusion(&fusion);
+
+  auto t0 = at::randn(shape, options);
+  for (auto [start, stop] : std::vector<std::pair<int64_t, int64_t>>(


Should we pull this set of slices out of the test and reuse it for all the three tests?

naoyam

LGTM

Add clipping to slice output extent expressions

f9def57

Merge branch 'main' into slice_clip

15587bd

jacobhinkle marked this pull request as ready for review June 6, 2023 23:35

jacobhinkle requested a review from naoyam June 6, 2023 23:35

naoyam reviewed Jun 7, 2023

View reviewed changes

csrc/ops/alias.cpp Outdated Show resolved Hide resolved

Add where to ExpressionEvaluator, handle negative in slice

2c4af30

This currently fails at lowering due to infinite recursion in nvfuser::prove::lessEqual when trying to simplify index expressions for index hoisting.

jacobhinkle commented Jun 7, 2023

View reviewed changes

test/test_resize.cpp Outdated Show resolved Hide resolved

jacobhinkle mentioned this pull request Jun 22, 2023

Simplify slice expressions at concretization #511

Draft

jacobhinkle closed this Jun 22, 2023

zasdfgbnm deleted the slice_clip branch June 22, 2023 15:50

jacobhinkle mentioned this pull request Aug 22, 2023

Concretize dynamic expressions in topological order #760

Closed

jacobhinkle restored the slice_clip branch September 11, 2023 14:18

jacobhinkle added 3 commits September 11, 2023 10:19

Merge remote-tracking branch 'origin/main' into slice_clip

2b4ef9a

Support Set,Where, bool ops in NaiveValueMachine

5dcf2c8

Silence clang-tidy in test_resize.cpp

7432adb

jacobhinkle reopened this Sep 11, 2023

jacobhinkle marked this pull request as draft September 11, 2023 16:29

jacobhinkle commented Sep 11, 2023

View reviewed changes

jacobhinkle added 3 commits September 14, 2023 08:05

Merge remote-tracking branch 'origin/main' into slice_clip

d96ee2e

Add simplifying comparison operators

9856d93

Clean up normalized slice start/stop expressions

8544650

jacobhinkle commented Sep 14, 2023

View reviewed changes

csrc/ir/builder.cpp Outdated Show resolved Hide resolved

jacobhinkle commented Sep 14, 2023

View reviewed changes

csrc/ir/builder.cpp Outdated Show resolved Hide resolved

jacobhinkle commented Sep 14, 2023

View reviewed changes

csrc/ir/builder.h Show resolved Hide resolved

jacobhinkle added 2 commits September 14, 2023 09:02

Fix wrong ostream in preseg ir dump

eee9ff2

Remove debug print

efcb203

jacobhinkle commented Sep 14, 2023

View reviewed changes

Handle NE in runBinaryOp

bf0a4b6

zasdfgbnm reviewed Sep 18, 2023

View reviewed changes

csrc/evaluator_common.cpp Show resolved Hide resolved

csrc/ir/builder.cpp Outdated Show resolved Hide resolved

csrc/ir/builder.cpp Show resolved Hide resolved

Merge branch 'main' into slice_clip

5b06adc

jacobhinkle mentioned this pull request Sep 22, 2023

Fix concretization and execution with padded expanded empty inputs #876

Merged

jacobhinkle marked this pull request as draft September 26, 2023 11:44

jacobhinkle added 4 commits September 26, 2023 08:42

Simplify clipping exprs, clean up op

72493d2

Merge remote-tracking branch 'origin/main' into slice_clip

553e63d

Simplify maybe cast exprs

a9d9e58

Remove unneeded change to nodes.cpp

90f9616

jacobhinkle marked this pull request as ready for review September 26, 2023 13:09

jacobhinkle and others added 4 commits September 26, 2023 10:47

Restore check for trivial slice

ca3a674

Use SimplifyingIrBuilder

2fd5c23

fixes bcast error

Cast extent first

2bc4748

Merge branch 'main' into slice_clip

2f714a8

zasdfgbnm reviewed Sep 26, 2023

View reviewed changes

csrc/ops/alias.cpp Show resolved Hide resolved

zasdfgbnm reviewed Sep 26, 2023

View reviewed changes

test/test_resize.cpp Outdated Show resolved Hide resolved

test/test_resize.cpp Show resolved Hide resolved

test/test_resize.cpp Outdated Show resolved Hide resolved

zasdfgbnm reviewed Sep 26, 2023

View reviewed changes

naoyam and others added 3 commits September 26, 2023 13:43

Adding a reshape example (#944)

2331989

Remove manual refs and add FEC test

74eb074

Change check for invalid extents to be >= 0

d7a4b56

jacobhinkle commented Sep 26, 2023

View reviewed changes

csrc/dynamic_transform.cpp Show resolved Hide resolved

zasdfgbnm reviewed Sep 26, 2023

View reviewed changes

Use same set of slice cases for all three tests

87dae14

zasdfgbnm approved these changes Sep 26, 2023

View reviewed changes

naoyam approved these changes Sep 26, 2023

View reviewed changes

jacobhinkle merged commit 7b142b3 into main Sep 27, 2023

jacobhinkle deleted the slice_clip branch September 27, 2023 00:38

jacobhinkle mentioned this pull request Nov 15, 2023

AliasAnalysis handles slices. #1281

Merged

jacobhinkle mentioned this pull request Jan 3, 2024

lessThan ran into infinite recursion. #1572

Closed

Conversation

jacobhinkle commented Jun 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobhinkle commented Jun 6, 2023

Uh oh!

Uh oh!

Uh oh!

jacobhinkle commented Jun 22, 2023

Uh oh!

jacobhinkle Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jacobhinkle Sep 14, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jacobhinkle commented Sep 26, 2023

Uh oh!

jacobhinkle commented Sep 26, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zasdfgbnm Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zasdfgbnm Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jacobhinkle commented Jun 6, 2023 •

edited

Loading