Use `GetMetaData` for stride computation by zasdfgbnm · Pull Request #649 · NVIDIA/Fuser

zasdfgbnm · 2023-07-26T00:23:53Z

The definition of Tensor in runtime has been changed as

template <typename T, int Dims, int AllocDims = Dims>
struct Tensor {
  __device__ T& operator[](nvfuser_index_t ind) {
    return data[ind];
  };

  T* data;
  Array<nvfuser_index_t, Dims, 1> logical_size;
  Array<nvfuser_index_t, AllocDims, 1> alloc_stride;
};

That is, we are more explicit about whether we are referring to the size/stride of allocation domain or rFactor domain. On the host, the PolymorphicValue of tensor metadata now stores all the four logical_size, logical_stride, alloc_size, alloc_stride.

The utility that converts sizes and strides wrt rFactor to allocation domain was inferAndValidateAllocationSizesAndStrides, this function has been moved to tensor_metadata.cpp and is now private to that file. The new way to compute sizes and strides of allocation domain is:

expression_evaluator.evaluate(IrBuilder::metadataExpr(tv));

I need to change SchedulerRuntimeInfo::SchedulerRuntimeInfo, getKernelArgument, and validateAlignedVectorizedFusionInputOutput to use this new approach.

With this change, getTensorArg and KernelArgumentHolder::getBuffer are no longer used. We should be ready to remove all the subclasses of ArgAbstract, but I am not doing this clean up. These cleanups will be left as next PR.

…nputs

…cutor

…de-inference

zasdfgbnm · 2023-07-28T06:10:39Z

csrc/executor_kernel_arg.cpp

-// Forward traverse from rFactor domain to allocation domain, compute frontier
-// sizes and strides, validate that splits are divisible and merges are
-// contiguous, and update active_ids_ correspondingly.
-class ForwardTraverseFromRFactorToAlloc {


Moved to tensor_metadata.cpp unchanged.

zasdfgbnm · 2023-07-28T06:10:47Z

csrc/executor_kernel_arg.cpp

-};
-
-// Similar to ForwardTraverseFromRFactorToAlloc, but in the opposite direction.
-class BackwardTraverseFromRFactorToAlloc {


Moved to tensor_metadata.cpp unchanged.

zasdfgbnm · 2023-07-28T06:11:01Z

csrc/executor_kernel_arg.cpp

-// is [I1, I2], and the tensor's size is [15] and stride is [7], and the extent
-// of I2 is 5, then the resulting size will be [3, 5] and stride will be [35, 7]
-std::vector<std::pair<int64_t, int64_t>>
-inferAndValidateAllocationSizesAndStrides(


Moved to tensor_metadata.cpp slightly changed.

zasdfgbnm · 2023-07-28T06:17:41Z

!build

zasdfgbnm · 2023-07-28T06:54:00Z

!build

jacobhinkle

LGTM. The changes are mostly mechanical due to name change and moving code. I made one note of no-longer-used method and member of TensorArg, but since you'll be removing all ArgAbstract subclassess soon anyway cleaning up the interface is unimportant.

jacobhinkle · 2023-07-28T13:13:49Z

csrc/executor_kernel_arg.h

-    TORCH_INTERNAL_ASSERT(
-        (size_t)instance_.nAllocationDims() == sizes_strides.size());
-    for (auto i : c10::irange((int64_t)sizes_strides.size())) {
-      alloc_sizes.at(i) = sizes_strides.at(i).first;


Since this is removed, where does alloc_sizes get set now?

OK I see; it's not used anymore since it is all done in metadata. So this means we can also get rid of alloc_sizes along with getAllocSize().

Yes, this change breaks TensorArg, but it is not used so who cares if it is broken.

naoyam · 2023-07-28T15:56:34Z

@zasdfgbnm Could you remind me again what the logical stride means? Does that just strides computed from the logical sizes of a logical contiguous tensor?

zasdfgbnm · 2023-07-28T16:35:23Z

logical strides is just the stride in terms of rFactor domain, that is the "raw" strides from PyTorch using tensor.strides(). In comparison, alloc_stride is the stride in terms of allocation domain that we compute from logical stride

zasdfgbnm added 25 commits July 18, 2023 18:24

Initial kernel input support

faed8fb

Merge branch 'main' into kernel_inputs

55f5c37

format

ea58b44

Merge branch 'kernel_inputs' of github.com:NVIDIA/Fuser into kernel_i…

9a6786a

…nputs

cleanup

06645a3

cleanup

48b2538

save

2080387

minimum set of inputs

e09b45c

Merge branch 'main' of github.com:NVIDIA/Fuser into kernel_inputs

1cfcbc3

cleanup

8b9e980

save

72937d7

Merge branch 'main' of github.com:NVIDIA/Fuser into kernel_inputs_exe…

246603a

…cutor

fix

d0cb6d5

save

5d9220d

save

6060fb9

save

1d6301d

outputs and global buffers as kernel inputs

b0ea760

real migration

7eedc74

tidy

5848305

renamings

14d50b7

cleanups

86cde8d

commenting

38d6179

more cleanup

3886732

save

a6e3524

revert kernel change

c6ec315

zasdfgbnm changed the title ~~Metadata for stride inference~~ Use GetMetaData for stride computation Jul 26, 2023

zasdfgbnm added 4 commits July 25, 2023 17:40

Merge branch 'main' into kernel_inputs_executor

18ca119

Merge branch 'kernel_inputs_executor' into metadata-for-stride-inference

4183996

comment

a694f1a

Merge branch 'kernel_inputs_executor' into metadata-for-stride-inference

50a70e4

Base automatically changed from kernel_inputs_executor to main July 26, 2023 19:05

zasdfgbnm added 11 commits July 26, 2023 23:23

Merge branch 'main' of github.com:NVIDIA/Fuser into metadata-for-stri…

9adafd3

…de-inference

fix

30292f7

save both logical and alloc size and stride

27b5ad8

move code

0ddb297

without precomputed values

2d22305

fix ExprSimplifierTest

ca618be

fix AllocationDomainTest

bc847b1

fix MetadataAsTensor

c815b87

fix LoopRotationTest

7e5dafe

fix other tests

1eb8338

Merge branch 'main' into metadata-for-stride-inference

b00d99b

zasdfgbnm commented Jul 28, 2023

View reviewed changes

unchange

cfc9b72

zasdfgbnm marked this pull request as ready for review July 28, 2023 06:21

zasdfgbnm requested a review from jacobhinkle July 28, 2023 06:21

zasdfgbnm assigned naoyam and unassigned naoyam Jul 28, 2023

zasdfgbnm requested a review from naoyam July 28, 2023 06:22

tidy

2a21b8f

jacobhinkle approved these changes Jul 28, 2023

View reviewed changes

zasdfgbnm merged commit 87f046d into main Jul 28, 2023

zasdfgbnm deleted the metadata-for-stride-inference branch July 28, 2023 15:00

zasdfgbnm mentioned this pull request Aug 19, 2023

Shape inference time doubled after #649 #749

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `GetMetaData` for stride computation#649

Use `GetMetaData` for stride computation#649
zasdfgbnm merged 42 commits intomainfrom
metadata-for-stride-inference

zasdfgbnm commented Jul 26, 2023 •

edited

Loading

Uh oh!

zasdfgbnm Jul 28, 2023

Uh oh!

zasdfgbnm Jul 28, 2023

Uh oh!

zasdfgbnm Jul 28, 2023

Uh oh!

zasdfgbnm commented Jul 28, 2023

Uh oh!

zasdfgbnm commented Jul 28, 2023

Uh oh!

jacobhinkle left a comment

Uh oh!

jacobhinkle Jul 28, 2023

Uh oh!

jacobhinkle Jul 28, 2023

Uh oh!

zasdfgbnm Jul 28, 2023

Uh oh!

naoyam commented Jul 28, 2023

Uh oh!

zasdfgbnm commented Jul 28, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zasdfgbnm commented Jul 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zasdfgbnm Jul 28, 2023

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm Jul 28, 2023

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm Jul 28, 2023

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm commented Jul 28, 2023

Uh oh!

zasdfgbnm commented Jul 28, 2023

Uh oh!

jacobhinkle left a comment

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jul 28, 2023

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Jul 28, 2023

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm Jul 28, 2023

Choose a reason for hiding this comment

Uh oh!

naoyam commented Jul 28, 2023

Uh oh!

zasdfgbnm commented Jul 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zasdfgbnm commented Jul 26, 2023 •

edited

Loading

zasdfgbnm commented Jul 28, 2023 •

edited

Loading