Skip to content

Use GetMetaData for stride computation#649

Merged
zasdfgbnm merged 42 commits intomainfrom
metadata-for-stride-inference
Jul 28, 2023
Merged

Use GetMetaData for stride computation#649
zasdfgbnm merged 42 commits intomainfrom
metadata-for-stride-inference

Conversation

@zasdfgbnm
Copy link
Collaborator

@zasdfgbnm zasdfgbnm commented Jul 26, 2023

The definition of Tensor in runtime has been changed as

template <typename T, int Dims, int AllocDims = Dims>
struct Tensor {
  __device__ T& operator[](nvfuser_index_t ind) {
    return data[ind];
  };

  T* data;
  Array<nvfuser_index_t, Dims, 1> logical_size;
  Array<nvfuser_index_t, AllocDims, 1> alloc_stride;
};

That is, we are more explicit about whether we are referring to the size/stride of allocation domain or rFactor domain. On the host, the PolymorphicValue of tensor metadata now stores all the four logical_size, logical_stride, alloc_size, alloc_stride.

The utility that converts sizes and strides wrt rFactor to allocation domain was inferAndValidateAllocationSizesAndStrides, this function has been moved to tensor_metadata.cpp and is now private to that file. The new way to compute sizes and strides of allocation domain is:

expression_evaluator.evaluate(IrBuilder::metadataExpr(tv));

I need to change SchedulerRuntimeInfo::SchedulerRuntimeInfo, getKernelArgument, and validateAlignedVectorizedFusionInputOutput to use this new approach.

With this change, getTensorArg and KernelArgumentHolder::getBuffer are no longer used. We should be ready to remove all the subclasses of ArgAbstract, but I am not doing this clean up. These cleanups will be left as next PR.

@zasdfgbnm zasdfgbnm changed the title Metadata for stride inference Use GetMetaData for stride computation Jul 26, 2023
Base automatically changed from kernel_inputs_executor to main July 26, 2023 19:05
// Forward traverse from rFactor domain to allocation domain, compute frontier
// sizes and strides, validate that splits are divisible and merges are
// contiguous, and update active_ids_ correspondingly.
class ForwardTraverseFromRFactorToAlloc {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to tensor_metadata.cpp unchanged.

};

// Similar to ForwardTraverseFromRFactorToAlloc, but in the opposite direction.
class BackwardTraverseFromRFactorToAlloc {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to tensor_metadata.cpp unchanged.

// is [I1, I2], and the tensor's size is [15] and stride is [7], and the extent
// of I2 is 5, then the resulting size will be [3, 5] and stride will be [35, 7]
std::vector<std::pair<int64_t, int64_t>>
inferAndValidateAllocationSizesAndStrides(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to tensor_metadata.cpp slightly changed.

@zasdfgbnm
Copy link
Collaborator Author

!build

@zasdfgbnm zasdfgbnm marked this pull request as ready for review July 28, 2023 06:21
@zasdfgbnm zasdfgbnm requested a review from jacobhinkle July 28, 2023 06:21
@zasdfgbnm zasdfgbnm assigned naoyam and unassigned naoyam Jul 28, 2023
@zasdfgbnm zasdfgbnm requested a review from naoyam July 28, 2023 06:22
@zasdfgbnm
Copy link
Collaborator Author

!build

Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The changes are mostly mechanical due to name change and moving code. I made one note of no-longer-used method and member of TensorArg, but since you'll be removing all ArgAbstract subclassess soon anyway cleaning up the interface is unimportant.

TORCH_INTERNAL_ASSERT(
(size_t)instance_.nAllocationDims() == sizes_strides.size());
for (auto i : c10::irange((int64_t)sizes_strides.size())) {
alloc_sizes.at(i) = sizes_strides.at(i).first;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is removed, where does alloc_sizes get set now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see; it's not used anymore since it is all done in metadata. So this means we can also get rid of alloc_sizes along with getAllocSize().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this change breaks TensorArg, but it is not used so who cares if it is broken.

@zasdfgbnm zasdfgbnm merged commit 87f046d into main Jul 28, 2023
@zasdfgbnm zasdfgbnm deleted the metadata-for-stride-inference branch July 28, 2023 15:00
@naoyam
Copy link
Collaborator

naoyam commented Jul 28, 2023

@zasdfgbnm Could you remind me again what the logical stride means? Does that just strides computed from the logical sizes of a logical contiguous tensor?

@zasdfgbnm
Copy link
Collaborator Author

zasdfgbnm commented Jul 28, 2023

logical strides is just the stride in terms of rFactor domain, that is the "raw" strides from PyTorch using tensor.strides(). In comparison, alloc_stride is the stride in terms of allocation domain that we compute from logical stride

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants