[FeatureRequest] codegen reshape/view on python API

# Background

reshape/view in nvfuser doesn't imply memory alias, so we'll be referring to this as reshape in this issue to keep the conversation simple and accurate.

nvfuser reshape is implemented via translating to a series of keep, merge and split: https://github.com/NVIDIA/Fuser/blob/86d5dd3b4acd9f63e6184e020c44d3e1626a70eb/csrc/ops/alias.cpp#L20-L63

### nvfuser reshape support in TorchScript

Currently we rely on some runtime checks to ensure that the reshape parsing, i.e. `ViewOp` in the fusion, is still semantically correct. This works fine for our TorchScript integration, where we can rely on a `guard` operator that queries the backend API https://github.com/NVIDIA/Fuser/blob/86d5dd3b4acd9f63e6184e020c44d3e1626a70eb/csrc/register_interface.cpp#L430-L431 to reject the fusion.

### python API and cache

This workflow is harder to do with our python integration though. There're a few reasons:
1. The lack of shape inference in our python API makes it tricky for us to validate the runtime tensor shape to `reshape` ops.
2. FusionRecord design has the assumption that each leaf node in the trie structure indicates a single / unique fusion object. If our `reshape` node in FusionRecord would be lowered to different fusion based on input shapes, that's some nasty patching to the design. cc'ing @kevinstephano @jacobhinkle for reference.

### current plan

IIUC, we are moving forward with more plumbing to support our `reshape` logic in python API, a few on-going items (cc'ing @csarofeen @naoyam for reference):
- @naoyam is working on more API to allow expression evaluation accessible at python API, so we'll be able to infer input shapes to `reshape` ops.
- We are plumbing `nvfuser::analyzeViewConstraint` to our cache system, so that we can map the inferred shape to pick the right fusion object in order to pick up the right fusion.

This is a lot of refactor that needs to happen in order for the new workflow to work. It feels like we are doing quite a lot plumbing on the codegen as well as the python API side in order to mimic a `reshape` op in the codegen.
But in the end, we are not doing anything more than just a decomposition. A decomposition should be much easier performed and validated at the program acquisition time. IIUC, the missing piece now that stops us from doing that is just shape inference in our integration.

I know this is mostly just a design decision and we are pushing to expose nvfuser expression evaluation to client facing APIs. I'm not sure if we could really expect our expression evaluation to replace a shape inference mechanism on our integration, merely due to the fact that nvfuser op coverage is limited, and the awkward program flow where expression evaluation is only available after we have a fusion IR.

	//! Transform TensorView according to keep, merge, and split transformations.
	//! Squeeze and broadcast transformations are handled separately.
	//! It is recommend to use the composite ops view function, which will call
	//! the analyzeView function to generate the appropriate transformations.
	//!
	//! For example:
	//! original sizes = [2, 10, 40]
	//! new_size = [2, 10, 2, 20]
	//! auto analysis = analyzeView(TV0, original_sizes, new_sizes)
	//! auto TV1 = TV0->view(analysis.transforms);
	//!
	//! Transforms = [(Keep I0), (Keep I1), (Split I2 by 2)]
	//! Before: TV0[I0, I1, I2]
	//! After: TV0[I0, I1, 2, ceilDiv(I2, 2)]
	//!
	//! orig_tv is the tensor view originally coming in from user for the view
	//! operation. This is the tensor view all of the view analysis is relative to.
	//! View might be doing squeezes before sending into the view operation, so we
	//! want the actual input to the view operation to be potentially after the
	//! original view operation.
	TensorView* applyViewTransforms(
	TensorView* orig_tv,
	TensorView* post_reduce_tv,
	const AnalyzeViewResult& view_analysis) {
	TORCH_INTERNAL_ASSERT(orig_tv != nullptr, "Input is invalid.");
	TORCH_INTERNAL_ASSERT(post_reduce_tv != nullptr, "Input is invalid.");
	TORCH_INTERNAL_ASSERT(
	!post_reduce_tv->hasComputeAt(),
	"Cannot modify rfactor domain after compute at has been set.");

	TORCH_INTERNAL_ASSERT(
	post_reduce_tv->nDims() > 0, "Tried to view a 0-dim TensorView");

	TORCH_INTERNAL_ASSERT(!view_analysis.transforms.empty());

	TensorView* consumer = IrBuilder::create<TensorView>(
	orig_tv->container(),
	orig_tv->domain()->view(view_analysis),
	orig_tv->getDataType().value());

	IrBuilder::create<ViewOp>(orig_tv->container(), consumer, post_reduce_tv);

	return consumer;
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FeatureRequest] codegen reshape/view on python API #22

Background

nvfuser reshape support in TorchScript

python API and cache

current plan

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	auto new_constraints = nvfuser::analyzeViewConstraint(
	tensor_sizes_int_vec, view_sizes_int_vec);

[FeatureRequest] codegen reshape/view on python API #22

Description

Background

nvfuser reshape support in TorchScript

python API and cache

current plan

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions