Skip to content

[FeatureRequest] codegen reshape/view on python API #22

@jjsjann123

Description

@jjsjann123

Background

reshape/view in nvfuser doesn't imply memory alias, so we'll be referring to this as reshape in this issue to keep the conversation simple and accurate.

nvfuser reshape is implemented via translating to a series of keep, merge and split:

Fuser/csrc/ops/alias.cpp

Lines 20 to 63 in 86d5dd3

//! Transform TensorView according to keep, merge, and split transformations.
//! Squeeze and broadcast transformations are handled separately.
//! It is recommend to use the composite ops view function, which will call
//! the analyzeView function to generate the appropriate transformations.
//!
//! For example:
//! original sizes = [2, 10, 40]
//! new_size = [2, 10, 2, 20]
//! auto analysis = analyzeView(TV0, original_sizes, new_sizes)
//! auto TV1 = TV0->view(analysis.transforms);
//!
//! Transforms = [(Keep I0), (Keep I1), (Split I2 by 2)]
//! Before: TV0[I0, I1, I2]
//! After: TV0[I0, I1, 2, ceilDiv(I2, 2)]
//!
//! orig_tv is the tensor view originally coming in from user for the view
//! operation. This is the tensor view all of the view analysis is relative to.
//! View might be doing squeezes before sending into the view operation, so we
//! want the actual input to the view operation to be potentially after the
//! original view operation.
TensorView* applyViewTransforms(
TensorView* orig_tv,
TensorView* post_reduce_tv,
const AnalyzeViewResult& view_analysis) {
TORCH_INTERNAL_ASSERT(orig_tv != nullptr, "Input is invalid.");
TORCH_INTERNAL_ASSERT(post_reduce_tv != nullptr, "Input is invalid.");
TORCH_INTERNAL_ASSERT(
!post_reduce_tv->hasComputeAt(),
"Cannot modify rfactor domain after compute at has been set.");
TORCH_INTERNAL_ASSERT(
post_reduce_tv->nDims() > 0, "Tried to view a 0-dim TensorView");
TORCH_INTERNAL_ASSERT(!view_analysis.transforms.empty());
TensorView* consumer = IrBuilder::create<TensorView>(
orig_tv->container(),
orig_tv->domain()->view(view_analysis),
orig_tv->getDataType().value());
IrBuilder::create<ViewOp>(orig_tv->container(), consumer, post_reduce_tv);
return consumer;
}

nvfuser reshape support in TorchScript

Currently we rely on some runtime checks to ensure that the reshape parsing, i.e. ViewOp in the fusion, is still semantically correct. This works fine for our TorchScript integration, where we can rely on a guard operator that queries the backend API

auto new_constraints = nvfuser::analyzeViewConstraint(
tensor_sizes_int_vec, view_sizes_int_vec);
to reject the fusion.

python API and cache

This workflow is harder to do with our python integration though. There're a few reasons:

  1. The lack of shape inference in our python API makes it tricky for us to validate the runtime tensor shape to reshape ops.
  2. FusionRecord design has the assumption that each leaf node in the trie structure indicates a single / unique fusion object. If our reshape node in FusionRecord would be lowered to different fusion based on input shapes, that's some nasty patching to the design. cc'ing @kevinstephano @jacobhinkle for reference.

current plan

IIUC, we are moving forward with more plumbing to support our reshape logic in python API, a few on-going items (cc'ing @csarofeen @naoyam for reference):

  • @naoyam is working on more API to allow expression evaluation accessible at python API, so we'll be able to infer input shapes to reshape ops.
  • We are plumbing nvfuser::analyzeViewConstraint to our cache system, so that we can map the inferred shape to pick the right fusion object in order to pick up the right fusion.

This is a lot of refactor that needs to happen in order for the new workflow to work. It feels like we are doing quite a lot plumbing on the codegen as well as the python API side in order to mimic a reshape op in the codegen.
But in the end, we are not doing anything more than just a decomposition. A decomposition should be much easier performed and validated at the program acquisition time. IIUC, the missing piece now that stops us from doing that is just shape inference in our integration.

I know this is mostly just a design decision and we are pushing to expose nvfuser expression evaluation to client facing APIs. I'm not sure if we could really expect our expression evaluation to replace a shape inference mechanism on our integration, merely due to the fact that nvfuser op coverage is limited, and the awkward program flow where expression evaluation is only available after we have a fusion IR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions