-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[Unity][Transform] Introduce data-dependent operation of reshape and its constant folding #14282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
|
please update via |
|
|
||
| def test_op_tensor_to_shape(): | ||
| out_shape = run_cpu( | ||
| TensorToShapeTest, "run_tensor_to_shape", tvm.nd.array(np.array([1, 2, 3]).astype("int64")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the demonstrated behavior is correct. We are converting the rank-1 tensor [1, 2, 3] into a shape (1, 2, 3). I think the problem here is that ndim means different things for tensors (where it means the rank) and shapes (where it means the number of dimensions denoted by the shape).
47eead9 to
b75f11f
Compare
| ICHECK(call->args.size() == 1); | ||
| ICHECK(call->args[0]->struct_info_.defined()); | ||
| const auto* tsinfo = GetStructInfoAs<TensorStructInfoNode>(call->args[0]); | ||
| ICHECK(tsinfo && tsinfo->shape.defined()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this condition is more restrictive than necessary. I think it's okay for the shape not to be known at compile time and to return ShapeStructInfo(ndim=-1) if the input rank/shape is unknown at compile time.
Edit: I guess you use the rank in vm_builtin_lower.cc but I don't see why it has to be implemented that way. You could check all the necessary properties dynamically (inside the builtin)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean these lines?
// define symbolic variables
Array<PrimExpr> shape_var;
for (int i = 0; i < sinfo->ndim; i++) {
shape_var.push_back(tir::Var("x", DataType::Int(64)));
}
Initially, I also wanted to support the unknown rank but realized it is trickier than I thought.
The problem is you need to insert these symbolic variables at compile-time so we need this info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly are the new shape vars needed for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To MatchCast and put them in the ShapeExpr.
// define symbolic variables
Array<PrimExpr> shape_var;
for (int i = 0; i < sinfo->ndim; i++) {
shape_var.push_back(tir::Var("x", DataType::Int(64)));
}
// bind symbolic variables to the shape tuple
relax::Var var("y", ShapeStructInfo(shape_var));
builder_->EmitNormalized(MatchCast(var, call, ShapeStructInfo(shape_var)));
return ShapeExpr(shape_var);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That much I understand, what I am more curious about is why we need to construct this shape expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does lowering require the dimensions to be bound to a variable? That doesn't seem like a restriction that needs to exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slyubomirsky To pre-allocate memory for TIR functions? TE compute takes the output shape in Array<PrimExpr> as the first argument:
Tensor compute(Array<PrimExpr> shape, FCompute fcompute, std::string name, std::string tag,
Map<String, ObjectRef> attrs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great discussion folks! Looks the limitation for shape_expr is from TE, interesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, I think we should put this on the list to fix later. Let's please note this limitation in a comment, since it is not obvious from looking at the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this in the comment. Would it be good to go now?
| TVM_REGISTER_GLOBAL("vm.builtin.tensor_to_shape").set_body_typed([](NDArray data) { | ||
| NDArray arr = data; | ||
| if (data->device.device_type != kDLCPU) { | ||
| arr = data.CopyTo(DLDevice{kDLCPU, 0}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this copy fail if the NDArray is on another device? I'm a little hesitant to have an op that just will not work depending on the device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that we have a shape tensor not on host device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure where this might come up. I wouldn't be surprised if, say, we import an ONNX model (this is where this use-case first came up) that we mean to run on GPU and every tensor (including those that stand for shapes) is stored on GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my current understanding, we want to keep the shape-related computation on the host side since its result could be related to the memory planning. Even for general GPU execution besides this case, we are running output shape computation (inserted by VMShapeLower pass) on the host side.
11842e3 to
802a01b
Compare
|
This PR extends existing |
Hzfengsy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for a naming issue
| Map<Expr, Expr> batch_norm_map_; | ||
| }; | ||
|
|
||
| class OpDecomposer : public ExprMutator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of OpDecoposer is too generic since it's only for tensor_to_shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can extend this function to add other composite ops like erf, attention which will be upcoming soon.
|
@sunggg Hi actually we also do some changes in |
3de7d67 to
24a9fe7
Compare
|
@yongwww @slyubomirsky would you take another look? |
|
@sunggg it would be good to rebase it via |
a07929a to
f4095b6
Compare
yongwww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for addressing my concerns!
slyubomirsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concerns have been addressed, thank you
|
Thanks @sunggg for the thoughtful design and quick implementation! |
…its constant folding (#14282) * FEAT: Support data-dependent operation of reshape * FEAT: Support constant folding with data-dependent reshape * fix * remove empty line * reflect feedback * Lift the lowering of tensor_to_shape from builtin to DecomposeCompositeOps pass * fix and comment * fix * add comments * reflect feedback * add comment * fix
…its constant folding (#14282) * FEAT: Support data-dependent operation of reshape * FEAT: Support constant folding with data-dependent reshape * fix * remove empty line * reflect feedback * Lift the lowering of tensor_to_shape from builtin to DecomposeCompositeOps pass * fix and comment * fix * add comments * reflect feedback * add comment * fix
…its constant folding (#14282) * FEAT: Support data-dependent operation of reshape * FEAT: Support constant folding with data-dependent reshape * fix * remove empty line * reflect feedback * Lift the lowering of tensor_to_shape from builtin to DecomposeCompositeOps pass * fix and comment * fix * add comments * reflect feedback * add comment * fix
PR (apache/tvm#14282) refactors the pass `SimplifyNorm` to `DecomposeOps`. Last rebase leaves the conflicts to be fixed and this PR merges apache/tvm#14282 and #162 together. The changes mainly include: - func pass -> module pass (Because sometimes we don't want simplify all functions in a module) - Add a `mode` argument to indicate whether it is a training simplification or eval simplification.
As discussed in previous Unity open dev meeting, this PR implements data-dependent operation of reshape.
Representative examples
This PR realizes data-dependent operation of reshape as follows.
Also,
FoldConstantis extended to supporttensor_to_shape.Summary of changes
builtintensor_to_shapeShapeExpr | Array[PrimExpr]. This PR extends this to takeVaronly when it is bound toShapeExpr.FoldConstantpass to supporttensor_to_shape03/18/2023 Update
Turned out if we implement
target_to_shapeas a builtin, its lowering happens too late and we cannot lower the followingreshape-like ops since they need at least symbolic shape to legalize. Therefore,target_to_shapeshould be lowered before we lower reshape. New change extends existingSimplifyNormInferencepass to serve as a generic pass to decompose composite operators liketensor_to_shape,attention,erf, etc.Follow-up PRs