-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[Tensorize]Fix tensorize error while reusing compute #7879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I agree with you. Not only this example, we are able to avoid reusing in any case. But basically, reusing is never a special case for expr analyzing or tensorize matching. Should TVM complains about the that usage? If reusing should be avoided, I think we still need some clearer error message to inform the user to correct the wrong code. |
|
@llehtahw @comaniac The modification does not consider this reusing case. :( Substitute LHS (aka the provide) should solve this reusing case. The reason is after Normalize step, IterVar i is rebased, but only RHS (aka the intrin) has been updated. // src/te/operation/tensorize.cc:330
PrimExpr lhs = ana.Simplify(Substitute(body[i], value_map));I think it is okay to reuse compute since this is a way to reuse the compute concepts to describe the behavior of HW intrinsic. Actually, we also do some reusings, but we don't use the same compute directly, what we do like follows. |
|
@llehtahw Thanks for pointing this out. :) One more thing, why not use call_extern, etc. to achieve the same goal? |
|
What I need to explain more is in our case, we use compute to define the HW intrinsic, it is may as the same as Operators, such as element-wise. But it is not the same compute (I mean the same addresses of all nodes), instead of the concept. :) For example, an Add op has 4 dims (i0, i1, i2, i3) and the three innermost (i1, i2, i3) axes will be split, then after reorder, the region of the inner axes (i1.i, i2.i, i3.i) will be tensorized to an Intrin_Add. In this situation, Add op and the Intrin_Add share the compute. So, in the reusing case, it is more natural to |
In fact, my example was just drafted for showing the Actually I use
+1 Thanks for you explaining @leeexyz |
After #7497, this snippet fails:
When using the same compute body in both
decl_tensor_intrinandcreate_schedule, after thisSubstitute, some nodes' address changed, and then check failed.I haven't figured out why this happens when reusing compute body, but I think we can drop the comparison if the two
exprs are exactly the same one.