-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[BugFix] resolve integer 32. ~ 64. mismatch by casting #9582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Mousius
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ganler,
Could you please construct a test case representing this case?
|
It is a bit strange that this change seems to influence meta scheduler's results (according to CI). I am not sure if those test oracles are related to correctness. |
|
TVM has a pretty fragile system of using i32 vs i64; I personally experienced it a few times before... The meta schedule unittest is about extracting tasks from Relay. If it fails, it's probably because the number of graph partitions in the relay lowering pipeline goes wrong. |
|
Sorry, I think my prior idea is bad. I came up with a more compatible fix that simply matches key-value types in ir substitution. Thanks @junrushao1994 for the suggestions. |
In most cases I think it's fine since when we call That said, in IR substitution, it is ill-formed to have |
|
@YuchenJin @junrushao1994 Would you help review this PR if you are available (The CI finally works...) |
|
Hi @ganler, thanks for the work! Is this bug found by your fuzzer? :) For the regression test case, it is now implemented as compiling an e2e relay model. If this bug is specific to a pass (layout transformation pass in this case), it might be more clear and effective to write a unit test for the layout transformation pass alone. |
|
This bug is found by using an e2e model generated by PyTorch. For simplicity I converted it to Relay. I think I have located the root cause which is that the implementation of expression substitution did not make sure the type match. :-) |
|
I will try only using one specific pass to minimize the unit test. Thank you for the suggestion! @YuchenJin |
|
@YuchenJin It seems that this bug is triggered by the combination of 2 passes after my minimization (the de facto order of O3 optimizaiton): with tvm.transform.PassContext(opt_level=3):
with tvm.target.Target('llvm'):
mod = relay.transform.CanonicalizeOps()(mod)
mod = relay.transform.AlterOpLayout()(mod) |
YuchenJin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ganler, thanks for narrowing down the cause of this bug. LGTM and thanks for digging into it!
Environment & Flags
uname -a:Linux ise-manjaro 5.10.70-1-MANJARO #1 SMP PREEMPT Thu Sep 30 15:29:01 UTC 2021 x86_64 GNU/LinuxBug Description
When compiling this model, TVM tends to do some type (shape) inference related to NCHWc. In NCHWc->NCHW shape inference,
c + Cwill be evaluated.However, model shape axis values are int64 but some other values initialized by default is int32. So here's a data type mismatch. This data type mismatch will cause TVM to fail in
ExprMutatorsince binary operators require matched types. for operands.Fix
Initialize those artificial constants as int64 instead of int32 (
PrimExpr(0)will be regarded as int32.).Full failure log [click to expand]
cc: @YuchenJin @junrushao1994