-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Description
The PR #7348 removes broadcast_to before batch_matmul because batch_matmul already supported implicitly broadcast. However, the CuBLAS implementation isn't changed accordingly, which results in the failure of the following case:
import numpy as np
import tvm
from tvm import relay
from tvm.contrib import graph_runtime
sa = (4, 128, 768)
sb = (1, 768, 768)
a = relay.var("a", shape=sa)
b = relay.var("b", shape=sb)
c = relay.nn.batch_matmul(a, b)
f = relay.Function([a, b], c)
mod = tvm.ir.IRModule.from_expr(f)
mod = relay.transform.InferType()(mod)
with tvm.transform.PassContext(opt_level=3):
lib = relay.build(mod, target="cuda") # change target to "cuda -libs=cublas" will fail
ctx = tvm.gpu(0)
m = graph_runtime.GraphModule(lib["default"](ctx))
p = np.random.uniform(0, 1, sa)
q = np.random.uniform(0 ,1, sb)
m.set_input("a", p)
m.set_input("b", q)
ftimer = m.module.time_evaluator("run", ctx, number=1, repeat=10)
prof_res = np.array(ftimer().results) * 1000
print(np.mean(prof_res))I guess we need to either add the broadcast_to back or support implicitly broadcasting in CuBLAS implementation.
masahi and csullivan
Metadata
Metadata
Assignees
Labels
No labels