-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Description
Consider the following program in which we split a loop by a factor of 48 and then we split the inner loop (which is of size 48) by a factor of 32.
import tvm
m=100
A=tvm.placeholder((m,), name='A')
C=tvm.compute((m,), lambda *index: A(*index), name='C')
s=tvm.create_schedule(C.op)
do,di=s[C].split(C.op.axis[0], 48)
di0,di1 = s[C].split(di, 32)
print(tvm.lower(s, [A,C], simple_mode=True))The generated Halide IR is:
produce C {
for (i0.outer, 0, 3) {
for (i0.inner.outer, 0, 2) {
for (i0.inner.inner, 0, 32) {
if (likely(((i0.outer*48) < ((100 - i0.inner.inner) - (i0.inner.outer*32))))) {
C[(((i0.outer*48) + (i0.inner.outer*32)) + i0.inner.inner)] = A[(((i0.outer*48) + (i0.inner.outer*32)) + i0.inner.inner)]
}
}
}
}
}
While the generated code is functionally correct but it's inefficient in the sense that some points of the iteration space are visited more than once. In particular, when i0.outer is 0, we execute the assignment for points [0-63], when i0.outer is 1, we execute it for points [48-99], and when i0.outer is 2, we execute it for points [96-99].
If we add a predicate that relates i0.inner.outer and i0.inner.inner (i.e., i0.inner.outer*32 + i0.inner.inner < 48) the problem will be solved.
Metadata
Metadata
Assignees
Labels
No labels