Skip to content

Conversation

@anijain2305
Copy link
Contributor

@anijain2305 anijain2305 commented Oct 2, 2018

The following PR accomplishes 2 items in the context of loop partitioning pass

  • Removing if conditions from the AST if they can be removed
  • Prev implementation was not partitioning loops on multiple loop variables because the other loop variable candidates were becoming stale as we partition for just one loop variable

Illustrative example
Input

realize compute([0, 60]) {
  produce compute {
    for (i.outer, 0, 4) {
      for (i.inner, 0, 16) {
        if (likely(((i.inner + (i.outer*16)) < 60))) {
          if (likely(((i.inner + (i.outer*16)) < 60))) {
            compute((i.inner + (i.outer*16))) =(A((i.inner + (i.outer*16))) + B((i.inner + (i.outer*16))))
          }
        }
      }
    }
  }
}

Output

// attr [compute(compute, 0x7f83a88706c0)] realize_scope = ""
realize compute([0, 60]) {
  produce compute {
    for (i.outer, 0, 3) {
      for (i.inner, 0, 16) {
        compute((i.inner + (i.outer*16))) =(A((i.inner + (i.outer*16))) + B((i.inner + (i.outer*16))))
      }
    }
    for (i.inner, 0, 12) {
      compute((i.inner + 48)) =(A((i.inner + 48)) + B((i.inner + 48)))
    }
  }
}


Example with multiple if conditions
Input

// attr [compute(compute, 0x7ffd9e4a0970)] realize_scope = ""
realize compute([0, 94], [0, 62]) {
  produce compute {
    for (i.outer, 0, 6) {
      for (j.outer, 0, 4) {
        for (i.inner, 0, 16) {
          for (j.inner, 0, 16) {
            if (likely(((i.inner + (i.outer*16)) < 94))) {
              if (likely(((i.inner + (i.outer*16)) < 94))) {
                if (likely(((j.inner + (j.outer*16)) < 62))) {
                  if (likely(((j.inner + (j.outer*16)) < 62))) {
                    compute((i.inner + (i.outer*16)), (j.inner + (j.outer*16))) =(A((i.inner + (i.outer*16)), (j.inner + (j.outer*16))) + B((i.inner + (i.outer*16)), (j.inner + (j.outer*16))))
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Output

// attr [compute(compute, 0x7ffd9e4a0970)] realize_scope = ""
realize compute([0, 94], [0, 62]) {
  produce compute {
    for (i.outer, 0, 5) {
      for (j.outer, 0, 3) {
        for (i.inner, 0, 16) {
          for (j.inner, 0, 16) {
            compute((i.inner + (i.outer*16)), (j.inner + (j.outer*16))) =(A((i.inner + (i.outer*16)), (j.inner + (j.outer*16))) + B((i.inner + (i.outer*16)), (j.inner + (j.outer*16))))
          }
        }
      }
      for (i.inner, 0, 16) {
        for (j.inner, 0, 14) {
          compute((i.inner + (i.outer*16)), (j.inner + 48)) =(A((i.inner + (i.outer*16)), (j.inner + 48)) + B((i.inner + (i.outer*16)), (j.inner + 48)))
        }
      }
    }
    for (j.outer, 0, 3) {
      for (i.inner, 0, 14) {
        for (j.inner, 0, 16) {
          compute((i.inner + 80), (j.inner + (j.outer*16))) =(A((i.inner + 80), (j.inner + (j.outer*16))) + B((i.inner + 80), (j.inner + (j.outer*16))))
        }
      }
    }
    for (i.inner, 0, 14) {
      for (j.inner, 0, 14) {
        compute((i.inner + 80), (j.inner + 48)) =(A((i.inner + 80), (j.inner + 48)) + B((i.inner + 80), (j.inner + 48)))
      }
    }
  }
}

@anijain2305
Copy link
Contributor Author

@xqdan @tqchen @ZihengJiang

@anijain2305
Copy link
Contributor Author

Ping

@xqdan
Copy link
Contributor

xqdan commented Oct 9, 2018

We are testing for this patch, will give feedback soon.

@anijain2305
Copy link
Contributor Author

Thanks @xqdan, please feel free to provide more testcases if something is of your interest and doesn't work with this patch.

@xqdan
Copy link
Contributor

xqdan commented Oct 9, 2018

@xqdan
Copy link
Contributor

xqdan commented Oct 9, 2018

more cases:
test_cce_loop_1: fail to partition
test_cce_loop_2: ok
test_cce_loop_3: fail to partition

import tvm

def test_cce_loop_1():
  ib = tvm.ir_builder.create()
  len = 112
  tile = 32
  loop = (len + tile - 1) // tile
  with ib.for_range(0, loop, 'i') as i:
    head = i * tile
    tail = tvm.select(head + tile > len, len, head + tile)
    ib.emit(tvm.call_extern('float32', "cce_intrisic", head, tail))

  stmt = ib.get()
  stmt = tvm.ir_pass.LoopPartition(stmt, True)
  stmt = tvm.ir_pass.Simplify(stmt)
  print stmt

def test_cce_loop_2():
  ib = tvm.ir_builder.create()
  len = 112
  tile = 32
  loop = (len + tile - 1) // tile
  with ib.for_range(0, loop, 'i') as i:
    head = i * tile
    with ib.if_scope(ib.likely(head + tile > len)):
      tail = len
      ib.emit(tvm.call_extern('float32', "cce_intrisic", head, tail))
    with ib.else_scope():
      tail = head + tile
      ib.emit(tvm.call_extern('float32', "cce_intrisic", head, tail))

  stmt = ib.get()
  stmt = tvm.ir_pass.LoopPartition(stmt, True)
  stmt = tvm.ir_pass.Simplify(stmt)
  print stmt

def test_cce_loop_3():
  ib = tvm.ir_builder.create()
  len = 112
  tile = 32
  loop = (len + tile - 1) // tile
  with ib.for_range(0, loop, 'i') as i:
    head = i * tile
    with ib.if_scope(ib.likely(head + tile > len)):
      tail = len
      with ib.if_scope(ib.likely(i > 0)):
        init = 0
        ib.emit(tvm.call_extern('float32', "cce_intrisic", head, tail, init))
      with ib.else_scope():
        init = 1
        ib.emit(tvm.call_extern('float32', "cce_intrisic", head, tail, init))
    with ib.else_scope():
      tail = head + tile
      with ib.if_scope(ib.likely(i > 0)):
        init = 0
        ib.emit(tvm.call_extern('float32', "cce_intrisic", head, tail, init))
      with ib.else_scope():
        init = 1
        ib.emit(tvm.call_extern('float32', "cce_intrisic", head, tail, init))

  stmt = ib.get()
  stmt = tvm.ir_pass.LoopPartition(stmt, True)
  stmt = tvm.ir_pass.Simplify(stmt)
  print stmt

if __name__ == "__main__":
  test_cce_loop_1()
  test_cce_loop_2()
  test_cce_loop_3()

@anijain2305
Copy link
Contributor Author

@xqdan Thanks for providing the test cases. Current situation is

  1. https://discuss.tvm.ai/t/simplify-if-stmt-within-for/830 --> If you add ib.likely in the conditions, it gets partitioned. Currently, the loop partitioning only works for likely primitives.

  2. cce_loop_1 --> select not supported for loop partitioning

  3. cce_loop_2 --> Seems good

  4. cce_loop_3 --> Will need to dig deeper. I think 2 conditions dependent on i back to back is throwing code off.

  with ib.for_range(0, loop, 'i') as i:
    head = i * tile
    with ib.if_scope(ib.likely(head + tile > len)):

@RichardFido
Copy link

hi,
here's another case as bellow fail to partition, please consider:

import tvm

def test():
ib = tvm.ir_builder.create()
loop1=4
loop2=9998
tile=39991
with ib.for_range(0,loop2,'i') as i:
with ib.for_range(0,loop1,'j') as j:
head1 = i
head2 = j
with ib.if_scope(ib.likely(head1loop1 < tile - head2)):
with ib.if_scope(ib.likely(head1
loop1 < tile - head2)):
ib.emit(tvm.call_extern('float16',"cce_intrinsic",head1))

stmt = ib.get()
print stmt
print "==="
stmt = tvm.ir_pass.LoopPartition(stmt,True)
print stmt

test()

Stmt post_body = Substitute(body, {{Var{var}, var + post_doubt_begin}});
post_stmt = MakeFor(node, max - post_doubt_begin + 1, post_body);
Stmt post_body;
// If the loop is going from 0 to 1, then replace the loop var with min value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove then

@anijain2305
Copy link
Contributor Author

@RichardFido Can you please edit the code to print it nicely? It doesn't compile as of now

@RichardFido
Copy link

sorry about that. @anijain2305
please check it again:


def test():
    ib = tvm.ir_builder.create()
    loop1 = 4 
    loop2 = 9998
    tile = 39991
    with ib.for_range(0,loop2,'i') as i:
        with ib.for_range(0,loop1,'j') as j:
            head1 = i 
            head2 = j 
            with ib.if_scope(ib.likely(head1*loop1 < tile - head2)):
                with ib.if_scope(ib.likely(head1*loop1 < tile - head2)):
                    ib.emit(tvm.call_extern('float16',"cce_intrisic",head1))

    stmt = ib.get()
    stmt = tvm.ir_pass.LoopPartition(stmt,True)
    print stmt

test()

if (*as_const_int(max) == *as_const_int(post_doubt_begin)) {
post_body = Substitute(body, {{Var{var}, post_doubt_begin}});
post_stmt = post_body;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just trying to understand, why this need to be handled separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example usecase is as follows. This is what statement looks in between the CLP is running

for (i = 0, 1)
  for(j = 0, 8)
    if(i + j < 4)
      Do_something

If the loop bounds are going from 0 to 1, then its better to replace i with 0 because then it simplifies the conditions, resulting in better partitioning.

@anijain2305
Copy link
Contributor Author

@RichardFido The example doesn't work because of the rigid structure of how bounds are deduced for the conditionals. I do not understand that portion of the code. There is an easy workaround for now.

Line causing issue - with ib.if_scope(ib.likely(head1*loop1 < tile - head2)):
Replace  - with ib.if_scope(ib.likely(head1*loop1 + head2 < tile)):

If the constant is not present on the right side alone, then the deduce_bound function does not work correctly. Solving for any combination of conditions will require decent changes in deduce_bounds functions or Simplify pass. I do not plan to work on that in this PR.

@xqdan
Copy link
Contributor

xqdan commented Oct 29, 2018

I suggest we can merge this PR first, we can send more PR to support other cases someone may need.

@tqchen
Copy link
Member

tqchen commented Oct 29, 2018

OK, reviewer, please take another round of look and https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

@tqchen tqchen merged commit dd1558a into apache:master Oct 30, 2018
@tqchen
Copy link
Member

tqchen commented Oct 30, 2018

THanks @xqdan @MarisaKirisame @anijain2305 @yzhliu this is now merged

@anijain2305 anijain2305 deleted the conditional_loop_partitioning branch June 14, 2019 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants