Skip to content

compute_with() and async() don't work well together #5179

@vksnk

Description

@vksnk

Following example will produce incorrect results:

producer1(x, y) = x + y;
producer2(x, y) = 3 * x + 2 * y;
consumer(x, y) = producer1(x, y - 1) + producer1(x, y + 1) + producer2(x, y - 1) + producer2(x, y + 1);
consumer.compute_root();

producer1.compute_root().async();
producer2.compute_root().compute_with(producer1, y);

consumer.bound(x, 0, 16).bound(y, 0, 16);

From IR, it looks like the problem that it will have two separate allocations for producer2, one used in producer and one in consumer, so consumer will end up reading the data which was never written.

Similarly, it also will work incorrectly, if we use async() on both producers:

producer1.compute_root().async();
producer2.compute_root().compute_with(producer1, y).async();;

This one is even more interesting, because it will end up with almost empty fork block for producer2:

 fork {                                                                                                                                                                 
  allocate producer2[int32 * 16 * 18]                                                                                                                                   
  produce producer1 {          
   for (producer1.s0.fused.y, -1, 18) {
    for (producer1.s0.x, 0, 16) {
     producer1[((producer1.s0.fused.y*16) + producer1.s0.x) + 16] = producer1.s0.fused.y + producer1.s0.x
    } // for producer1.s0.x
    let t8 = producer1.s0.fused.y*16
    let t7 = producer1.s0.fused.y*2
    for (producer2.s0.x, 0, 16) {
     producer2[(producer2.s0.x + t8) + 16] = (producer2.s0.x*3) + t7
    } // for producer2.s0.x
   } // for producer1.s0.fused.y
   free producer2
   halide_semaphore_release(producer1.semaphore_0, 1)
  }
 } {                       
  allocate producer2[int32 * 16 * 18]                                               
  let producer2.semaphore_0 = (halide_semaphore_t *)alloca(16)
  halide_semaphore_init(producer2.semaphore_0, 0)      
  fork {                                                                            
   produce producer2 {                                                                                                                                                  
    halide_semaphore_release(producer2.semaphore_0, 1)                                                                                                                  
   }                      
  } {

The only schedule which works correctly is when async() applied only to producer2 (but I am not sure it will hold if fused group has more than 2 functions):

producer1.compute_root();
producer2.compute_root().compute_with(producer1, y).async();;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions