Ignore trivial loops in memory aliasing pass#766
Conversation
|
As I mentioned in the original PR, can you please make sure there's no invalid |
We should never get an invalid access due to |
naoyam
left a comment
There was a problem hiding this comment.
Feel free to merge once the check with the generated code is done
|
Manually checked diffs in codegen. There is still quite a bit of non-determinism, which I believe is coming from
I think this is safe to merge. |
Did you see diffs with the benchmarks or the tests, or both? Last time I checked I didn't see any diff with the benchmarks. I did see some minor diffs with a few benchmarks. |
|
Hmm, could you please create an issue with a repro? |

Trivial
kir::ForLoops are ones that appear in the kernel IR, but do not appear in the generated CUDA kernel. This can happen for a number of reasons: for example if that dimension is vectorized, or if it's parallelized with a stop value equal to the extent of a dimension. We can test this withkir::ForLoop::isTrivial(). Consider an example:In this case, all of the parallelized for loops are trivial, and only the
FOR i806loop appears in the generated code. That means the actual lifetimes of T7 and T8 overlap and those of T8 and T9 overlap, but not those of T7 and T9.In the aliasing pass, we define outer live intervals as those at the scope of the allocation. In the above case, it will set the outer live interval of all three allocations equal to the start and end of the
blockIdx.xloop.This PR ignores trivial loops in this analysis, so that outer live intervals are defined at the scope that will be realized in the CUDA kernel at the level of the
Allocateexpression. In the above example, this means the outer live intervals for T7 and T9 will no longer overlap, so they are eligible for memory re-use.