Conversation
| // constant folding | ||
| ASSERT_TRUE(simplifyExpr("ceilDiv( 5 , 3 ) * 5"_)->sameAs("10"_)); | ||
|
|
||
| ASSERT_TRUE(simplifyExpr("1 * i"_)->sameAs("i"_)); |
There was a problem hiding this comment.
I replaced some TORCH_CHECK here with ASSERT_TRUE.
| // Simplify dim_map_ | ||
| for (auto& [k, v] : dim_map_) { | ||
| v = simplifyExpr(v); | ||
| // Well, this isn't really correct, but we need this assumption to better |
There was a problem hiding this comment.
But doesn't this actually break if empty tensors are given?
I know we haven't been very strict with empty tensors, so it's highly likely there are other things that would break with empty tensors, so this isn't a new issue.
Maybe we want to add a property to Kernel asserting there's no empty tensor? Similar to what we do with the index type, we could lower a fusion with an optional assertion (based on actual input tensor sizes) that it is safe to assume no tensor is empty. That seems to work good enough, doesn't it?
There was a problem hiding this comment.
Hmm, I don't know if there is anything really breaks today. If a tensor size is 0, and it appears in the parallel dim, this usually means we want to launch 0 blocks or 0 threads, this usually means all the fusion's output are empty tensor, and the lowering of this case is already skipped. The thing that I am really worried about is, if some extent is T0.size[0] + 1, then empty tensor does not imply empty output. I don't think we currently have this schedule yet, but it might become a pitfall for the future.
Regarding the property for non-empty tensor in Kernel, I am not sure. But isn't that a big project to implement it? For example, if you have such a property, you need to know whether to set it as true or false so that you can enable more simplifications as much as possible. And doesn't that requires modifying our cache system to handle 0 as a special number (just like how we treated 1 as broadcast), and if I see an empty tensor, we need to make sure we don't use the cache for non-empty tensor?
There was a problem hiding this comment.
Oh, I didn't pay close attention that this is only for vals that represent parallel dimensions. Yes, in this case, I see no concern.
There was a problem hiding this comment.
Regarding the property for non-empty tensor in
Kernel, I am not sure. But isn't that a big project to implement it? For example, if you have such a property, you need to know whether to set it as true or false so that you can enable more simplifications as much as possible. And doesn't that requires modifying our cache system to handle 0 as a special number (just like how we treated 1 as broadcast), and if I see an empty tensor, we need to make sure we don't use the cache for non-empty tensor?
I don't think that's too different what we already do with the index type. Granted, we do have a couple of issues with it, but all we would need to do is to assume tensors can be empty by default and allow to have the non-empty tensor only with an optional flag that should be part of CompileOptions. Not necessary for this PR, but I think this approach should fit well by following the same approach as the index type specialization.
There was a problem hiding this comment.
Let's leave this PR as is, and figure out our strategy for it when we see a bug in empty tensor. For matmul, it strongly rely on the assumption that tensors are not empty, but I don't see any other case where this is important. But I am feeling weird about the logic "if matmul, then assume nonempty, otherwise don't assume so". Whether to assume nonempty should be based on inputs, not on whether it is matmul.
| } | ||
| } | ||
| Bool* result = nullptr; | ||
| for (auto ts : tensor_sizes) { |
There was a problem hiding this comment.
tensor_sizes can contain tensor size named scalars redundantly. Are you assuming the redundancy should be eliminated by the expr simplification?
There was a problem hiding this comment.
Thanks for catching! They will not be eliminated by expr simplifier, because the assumptions argument are not simplified. I don't think this will cause any issue, but still, we should get rid of it. I will update this PR.
This PR is stacked on #105, it fixes #95. There are mainly two changes effective: - In `lower_scalar_hoist.cpp`, add assumptions about parallel indices - In `predicate_compute.cpp`, check if extent is the same as parallel dimension Only one of the above changes is sufficient to fix the bug, but I did both to be double safe. cc: @mmigdal-nv @drzejan2
assumewhich contains utilities to generate boolean expressions stating certain things we want to assume.max(a, b) -> aifa >= b,min(a, b) -> bifa >= b.ceilDiv.stupid_simple_compilerintest_expr_simplifier.cppto support function call.This PR allows me to simplify the parallel dimension from
into