Skip to content

Some misc expr simplification#105

Merged
zasdfgbnm merged 15 commits intomainfrom
more-expr-simplification
Mar 31, 2023
Merged

Some misc expr simplification#105
zasdfgbnm merged 15 commits intomainfrom
more-expr-simplification

Conversation

@zasdfgbnm
Copy link
Collaborator

@zasdfgbnm zasdfgbnm commented Mar 30, 2023

  • Add a new namespace assume which contains utilities to generate boolean expressions stating certain things we want to assume.
  • Simplification: max(a, b) -> a if a >= b, min(a, b) -> b if a >= b.
  • Positivity proof of ceilDiv.
  • extend stupid_simple_compiler in test_expr_simplifier.cpp to support function call.

This PR allows me to simplify the parallel dimension from

max(ceilDiv(T0.size[0], 128), ceilDiv(T0.size[0], 128) * 4 , 4)

into

ceilDiv(T0.size[0], 128) * 4

@zasdfgbnm zasdfgbnm changed the title More expr simplification Some misc expr simplification Mar 30, 2023
// constant folding
ASSERT_TRUE(simplifyExpr("ceilDiv( 5 , 3 ) * 5"_)->sameAs("10"_));

ASSERT_TRUE(simplifyExpr("1 * i"_)->sameAs("i"_));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced some TORCH_CHECK here with ASSERT_TRUE.

@zasdfgbnm zasdfgbnm requested a review from naoyam March 30, 2023 23:56
// Simplify dim_map_
for (auto& [k, v] : dim_map_) {
v = simplifyExpr(v);
// Well, this isn't really correct, but we need this assumption to better
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doesn't this actually break if empty tensors are given?

I know we haven't been very strict with empty tensors, so it's highly likely there are other things that would break with empty tensors, so this isn't a new issue.

Maybe we want to add a property to Kernel asserting there's no empty tensor? Similar to what we do with the index type, we could lower a fusion with an optional assertion (based on actual input tensor sizes) that it is safe to assume no tensor is empty. That seems to work good enough, doesn't it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't know if there is anything really breaks today. If a tensor size is 0, and it appears in the parallel dim, this usually means we want to launch 0 blocks or 0 threads, this usually means all the fusion's output are empty tensor, and the lowering of this case is already skipped. The thing that I am really worried about is, if some extent is T0.size[0] + 1, then empty tensor does not imply empty output. I don't think we currently have this schedule yet, but it might become a pitfall for the future.

Regarding the property for non-empty tensor in Kernel, I am not sure. But isn't that a big project to implement it? For example, if you have such a property, you need to know whether to set it as true or false so that you can enable more simplifications as much as possible. And doesn't that requires modifying our cache system to handle 0 as a special number (just like how we treated 1 as broadcast), and if I see an empty tensor, we need to make sure we don't use the cache for non-empty tensor?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't pay close attention that this is only for vals that represent parallel dimensions. Yes, in this case, I see no concern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the property for non-empty tensor in Kernel, I am not sure. But isn't that a big project to implement it? For example, if you have such a property, you need to know whether to set it as true or false so that you can enable more simplifications as much as possible. And doesn't that requires modifying our cache system to handle 0 as a special number (just like how we treated 1 as broadcast), and if I see an empty tensor, we need to make sure we don't use the cache for non-empty tensor?

I don't think that's too different what we already do with the index type. Granted, we do have a couple of issues with it, but all we would need to do is to assume tensors can be empty by default and allow to have the non-empty tensor only with an optional flag that should be part of CompileOptions. Not necessary for this PR, but I think this approach should fit well by following the same approach as the index type specialization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave this PR as is, and figure out our strategy for it when we see a bug in empty tensor. For matmul, it strongly rely on the assumption that tensors are not empty, but I don't see any other case where this is important. But I am feeling weird about the logic "if matmul, then assume nonempty, otherwise don't assume so". Whether to assume nonempty should be based on inputs, not on whether it is matmul.

}
}
Bool* result = nullptr;
for (auto ts : tensor_sizes) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tensor_sizes can contain tensor size named scalars redundantly. Are you assuming the redundancy should be eliminated by the expr simplification?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching! They will not be eliminated by expr simplifier, because the assumptions argument are not simplified. I don't think this will cause any issue, but still, we should get rid of it. I will update this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@zasdfgbnm zasdfgbnm requested a review from naoyam March 31, 2023 20:40
Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zasdfgbnm zasdfgbnm merged commit 680195d into main Mar 31, 2023
@zasdfgbnm zasdfgbnm deleted the more-expr-simplification branch March 31, 2023 21:13
zasdfgbnm added a commit that referenced this pull request Apr 1, 2023
This PR is stacked on #105, it fixes
#95.

There are mainly two changes effective:
- In `lower_scalar_hoist.cpp`, add assumptions about parallel indices
- In `predicate_compute.cpp`, check if extent is the same as parallel
dimension

Only one of the above changes is sufficient to fix the bug, but I did
both to be double safe.

cc: @mmigdal-nv @drzejan2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants