[Unity] Implement LowerAllocTensor to remove R.builtin.alloc_tensor #15809

Lunderberg · 2023-09-22T21:45:53Z

The StaticPlanBlockMemory transform is provided a module that expresses all allocations with R.builtin.alloc_tensor, and produces a module that uses R.memory.alloc_storage and
R.memory.alloc_tensor to express static allocations, while dynamic allocations continue to use R.builtin.alloc_tensor.

Prior to this commit, this mixed output was handled as part of VMBuiltinLower. This commit extracts the lowering of R.builtin.alloc_tensor to a new pass, LowerAllocTensor. This pass runs after StaticPlanBlockMemory, and replaces any remaining R.builtin.alloc_tensor with calls to R.memory.alloc_storage and R.memory.alloc_tensor.

csullivan · 2023-09-26T21:33:55Z

python/tvm/relax/vm_build.py

+    passes.append(relax.transform.LowerAllocTensor())

    if tvm.transform.PassContext.current().config.get("relax.backend.use_cuda_graph", False):
        passes.append(relax.transform.RewriteCUDAGraph())


Did you look in to what assumptions relax.transform.RewriteCUDAGraph makes on static vs dynamic allocations to ensure the expectations match before and after this change, given that there is some dependence on alloc_tensor in RewriteCudaGraph?

Thank you on the reminder. I had assumed that the tests in tests/python/relax/test_transform_rewrite_cuda_graph.py and tests/python/relax/test_vm_cuda_graph.py would be sufficient. On closer inspection, it turns out that the former provides the input of RewriteCUDAGraph, the latter provides the output from RewriteCUDAGraph, and neither test the behavior of the pass as it exists within a lowering flow. On writing a quick end-to-end test, there is an issue that occurs within the cudagraph rewriting pass.

It appears to be a bug in RewriteCUDAGraph, which occurs when there is a R.memory.alloc_storage that is then used in a trivial var-to-var rebinding.

This bug ended up being trickier to track down than I had expected. It will be much simpler to solve after #15810 lands, since the .kill_* methods won't be inserted yet. For now, I've re-ordered the passes so the LowerAllocTensor occurs after RewriteCUDAGraph.

Awesome, many thanks for looking into this @Lunderberg, changing the ordering until the fix is in makes sense. Let's land this an #15810.

csullivan · 2023-09-27T21:00:22Z

python/tvm/relax/vm_build.py

+    passes.append(relax.transform.LowerAllocTensor())

    if tvm.transform.PassContext.current().config.get("relax.backend.use_cuda_graph", False):
        passes.append(relax.transform.RewriteCUDAGraph())


Awesome, many thanks for looking into this @Lunderberg, changing the ordering until the fix is in makes sense. Let's land this an #15810.

The `StaticPlanBlockMemory` transform is provided a module that expresses all allocations with `R.builtin.alloc_tensor`, and produces a module that uses `R.memory.alloc_storage` and `R.memory.alloc_tensor` to express static allocations, while dynamic allocations continue to use `R.builtin.alloc_tensor`. Prior to this commit, this mixed output was handled as part of `VMBuiltinLower`. This commit extracts the lowering of `R.builtin.alloc_tensor` to a new pass, `LowerAllocTensor`. This pass runs after `StaticPlanBlockMemory`, and replaces any remaining `R.builtin.alloc_tensor` with calls to `R.memory.alloc_storage` and `R.memory.alloc_tensor`.

The `R.memory.alloc_storage` produced by `LowerAllocTensor` must be present in order to be appropriately deleted by `KillAfterLastUse`.

csullivan reviewed Sep 26, 2023

View reviewed changes

csullivan approved these changes Sep 27, 2023

View reviewed changes

Lunderberg added 2 commits September 28, 2023 08:56

Updated unit tests

253e013

Lunderberg force-pushed the unity_lower_alloc_tensor branch from 1cf89ad to 253e013 Compare September 28, 2023 13:56

Correct order of LowerAllocTensor and KillAfterLastUse

d26d951

The `R.memory.alloc_storage` produced by `LowerAllocTensor` must be present in order to be appropriately deleted by `KillAfterLastUse`.

csullivan merged commit 4a8a7b9 into apache:unity Sep 29, 2023

Lunderberg deleted the unity_lower_alloc_tensor branch September 30, 2023 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unity] Implement LowerAllocTensor to remove R.builtin.alloc_tensor #15809

[Unity] Implement LowerAllocTensor to remove R.builtin.alloc_tensor #15809

Uh oh!

Lunderberg commented Sep 22, 2023

Uh oh!

csullivan Sep 26, 2023

Uh oh!

Lunderberg Sep 27, 2023

Uh oh!

Lunderberg Sep 27, 2023

Uh oh!

Lunderberg Sep 27, 2023

Uh oh!

csullivan Sep 27, 2023

Uh oh!

csullivan Sep 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Unity] Implement LowerAllocTensor to remove R.builtin.alloc_tensor #15809

[Unity] Implement LowerAllocTensor to remove R.builtin.alloc_tensor #15809

Uh oh!

Conversation

Lunderberg commented Sep 22, 2023

Uh oh!

csullivan Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

Lunderberg Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

Lunderberg Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

Lunderberg Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

csullivan Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

csullivan Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants