Skip to content

Layout propagation (Part 2) - Enable#1755

Merged
jjsjann123 merged 78 commits intomainfrom
layout_propagation_enable
Feb 21, 2024
Merged

Layout propagation (Part 2) - Enable#1755
jjsjann123 merged 78 commits intomainfrom
layout_propagation_enable

Conversation

@jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented Feb 13, 2024

Stacked PRs:
==== #1755 enabling layout propagation through runtime <- this one
#1792 propagation rule for broadcast
#1790 propagation rule for binary op
#1788 adding layout inference pass

What's in this PR:
Enabling the MemoryFormat optimization pass in runtime. The pass is run as part of pre_segment optimization pass.
Adding cpp test to verify optimization behavior

Quick design doc: #1756

TODOs:

  • rebase with the new inference rule PRs.

@jjsjann123 jjsjann123 changed the title Layout propagation enable [WIP] Layout propagation enable Feb 13, 2024
@jjsjann123 jjsjann123 mentioned this pull request Feb 13, 2024
2 tasks
@jjsjann123 jjsjann123 added the allocation domain issues related to allocation domain support label Feb 13, 2024
@jjsjann123
Copy link
Collaborator Author

jjsjann123 commented Feb 13, 2024

48d3ee162b1cc7b2fe9796869e3a19dad3e36c52 local failing tests:

[  FAILED  ] 3 tests, listed below:
[  FAILED  ] NVFuserTest.FusionAvoidRedundantWrite_CUDA
[  FAILED  ] AliasTest.TrivialInputForwarding_ScalarTensor
[  FAILED  ] NoOpTest.FusionNullScheduler3

Tests failures are not caused by this code change and have been cleaned up.

@jjsjann123 jjsjann123 marked this pull request as draft February 19, 2024 08:06
@jjsjann123 jjsjann123 changed the base branch from layout_propagation to layout_propagation_pr_0 February 20, 2024 03:29
@jjsjann123 jjsjann123 marked this pull request as ready for review February 20, 2024 06:27
@jjsjann123 jjsjann123 requested a review from zasdfgbnm February 20, 2024 06:28
Base automatically changed from layout_propagation_pr_0 to main February 20, 2024 23:42
jjsjann123 added a commit that referenced this pull request Feb 20, 2024
Stacked PRs:
#1755 enabling layout propagation through runtime
#1792 propagation rule for broadcast
#1790 propagation rule for binary op
==== #1788 adding layout inference pass **_<- this one_**

What's in this PR:
inferenceAllocationOrder pass that works on an entire Fusion:
It computes AllocationOrder on inputs by looking at each TensorView's
allocation_domain and rfactor_domain;
It uses a predefined rule (in AllocationOrderInferencer) to traverse and
propagate AllocationOrder from inputs to the entire fusion;

Note that the pass itself doesn't mutate the fusion IR. It's just a
utility function that suggests ways to specify allocation domain to be
used by other optimization passes.

- [x] adding inferenceAllocationOrder pass function;
- [x] adding propagate rule for unary op;
- [x] adding cpp test to verify propagation rule;

Quick design doc: #1756

Future Work:
* expanding propagation rule to cover more operation;

---------

Co-authored-by: Jacob Hinkle <1454944+jacobhinkle@users.noreply.github.com>
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
@jjsjann123
Copy link
Collaborator Author

!build

@jjsjann123
Copy link
Collaborator Author

failure is unrelated in CI. merging as-is.

@jjsjann123 jjsjann123 merged commit 302d634 into main Feb 21, 2024
@jjsjann123 jjsjann123 deleted the layout_propagation_enable branch February 21, 2024 07:20
jjsjann123 added a commit that referenced this pull request Feb 21, 2024
Stacked PRs:
#1755 enabling layout propagation through runtime
#1792 propagation rule for broadcast
==== #1790 propagation rule for binary op **_<- this one_**
#1788 adding layout inference pass

What's in this PR:

BinaryOp propagation tries to merge the allocation order of both inputs:
* when there's only one operand is a tensor, we just forward the
recorded allocation order
* when both operands are tensors, we resolve it by:
    i. prioritize the tensor with less broadcast iterdomain;
    ii. otherwise, we just propagate the allocation order of lhs.

Propagation rule for binary operation, 

- [x] adding propagate rule for binary op;
- [x] handling two scalar;
- [x] handling intermediate tensors (factory tensor);
- [x] adding cpp test to verify propagation rule;

---------

Co-authored-by: Jacob Hinkle <1454944+jacobhinkle@users.noreply.github.com>
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
jjsjann123 added a commit that referenced this pull request Feb 21, 2024
Stacked PRs:
#1755 enabling layout propagation through runtime
==== #1792 propagation rule for broadcast **_<- this one_**
#1790 propagation rule for binary op
#1788 adding layout inference pass

What's in this PR:

BroadcastOp propagation tries to push all new broadcast iterdomain as
outer dimensions for the output tensor.

- [x] adding propagate rule for broadcast op;
- [x] adding cpp test to verify propagation rule;

---------

Co-authored-by: Jacob Hinkle <1454944+jacobhinkle@users.noreply.github.com>
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

allocation domain issues related to allocation domain support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants