Visit extent scalars in `SegmentCandidateFinder::resolveScalarsInGroup` by jacobhinkle · Pull Request #840 · NVIDIA/Fuser

jacobhinkle · 2023-09-05T18:27:11Z

During segmentation, groups are defined as lists of Exprs. When finalizing the segmentation, scalar inputs that might be needed are sprinkled in using resolveScalarsInGroup. The method resolveScalarsInGroup attempts to find all scalars needed in the group and adds their defining exprs to the group; in this way scalars can be recomputed for multiple groups if needed. When the SegmentedGroup is finalized, all expression inputs in all the Exprs in the group are added as inputs (deduplicated, removing constants, etc.).

Currently, resolveScalarsInGroup only processes the inputs of current Exprs to look for input scalars using input->isScalar(). As seen in #656, we also need to look at output extents in some cases. This PR modifies that method to look also at intermediate IterDomain expressions and their attributes, to more aggressively search for scalars.

Fixes #656.

jacobhinkle · 2023-09-06T15:25:41Z

!build

Previously, we only called mutate(TensorView*), which itself mutates IterDomains. We need to do this instead of mutating IterDomains in the usual topo order since we need to do exact map propagation before root->rfactor propagation. However, we also need to propagate mutations through scalars and scalar expressions, and for that we need to call mutate at some point. Previously, scalar expressions were never mutated, resulting in straggler scalars being left in the concretized fusion even though they had been registered for mutation. This change fixes that particular problem, meaning ReshapeToSlice now works properly.

jacobhinkle · 2023-09-11T16:31:17Z

!build

naoyam · 2023-09-11T23:17:09Z

csrc/fusion_segmenter.cpp

+    for (auto output : expr->outputs()) {
+      // We must be able to compute output extents for expression, so here we
+      // ensure the scalars involved are all available to this group
+      if (auto tv = dynamic_cast<TensorView*>(output)) {


Do we have tests to exercise this case?

The ReshapeToPad test hits this since the missing scalar appeared only in the root domain of the pad output in the second segment.

Hmm, then the input tensor of the pad op should have the same extent, so shouldn't it use the same scalar?

In the unsegmented fusion the input has the same extent as you say. But when it is segmented the inputs have their roots set as the rfactor with new input scalars for the extents.

But when it is segmented the inputs have their roots set as the rfactor with new input scalars for the extents.

Are you referring to the inputs of a fusion segment?

Yes. In the test we have a segment with a single pad expression. In that case the expression input is a segment input so it has had its extents replaced.

OK, so the pad input tensor becomes a segment input. It originally has the root and rfactor domains as it's the output of the reshape op, but we only keep the rfactor domain for segment inputs, so the input only has the root domain that was originally the rfactor domain. Am I right?

Still not sure why the pad output would be a problem...

I think you're right; the pad example does not need output scalars. However, the slice example from #656 (comment) does need this. In that case the slice output contains i5 which is not found in the segment and is exact mapped to the remapped rfactor->root of T3. The only way to find it is to traverse root->rfactor in the output of the SliceOp. In fact, I think since we have the segment input scalars and since every input to a segment expr is either a segment input or the output of another expr in the segment, it should suffice to look at only attributes and outputs of all expressions in the segment...

In #912 we see that even when we do not replace sizes with ceilDiv expressions in reshape, we can hit this problem. In that case there is no longer a problem in the second segment but the first segment is

Inputs: T0_g[ iS0{i0}, iS1{i2}, iS2{i3}, iS3{i4} ], float Outputs: T8_g[ iS40{i0}, iS47{( i2 / i5 )}rf, iS48{( (nvfuser_index_t)(i5) )}rf, iS42{i3}, iS43{i4} ], float %kernel_math { T8_g[ iS40{i0}, iS47{( i2 / i5 )}rf, iS48{( (nvfuser_index_t)(i5) )}rf, iS42{i3}, iS43{i4} ] = view( T0_g[ iS0{i0}, iS1{i2}, iS2{ i3}, iS3{i4} ] ) }

and since i5 is not an input to the ViewOp, it is not included as a segment input. We could of course have the new shapes included in ViewOp::inputs()instead which would also address this particular case.

This makes sense and seems reasonable. Can you please leave a little more detailed comment why this is necessary?

test/test_resize.cpp

jacobhinkle · 2023-09-26T13:53:34Z

!build

naoyam

LGTM. Thanks for the fix.

jacobhinkle · 2023-10-02T13:45:48Z

I converted this back to draft since the current method is including more scalars than needed. Any root domain extent does not need to be included if it is exact mapped to an extent that is already computable. We could temporarily bind inputs to an ExpressionEvaluator and find InputsOf uncomputable scalars but that is a bit heavy handed. I will work on this then mark ready when this doesn't impact existing tests.

I have been chasing down codegen changes in #840 and #947 and have needed to dig through a lot of spurious diffs. I decided to extend the codegen diff tool to output HTML, and to also modify the diffing a bit. This PR: - Changes `tools/compare_codegen.sh` to output env information as well as add `ptxas_verbose` dump option. - Changes the diffs performed by that tool to ignore both the kernel name and the preamble. The preamble is estimated by skipping the typedef of `nvfuser_index_t`. If preambles between two runs differ, we report that with a warning and show the diff in the output. - Adds an `--html` option to `tools/diff_codegen_nvfuser_tests.py` which will write a self-contained HTML file holding all the differing kernels and diffs. To use this option you must have previously run `pip install jinja2`. - Adds a `--json` option to `tools/diff_codegen_nvfuser_tests.py` which writes a JSON file containing all the information contained in the HTML file in an easier-to-parse format. - Changes the default to not printing the diffs to STDOUT. This can be re-enabled with the `--show-diffs` argument. This lets us communicate code differences easily by sharing these files, which could be generated by our CI. An example output is attached. Github doesn't support uploading html so I have uploaded a zipped example: [codediff_f7786819_feda1e1e_binary_tests.html.zip](https://github.com/NVIDIA/Fuser/files/12793721/codediff_f7786819_feda1e1e_binary_tests.html.zip) Note that this file is probably typical for a medium sized change: it results in a zipped file size of 184KB and unzipped it is 2.1MB. Some ideas left out of this PR that might be nice in the future: - Handle not just `nvfuser_tests` output but also `nvfuser_bench` and `pytest` output. We could also fall back to arbitrary command output where we just group everything to one big "test" if we can't associate each kernel with a specific test/benchmark. - Show multiple commands in one HTML file. Especially if the first bullet is addressed, then we could have a single summary for our whole suite. - Include benchmark results. This could be done in another hidden div with a "benchmarks" button. It might be tricky especially if the number of benchmark items associated to each kernel is changed between commits, but it might also be handy to refer to benchmark regressions and have the codegen output one click away. Fixes #1007

Note this is kind of a slow test so we might want to remove it or reduce it

jacobhinkle · 2023-12-01T15:54:41Z

!build --diff

jacobhinkle · 2023-12-04T15:18:30Z

I converted this back to draft since the current method is including more scalars than needed. Any root domain extent does not need to be included if it is exact mapped to an extent that is already computable. We could temporarily bind inputs to an ExpressionEvaluator and find InputsOf uncomputable scalars but that is a bit heavy handed. I will work on this then mark ready when this doesn't impact existing tests.

http://nv/e5M/nvfuser_github_ci/codegen_diff_p11175548_j76052171_1701449937649638555_codediff_a693a66_bdd502e_custom_command_20231201_164454.html
The differing test there shows this issue. The first kernel there has inputs T0 and T4, so we should not also need to bind i0, i1, i2, i3 which are the dimensions of T0, but we do after the change in this PR. This is because we are adding purely scalar expressions in resolveScalarsInGroup which then lead to adding their inputs in resolveInputsInGroup.

jacobhinkle · 2023-12-04T16:11:09Z

!build --diff

jacobhinkle · 2023-12-18T19:55:15Z

!build --diff

This happened in ResizePadToBroadcastDynamic_CUDA, where a seg edge was placed at the pad output, whose rfactor expressions included the original input sizes. Those sizes were added as inputs to the mul segment, but were unneeded. This change fixes that type of situation.

jacobhinkle · 2023-12-19T14:39:04Z

!build --diff

jacobhinkle · 2023-12-19T20:06:18Z

!build --diff

jacobhinkle · 2023-12-19T20:53:42Z

!build --diff

jacobhinkle · 2023-12-20T12:46:02Z

!build --diff

jacobhinkle · 2023-12-20T17:55:29Z

No test failures, and verified that this is finally never increasing the number of kernel args. Merging.

Fixes #1277. The bug was actually fixed in #840, but the comment was inaccurate.

Visit extent scalars in resolveScalarsInGroup

68f0c81

jacobhinkle requested a review from naoyam September 5, 2023 18:27

jacobhinkle mentioned this pull request Sep 5, 2023

cat on a broadcasted dimension is triggering an error #224

Closed

jacobhinkle closed this Sep 5, 2023

jacobhinkle added 3 commits September 6, 2023 11:23

Traverse root->rfactor to find more scalars

2d999a5

Add ReshapeToSlice test

9142aa4

Merge remote-tracking branch 'origin/main' into scalar_seg_edges

37f2e11

jacobhinkle reopened this Sep 6, 2023

jacobhinkle added 3 commits September 7, 2023 09:35

Merge remote-tracking branch 'origin/main' into scalar_seg_edges

3fcd309

Roll back change to concretize

4439cce

naoyam reviewed Sep 11, 2023

View reviewed changes

jacobhinkle and others added 2 commits September 12, 2023 14:12

Add tests that fusions are segmented

f821c31

Merge branch 'main' into scalar_seg_edges

d7ffe4e

jacobhinkle mentioned this pull request Sep 20, 2023

Preserve symbolic reshape extents after concretization #912

Draft

jacobhinkle and others added 2 commits September 21, 2023 10:48

Merge branch 'main' into scalar_seg_edges

55c3a2c

Merge remote-tracking branch 'origin/main' into scalar_seg_edges

18467df

naoyam approved these changes Sep 27, 2023

View reviewed changes

jacobhinkle mentioned this pull request Sep 29, 2023

Output HTML and JSON from codegen diff tool #996

Merged

Merge remote-tracking branch 'origin/main' into scalar_seg_edges

feda1e1

jacobhinkle marked this pull request as draft October 2, 2023 13:42

jacobhinkle mentioned this pull request Oct 3, 2023

Should the codegen diff tool only check the kernel itself? #1007

Closed

Merge remote-tracking branch 'origin/main' into scalar_seg_edges

d9a07cf

This was referenced Oct 7, 2023

fixing fusion segmenter dropping allocation domain #1033

Merged

convertInputRfactorsToRoots should be removed. #1040

Closed

jacobhinkle added 2 commits November 17, 2023 10:20

Insert extents in input_set

e88fdff

Add python repro from #1270

74e4817

Note this is kind of a slow test so we might want to remove it or reduce it

jacobhinkle mentioned this pull request Dec 1, 2023

Cycle in inter_visitor detected? #1279

Closed

jacobhinkle added 2 commits December 1, 2023 10:06

Merge remote-tracking branch 'origin/main' into scalar_seg_edges

7cba69c

Fix merge

1149ade

Merge remote-tracking branch 'origin/main' into scalar_seg_edges

45ab775

Avoid adding redundant extent scalars in SegmentedGroup::finalize

848e7b3

wujingyue mentioned this pull request Dec 4, 2023

AliasAnalysis to handle more op types. #1443

Closed

jacobhinkle added 2 commits December 18, 2023 18:22

Merge remote-tracking branch 'origin/main' into scalar_seg_edges

f0d6062

Skip reduction IDs in rfactor domains

57293cd

jacobhinkle marked this pull request as ready for review December 19, 2023 14:39

Avoid reduction axes in root domains too

a218356

Merge branch 'main' into scalar_seg_edges

83f1f62

jacobhinkle and others added 2 commits December 20, 2023 12:38

Explicitly avoid adding producer edge extents

35ef8ad

Merge branch 'main' into scalar_seg_edges

f1aa70d

jacobhinkle merged commit 42bf555 into main Dec 20, 2023

jacobhinkle deleted the scalar_seg_edges branch December 20, 2023 17:56

jacobhinkle added a commit that referenced this pull request Dec 20, 2023

Clean up comment in test_issue1277

044c53d

Fixes #1277. The bug was actually fixed in #840, but the comment was inaccurate.

jacobhinkle mentioned this pull request Dec 20, 2023

Clean up comment in test_issue1277 #1547

Merged

jjsjann123 pushed a commit that referenced this pull request Dec 26, 2023

Clean up comment in test_issue1277 (#1547)

fd53986

Fixes #1277. The bug was actually fixed in #840, but the comment was inaccurate.

Conversation

jacobhinkle commented Sep 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobhinkle commented Sep 6, 2023

Uh oh!

jacobhinkle commented Sep 11, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Sep 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jacobhinkle commented Sep 26, 2023

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

jacobhinkle commented Oct 2, 2023

Uh oh!

jacobhinkle commented Dec 1, 2023

Uh oh!

jacobhinkle commented Dec 4, 2023

Uh oh!

jacobhinkle commented Dec 4, 2023

Uh oh!

jacobhinkle commented Dec 18, 2023

Uh oh!

jacobhinkle commented Dec 19, 2023

Uh oh!

jacobhinkle commented Dec 19, 2023

Uh oh!

jacobhinkle commented Dec 19, 2023

Uh oh!

jacobhinkle commented Dec 20, 2023

Uh oh!

jacobhinkle commented Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jacobhinkle commented Sep 5, 2023 •

edited

Loading

jacobhinkle Sep 21, 2023 •

edited

Loading

jacobhinkle commented Dec 20, 2023 •

edited

Loading