Skip to content

Fix non-determinism in mma_utils::getTensorsRoles#947

Merged
jacobhinkle merged 5 commits intomainfrom
matmul_tensorroles_determinism
Sep 26, 2023
Merged

Fix non-determinism in mma_utils::getTensorsRoles#947
jacobhinkle merged 5 commits intomainfrom
matmul_tensorroles_determinism

Conversation

@jacobhinkle
Copy link
Collaborator

@jacobhinkle jacobhinkle commented Sep 26, 2023

This sorts the output of mma_utils::getTensorsRoles so that the matmul scheduler is repeatable. This should fix the false positives in codegen diff CI jobs. 🤞

Fixes #799.

@jacobhinkle jacobhinkle marked this pull request as ready for review September 26, 2023 16:10
Copy link
Collaborator

@zasdfgbnm zasdfgbnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@jacobhinkle
Copy link
Collaborator Author

There should be only one A, B, or D tensor; see
https://github.com/NVIDIA/Fuser/blob/main/csrc/scheduler/matmul.cpp#L725-L728. However there can be multiple C tensors (epilogue producers) and the RolesMap determines the order we cache them, hence the different numbering for cached tensors in some matmul fusions.

@jacobhinkle jacobhinkle merged commit 079b58d into main Sep 26, 2023
@jacobhinkle jacobhinkle deleted the matmul_tensorroles_determinism branch September 26, 2023 18:46
jacobhinkle added a commit that referenced this pull request Oct 4, 2023
I have been chasing down codegen changes in #840 and #947 and have
needed to dig through a lot of spurious diffs. I decided to extend the
codegen diff tool to output HTML, and to also modify the diffing a bit.
This PR:

- Changes `tools/compare_codegen.sh` to output env information as well
as add `ptxas_verbose` dump option.
- Changes the diffs performed by that tool to ignore both the kernel
name and the preamble. The preamble is estimated by skipping the typedef
of `nvfuser_index_t`. If preambles between two runs differ, we report
that with a warning and show the diff in the output.
- Adds an `--html` option to `tools/diff_codegen_nvfuser_tests.py` which
will write a self-contained HTML file holding all the differing kernels
and diffs. To use this option you must have previously run `pip install
jinja2`.
- Adds a `--json` option to `tools/diff_codegen_nvfuser_tests.py` which
writes a JSON file containing all the information contained in the HTML
file in an easier-to-parse format.
- Changes the default to not printing the diffs to STDOUT. This can be
re-enabled with the `--show-diffs` argument.

This lets us communicate code differences easily by sharing these files,
which could be generated by our CI. An example output is attached.

Github doesn't support uploading html so I have uploaded a zipped
example:

[codediff_f7786819_feda1e1e_binary_tests.html.zip](https://github.com/NVIDIA/Fuser/files/12793721/codediff_f7786819_feda1e1e_binary_tests.html.zip)

Note that this file is probably typical for a medium sized change: it
results in a zipped file size of 184KB and unzipped it is 2.1MB.

Some ideas left out of this PR that might be nice in the future:
- Handle not just `nvfuser_tests` output but also `nvfuser_bench` and
`pytest` output. We could also fall back to arbitrary command output
where we just group everything to one big "test" if we can't associate
each kernel with a specific test/benchmark.
- Show multiple commands in one HTML file. Especially if the first
bullet is addressed, then we could have a single summary for our whole
suite.
- Include benchmark results. This could be done in another hidden div
with a "benchmarks" button. It might be tricky especially if the number
of benchmark items associated to each kernel is changed between commits,
but it might also be handy to refer to benchmark regressions and have
the codegen output one click away.

Fixes #1007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Repeatability in matmul scheduler

2 participants