Measure JIT TP in superpmi-diffs #68292

jakobbotsch · 2022-04-20T18:58:52Z

No description provided.

ghost · 2022-04-20T18:58:58Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author:	jakobbotsch
Assignees:	jakobbotsch
Labels:	`area-CodeGen-coreclr`
Milestone:	-

…e-throughput

The table is already pretty small so let's just always show them.

jakobbotsch · 2022-04-21T13:23:04Z

cc @dotnet/jit-contrib, this is an example of how this looks (the run includes a reversion of #67395 so that something actually shows up): https://dev.azure.com/dnceng/public/_build/results?buildId=1730378&view=ms.vss-build-web.run-extensions-tab
Ignore the <empty> for the ASM diffs which is a small bug I will fix.

I've opted not to add a collapsible section around the tables since they are small (compared to the ASM diffs).

For x64 the extra SPMI runs measuring TP add around 15 minutes to the SPMI job. For x86 it's worse since 64-bit addition is much more expensive, there it adds around 30 minutes. There are also two extra JIT builds, but those are done in parallel and only take ~8 minutes. I think the extra time taken for the job is worth it to consistently be able to see TP impact on JIT PRs.

The one outstanding problem is that the baseline JIT comes from the rolling build. That means there is the potential for some drift in compiler version used to compile the baseline and diff, which impacts TP measurements. This is worse locally since the CI machines usually are behind the compilers we are using locally -- I see around 1.5% fewer instructions executed with my locally built JIT compared to the CI built JIT. However when things run in CI I don't think this is a problem except during periods where the CI machines are being updated.

A few ways forward:

Build both baseline JITs and diff JITs in the CI job. This probably requires quite a bit more work as this is pretty new territory for our CI jobs. Also takes more resources.
Try to find the versions of the tools used to build the JITs in superpmi.py and report a warning back when they do not match. In particular link.exe records its own version inside the PE header, but I don't think cl.exe versions together with link.exe.
Like above, but we change the JIT to embed _MSC_FULL_VER (and similar on other platforms) somewhere that superpmi.py can find it.
Do nothing and just accept that the versions may drift once in a while and we will see TP impact that may not be because of the PR being tested.

I am partial to do 3 which seems simple to do and should be robust.

kunalspathak · 2022-04-21T14:48:56Z

src/coreclr/scripts/superpmi_asmdiffs.py

    os_name = "universal" if arch_name.startswith("arm") else os_name
-    base_jit_path = os.path.join(coreclr_args.base_jit_directory, 'clrjit_{}_{}_{}.dll'.format(os_name, arch_name, host_arch_name))
-    diff_jit_path = os.path.join(coreclr_args.diff_jit_directory, 'clrjit_{}_{}_{}.dll'.format(os_name, arch_name, host_arch_name))
+    base_checked_jit_path = os.path.join(coreclr_args.base_jit_directory, "checked", 'clrjit_{}_{}_{}.dll'.format(os_name, arch_name, host_arch_name))


I wouldn't do the tp-diff as part of superpmi_asmdiffs.py but would write a separate script exclusively for tp-diff. That way you can have a separate CI pipeline flow just to tpdiff without penalizing the asmdiff execution time.

I wouldn't even recommend having a parameter like asmdiff vs. tpdiff to this file, because it just pollutes the logic and hard to refactor in future.

I see no reason to introduce yet another pipeline if we want to run TP impact on every PR. It will require even more resources (e.g. redownloading all collections again), it will force JIT devs to look in even more places to evaluate their change, and it does not scale (e.g. I want to add memory stats measurements in the future as well).

I really hope instead to rename the pipeline's "asmdiffs" to "diffs" (in fact, I have done this in a couple of places) and do all the base-diff measurements in this pipeline.

FWIW, these pipelines spend way more time checking out sources, copying files and building than they do inside superpmi_asmdiffs.py. Currently (before this change) the x64 build spends only around 25 minutes in helix, so only around 1/3 of the build is actually spent inside superpmi_asmdiffs.py. So I do not think it is a bad idea to let it do more work and amortize some of all the other cost.

it will force JIT devs to look in even more places to evaluate their change

+1 on that. Yeah, that justification sounds reasonable to have a single pipeline with both asmdiff/TPdiff. Also, while you are there, and if you have time, I would really want to surface any SPMI replay failures to extension page so we don't have to scan through the log.

Let me look into it separately from this PR. I think we should report all of this back from helix/superpmi as JSON so that the summarize script can do a much better job of laying things out, but it will probably take a bit of work.

kunalspathak

the extra SPMI runs

Have you considered having a separate CI pipeline for this TP-diff?

I am partial to do 3 which seems simple to do and should be robust.

Agree. I am even fine with 4.

kunalspathak · 2022-04-21T14:51:36Z

src/coreclr/scripts/superpmi_asmdiffs.py

-""")
+    md_files = []
+
+    overall_md_asmdiff_summary_file_target = os.path.join(log_directory, "superpmi_diff_summary_{}_{}.md".format(platform_name, arch_name))


Regarding the output table, I am only interested in the actual diff rather than raw numbers. I would suggest to just display the diff % (if possible, in tabular form) and then have an expansion that shows the raw numbers along with diffs. That way, anyway who is looking at the extension page can quickly scan through the numbers to know if TP was regressed or not. Also, leave a note about "lower is better" .

Good idea, let me see if I can do that in a nice way. It is complicated a little by how we communicate the summaries back (as markdown files instead of just the data).

This is how it looks now: https://dev.azure.com/dnceng/public/_build/results?buildId=1731173&view=ms.vss-build-web.run-extensions-tab
What do you think?

Much better. Looking at the title "Throughput impact on Windows x64", it makes me think that is this the right way to measure the throughput impact for binaries compiled for Linux platforms? On Linux, we use LLVM which would generate different instructions, while here, we are measuring the count of instructions generated by MSVC on cross-compiled clrjit.

Yes, it is somewhat misleading. However I spoke to Bruce about this and we agreed it was better than not having it at all. Bruce pointed out that cross-compilation is still a scenario (for crossgen2) and that it is likely that regressions/improvements with MSVC transfer to Clang/GCC as well.

It is on my backlog to clean up how we lay this file out. I think when I do that I will sort the entries here and point out which ones are cross-compiled to make sure there is less confusion there. But let me leave that for that change.

jakobbotsch · 2022-04-21T18:19:34Z

src/coreclr/scripts/superpmi.py

        if coreclr_args.base_git_hash is None:
            # We've got the current hash; figure out the baseline hash.
-            command = [ "git", "merge-base", current_hash, "main" ]
+            command = [ "git", "branch", "-r", "--sort=-committerdate", "-v", "--list", "*/main" ]


The old method does not work for my workflow, I never have a 'main' branch.
The new method is to use the newest commit of any branch that matches "*/main".

…work" This reverts commit 57b9873.

This reverts commit 12fd181.

…e-throughput

jakobbotsch · 2022-04-21T20:54:13Z

This is ready for review. https://dev.azure.com/dnceng/public/_build/results?buildId=1731173&view=ms.vss-build-web.run-extensions-tab is an example of how it currently looks.

cc @dotnet/jit-contrib

kunalspathak · 2022-04-21T23:31:03Z

Could you send a PR that renames superpmi-asmdiff.* to superpmi-diff.* and then rebase this PR on top of it? Don't want you to do it in this PR otherwise your changes will not be visible in the renamed files.

…e-throughput

BruceForstall · 2022-04-22T20:02:27Z

src/coreclr/scripts/superpmi.py

+throughput_parser.add_argument("-base_jit_path", help="Path to baseline clrjit. Defaults to release baseline JIT from rolling build, by computing baseline git hash.")
+throughput_parser.add_argument("-diff_jit_path", help="Path to diff clrjit. Defaults to release Core_Root JIT.")
+throughput_parser.add_argument("-git_hash", help="Use this git hash as the current hash for use to find a baseline JIT. Defaults to current git hash of source tree.")
+throughput_parser.add_argument("-base_git_hash", help="Use this git hash as the baseline JIT hash. Default: search for the baseline hash.")
+throughput_parser.add_argument("-base_jit_option", action="append", help="Option to pass to the baseline JIT. Format is key=value, where key is the option name without leading COMPlus_...")
+throughput_parser.add_argument("-diff_jit_option", action="append", help="Option to pass to the diff JIT. Format is key=value, where key is the option name without leading COMPlus_...")


Looks like you could add a "diff_parser" parent for shared things between asm_diff_parser and throughput_parser, but I'm ok either way.

BruceForstall · 2022-04-22T20:14:25Z

src/coreclr/scripts/superpmi.py

+    with open(clrjit_path, "rb") as fh:
+        contents = fh.read()


So this isn't actually reading the string via the export added in #68329; it's actually reading the entire JIT native code file and parsing for this string. That seems... indirect (slow, also?).

I wonder if it would be better to do something like what was done to determine the JIT-EE version the JIT was built with by adding -printJITEEVersion as an argument to the SPMI utility mcs (#42238). The idea being, the JIT and mcs will be built in the same build by the same toolset, so we can use mcs as a proxy for how the JIT is built. Then, here you could do something like run mcs -printBuildString. This is already done in this file in determine_jit_ee_version().

Seems unlikely to me that this kind of search would be slower than actually creating a process, mapping the images, etc. It is a simple text search through 1.5 MB of text. And is invoking mcs.exe to have it load the library to get the export really any more direct?

I agree that ideally we would parse the library, but in this case I was just being pragmatic about it -- this is simple and works in a handful of lines. The main reason I made it an export was just to make sure it is not removed by the linker (and also, it sort of makes sense to have it when it does get consumed as if it was an export).

The idea being, the JIT and mcs will be built in the same build by the same toolset, so we can use mcs as a proxy for how the JIT is built.

I'm not really sure how it works with the JIT-EE GUID right now, but I don't see how we would get the compiler version of the baseline this way.

The way you've done it is reasonable, and pragmatic. I suppose I could have done something similar with the JIT-EE GUID as a text string stashed in the JIT dll, as an exported string and/or an embedded string with enough unique context such that it's parseable. Having it as output from running mcs means I can look at it by easily running a command-line tool, which is nice.

The GUID is used to determine the MCH files to use; it's unrelated to finding a baseline.

BruceForstall · 2022-04-22T20:19:18Z

src/coreclr/scripts/superpmi.py

+                write_fh.write("|Collection|PDIFF|\n")
+                write_fh.write("|---|---|\n")
+                for mch_file, base_instructions, diff_instructions in tp_diffs:
+                    write_fh.write("|{}|{}{:.2f}%|\n".format(
+                        mch_file,
+                        "+" if diff_instructions > base_instructions else "",
+                        (diff_instructions - base_instructions) / base_instructions * 100))


I wonder if this table should be a "significant summary" and only include collections where the PDIFF shown (at the .2f significant digits) is non-zero. The "detailed" can always include full data. This would make the table smaller. If there are no "significant" data it could say that, e.g., "No significant throughput differences found".

Makes sense to me, let me work on that.

jakobbotsch · 2022-04-22T21:57:38Z

Latest run after the renaming: https://dev.azure.com/dnceng/public/_build/results?buildId=1733433&view=ms.vss-build-web.run-extensions-tab
There are some TP diffs currently, I assume because the baseline release JIT is is finding is a few commits old. It should right itself once we get a JIT change.

@kunalspathak You're still set as "change requested", did I address those concerns?

kunalspathak · 2022-04-22T22:10:27Z

src/coreclr/scripts/superpmi.py

+        return
+
+    pin_dir_path = get_pintools_path(coreclr_args)
+    pintools_rel_path = "{}/{}/{}.zip".format(az_pintools_root_folder, pintools_current_version, coreclr_args.host_os.lower())


I believe this will be uploaded manually once in a while?

Yep, this is just uploaded manually to Azure Storage. I've versioned it to be safe although that probably wasn't necessary.

kunalspathak

LGTM. Thanks for doing this.

jakobbotsch · 2022-04-23T05:53:57Z

Latest run looks good: https://dev.azure.com/dnceng/public/_build/results?buildId=1733713&view=ms.vss-build-web.run-extensions-tab

Will merge.

WIP

323f4c8

ghost assigned jakobbotsch Apr 20, 2022

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 20, 2022

jakobbotsch added 9 commits April 20, 2022 20:59

Random JIT change

12fd181

Fix additional arg passed to tpdiff

73affe7

Merge branch 'main' of github.com:dotnet/runtime into superpmi-measur…

0da2355

…e-throughput

Increase timeout and pass quiet to pintool

38898fd

Revert ValueNumStore::GetVNFunc optimizations just to see it work

57b9873

Up the correct timeout

5615b00

Consolidate TP diffs separately

961529b

Do not produce collapsible sections around TP diffs

db7536e

The table is already pretty small so let's just always show them.

Small cleanup

d2a3933

kunalspathak reviewed Apr 21, 2022

View reviewed changes

kunalspathak suggested changes Apr 21, 2022

View reviewed changes

ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 21, 2022

kunalspathak reviewed Apr 21, 2022

View reviewed changes

ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 21, 2022

jakobbotsch added 2 commits April 21, 2022 17:24

Few small fixes, use informational built-by string

fc201bb

Feedback

94cc34b

jakobbotsch mentioned this pull request Apr 21, 2022

Embed an informational built-by string in JIT #68329

Merged

jakobbotsch added 2 commits April 21, 2022 18:40

Avoid SuperPMI confusion

cc630a5

Add missing newlines

91d0bd2

jakobbotsch commented Apr 21, 2022

View reviewed changes

jakobbotsch added 5 commits April 21, 2022 20:25

Add some more comments, revert a change, add a newline

af84696

Even more newlines

3016f7c

Revert "Revert ValueNumStore::GetVNFunc optimizations just to see it …

800128c

…work" This reverts commit 57b9873.

Revert "Random JIT change"

5d3ee53

This reverts commit 12fd181.

Fix missing merge-base invocation

437687b

Merge branch 'main' of github.com:dotnet/runtime into superpmi-measur…

314e211

…e-throughput

jakobbotsch marked this pull request as ready for review April 21, 2022 20:53

jakobbotsch requested review from BruceForstall and kunalspathak April 21, 2022 20:53

Fix indexing and unused variable

e1b0ff5

jakobbotsch linked an issue Apr 21, 2022 that may be closed by this pull request

pin over SPMI has unexpectedly high variance #68214

Closed

jakobbotsch changed the title ~~Measure JIT TP in superpmi-asmdiffs~~ Measure JIT TP in superpmi-diffs Apr 22, 2022

Merge branch 'main' of github.com:dotnet/runtime into superpmi-measur…

121b52c

…e-throughput

jakobbotsch mentioned this pull request Apr 22, 2022

Build rolling release jits #68229

Merged

BruceForstall reviewed Apr 22, 2022

View reviewed changes

Address feedback

ae1cfae

BruceForstall approved these changes Apr 22, 2022

View reviewed changes

kunalspathak reviewed Apr 22, 2022

View reviewed changes

kunalspathak approved these changes Apr 22, 2022

View reviewed changes

jakobbotsch merged commit db56ff8 into dotnet:main Apr 23, 2022

jakobbotsch deleted the superpmi-measure-throughput branch April 23, 2022 05:55

jakobbotsch mentioned this pull request Apr 25, 2022

Optimize bswap+mov to movbe on xarch #66965

Merged

ghost locked as resolved and limited conversation to collaborators May 23, 2022

Measure JIT TP in superpmi-diffs #68292

Measure JIT TP in superpmi-diffs #68292

Uh oh!

Conversation

jakobbotsch commented Apr 20, 2022

Uh oh!

ghost commented Apr 20, 2022

Uh oh!

jakobbotsch commented Apr 21, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakobbotsch Apr 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

jakobbotsch commented Apr 21, 2022

Uh oh!

kunalspathak commented Apr 21, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakobbotsch Apr 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakobbotsch commented Apr 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

jakobbotsch commented Apr 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

jakobbotsch Apr 21, 2022 •

edited

Loading

jakobbotsch Apr 22, 2022 •

edited

Loading

jakobbotsch commented Apr 22, 2022 •

edited

Loading