Skip to content

Conversation

@jakobbotsch
Copy link
Member

No description provided.

@ghost ghost assigned jakobbotsch Apr 20, 2022
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 20, 2022
@ghost
Copy link

ghost commented Apr 20, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib, this is an example of how this looks (the run includes a reversion of #67395 so that something actually shows up): https://dev.azure.com/dnceng/public/_build/results?buildId=1730378&view=ms.vss-build-web.run-extensions-tab
Ignore the <empty> for the ASM diffs which is a small bug I will fix.

I've opted not to add a collapsible section around the tables since they are small (compared to the ASM diffs).

For x64 the extra SPMI runs measuring TP add around 15 minutes to the SPMI job. For x86 it's worse since 64-bit addition is much more expensive, there it adds around 30 minutes. There are also two extra JIT builds, but those are done in parallel and only take ~8 minutes. I think the extra time taken for the job is worth it to consistently be able to see TP impact on JIT PRs.

The one outstanding problem is that the baseline JIT comes from the rolling build. That means there is the potential for some drift in compiler version used to compile the baseline and diff, which impacts TP measurements. This is worse locally since the CI machines usually are behind the compilers we are using locally -- I see around 1.5% fewer instructions executed with my locally built JIT compared to the CI built JIT. However when things run in CI I don't think this is a problem except during periods where the CI machines are being updated.

A few ways forward:

  1. Build both baseline JITs and diff JITs in the CI job. This probably requires quite a bit more work as this is pretty new territory for our CI jobs. Also takes more resources.
  2. Try to find the versions of the tools used to build the JITs in superpmi.py and report a warning back when they do not match. In particular link.exe records its own version inside the PE header, but I don't think cl.exe versions together with link.exe.
  3. Like above, but we change the JIT to embed _MSC_FULL_VER (and similar on other platforms) somewhere that superpmi.py can find it.
  4. Do nothing and just accept that the versions may drift once in a while and we will see TP impact that may not be because of the PR being tested.

I am partial to do 3 which seems simple to do and should be robust.

os_name = "universal" if arch_name.startswith("arm") else os_name
base_jit_path = os.path.join(coreclr_args.base_jit_directory, 'clrjit_{}_{}_{}.dll'.format(os_name, arch_name, host_arch_name))
diff_jit_path = os.path.join(coreclr_args.diff_jit_directory, 'clrjit_{}_{}_{}.dll'.format(os_name, arch_name, host_arch_name))
base_checked_jit_path = os.path.join(coreclr_args.base_jit_directory, "checked", 'clrjit_{}_{}_{}.dll'.format(os_name, arch_name, host_arch_name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do the tp-diff as part of superpmi_asmdiffs.py but would write a separate script exclusively for tp-diff. That way you can have a separate CI pipeline flow just to tpdiff without penalizing the asmdiff execution time.

I wouldn't even recommend having a parameter like asmdiff vs. tpdiff to this file, because it just pollutes the logic and hard to refactor in future.

Copy link
Member Author

@jakobbotsch jakobbotsch Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason to introduce yet another pipeline if we want to run TP impact on every PR. It will require even more resources (e.g. redownloading all collections again), it will force JIT devs to look in even more places to evaluate their change, and it does not scale (e.g. I want to add memory stats measurements in the future as well).

I really hope instead to rename the pipeline's "asmdiffs" to "diffs" (in fact, I have done this in a couple of places) and do all the base-diff measurements in this pipeline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, these pipelines spend way more time checking out sources, copying files and building than they do inside superpmi_asmdiffs.py. Currently (before this change) the x64 build spends only around 25 minutes in helix, so only around 1/3 of the build is actually spent inside superpmi_asmdiffs.py. So I do not think it is a bad idea to let it do more work and amortize some of all the other cost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will force JIT devs to look in even more places to evaluate their change

+1 on that. Yeah, that justification sounds reasonable to have a single pipeline with both asmdiff/TPdiff. Also, while you are there, and if you have time, I would really want to surface any SPMI replay failures to extension page so we don't have to scan through the log.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me look into it separately from this PR. I think we should report all of this back from helix/superpmi as JSON so that the summarize script can do a much better job of laying things out, but it will probably take a bit of work.

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the extra SPMI runs

Have you considered having a separate CI pipeline for this TP-diff?

I am partial to do 3 which seems simple to do and should be robust.

Agree. I am even fine with 4.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 21, 2022
""")
md_files = []

overall_md_asmdiff_summary_file_target = os.path.join(log_directory, "superpmi_diff_summary_{}_{}.md".format(platform_name, arch_name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the output table, I am only interested in the actual diff rather than raw numbers. I would suggest to just display the diff % (if possible, in tabular form) and then have an expansion that shows the raw numbers along with diffs. That way, anyway who is looking at the extension page can quickly scan through the numbers to know if TP was regressed or not. Also, leave a note about "lower is better" .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, let me see if I can do that in a nice way. It is complicated a little by how we communicate the summaries back (as markdown files instead of just the data).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better. Looking at the title "Throughput impact on Windows x64", it makes me think that is this the right way to measure the throughput impact for binaries compiled for Linux platforms? On Linux, we use LLVM which would generate different instructions, while here, we are measuring the count of instructions generated by MSVC on cross-compiled clrjit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is somewhat misleading. However I spoke to Bruce about this and we agreed it was better than not having it at all. Bruce pointed out that cross-compilation is still a scenario (for crossgen2) and that it is likely that regressions/improvements with MSVC transfer to Clang/GCC as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is on my backlog to clean up how we lay this file out. I think when I do that I will sort the entries here and point out which ones are cross-compiled to make sure there is less confusion there. But let me leave that for that change.

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 21, 2022
if coreclr_args.base_git_hash is None:
# We've got the current hash; figure out the baseline hash.
command = [ "git", "merge-base", current_hash, "main" ]
command = [ "git", "branch", "-r", "--sort=-committerdate", "-v", "--list", "*/main" ]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old method does not work for my workflow, I never have a 'main' branch.
The new method is to use the newest commit of any branch that matches "*/main".

This comment was marked as outdated.

@jakobbotsch jakobbotsch marked this pull request as ready for review April 21, 2022 20:53
@jakobbotsch
Copy link
Member Author

This is ready for review. https://dev.azure.com/dnceng/public/_build/results?buildId=1731173&view=ms.vss-build-web.run-extensions-tab is an example of how it currently looks.

cc @dotnet/jit-contrib

@jakobbotsch jakobbotsch linked an issue Apr 21, 2022 that may be closed by this pull request
@kunalspathak
Copy link
Contributor

Could you send a PR that renames superpmi-asmdiff.* to superpmi-diff.* and then rebase this PR on top of it? Don't want you to do it in this PR otherwise your changes will not be visible in the renamed files.

@jakobbotsch jakobbotsch changed the title Measure JIT TP in superpmi-asmdiffs Measure JIT TP in superpmi-diffs Apr 22, 2022
Comment on lines 348 to 353
throughput_parser.add_argument("-base_jit_path", help="Path to baseline clrjit. Defaults to release baseline JIT from rolling build, by computing baseline git hash.")
throughput_parser.add_argument("-diff_jit_path", help="Path to diff clrjit. Defaults to release Core_Root JIT.")
throughput_parser.add_argument("-git_hash", help="Use this git hash as the current hash for use to find a baseline JIT. Defaults to current git hash of source tree.")
throughput_parser.add_argument("-base_git_hash", help="Use this git hash as the baseline JIT hash. Default: search for the baseline hash.")
throughput_parser.add_argument("-base_jit_option", action="append", help="Option to pass to the baseline JIT. Format is key=value, where key is the option name without leading COMPlus_...")
throughput_parser.add_argument("-diff_jit_option", action="append", help="Option to pass to the diff JIT. Format is key=value, where key is the option name without leading COMPlus_...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you could add a "diff_parser" parent for shared things between asm_diff_parser and throughput_parser, but I'm ok either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +509 to +510
with open(clrjit_path, "rb") as fh:
contents = fh.read()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this isn't actually reading the string via the export added in #68329; it's actually reading the entire JIT native code file and parsing for this string. That seems... indirect (slow, also?).

I wonder if it would be better to do something like what was done to determine the JIT-EE version the JIT was built with by adding -printJITEEVersion as an argument to the SPMI utility mcs (#42238). The idea being, the JIT and mcs will be built in the same build by the same toolset, so we can use mcs as a proxy for how the JIT is built. Then, here you could do something like run mcs -printBuildString. This is already done in this file in determine_jit_ee_version().

Copy link
Member Author

@jakobbotsch jakobbotsch Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unlikely to me that this kind of search would be slower than actually creating a process, mapping the images, etc. It is a simple text search through 1.5 MB of text. And is invoking mcs.exe to have it load the library to get the export really any more direct?

I agree that ideally we would parse the library, but in this case I was just being pragmatic about it -- this is simple and works in a handful of lines. The main reason I made it an export was just to make sure it is not removed by the linker (and also, it sort of makes sense to have it when it does get consumed as if it was an export).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea being, the JIT and mcs will be built in the same build by the same toolset, so we can use mcs as a proxy for how the JIT is built.

I'm not really sure how it works with the JIT-EE GUID right now, but I don't see how we would get the compiler version of the baseline this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you've done it is reasonable, and pragmatic. I suppose I could have done something similar with the JIT-EE GUID as a text string stashed in the JIT dll, as an exported string and/or an embedded string with enough unique context such that it's parseable. Having it as output from running mcs means I can look at it by easily running a command-line tool, which is nice.

The GUID is used to determine the MCH files to use; it's unrelated to finding a baseline.

Comment on lines 1975 to 1981
write_fh.write("|Collection|PDIFF|\n")
write_fh.write("|---|---|\n")
for mch_file, base_instructions, diff_instructions in tp_diffs:
write_fh.write("|{}|{}{:.2f}%|\n".format(
mch_file,
"+" if diff_instructions > base_instructions else "",
(diff_instructions - base_instructions) / base_instructions * 100))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this table should be a "significant summary" and only include collections where the PDIFF shown (at the .2f significant digits) is non-zero. The "detailed" can always include full data. This would make the table smaller. If there are no "significant" data it could say that, e.g., "No significant throughput differences found".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, let me work on that.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Apr 22, 2022

Latest run after the renaming: https://dev.azure.com/dnceng/public/_build/results?buildId=1733433&view=ms.vss-build-web.run-extensions-tab
There are some TP diffs currently, I assume because the baseline release JIT is is finding is a few commits old. It should right itself once we get a JIT change.

@kunalspathak You're still set as "change requested", did I address those concerns?

return

pin_dir_path = get_pintools_path(coreclr_args)
pintools_rel_path = "{}/{}/{}.zip".format(az_pintools_root_folder, pintools_current_version, coreclr_args.host_os.lower())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will be uploaded manually once in a while?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is just uploaded manually to Azure Storage. I've versioned it to be safe although that probably wasn't necessary.

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for doing this.

@jakobbotsch
Copy link
Member Author

@jakobbotsch jakobbotsch merged commit db56ff8 into dotnet:main Apr 23, 2022
@jakobbotsch jakobbotsch deleted the superpmi-measure-throughput branch April 23, 2022 05:55
@ghost ghost locked as resolved and limited conversation to collaborators May 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pin over SPMI has unexpectedly high variance

3 participants