-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Measure JIT TP in superpmi-diffs #68292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure JIT TP in superpmi-diffs #68292
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT Issue Detailsnull
|
The table is already pretty small so let's just always show them.
|
cc @dotnet/jit-contrib, this is an example of how this looks (the run includes a reversion of #67395 so that something actually shows up): https://dev.azure.com/dnceng/public/_build/results?buildId=1730378&view=ms.vss-build-web.run-extensions-tab I've opted not to add a collapsible section around the tables since they are small (compared to the ASM diffs). For x64 the extra SPMI runs measuring TP add around 15 minutes to the SPMI job. For x86 it's worse since 64-bit addition is much more expensive, there it adds around 30 minutes. There are also two extra JIT builds, but those are done in parallel and only take ~8 minutes. I think the extra time taken for the job is worth it to consistently be able to see TP impact on JIT PRs. The one outstanding problem is that the baseline JIT comes from the rolling build. That means there is the potential for some drift in compiler version used to compile the baseline and diff, which impacts TP measurements. This is worse locally since the CI machines usually are behind the compilers we are using locally -- I see around 1.5% fewer instructions executed with my locally built JIT compared to the CI built JIT. However when things run in CI I don't think this is a problem except during periods where the CI machines are being updated. A few ways forward:
I am partial to do 3 which seems simple to do and should be robust. |
| os_name = "universal" if arch_name.startswith("arm") else os_name | ||
| base_jit_path = os.path.join(coreclr_args.base_jit_directory, 'clrjit_{}_{}_{}.dll'.format(os_name, arch_name, host_arch_name)) | ||
| diff_jit_path = os.path.join(coreclr_args.diff_jit_directory, 'clrjit_{}_{}_{}.dll'.format(os_name, arch_name, host_arch_name)) | ||
| base_checked_jit_path = os.path.join(coreclr_args.base_jit_directory, "checked", 'clrjit_{}_{}_{}.dll'.format(os_name, arch_name, host_arch_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't do the tp-diff as part of superpmi_asmdiffs.py but would write a separate script exclusively for tp-diff. That way you can have a separate CI pipeline flow just to tpdiff without penalizing the asmdiff execution time.
I wouldn't even recommend having a parameter like asmdiff vs. tpdiff to this file, because it just pollutes the logic and hard to refactor in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no reason to introduce yet another pipeline if we want to run TP impact on every PR. It will require even more resources (e.g. redownloading all collections again), it will force JIT devs to look in even more places to evaluate their change, and it does not scale (e.g. I want to add memory stats measurements in the future as well).
I really hope instead to rename the pipeline's "asmdiffs" to "diffs" (in fact, I have done this in a couple of places) and do all the base-diff measurements in this pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, these pipelines spend way more time checking out sources, copying files and building than they do inside superpmi_asmdiffs.py. Currently (before this change) the x64 build spends only around 25 minutes in helix, so only around 1/3 of the build is actually spent inside superpmi_asmdiffs.py. So I do not think it is a bad idea to let it do more work and amortize some of all the other cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will force JIT devs to look in even more places to evaluate their change
+1 on that. Yeah, that justification sounds reasonable to have a single pipeline with both asmdiff/TPdiff. Also, while you are there, and if you have time, I would really want to surface any SPMI replay failures to extension page so we don't have to scan through the log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me look into it separately from this PR. I think we should report all of this back from helix/superpmi as JSON so that the summarize script can do a much better job of laying things out, but it will probably take a bit of work.
kunalspathak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the extra SPMI runs
Have you considered having a separate CI pipeline for this TP-diff?
I am partial to do 3 which seems simple to do and should be robust.
Agree. I am even fine with 4.
| """) | ||
| md_files = [] | ||
|
|
||
| overall_md_asmdiff_summary_file_target = os.path.join(log_directory, "superpmi_diff_summary_{}_{}.md".format(platform_name, arch_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the output table, I am only interested in the actual diff rather than raw numbers. I would suggest to just display the diff % (if possible, in tabular form) and then have an expansion that shows the raw numbers along with diffs. That way, anyway who is looking at the extension page can quickly scan through the numbers to know if TP was regressed or not. Also, leave a note about "lower is better" .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, let me see if I can do that in a nice way. It is complicated a little by how we communicate the summaries back (as markdown files instead of just the data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how it looks now: https://dev.azure.com/dnceng/public/_build/results?buildId=1731173&view=ms.vss-build-web.run-extensions-tab
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better. Looking at the title "Throughput impact on Windows x64", it makes me think that is this the right way to measure the throughput impact for binaries compiled for Linux platforms? On Linux, we use LLVM which would generate different instructions, while here, we are measuring the count of instructions generated by MSVC on cross-compiled clrjit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is somewhat misleading. However I spoke to Bruce about this and we agreed it was better than not having it at all. Bruce pointed out that cross-compilation is still a scenario (for crossgen2) and that it is likely that regressions/improvements with MSVC transfer to Clang/GCC as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is on my backlog to clean up how we lay this file out. I think when I do that I will sort the entries here and point out which ones are cross-compiled to make sure there is less confusion there. But let me leave that for that change.
| if coreclr_args.base_git_hash is None: | ||
| # We've got the current hash; figure out the baseline hash. | ||
| command = [ "git", "merge-base", current_hash, "main" ] | ||
| command = [ "git", "branch", "-r", "--sort=-committerdate", "-v", "--list", "*/main" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old method does not work for my workflow, I never have a 'main' branch.
The new method is to use the newest commit of any branch that matches "*/main".
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
|
This is ready for review. https://dev.azure.com/dnceng/public/_build/results?buildId=1731173&view=ms.vss-build-web.run-extensions-tab is an example of how it currently looks. cc @dotnet/jit-contrib |
|
Could you send a PR that renames |
src/coreclr/scripts/superpmi.py
Outdated
| throughput_parser.add_argument("-base_jit_path", help="Path to baseline clrjit. Defaults to release baseline JIT from rolling build, by computing baseline git hash.") | ||
| throughput_parser.add_argument("-diff_jit_path", help="Path to diff clrjit. Defaults to release Core_Root JIT.") | ||
| throughput_parser.add_argument("-git_hash", help="Use this git hash as the current hash for use to find a baseline JIT. Defaults to current git hash of source tree.") | ||
| throughput_parser.add_argument("-base_git_hash", help="Use this git hash as the baseline JIT hash. Default: search for the baseline hash.") | ||
| throughput_parser.add_argument("-base_jit_option", action="append", help="Option to pass to the baseline JIT. Format is key=value, where key is the option name without leading COMPlus_...") | ||
| throughput_parser.add_argument("-diff_jit_option", action="append", help="Option to pass to the diff JIT. Format is key=value, where key is the option name without leading COMPlus_...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you could add a "diff_parser" parent for shared things between asm_diff_parser and throughput_parser, but I'm ok either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| with open(clrjit_path, "rb") as fh: | ||
| contents = fh.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this isn't actually reading the string via the export added in #68329; it's actually reading the entire JIT native code file and parsing for this string. That seems... indirect (slow, also?).
I wonder if it would be better to do something like what was done to determine the JIT-EE version the JIT was built with by adding -printJITEEVersion as an argument to the SPMI utility mcs (#42238). The idea being, the JIT and mcs will be built in the same build by the same toolset, so we can use mcs as a proxy for how the JIT is built. Then, here you could do something like run mcs -printBuildString. This is already done in this file in determine_jit_ee_version().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unlikely to me that this kind of search would be slower than actually creating a process, mapping the images, etc. It is a simple text search through 1.5 MB of text. And is invoking mcs.exe to have it load the library to get the export really any more direct?
I agree that ideally we would parse the library, but in this case I was just being pragmatic about it -- this is simple and works in a handful of lines. The main reason I made it an export was just to make sure it is not removed by the linker (and also, it sort of makes sense to have it when it does get consumed as if it was an export).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea being, the JIT and mcs will be built in the same build by the same toolset, so we can use mcs as a proxy for how the JIT is built.
I'm not really sure how it works with the JIT-EE GUID right now, but I don't see how we would get the compiler version of the baseline this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you've done it is reasonable, and pragmatic. I suppose I could have done something similar with the JIT-EE GUID as a text string stashed in the JIT dll, as an exported string and/or an embedded string with enough unique context such that it's parseable. Having it as output from running mcs means I can look at it by easily running a command-line tool, which is nice.
The GUID is used to determine the MCH files to use; it's unrelated to finding a baseline.
src/coreclr/scripts/superpmi.py
Outdated
| write_fh.write("|Collection|PDIFF|\n") | ||
| write_fh.write("|---|---|\n") | ||
| for mch_file, base_instructions, diff_instructions in tp_diffs: | ||
| write_fh.write("|{}|{}{:.2f}%|\n".format( | ||
| mch_file, | ||
| "+" if diff_instructions > base_instructions else "", | ||
| (diff_instructions - base_instructions) / base_instructions * 100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this table should be a "significant summary" and only include collections where the PDIFF shown (at the .2f significant digits) is non-zero. The "detailed" can always include full data. This would make the table smaller. If there are no "significant" data it could say that, e.g., "No significant throughput differences found".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, let me work on that.
|
Latest run after the renaming: https://dev.azure.com/dnceng/public/_build/results?buildId=1733433&view=ms.vss-build-web.run-extensions-tab @kunalspathak You're still set as "change requested", did I address those concerns? |
| return | ||
|
|
||
| pin_dir_path = get_pintools_path(coreclr_args) | ||
| pintools_rel_path = "{}/{}/{}.zip".format(az_pintools_root_folder, pintools_current_version, coreclr_args.host_os.lower()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will be uploaded manually once in a while?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is just uploaded manually to Azure Storage. I've versioned it to be safe although that probably wasn't necessary.
kunalspathak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for doing this.
|
Latest run looks good: https://dev.azure.com/dnceng/public/_build/results?buildId=1733713&view=ms.vss-build-web.run-extensions-tab Will merge. |
No description provided.