unionfind: robustly avoid changing `Idx`s in the GVN map by meithecatte · Pull Request #7746 · bytecodealliance/wasmtime

meithecatte · 2024-01-04T11:01:43Z

Stop hoping that keeping the smallest Idx as canonical will yield good results, and instead explicitly inform the UnionFind of what we expect not to move.

Fixes #6126

I attempted running the sightglass benchmark on this, but I think my setup was too noisy – any differences in performance disappeared when I re-ran the specific benchmark on which the difference was reported (though I only did it manually for the few first reported differences).

meithecatte · 2024-01-04T11:09:26Z

cc @cfallin

fitzgen

I don't feel super confident reviewing this on my own. I think it needs a second pair of eyes from @cfallin (on parental leave till end of February IIRC). So I think this PR might need to wait for a bit before it can be merged.

That said, the pinning feels a bit funky to me, in that it breaks down when we union two pinned ids. Would it be possible to turn that warning into a panic? Or is it expected that this will happen and we just have to deal with it as best as we can?

fitzgen · 2024-01-04T20:59:47Z

I attempted running the sightglass benchmark on this, but I think my setup was too noisy – any differences in performance disappeared when I re-ran the specific benchmark on which the difference was reported (though I only did it manually for the few first reported differences).

Are you on linux? If so you can try measuring instructions retired (pass -m insts-retired), which is much much less noisy. I can usually get good results with just a single process and ten iterations.

jameysharp · 2024-01-04T22:03:29Z

Although I haven't carefully reviewed this PR, I'm cautiously optimistic about it. The filetest changes look promising and the general approach feels plausible. Making use of a magic value for "rank" is a clever way to get this behavior almost for free out of an otherwise-standard union-find implementation.

I agree with @fitzgen that it feels unsatisfying to have the case of merging two pinned IDs just print a warning. But it's better than the status quo where we get the same result without any indication that something went wrong.

At worst, this PR would be a good starting point for evaluating how often we have this problem in practice. My guess is we don't see #6126 come up very often in normal use of Cranelift, in which case it's good that this PR is a very small change, algorithmically speaking. But maybe my guess is wrong and it'd be nice to discover that.

I would also be interested to know what fraction of nodes in the union-find do not get pinned with this PR. If we pinned everything before merging, this approach would just warn on every merge. If we merge things and then pin them afterward though, that's fine. I can't really focus right now to picture how this works out but it'd be nice to develop a better understanding.

@meithecatte, if you're using Linux and want lower-variance performance measurements, I've documented the rather extreme measures I use to isolate a CPU core for benchmarks. You don't need performance measurements to justify merging this PR, but if there's a measurable improvement, that's definitely a nice extra argument in favor.

meithecatte · 2024-01-05T00:02:06Z

I agree with @fitzgen that it feels unsatisfying to have the case of merging two pinned IDs just print a warning. But it's better than the status quo where we get the same result without any indication that something went wrong.

If it's any consolation, this happening is in the "missed optimization" category, and won't cause a miscompilation or ICE. See #6126 (comment)

I would also be interested to know what fraction of nodes in the union-find do not get pinned with this PR. If we pinned everything before merging, this approach would just warn on every merge. If we merge things and then pin them afterward though, that's fine. I can't really focus right now to picture how this works out but it'd be nice to develop a better understanding.

The test suite passes if I replace the log::warn with a panic. I've convinced myself that the only situation in which we union two pinned eclasses, is if there's a rewrite rule (or a chain of them) that lets us derive A -> B, and separately A -> C, and yet there's no way to go B -> C nor C -> B in isolation. Intuitively, it seems unlikely that we would have rules where this could arise, but the problem of analyzing a ruleset to reliably decide if it's possible seems like a thesis.

meithecatte · 2024-01-05T00:21:14Z

So, it looks like the warn! isn't shown during cargo test, and in normal runs needs an explicit RUST_LOG=warn or higher. Sounds like the likelyhood of the warning being seen is quite low. I suppose that one option would be to have this be a debug_assert! – scream if we're in debug mode (i.e. someone's probably observing), carry on if we're in release mode (i.e. probably deployed somewhere).

I also tried running the benchmarks as -m insts-retired, as you suggested. This ended as follows:

~/src/sightglass$ time cargo run -- benchmark -m insts-retired --small-workloads --processes 1 --engine /tmp/wasmtime_main.so --engine ~/dev/wasmtime/target/release/libwasmtime_bench_api.so -- benchmarks/all.suite
warning: virtual workspace defaulting to `resolver = "1"` despite one or more workspace members being on edition 2021 which implies `resolver = "2"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
note: for more details see https://doc.rust-lang.org/cargo/reference/resolver.html#resolver-versions
    Finished dev [unoptimized + debuginfo] target(s) in 0.06s
     Running `target/debug/sightglass-cli benchmark -m insts-retired --small-workloads --processes 1 --engine /tmp/wasmtime_main.so --engine /home/maya/dev/wasmtime/target/release/libwasmtime_bench_api.so -- benchmarks/all.suite`
Error: The variance of one of the samples is zero

real	8m15.667s
user	13m56.404s
sys	4m19.762s

elliottt

I like this approach! The trick of using the rank to pick the parent is great. I agree with the comments about using a warning when merging two pinned eclasses, and really wish that we could confidently make that a panic instead.

elliottt · 2024-01-05T00:43:33Z

cranelift/codegen/src/unionfind.rs

+/// won't change (unless, of course, it gets unioned with another pinned
+/// eclass).


Given the discussion on this PR, should this comment change? Maybe pointing out that

two pinned eclasses aren't likely to be merged

a rough outline of when that might happen, and

that the effect of that would be at worst the behavior prior to pinning.

I expanded upon the comments I've added. Let me know if they're up to your expectations now.

cranelift/codegen/src/unionfind.rs

cfallin · 2024-01-05T01:04:48Z

Thanks for tackling this @meithecatte ! As @fitzgen noted, I'm still on parental leave until Feb 12 so I can't review anything properly until then, but I'm watching email and will try not to block things when my input is useful!

I'd agree that this is generally a good approach, with the caveat that we must not warn, and especially not debug-assert, if two pinned values are union'd. As @meithecatte noted, analyzing the rules to prove this can't occur is a hard problem and above the abstraction level we likely want to work at. This case shouldn't happen in normal use (values numbered sequentially as built, rules generally rewriting toward better expressions), but compilation is still correct if it does happen (as noted in this comment that @meithecatte linked), and moreover the entire point of aegraphs is to approximate egraphs with lower cost -- this means that we can expect suboptimal outcomes in some cases (this is the price we pay for no fixpoint loop).

So, since it can happen in normal operation, and is not incorrect, we must not panic (even in a debug build), and we probably shouldn't warn -- that has the potential to create a log-storm issue at the most inopportune time (e.g. a DoS vector in production where the compiler was happy before). My vote would be to choose the lower ID to break the tie, as we do today, but either choice should be valid I think.

meithecatte · 2024-01-05T01:17:18Z

So, since it can happen in normal operation, and is not incorrect, we must not panic (even in a debug build), and we probably shouldn't warn -- that has the potential to create a log-storm issue at the most inopportune time (e.g. a DoS vector in production where the compiler was happy before).

Good point. Perhaps something like the Linux kernel's WARN_ONCE?

My vote would be to choose the lower ID to break the tie, as we do today, but either choice should be valid I think.

Perhaps it indeed makes sense to use this as a heuristic.

I'd agree that this is generally a good approach, with the caveat that we must not warn, and especially not debug-assert, if two pinned values are union'd. As @meithecatte noted, analyzing the rules to prove this can't occur is a hard problem and above the abstraction level we likely want to work at. This case shouldn't happen in normal use (values numbered sequentially as built, rules generally rewriting toward better expressions), but compilation is still correct if it does happen (as noted in #6126 (comment) that @meithecatte linked), and moreover the entire point of aegraphs is to approximate egraphs with lower cost -- this means that we can expect suboptimal outcomes in some cases (this is the price we pay for no fixpoint loop).

Indeed, though I'd argue that this case triggering would probably mean that there's a reasonable rewrite rule that we haven't implemented. I don't know about you, but I'd like to know if it happened.

cfallin · 2024-01-05T01:29:00Z

Indeed, though I'd argue that this case triggering would probably mean that there's a reasonable rewrite rule that we haven't implemented. I don't know about you, but I'd like to know if it happened.

Sure, agreed it would be nice to get this information! In my mind, it looks more like a "compiler stat", like a regalloc spill, or a worst-case fallback isel rule firing: regrettable, but not rising to the level of warn!. Remember that we have a lot of embedders of wasmtime in, e.g., server runtimes and game engines and such, and folks may, e.g., pipe warnings to devops folks (Slack channels and such); an "I should address this" missing optimization to the compiler engineer is a far cry from a "disk is almost full" or "frame dropped" message or something like that.

Or another way perhaps: it is an implementation quality signal rather than a runtime urgency signal. Warnings should occur when the person operating the software might want to take some action, not when the compiler engineer might want to add an optimization.

We have a nascent "stats framework" idea, which appears in the aegraphs framework and in regalloc2 as well; perhaps "two pinned values union'd" is another stat event and whenever we go fishing for optimization work to do, we notice if the value is nonzero?

meithecatte · 2024-01-09T07:11:59Z

We have a nascent "stats framework" idea, which appears in the aegraphs framework and in regalloc2 as well; perhaps "two pinned values union'd" is another stat event and whenever we go fishing for optimization work to do, we notice if the value is nonzero?

Okay, you've convinced me, I made it a stat. It's a bit of a hack – should I pass in a &mut Stats as a separate argument to union instead? Doing it this way seemed less ugly...

jameysharp · 2024-01-09T22:37:48Z

Okay, you've convinced me, I made it a stat. It's a bit of a hack – should I pass in a &mut Stats as a separate argument to union instead? Doing it this way seemed less ugly...

Yeah, I think your approach is perfectly reasonable, and better than trying to pass the whole stats struct in repeatedly.

cfallin · 2024-01-11T16:22:15Z

Looks good to me too! I'll allow someone else to do the detailed review since I'm still technically on leave but all my concerns are addressed :-)

cfallin · 2024-02-11T19:13:14Z

@meithecatte would you mind rebasing on the latest main, just to be sure everything is still good (since some other egraphs-related changes have been made in the meantime)?

Sorry for the delay on this; I'm back from leave officially tomorrow and will review and merge if tests are green.

Stop hoping that keeping the smallest Idx as canonical will yield good results, and instead explicitly inform the UnionFind of what we expect not to move. Fixes bytecodealliance#6126

Emitting a warning in this situation is too much.

meithecatte · 2024-02-12T03:46:17Z

@meithecatte would you mind rebasing on the latest main, just to be sure everything is still good (since some other egraphs-related changes have been made in the meantime)?

Sure thing! No conflicts, and the tests passed on my end.

Sorry for the delay on this; I'm back from leave officially tomorrow and will review and merge if tests are green.

Don't worry, I don't mind.

cfallin

LGTM, thanks!

meithecatte requested a review from a team as a code owner January 4, 2024 11:01

meithecatte requested review from fitzgen and removed request for a team January 4, 2024 11:01

meithecatte force-pushed the gvn-reasoned-merge branch from 2dc8e4d to c68f5b6 Compare January 4, 2024 11:06

github-actions bot added the cranelift Issues related to the Cranelift code generator label Jan 4, 2024

fitzgen reviewed Jan 4, 2024

View reviewed changes

elliottt self-requested a review January 4, 2024 22:09

elliottt reviewed Jan 5, 2024

View reviewed changes

meithecatte force-pushed the gvn-reasoned-merge branch 2 times, most recently from 9e2a348 to 581e6e0 Compare January 9, 2024 07:09

meithecatte added 2 commits February 12, 2024 04:04

unionfind: robustly avoid changing Idxs in the GVN map

07601be

Stop hoping that keeping the smallest Idx as canonical will yield good results, and instead explicitly inform the UnionFind of what we expect not to move. Fixes bytecodealliance#6126

unionfind: track unions of pinned indices as stats

d31a3e5

Emitting a warning in this situation is too much.

meithecatte force-pushed the gvn-reasoned-merge branch from 581e6e0 to d31a3e5 Compare February 12, 2024 03:04

cfallin approved these changes Feb 12, 2024

View reviewed changes

cfallin added this pull request to the merge queue Feb 12, 2024

Merged via the queue into bytecodealliance:main with commit e7ab3a4 Feb 12, 2024

		/// won't change (unless, of course, it gets unioned with another pinned
		/// eclass).

Conversation

meithecatte commented Jan 4, 2024

Uh oh!

meithecatte commented Jan 4, 2024

Uh oh!

fitzgen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fitzgen commented Jan 4, 2024

Uh oh!

jameysharp commented Jan 4, 2024

Uh oh!

meithecatte commented Jan 5, 2024

Uh oh!

meithecatte commented Jan 5, 2024

Uh oh!

elliottt left a comment

Choose a reason for hiding this comment

Uh oh!

elliottt Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meithecatte Jan 9, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cfallin commented Jan 5, 2024

Uh oh!

meithecatte commented Jan 5, 2024

Uh oh!

cfallin commented Jan 5, 2024

Uh oh!

meithecatte commented Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jameysharp commented Jan 9, 2024

Uh oh!

cfallin commented Jan 11, 2024

Uh oh!

cfallin commented Feb 11, 2024

Uh oh!

meithecatte commented Feb 12, 2024

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fitzgen left a comment •

edited

Loading

elliottt Jan 5, 2024 •

edited

Loading

meithecatte commented Jan 9, 2024 •

edited

Loading