Skip to content

unionfind: robustly avoid changing Idxs in the GVN map#7746

Merged
cfallin merged 2 commits intobytecodealliance:mainfrom
meithecatte:gvn-reasoned-merge
Feb 12, 2024
Merged

unionfind: robustly avoid changing Idxs in the GVN map#7746
cfallin merged 2 commits intobytecodealliance:mainfrom
meithecatte:gvn-reasoned-merge

Conversation

@meithecatte
Copy link
Contributor

Stop hoping that keeping the smallest Idx as canonical will yield good results, and instead explicitly inform the UnionFind of what we expect not to move.

Fixes #6126

I attempted running the sightglass benchmark on this, but I think my setup was too noisy – any differences in performance disappeared when I re-ran the specific benchmark on which the difference was reported (though I only did it manually for the few first reported differences).

@meithecatte meithecatte requested a review from a team as a code owner January 4, 2024 11:01
@meithecatte meithecatte requested review from fitzgen and removed request for a team January 4, 2024 11:01
@meithecatte
Copy link
Contributor Author

cc @cfallin

@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Jan 4, 2024
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel super confident reviewing this on my own. I think it needs a second pair of eyes from @cfallin (on parental leave till end of February IIRC). So I think this PR might need to wait for a bit before it can be merged.

That said, the pinning feels a bit funky to me, in that it breaks down when we union two pinned ids. Would it be possible to turn that warning into a panic? Or is it expected that this will happen and we just have to deal with it as best as we can?

@fitzgen
Copy link
Member

fitzgen commented Jan 4, 2024

I attempted running the sightglass benchmark on this, but I think my setup was too noisy – any differences in performance disappeared when I re-ran the specific benchmark on which the difference was reported (though I only did it manually for the few first reported differences).

Are you on linux? If so you can try measuring instructions retired (pass -m insts-retired), which is much much less noisy. I can usually get good results with just a single process and ten iterations.

@jameysharp
Copy link
Contributor

Although I haven't carefully reviewed this PR, I'm cautiously optimistic about it. The filetest changes look promising and the general approach feels plausible. Making use of a magic value for "rank" is a clever way to get this behavior almost for free out of an otherwise-standard union-find implementation.

I agree with @fitzgen that it feels unsatisfying to have the case of merging two pinned IDs just print a warning. But it's better than the status quo where we get the same result without any indication that something went wrong.

At worst, this PR would be a good starting point for evaluating how often we have this problem in practice. My guess is we don't see #6126 come up very often in normal use of Cranelift, in which case it's good that this PR is a very small change, algorithmically speaking. But maybe my guess is wrong and it'd be nice to discover that.

I would also be interested to know what fraction of nodes in the union-find do not get pinned with this PR. If we pinned everything before merging, this approach would just warn on every merge. If we merge things and then pin them afterward though, that's fine. I can't really focus right now to picture how this works out but it'd be nice to develop a better understanding.

@meithecatte, if you're using Linux and want lower-variance performance measurements, I've documented the rather extreme measures I use to isolate a CPU core for benchmarks. You don't need performance measurements to justify merging this PR, but if there's a measurable improvement, that's definitely a nice extra argument in favor.

@elliottt elliottt self-requested a review January 4, 2024 22:09
@meithecatte
Copy link
Contributor Author

I agree with @fitzgen that it feels unsatisfying to have the case of merging two pinned IDs just print a warning. But it's better than the status quo where we get the same result without any indication that something went wrong.

If it's any consolation, this happening is in the "missed optimization" category, and won't cause a miscompilation or ICE. See #6126 (comment)

I would also be interested to know what fraction of nodes in the union-find do not get pinned with this PR. If we pinned everything before merging, this approach would just warn on every merge. If we merge things and then pin them afterward though, that's fine. I can't really focus right now to picture how this works out but it'd be nice to develop a better understanding.

The test suite passes if I replace the log::warn with a panic. I've convinced myself that the only situation in which we union two pinned eclasses, is if there's a rewrite rule (or a chain of them) that lets us derive A -> B, and separately A -> C, and yet there's no way to go B -> C nor C -> B in isolation. Intuitively, it seems unlikely that we would have rules where this could arise, but the problem of analyzing a ruleset to reliably decide if it's possible seems like a thesis.

@meithecatte
Copy link
Contributor Author

So, it looks like the warn! isn't shown during cargo test, and in normal runs needs an explicit RUST_LOG=warn or higher. Sounds like the likelyhood of the warning being seen is quite low. I suppose that one option would be to have this be a debug_assert! – scream if we're in debug mode (i.e. someone's probably observing), carry on if we're in release mode (i.e. probably deployed somewhere).

I also tried running the benchmarks as -m insts-retired, as you suggested. This ended as follows:

~/src/sightglass$ time cargo run -- benchmark -m insts-retired --small-workloads --processes 1 --engine /tmp/wasmtime_main.so --engine ~/dev/wasmtime/target/release/libwasmtime_bench_api.so -- benchmarks/all.suite
warning: virtual workspace defaulting to `resolver = "1"` despite one or more workspace members being on edition 2021 which implies `resolver = "2"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
note: for more details see https://doc.rust-lang.org/cargo/reference/resolver.html#resolver-versions
    Finished dev [unoptimized + debuginfo] target(s) in 0.06s
     Running `target/debug/sightglass-cli benchmark -m insts-retired --small-workloads --processes 1 --engine /tmp/wasmtime_main.so --engine /home/maya/dev/wasmtime/target/release/libwasmtime_bench_api.so -- benchmarks/all.suite`
Error: The variance of one of the samples is zero

real	8m15.667s
user	13m56.404s
sys	4m19.762s

Copy link
Member

@elliottt elliottt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach! The trick of using the rank to pick the parent is great. I agree with the comments about using a warning when merging two pinned eclasses, and really wish that we could confidently make that a panic instead.

Comment on lines +14 to +15
/// won't change (unless, of course, it gets unioned with another pinned
/// eclass).
Copy link
Member

@elliottt elliottt Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the discussion on this PR, should this comment change? Maybe pointing out that

  1. two pinned eclasses aren't likely to be merged
  2. a rough outline of when that might happen, and
  3. that the effect of that would be at worst the behavior prior to pinning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded upon the comments I've added. Let me know if they're up to your expectations now.

@cfallin
Copy link
Member

cfallin commented Jan 5, 2024

Thanks for tackling this @meithecatte ! As @fitzgen noted, I'm still on parental leave until Feb 12 so I can't review anything properly until then, but I'm watching email and will try not to block things when my input is useful!

I'd agree that this is generally a good approach, with the caveat that we must not warn, and especially not debug-assert, if two pinned values are union'd. As @meithecatte noted, analyzing the rules to prove this can't occur is a hard problem and above the abstraction level we likely want to work at. This case shouldn't happen in normal use (values numbered sequentially as built, rules generally rewriting toward better expressions), but compilation is still correct if it does happen (as noted in this comment that @meithecatte linked), and moreover the entire point of aegraphs is to approximate egraphs with lower cost -- this means that we can expect suboptimal outcomes in some cases (this is the price we pay for no fixpoint loop).

So, since it can happen in normal operation, and is not incorrect, we must not panic (even in a debug build), and we probably shouldn't warn -- that has the potential to create a log-storm issue at the most inopportune time (e.g. a DoS vector in production where the compiler was happy before). My vote would be to choose the lower ID to break the tie, as we do today, but either choice should be valid I think.

@meithecatte
Copy link
Contributor Author

So, since it can happen in normal operation, and is not incorrect, we must not panic (even in a debug build), and we probably shouldn't warn -- that has the potential to create a log-storm issue at the most inopportune time (e.g. a DoS vector in production where the compiler was happy before).

Good point. Perhaps something like the Linux kernel's WARN_ONCE?

My vote would be to choose the lower ID to break the tie, as we do today, but either choice should be valid I think.

Perhaps it indeed makes sense to use this as a heuristic.

I'd agree that this is generally a good approach, with the caveat that we must not warn, and especially not debug-assert, if two pinned values are union'd. As @meithecatte noted, analyzing the rules to prove this can't occur is a hard problem and above the abstraction level we likely want to work at. This case shouldn't happen in normal use (values numbered sequentially as built, rules generally rewriting toward better expressions), but compilation is still correct if it does happen (as noted in #6126 (comment) that @meithecatte linked), and moreover the entire point of aegraphs is to approximate egraphs with lower cost -- this means that we can expect suboptimal outcomes in some cases (this is the price we pay for no fixpoint loop).

Indeed, though I'd argue that this case triggering would probably mean that there's a reasonable rewrite rule that we haven't implemented. I don't know about you, but I'd like to know if it happened.

@cfallin
Copy link
Member

cfallin commented Jan 5, 2024

Indeed, though I'd argue that this case triggering would probably mean that there's a reasonable rewrite rule that we haven't implemented. I don't know about you, but I'd like to know if it happened.

Sure, agreed it would be nice to get this information! In my mind, it looks more like a "compiler stat", like a regalloc spill, or a worst-case fallback isel rule firing: regrettable, but not rising to the level of warn!. Remember that we have a lot of embedders of wasmtime in, e.g., server runtimes and game engines and such, and folks may, e.g., pipe warnings to devops folks (Slack channels and such); an "I should address this" missing optimization to the compiler engineer is a far cry from a "disk is almost full" or "frame dropped" message or something like that.

Or another way perhaps: it is an implementation quality signal rather than a runtime urgency signal. Warnings should occur when the person operating the software might want to take some action, not when the compiler engineer might want to add an optimization.

We have a nascent "stats framework" idea, which appears in the aegraphs framework and in regalloc2 as well; perhaps "two pinned values union'd" is another stat event and whenever we go fishing for optimization work to do, we notice if the value is nonzero?

@meithecatte meithecatte force-pushed the gvn-reasoned-merge branch 2 times, most recently from 9e2a348 to 581e6e0 Compare January 9, 2024 07:09
@meithecatte
Copy link
Contributor Author

meithecatte commented Jan 9, 2024

We have a nascent "stats framework" idea, which appears in the aegraphs framework and in regalloc2 as well; perhaps "two pinned values union'd" is another stat event and whenever we go fishing for optimization work to do, we notice if the value is nonzero?

Okay, you've convinced me, I made it a stat. It's a bit of a hack – should I pass in a &mut Stats as a separate argument to union instead? Doing it this way seemed less ugly...

@jameysharp
Copy link
Contributor

Okay, you've convinced me, I made it a stat. It's a bit of a hack – should I pass in a &mut Stats as a separate argument to union instead? Doing it this way seemed less ugly...

Yeah, I think your approach is perfectly reasonable, and better than trying to pass the whole stats struct in repeatedly.

@cfallin
Copy link
Member

cfallin commented Jan 11, 2024

Looks good to me too! I'll allow someone else to do the detailed review since I'm still technically on leave but all my concerns are addressed :-)

@cfallin
Copy link
Member

cfallin commented Feb 11, 2024

@meithecatte would you mind rebasing on the latest main, just to be sure everything is still good (since some other egraphs-related changes have been made in the meantime)?

Sorry for the delay on this; I'm back from leave officially tomorrow and will review and merge if tests are green.

Stop hoping that keeping the smallest Idx as canonical will yield good
results, and instead explicitly inform the UnionFind of what we expect
not to move.

Fixes bytecodealliance#6126
Emitting a warning in this situation is too much.
@meithecatte
Copy link
Contributor Author

@meithecatte would you mind rebasing on the latest main, just to be sure everything is still good (since some other egraphs-related changes have been made in the meantime)?

Sure thing! No conflicts, and the tests passed on my end.

Sorry for the delay on this; I'm back from leave officially tomorrow and will review and merge if tests are green.

Don't worry, I don't mind.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@cfallin cfallin added this pull request to the merge queue Feb 12, 2024
Merged via the queue into bytecodealliance:main with commit e7ab3a4 Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cranelift: GVN depends on value definition order matching visit order

5 participants