egraph: Ensure eclass members are all available in the same block by jameysharp · Pull Request #8006 · bytecodealliance/wasmtime

jameysharp · 2024-02-28T02:08:05Z

During elaboration, we want the property that it's safe to pick any member of an eclass to implement that eclass. To establish that property, this commit defines a notion of the "available block" for every value, and enforces that all values in an eclass have the same available block.

Closes #7891 where we devised this plan.

TODO: this needs more comments, especially documenting the available_block structure and the linear-time filtering loop in optimize_pure_enode. If somebody else wants to write those I would really appreciate it.

Future work: immediately after optimized_values.push(orig_value), we should try optimized_values.sort_unstable() and then optimized_values.dedup(). That avoids creating union nodes for duplicate values, which should save a little time downstream in the compiler pipeline, but causes values to get renumbered in some filetests so they need more inspection. It also means we can delete the if optimized_value == orig_value test in the loop afterward. But it's an O(n log n) pass over a potentially large list so we should benchmark the change too.

cc: @elliottt @cfallin @lpereira

During elaboration, we want the property that it's safe to pick any member of an eclass to implement that eclass. To establish that property, this commit defines a notion of the "available block" for every value, and enforces that all values in an eclass have the same available block. Closes bytecodealliance#7891 where we devised this plan. Co-authored-by: L Pereira <l.pereira@fastly.com>

fitzgen · 2024-02-28T14:34:59Z

cranelift/codegen/src/egraph.rs

        // maybe just one new value marked as "subsuming" the
        // original, if present.)
-        let mut union_value = orig_value;
+        let mut union_value = optimized_values.pop().unwrap();


Rather than the indirect pushing and then popping, this could be

Suggested change

let mut union_value = optimized_values.pop().unwrap();

let mut union_value = optimized_values.pop().unwrap_or(orig_value);

Then, as a reader, I don't have to wonder if the push is going to trigger a realloc or not (probably not because it is a smallvec and we limit the number of returns from ISLE in ISLE, but I think it is possible if we return the max ISLE returns and then push orig_vlaue?)

We need the ability to filter out the original value, not just the optimized values. We could add an extra check after the loop to compare the original value's available block against best_block, but that means pretty much duplicating the body of the loop, and I think the cleanest thing to do then would still involve pushing the original value sometimes.

Besides, this smallvec is allocated with size MATCHES_LIMIT, which is 5, but ISLE truncates at MAX_ISLE_RETURNS, which is 8, so we already might heap-allocate. Now that I've learned that I have a suggestion for simplifying ISLE and fixing that discrepancy at the same time, but that should be a separate issue.

The simplest way to ensure that this push can't cause a realloc is to push before calling constructor_simplify, which does not require that the provided vector is empty on entry. I can do that if you prefer.

fitzgen · 2024-02-28T14:42:32Z

cranelift/codegen/src/egraph.rs

+        let mut available_block: SecondaryMap<Value, Block> =
+            SecondaryMap::with_default(Block::reserved_value());


Might be worth pre-reserving space here, since IIUC we are going to end up filling this in for every single value.

Pre-allocating space for this map is not totally trivial for two reasons.

We add more values to the data-flow graph while the egraph pass is running and I don't know what bound to use if we want to avoid reallocating later.

SecondaryMap has with_default and with_capacity constructors, but no constructor that does both, and types like Block don't implement Default. This is mostly just an annoyance since we can call resize after construction, but resize initializes all of the backing memory, while with_capacity doesn't, so it might make a difference.

I'll go ahead and do an initial resize to the number of values present before the start of the egraph pass, but if you have any suggestions for estimating a reasonable larger size, let me know.

cfallin

Very nice overall! Just some comment nits below.

Also, could we have a test case that triggers the trimming behavior, perhaps with our subsume-for-correctness disabled? Actually, a thought on that: we could (with this PR, not before!) add an option disable_egraph_subsume to Cranelift; keep it off by default, but add it to a test we know triggers this auto-subsume; then fuzz with the option; then eventually turn it on and remove our careful subsume cases.

cfallin · 2024-02-28T15:49:14Z

cranelift/codegen/src/egraph.rs

+                if self.domtree.dominates(x, y) {
+                    Ordering::Less
+                } else {
+                    debug_assert!(self.domtree.dominates(y, x));


Can we add a comment here about the SSA property we're relying on? Something like: "Note that the def-point of all arguments to an instruction in SSA lie on a line of direct ancestors in the domtree, and so do their available-blocks. This means that either x dom y or y dom x; they cannot be unordered wrt each other."

cfallin · 2024-02-28T16:00:55Z

cranelift/codegen/src/egraph.rs


+        optimized_values.push(orig_value);
+
+        let mut best_block = self.available_block[*optimized_values.last().unwrap()];


This loop looks correct to me (very slick in-place compaction!) but could we have a comment describing what it's doing? It took me a little bit, with knowledge of what it's supposed to be doing, to grok it. Perhaps something like: "Remove any values from optimized_values that do not have the highest possible available block in the domtree." at the top, and "This value's available block is higher; all higher indices we've already scanned have lower blocks and need to be removed, so truncate after this element." in the first case, and "This value's available block is lower, and we want to skip it; remove it from the list and keep going (order does not matter so swap_remove is OK)." in the second?

One of the things I'm not sure here is that, when swap_remove() is called, the element that was at the end of optimized_values is put in the idx-th position, but then it's never used again to lookup this_block for the next iteration. Is this the desirable behavior, or am I missing something here?

Edit: yeah, this seems fine because the loop goes in reverse.

(Re: your edit) yep, indeed; I think it might be good to add a comment with the loop invariant here, for clarity. Something like: "Loop scans in reverse; all values at indices >= idx have the same available block, which is the best (highest in domtree) available block seen so far."

- add documentation comments - push orig_value onto optimized_values before calling simplify so it can't cause a heap reallocation

jameysharp

I think I've addressed all the current review comments and added comments in all the places I intended to add them. I expect this will still fail CI as I haven't investigated why it failed before yet.

jameysharp · 2024-03-04T17:08:18Z

cranelift/codegen/src/egraph.rs

        // maybe just one new value marked as "subsuming" the
        // original, if present.)
-        let mut union_value = orig_value;
+        let mut union_value = optimized_values.pop().unwrap();


We need the ability to filter out the original value, not just the optimized values. We could add an extra check after the loop to compare the original value's available block against best_block, but that means pretty much duplicating the body of the loop, and I think the cleanest thing to do then would still involve pushing the original value sometimes.

Besides, this smallvec is allocated with size MATCHES_LIMIT, which is 5, but ISLE truncates at MAX_ISLE_RETURNS, which is 8, so we already might heap-allocate. Now that I've learned that I have a suggestion for simplifying ISLE and fixing that discrepancy at the same time, but that should be a separate issue.

The simplest way to ensure that this push can't cause a realloc is to push before calling constructor_simplify, which does not require that the provided vector is empty on entry. I can do that if you prefer.

jameysharp · 2024-03-04T20:11:19Z

cranelift/codegen/src/egraph.rs

+        let mut available_block: SecondaryMap<Value, Block> =
+            SecondaryMap::with_default(Block::reserved_value());


Pre-allocating space for this map is not totally trivial for two reasons.

We add more values to the data-flow graph while the egraph pass is running and I don't know what bound to use if we want to avoid reallocating later.

SecondaryMap has with_default and with_capacity constructors, but no constructor that does both, and types like Block don't implement Default. This is mostly just an annoyance since we can call resize after construction, but resize initializes all of the backing memory, while with_capacity doesn't, so it might make a difference.

I'll go ahead and do an initial resize to the number of values present before the start of the egraph pass, but if you have any suggestions for estimating a reasonable larger size, let me know.

cfallin

LGTM wrt my comments at least; thanks very much for working on this!

jameysharp

I forgot about Chris' request for some new test cases, so I guess I haven't actually addressed all review comments yet.

And I think I don't want to address Nick's request, to avoid a possible realloc, in this PR. It's an easy change but it changes a bunch of filetests unnecessarily because it means that the optimized_values and orig_value are visited in a different order.

Also, and more importantly, there's something wrong with GVN.

jameysharp · 2024-03-04T23:57:37Z

cranelift/codegen/src/egraph.rs

+                debug_assert_eq!(
+                    self.get_available_block(inst),
+                    self.available_block[orig_result]
+                );


This debug-assert is failing in a few of the wasm spec tests and I don't understand why yet.

In every case that's failed, the GVN map says the instruction we looked up is equivalent to an existing instruction, but the available-block for the existing instruction is the entry block, while the available-block for the new instruction is somewhere deeper in the dominator tree.

I believed that GVN would always yield the same available-block for pure instructions so this is concerning.

I'm still reasoning through this. For GVN to match, both instructions must be identical, except that each operand is only required to be in the same set in the union-find rather than actually equal. The assertion failing means that for some operand in the instruction, the two instructions consume values which are in the same union-find set but have different available-blocks.

I had been concerned about this specific situation but Chris and I reasoned that it was okay (#7891 (comment)) and now I'm confused about whether we were right and the assertion is overly strict. If so, what is the correct available-block to use here?

I suspect what is happening here is that the instruction gets rewritten the first time it is seen and the GVN-map entry points to the rewritten result (which could be e.g. an iconst available everywhere), even though the instruction per its arguments is only available in one block. So I think the assertion that we want is that the available block of orig_result (which we're going to use as our value here) dominates the available block of inst, rather than is exactly the same. I think that resolves the issue?

That does make the assert pass in all the tests we have. I think you're correct but the reason why was obscured by the surrounding code.

First off, the GVN-hit case did not always return orig_result. If we were looking up a NewOrExistingInst::Existing instruction, insert_pure_enode actually returned the result of that instruction. But that case only happens when this function was called from remove_pure_and_optimize, which ignores the return value. The return value only matters for make_inst_ctor, which always passes NewOrExistingInst::New, in which case we were returning orig_result.

Therefore it's safe to change this function to always return orig_result, which I think is what we probably intended in the first place, and based on your comment is what you thought we were already doing. 😁

The next observation is that if the result of inst never appears in the values of the value_to_opt_value map, then it will never get looked up in available_block (or in the union-find, for that matter). So we don't actually need an answer to the question of what block this instruction should be available in. In fact it's better to leave it at the "reserved" default value so we'll get assertion failures if we accidentally do look it up.

I've made these changes; could you review again when you have a chance? Hopefully CI will pass now too.

Because the duplicate instruction is never used, it never matters what block it might be available in. For the same reason it also doesn't need to be in the union-find.

cfallin

Changes LGTM. OK to merge with a test-case added that exercises the "auto-subsume" behavior (or if you've seen that it's already covered by one of the existing tests, note that here and that's fine too). Thanks!

cfallin · 2024-03-06T04:52:12Z

(Or, I guess that could be slightly tricky with our current rules around subsume; OK to punt test to followup removal of the subsumes if you like.)

jameysharp force-pushed the egraph-available-block branch from f410c1f to 211c5f9 Compare February 28, 2024 02:25

github-actions bot added the cranelift Issues related to the Cranelift code generator label Feb 28, 2024

fitzgen reviewed Feb 28, 2024

View reviewed changes

cfallin approved these changes Feb 28, 2024

View reviewed changes

Review comments

b27de68

- add documentation comments - push orig_value onto optimized_values before calling simplify so it can't cause a heap reallocation

jameysharp commented Mar 4, 2024

View reviewed changes

cfallin approved these changes Mar 4, 2024

View reviewed changes

Delay pushing orig_value so filetests don't change

cd4be43

jameysharp commented Mar 5, 2024

View reviewed changes

Don't set available-block for instructions removed by GVN

5e850f2

Because the duplicate instruction is never used, it never matters what block it might be available in. For the same reason it also doesn't need to be in the union-find.

cfallin approved these changes Mar 6, 2024

View reviewed changes

Add tests

ed4443f

jameysharp marked this pull request as ready for review March 7, 2024 17:37

jameysharp requested a review from a team as a code owner March 7, 2024 17:37

jameysharp requested review from elliottt and removed request for a team March 7, 2024 17:37

jameysharp enabled auto-merge March 7, 2024 17:39

jameysharp added this pull request to the merge queue Mar 7, 2024

Merged via the queue into bytecodealliance:main with commit 8341738 Mar 7, 2024

jameysharp deleted the egraph-available-block branch March 7, 2024 18:18

jameysharp mentioned this pull request May 2, 2024

Move CLIF legalization into ISLE #7321

Open

	let mut union_value = optimized_values.pop().unwrap();
	let mut union_value = optimized_values.pop().unwrap_or(orig_value);

		let mut available_block: SecondaryMap<Value, Block> =
		SecondaryMap::with_default(Block::reserved_value());


		optimized_values.push(orig_value);

		let mut best_block = self.available_block[*optimized_values.last().unwrap()];

Conversation

jameysharp commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lpereira Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jameysharp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

jameysharp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

cfallin commented Mar 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jameysharp commented Feb 28, 2024 •

edited

Loading

lpereira Feb 28, 2024 •

edited

Loading