egraph: Ensure eclass members are all available in the same block#8006
Conversation
During elaboration, we want the property that it's safe to pick any member of an eclass to implement that eclass. To establish that property, this commit defines a notion of the "available block" for every value, and enforces that all values in an eclass have the same available block. Closes bytecodealliance#7891 where we devised this plan. Co-authored-by: L Pereira <l.pereira@fastly.com>
f410c1f to
211c5f9
Compare
| // maybe just one new value marked as "subsuming" the | ||
| // original, if present.) | ||
| let mut union_value = orig_value; | ||
| let mut union_value = optimized_values.pop().unwrap(); |
There was a problem hiding this comment.
Rather than the indirect pushing and then popping, this could be
| let mut union_value = optimized_values.pop().unwrap(); | |
| let mut union_value = optimized_values.pop().unwrap_or(orig_value); |
Then, as a reader, I don't have to wonder if the push is going to trigger a realloc or not (probably not because it is a smallvec and we limit the number of returns from ISLE in ISLE, but I think it is possible if we return the max ISLE returns and then push orig_vlaue?)
There was a problem hiding this comment.
We need the ability to filter out the original value, not just the optimized values. We could add an extra check after the loop to compare the original value's available block against best_block, but that means pretty much duplicating the body of the loop, and I think the cleanest thing to do then would still involve pushing the original value sometimes.
Besides, this smallvec is allocated with size MATCHES_LIMIT, which is 5, but ISLE truncates at MAX_ISLE_RETURNS, which is 8, so we already might heap-allocate. Now that I've learned that I have a suggestion for simplifying ISLE and fixing that discrepancy at the same time, but that should be a separate issue.
The simplest way to ensure that this push can't cause a realloc is to push before calling constructor_simplify, which does not require that the provided vector is empty on entry. I can do that if you prefer.
| let mut available_block: SecondaryMap<Value, Block> = | ||
| SecondaryMap::with_default(Block::reserved_value()); |
There was a problem hiding this comment.
Might be worth pre-reserving space here, since IIUC we are going to end up filling this in for every single value.
There was a problem hiding this comment.
Pre-allocating space for this map is not totally trivial for two reasons.
- We add more values to the data-flow graph while the egraph pass is running and I don't know what bound to use if we want to avoid reallocating later.
SecondaryMaphaswith_defaultandwith_capacityconstructors, but no constructor that does both, and types likeBlockdon't implementDefault. This is mostly just an annoyance since we can callresizeafter construction, butresizeinitializes all of the backing memory, whilewith_capacitydoesn't, so it might make a difference.
I'll go ahead and do an initial resize to the number of values present before the start of the egraph pass, but if you have any suggestions for estimating a reasonable larger size, let me know.
cfallin
left a comment
There was a problem hiding this comment.
Very nice overall! Just some comment nits below.
Also, could we have a test case that triggers the trimming behavior, perhaps with our subsume-for-correctness disabled? Actually, a thought on that: we could (with this PR, not before!) add an option disable_egraph_subsume to Cranelift; keep it off by default, but add it to a test we know triggers this auto-subsume; then fuzz with the option; then eventually turn it on and remove our careful subsume cases.
| if self.domtree.dominates(x, y) { | ||
| Ordering::Less | ||
| } else { | ||
| debug_assert!(self.domtree.dominates(y, x)); |
There was a problem hiding this comment.
Can we add a comment here about the SSA property we're relying on? Something like: "Note that the def-point of all arguments to an instruction in SSA lie on a line of direct ancestors in the domtree, and so do their available-blocks. This means that either x dom y or y dom x; they cannot be unordered wrt each other."
|
|
||
| optimized_values.push(orig_value); | ||
|
|
||
| let mut best_block = self.available_block[*optimized_values.last().unwrap()]; |
There was a problem hiding this comment.
This loop looks correct to me (very slick in-place compaction!) but could we have a comment describing what it's doing? It took me a little bit, with knowledge of what it's supposed to be doing, to grok it. Perhaps something like: "Remove any values from optimized_values that do not have the highest possible available block in the domtree." at the top, and "This value's available block is higher; all higher indices we've already scanned have lower blocks and need to be removed, so truncate after this element." in the first case, and "This value's available block is lower, and we want to skip it; remove it from the list and keep going (order does not matter so swap_remove is OK)." in the second?
There was a problem hiding this comment.
One of the things I'm not sure here is that, when swap_remove() is called, the element that was at the end of optimized_values is put in the idx-th position, but then it's never used again to lookup this_block for the next iteration. Is this the desirable behavior, or am I missing something here?
Edit: yeah, this seems fine because the loop goes in reverse.
There was a problem hiding this comment.
(Re: your edit) yep, indeed; I think it might be good to add a comment with the loop invariant here, for clarity. Something like: "Loop scans in reverse; all values at indices >= idx have the same available block, which is the best (highest in domtree) available block seen so far."
- add documentation comments - push orig_value onto optimized_values before calling simplify so it can't cause a heap reallocation
jameysharp
left a comment
There was a problem hiding this comment.
I think I've addressed all the current review comments and added comments in all the places I intended to add them. I expect this will still fail CI as I haven't investigated why it failed before yet.
| // maybe just one new value marked as "subsuming" the | ||
| // original, if present.) | ||
| let mut union_value = orig_value; | ||
| let mut union_value = optimized_values.pop().unwrap(); |
There was a problem hiding this comment.
We need the ability to filter out the original value, not just the optimized values. We could add an extra check after the loop to compare the original value's available block against best_block, but that means pretty much duplicating the body of the loop, and I think the cleanest thing to do then would still involve pushing the original value sometimes.
Besides, this smallvec is allocated with size MATCHES_LIMIT, which is 5, but ISLE truncates at MAX_ISLE_RETURNS, which is 8, so we already might heap-allocate. Now that I've learned that I have a suggestion for simplifying ISLE and fixing that discrepancy at the same time, but that should be a separate issue.
The simplest way to ensure that this push can't cause a realloc is to push before calling constructor_simplify, which does not require that the provided vector is empty on entry. I can do that if you prefer.
| let mut available_block: SecondaryMap<Value, Block> = | ||
| SecondaryMap::with_default(Block::reserved_value()); |
There was a problem hiding this comment.
Pre-allocating space for this map is not totally trivial for two reasons.
- We add more values to the data-flow graph while the egraph pass is running and I don't know what bound to use if we want to avoid reallocating later.
SecondaryMaphaswith_defaultandwith_capacityconstructors, but no constructor that does both, and types likeBlockdon't implementDefault. This is mostly just an annoyance since we can callresizeafter construction, butresizeinitializes all of the backing memory, whilewith_capacitydoesn't, so it might make a difference.
I'll go ahead and do an initial resize to the number of values present before the start of the egraph pass, but if you have any suggestions for estimating a reasonable larger size, let me know.
cfallin
left a comment
There was a problem hiding this comment.
LGTM wrt my comments at least; thanks very much for working on this!
jameysharp
left a comment
There was a problem hiding this comment.
I forgot about Chris' request for some new test cases, so I guess I haven't actually addressed all review comments yet.
And I think I don't want to address Nick's request, to avoid a possible realloc, in this PR. It's an easy change but it changes a bunch of filetests unnecessarily because it means that the optimized_values and orig_value are visited in a different order.
Also, and more importantly, there's something wrong with GVN.
cranelift/codegen/src/egraph.rs
Outdated
| debug_assert_eq!( | ||
| self.get_available_block(inst), | ||
| self.available_block[orig_result] | ||
| ); |
There was a problem hiding this comment.
This debug-assert is failing in a few of the wasm spec tests and I don't understand why yet.
In every case that's failed, the GVN map says the instruction we looked up is equivalent to an existing instruction, but the available-block for the existing instruction is the entry block, while the available-block for the new instruction is somewhere deeper in the dominator tree.
I believed that GVN would always yield the same available-block for pure instructions so this is concerning.
There was a problem hiding this comment.
I'm still reasoning through this. For GVN to match, both instructions must be identical, except that each operand is only required to be in the same set in the union-find rather than actually equal. The assertion failing means that for some operand in the instruction, the two instructions consume values which are in the same union-find set but have different available-blocks.
I had been concerned about this specific situation but Chris and I reasoned that it was okay (#7891 (comment)) and now I'm confused about whether we were right and the assertion is overly strict. If so, what is the correct available-block to use here?
There was a problem hiding this comment.
I suspect what is happening here is that the instruction gets rewritten the first time it is seen and the GVN-map entry points to the rewritten result (which could be e.g. an iconst available everywhere), even though the instruction per its arguments is only available in one block. So I think the assertion that we want is that the available block of orig_result (which we're going to use as our value here) dominates the available block of inst, rather than is exactly the same. I think that resolves the issue?
There was a problem hiding this comment.
That does make the assert pass in all the tests we have. I think you're correct but the reason why was obscured by the surrounding code.
First off, the GVN-hit case did not always return orig_result. If we were looking up a NewOrExistingInst::Existing instruction, insert_pure_enode actually returned the result of that instruction. But that case only happens when this function was called from remove_pure_and_optimize, which ignores the return value. The return value only matters for make_inst_ctor, which always passes NewOrExistingInst::New, in which case we were returning orig_result.
Therefore it's safe to change this function to always return orig_result, which I think is what we probably intended in the first place, and based on your comment is what you thought we were already doing. 😁
The next observation is that if the result of inst never appears in the values of the value_to_opt_value map, then it will never get looked up in available_block (or in the union-find, for that matter). So we don't actually need an answer to the question of what block this instruction should be available in. In fact it's better to leave it at the "reserved" default value so we'll get assertion failures if we accidentally do look it up.
I've made these changes; could you review again when you have a chance? Hopefully CI will pass now too.
Because the duplicate instruction is never used, it never matters what block it might be available in. For the same reason it also doesn't need to be in the union-find.
cfallin
left a comment
There was a problem hiding this comment.
Changes LGTM. OK to merge with a test-case added that exercises the "auto-subsume" behavior (or if you've seen that it's already covered by one of the existing tests, note that here and that's fine too). Thanks!
|
(Or, I guess that could be slightly tricky with our current rules around subsume; OK to punt test to followup removal of the |
During elaboration, we want the property that it's safe to pick any member of an eclass to implement that eclass. To establish that property, this commit defines a notion of the "available block" for every value, and enforces that all values in an eclass have the same available block.
Closes #7891 where we devised this plan.
TODO: this needs more comments, especially documenting the
available_blockstructure and the linear-time filtering loop inoptimize_pure_enode. If somebody else wants to write those I would really appreciate it.Future work: immediately after
optimized_values.push(orig_value), we should tryoptimized_values.sort_unstable()and thenoptimized_values.dedup(). That avoids creating union nodes for duplicate values, which should save a little time downstream in the compiler pipeline, but causes values to get renumbered in some filetests so they need more inspection. It also means we can delete theif optimized_value == orig_valuetest in the loop afterward. But it's an O(n log n) pass over a potentially large list so we should benchmark the change too.cc: @elliottt @cfallin @lpereira