egraphs: Undo changes to union find and gvn map structures when backtracking#7891
egraphs: Undo changes to union find and gvn map structures when backtracking#7891
Conversation
This reverts commit 7464bbc.
Undo changes to the union find structure and gvn_map when popping the node stack in `remove_pure_and_optimize`, ensuring that we don't propagate values in a way that would invalidate the invariant that the `ValueData::Union` structure provides. Additionally, remove all uses of the union find structure from elaboration, as we have already computed the best values at that point. Co-authored-by: Jamey Sharp <jsharp@fastly.com> Co-authored-by: L Pereira <l.pereira@fastly.com> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
|
Re: the "scoping" approach, an efficient implementation (I think!) occurred to me a few days ago. It would probably get at what I think y'all are trying to get at here. Basically what I think we need to do is:
This should I think have O(1) cost (edit: per node; O(n) over the program), require storage of an integer pair per node (maybe a (I think I have a proof sketch that all this works; more next week if it doesn't break in the meantime :-) ) |
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "cranelift", "isle"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
I'm trying to make sure I follow your reasoning here so I want to restate this in a way that doesn't refer to preorder/postorder intervals, which I think may be a clever implementation detail rather than a necessary aspect of the solution. All of an instruction's operands must dominate the instruction. The SSA properties imply that dominance imposes a total order on those values which are used together as operands in the same instruction, and in particular one operand must be dominated by all the others. Therefore, the instruction can be placed anywhere in the subtree dominated by that one operand. Is that right? If so we need to be able to quickly evaluate which block is dominated by all the operands, and we can use the pre/post-order intervals as an implementation detail to identify one operand whose block is the right choice using constant-time comparisons. We can use the same trick to optimize dominance comparisons elsewhere in Cranelift so this may be a useful trick to apply more broadly. |
|
If I have that right then the neat follow-up trick that I think works is to record a block ID for each value rather than an interval per value. We just need to compare values according to their associated block's interval, we don't need to compute new intervals or map computed intervals back to blocks. And the block ID for a value is just the block where the value is actually defined, so we can get it from the layout rather than needing to store anything extra per value. I'm not sure how to extend this to union nodes, but I'm also not clear on whether we need to answer dominance questions about union nodes. If your sketch of an algorithm is correct about union nodes, Chris, and if I understand it correctly, then the block associated with a union node is whichever block dominates all the values in the eclass identified by that union node. I'm not sure what that tells us. By the way, I said this verbally to somebody recently but we should write it down: In the aegraph, every |
|
That all sounds basically correct to me! A few thoughts:
The last bit is pretty subtle: it wasn't apparent to me at first that this would always work. Consider e.g. block B1 dominating two sub-blocks B2 and B3 in the domtree, and two defs of a value v1, v2 in B2 and v3 in B3 (i.e., v1 = union v2, v3.) Maybe It turns out that the answer is that this can't happen: the highest-possible-block is sort of a max of a set of "fixed-location" (impure) inputs, and so the only way for Said another way, I think the invariant is that all members of an eclass have highest-available-block that lie along a path from some block upward toward the root of the domtree; there is always some unique answer for "block that dominates all other blocks" in this set of blocks. That block is what we hope to propagate forward through union nodes. |
|
Okay, @elliottt and @lpereira and I just spent a couple hours trying to reason through this and I, for one, am toast. But I'm going to see if I can write down what we learned anyway. As I was writing this I became uncertain that we investigated the right questions, but for reference, the approach we were planning was to filter the result of the I wonder if #7922 might have already fixed the underlying issue that #7879 introduced more The remaining part of this draft PR that was not covered in #7922 is the changes in The only test which failed was from Side note: this proof-carrying-code RLE test does not appear to test what it was supposed to because the So the bug we see when we undo changes to the union-find but not to the GVN map only occurs when there are two different branches of the dominance tree containing instructions which have equivalent I want to think more carefully about whether this situation can lead to bugs outside of this specific situation (undoing changes to the union-find but not to the GVN map). I guess the question is, what happens with a function like the following? (I haven't tested this.) If we don't undo changes in either the GVN map or the union-find, then we'll have I am way out of energy at this point but I think this will be a good test case to study when we pick this up again next week, and I think it might be made safe by the plan we discussed today. |
|
@jameysharp most of this sounds basically right to me and agrees with my understanding; thank you all for looking into this more deeply! One thought on the "trying to revert..." and experiments in general: I may be misunderstanding the intent here (if so, my apologies), but at least to me it doesn't seem too important to understand exactly which combination of partial fixes is just enough to mask issues; and the empirical approach ("does this particular bug occur on this particular test-case with these particular algorithm knobs turned this way") to me feels like it muddies the waters a bit. I suspect maybe y'all are trying to map the landscape better? Or is there a question of finding some other fix or minimize changes somehow? In any case, I think the test case you wrote out, combined with worst-case cost extraction (say that we always prefer |
|
(Actually, that gives me an idea: maybe a chaos-mode knob, or just an |
|
I love the idea of a chaos mode cost function! Much of what we discussed this afternoon was that when choosing the best value from |
Yeah, exactly. I won't speak for anyone else but I certainly didn't feel like I had a firm grasp on which invariants are important for the bug we investigated a couple weeks ago. So we tried breaking various implicit invariants to see what would fail and whether that would give us better intuitions about what's necessary. I think these experiments did help us get closer. I also love the idea of the chaos-mode cost function, so maybe we can work on that next week. |
|
In the interests of trying to make Cranelift give correct results without relying on the This test currently passes, but with two small changes to the egraph pass it fails.
With those changes, the verifier rejects the output of the egraph pass: This shows that we're currently relying on either |
|
@jameysharp this aligns with my understanding from our discussion yesterday; thanks for putting together the concrete test-case! Just to ensure I understand your path from here (and to have it written down for others): is the current plan to implement the domtree-scoping idea per node (from here)? |
|
Yes, roughly, @cfallin, but we've discussed enough variations on that idea that I'm going to try writing it down again explicitly. The first piece is to compute, for every value in the
So the second piece of this plan is to establish this new invariant. The only time we construct union nodes is in We'll choose to do it by discarding any alternative with an available block that is strictly dominated by the available block of some other alternative. In other words we choose only alternatives that all have the same available block, where that is the block which is closest, of all the alternatives, to the entry block in the dominator tree. One way to think of this is like an eagerly-evaluated cost function, applied to prune the egraph while we're still building it. This is a form of The third piece to consider here is the union-find data structure and the GVN map. There are three general cases where we union values together in the union-find.
So there's at least one hole in this plan that I haven't sorted out yet. Here's a slightly simpler test case that occurred to me while thinking through this plan. When It's not immediately obvious to me that this plan is correct with regard to side-effecting instructions, but I think it's okay. The question is, why is an "available block" precise enough, rather than an "available instruction"? What if we construct an eclass where one alternative is legal before a I think the answer is that the load, and any earlier instruction in the side-effecting skeleton, can only reference versions of eclasses which contain values that were originally defined before that instruction. So we can only extract a "best" instruction from an eclass that transitively depends on the load when we're elaborating later instructions in the side-effecting skeleton. That reasoning only holds now that we merged #7922 though, because before that I think the use of the union-find structure could smuggle later versions of eclasses to earlier instructions during elaboration. |
|
Thanks for writing this all out! One question:
I'm not sure I understand why the icmp-against-load would still be in the eclass: given your invariant, and checks on union to maintain it, wouldn't the rewriting while still in |
|
Ah and re:
is it enough that we do a forward scan through the block and process in that order? (In other words, what is the bad case that happens with dominance? It occurs because we move back up the domtree, but retain some state from our deep traversal. That doesn't happen within a single block.) |
I was describing what happens with that test case today, when I break both
Yeah, I think that's getting at the same thing I was trying to say. But scanning in forward order is only sufficient if we're also careful not to let any eclass travel "back in time" during elaboration, like what we fixed in #7922. |
|
After reasoning this through more with @cfallin, we're both pretty convinced that my worries about the union-find and GVN map are not necessary, and that just pruning the result of I was worried that if an instruction was added to the GVN map in one branch, and matched in an unrelated branch, that using the same value could lead to using an eclass that transitively references values that aren't available in the new context. (That's what's happening in the two test cases I've posted in this thread.) However, Chris pointed out that the new invariant I described means that the only values which can be added to the GVN map are valid everywhere that the instruction could appear. The most important thing about that is it makes the original motivation of this draft PR irrelevant: We don't need to undo changes in the union-find or in the GVN map during backtracking. If anyone has more questions about this, please ask, because attempting to answer them will probably help me straighten this out more in my head. |
During elaboration, we want the property that it's safe to pick any member of an eclass to implement that eclass. To establish that property, this commit defines a notion of the "available block" for every value, and enforces that all values in an eclass have the same available block. Closes bytecodealliance#7891 where we devised this plan. Co-authored-by: L Pereira <l.pereira@fastly.com>
) * egraph: Ensure eclass members are all available in the same block During elaboration, we want the property that it's safe to pick any member of an eclass to implement that eclass. To establish that property, this commit defines a notion of the "available block" for every value, and enforces that all values in an eclass have the same available block. Closes #7891 where we devised this plan. Co-authored-by: L Pereira <l.pereira@fastly.com> * Review comments - add documentation comments - push orig_value onto optimized_values before calling simplify so it can't cause a heap reallocation * Delay pushing orig_value so filetests don't change * Don't set available-block for instructions removed by GVN Because the duplicate instruction is never used, it never matters what block it might be available in. For the same reason it also doesn't need to be in the union-find. * Add tests --------- Co-authored-by: L Pereira <l.pereira@fastly.com>
This is not an efficient implementation, and is meant to test a theory.
Explicitly cache and restore the values of the union find and gvn map structures when backtracking while traversing the dominator graph. This avoids accidentally violating the invariant provided by
Unionvalues: they should only union together values that locally are valid substitutions.Additionally, remove uses of the union find structure during elaboration, as we have already computed the best values prior to elaboration.