StackSignature LUBs by tlively · Pull Request #3543 · WebAssembly/binaryen

tlively · 2021-02-04T02:20:56Z

Add a utility for calculating the least upper bounds of two StackSignatures,
taking into account polymorphic unreachable behavior. This will important in the
finalization and validation of Poppy IR blocks, where a block is allowed to
directly produce fewer values than the branches that target it carry if the
difference can be made up for by polymorphism due to an unreachable instruction
in the block.

Also removes the Poppy IR documentation from stack-utils.h because that
documentation is moved to the Stackify pass in #3541.

Add a utility for calculating the least upper bounds of two StackSignatures, taking into account polymorphic unreachable behavior. This will important in the finalization and validation of Poppy IR blocks, where a block is allowed to directly produce fewer values than the branches that target it carry if the difference can be made up for by polymorphism due to an unreachable instruction in the block. Also removes the Poppy IR documentation from stack-utils.h because that documentation is moved to the Stackify pass in WebAssembly#3541.

kripken · 2021-02-04T17:45:57Z

test/example/stack-utils.cpp

+  std::cout << ";; Test stack signature lub\n";
+  {
+    StackSignature a{Type::none, Type::none, false};
+    StackSignature b{Type::none, Type::none, false};


separate from this PR, the bool at the end makes this harder to read I think. An enum of Reachable/Unreachable might be clearer?

Agreed. I will do this as a follow-up.

kripken · 2021-02-04T17:47:59Z

src/ir/stack-utils.h

  }
+
+  // Returns the LUB of `a` and `b`. Assumes that a LUB exists.
+  static StackSignature getLeastUpperBound(StackSignature a, StackSignature b);


Should it be obvious what an LUB means here? The effects of unreachability in particular are not obvious enough to me without reading the implementation's source code. Maybe I am just missing an intuition for how/where an LUB would be used.

I'll add a comment clarifying the definition. This will be used in finalizing a block like this:

(block (i32.const 0 (ref.null extern) (br 0 (pop i32 extern) ) (ref.null func) )

In this case the signatures are []->[i32, externref] and []->[funcref]{unreachable} and the LUB would be []->[i32, anyref].

In what situation would this arise? That is, when will we have such a block but not already know its type?

Part of validation involves refinalizing a node then checking that it's actual type matches its refinalized type. So even though we already know the type of the node, we have to recalculate it to make sure we are correct about what the type is.

Oh, right, we don't use the existing type there.

I think we might be able to, though, if it would make all this simpler?

Some basic question about the Binaryen block validation rules that are not necessarily related to the Poppy IR or this PR:
In your example, even if the br is an unconditional branch, can you remind me why we still consider the type of the last element in the block?

Oh, right, we don't use the existing type there.

I think we might be able to, though, if it would make all this simpler?

AFAICT, The validator has no choice to recalculate the type of each expression because it needs to be able to detect expressions with incorrect types. I'm not sure how that could be changed without making the validator insensitive to that kind of typing problem.

Some basic question about the Binaryen block validation rules that are not necessarily related to the Poppy IR or this PR:
In your example, even if the br is an unconditional branch, can you remind me why we still consider the type of the last element in the block?

Good question! When Binaryen IR is emitted, unreachable blocks are unconditionally given block signature []->[].

binaryen/src/wasm/wasm-stack.cpp

Lines 25 to 27 in cf51ca4

void BinaryInstWriter::emitResultType(Type type) {

if (type == Type::unreachable) {

parent.writeType(Type::none);

That means that only blocks that could have type none can be candidates to be unreachable. In contrast, my old local branch has Poppy IR go through a type inference pass that assigns concrete types to all unreachable blocks before it is emitted, so before that pass it is ok for blocks of any type to be marked unreachable. We could loosen the restrictions on Binaryen IR by running it through a similar type inference pass before emitting it.

Sorry I'm not sure if I understand. I was mostly asking about the current AST's validation rule (not Poppy IR), in Block::finalize. If a br is in the middle of a block, there shouldn't be a chance that we fall through at the end of the block, but should we still consider the type of the last element in computing LUB?

In contrast, my old local branch has Poppy IR go through a type inference pass that assigns concrete types to all unreachable blocks before it is emitted, so before that pass it is ok for blocks of any type to be marked unreachable. We could loosen the restrictions on Binaryen IR by running it through a similar type inference pass before emitting it.

Where is your old local branch? Have we seen that type inference pass or have you uploaded it as a PR?

Consider what would happen right now if this Binaryen IR block were given type unreachable:

(block (unreachable) (i32.const 0) )

If this block were unreachable, the current code in BinaryInstWriter::emitResultType would emit it into a binary with signature []->[]. But that would be an invalid Wasm binary because the block returns an i32. Binaryen avoids emitting invalid blocks like this by forcing the blocks to have concrete types in Binaryen IR so that they will be emitted with the correct type rather than with Type::none.

Here is the type inference pass: https://github.com/WebAssembly/binaryen/pull/2784/files#diff-47b86a03ca3d74296cfc57759de43be1cee592a25325824de449a0bf8f026a33R1-R180.

kripken · 2021-02-04T17:49:37Z

src/ir/stack-utils.cpp

+  // Assert an approximation of the conditions for having a LUB
+  assert(a.unreachable || b.unreachable ||
+         (a.params.size() == b.params.size() &&
+          a.results.size() == b.results.size()));


In what sense is an LUB an LUB if the number of params or results are shorter? Do we fill the missing slots in the unreachable one as needed, so it can be shorter (and it must be the shorter one, if so - we could assert on that)?

Exactly, missing items can be filled in only on unreachable stack signatures. I'll add an assertion as you suggest.

aheejin

So is your definition of LUB is, if there are two sigs that can satisfy a condition, a reachable one is considered as LUB? Because none -> none, {unreachable} can satisfy everything.

src/ir/stack-utils.h

aheejin · 2021-02-05T17:53:59Z

src/ir/stack-utils.cpp

+    results.push_back(lub);
+  }
+  return StackSignature{
+    Type(params), Type(results), a.unreachable && b.unreachable};


Why is the LUB unreachable when both of the signatures are unreachable?

Being unreachable means that more elements can be appended to the params or prepended to the results as necessary. If two stack signatures have this property, then their LUB would have this property as well. If either signature was not unreachable and could not be extended, then the LUB should not be able to be extended either. This would be important for calculating the LUB of more than two unreachable stack signatures so that the number of params and results is not "frozen" by whatever the first two signatures to be combined are.

I'm still not sure if I understand the concept of LUB here. In the usual typing sense LUB of two types is a supertype of the two but the lowest one in the type hierarchy. First I'm not sure how unreachability is considered in this type hierarchy, I could be wrong, but I feel, for the two signatures with the same type input and output [a]->[b], if one of them is unreachable, that means that type has more flexibility and it should be considered as a supertype of the other.

But here it is possible that we have a LUB of two types that is not unreachable but one of those two child types can be unreachable. I'm not sure if I explained my confusion very well, but I'd appreciate a bit more explanation, maybe with an example.

Here's the subtype relation on stack signatures, which answers the question "can an instruction sequence with the LHS stack signature be typed with the RHS stack signature?"

[t1*] -> [t2*] <: [s* t1'*] -> [s* t2'*] iff

t1'_i <: t1_i

t2_i <: t2'_i

[t1*] -> [t2*] {u} <: [s1* t1'*] -> [s2* t2'*] {u?} iff

[t1*] -> [t2*] <: [t1'*] -> [t2'*]

As an example of the first rule, consider the instruction sequence

ref.as_func drop i32.add

The most specific type you could give this sequence is [i32, i32, anyref] -> [i32]. But you could also use it in a context that expects [i32, i32, funcref] -> [i32] because ref.as_func can accept funcref or any other subtype of anyref. That's where the contravariance comes from. You could also use this instruction sequence anywhere that expects [f32, i32, i32, anyref] -> [f32, anyref] because the f32 simply stays on the stack throughout the sequence. That's where the the prefix extension comes from.

For the second rule, consider this sequence:

ref.as_func drop i32.add unreachable i32.const 0

This instruction sequence has the specific type [i32, i32, anyref] -> [i32] {u}. It can be used in any situation the previous block can be used in, but can additionally be used in contexts that expect something like [f32, i32, i32, anyref] -> [v128, i32]. Because of the unreachable polymorphism, the additional prefixes on the params and results do not need to match.

Note that a reachable stack signature (without a {u}) is never a subtype of any unreachable stack signature (with a {u}). This makes sense because a sequence of instructions that has no polymorphic unreachable behavior cannot be given a type that says it does have polymorphic unreachable behavior.

Also, [] -> [] {u} is the bottom type here; it is a subtype of every other stack signature. This corresponds to (unreachable) being able to be given any stack signature.

Having written this out, I realize the implementation does not correctly handle the contravariance of the params. I will fix that, although it won't matter until we support block inputs. I also screwed up the parameter extension, which should be a prefix, not a suffix.

aheejin · 2021-02-05T18:06:16Z

src/ir/stack-utils.h

  }
+
+  // Returns the LUB of `a` and `b`. Assumes that a LUB exists.
+  static StackSignature getLeastUpperBound(StackSignature a, StackSignature b);


Some basic question about the Binaryen block validation rules that are not necessarily related to the Poppy IR or this PR:
In your example, even if the br is an unconditional branch, can you remind me why we still consider the type of the last element in the block?

Co-authored-by: Heejin Ahn <aheejin@gmail.com>

tlively · 2021-02-06T02:55:25Z

I hopefully made this less confusing now by documenting the subtype relationship and replacing the satisfies method with an implementation of subtyping.

src/ir/stack-utils.h

kripken · 2021-02-08T20:39:38Z

src/ir/stack-utils.h

+  //
+  //  - t1'_i <: t1_i
+  //  - t2_i <: t2'_i
+  //  - s1_i <: s2_i


do the _is here help? might be clearer without them (while meaning the same thing)

I think the subscripts make it clearer that we need to check element-wise subtyping and that the number of elements needs to match. It would be even clearer to write that out explicitly, though, so I'll do that and remove the messy-looking subscripts.

aheejin

Thanks a lot for your patient explanation. I think I better understand (or remind) the concept now. But I still don't have a firm grasp on the rules about the untouched arguments on the stack (and its relationship w/ unreachability). Sorry for many questions, and I think this version is certainly easier to understand.

aheejin · 2021-02-09T17:12:16Z

src/ir/stack-utils.h

+  //
+  //  - t1'_i <: t1_i
+  //  - t2_i <: t2'_i
+  //  - s1_i <: s2_i


It'd be good to mention how the number of elements in the input and output figures in the subtype. _i seems to suggest that the numbers have to exactly match, but it doesn't seem to be the case, right?

Why is it ok to be s1 <: s2? If those types just sit on stack untouched, shouldn't they be the same instead?

Does this rule about s1 and s2 only apply if a and b are not unreachable? Because if a or both are unreachable, those don't seem to matter. For example,

a = [i32] -> [i32], unreachable b = [f32, f64, i32] -> [anyref, funcref, i32], unreachability doesn't matter

Even if b does not satisfy the s1 <: s2 rule, because a is unreachable, it can consume f32 and f64 and produce anyref and funref, no?

It'd be good to mention how the number of elements in the input and output figures in the subtype. _i seems to suggest that the numbers have to exactly match, but it doesn't seem to be the case, right?

Yes, the number of elements there need to match exactly. Any extra elements on the right hand side would be in the s part and the left side is not allowed to have extra elements. I will write that out explicitly.

Why is it ok to be s1 <: s2? If those types just sit on stack untouched, shouldn't they be the same instead?

For example, it is valid for an empty block to have type [funcref] -> [anyref] because nothing needs to happen to the funcref it takes as input for it to be interpreted as an anyref instead.

Does this rule about s1 and s2 only apply if a and b are not unreachable? Because if a or both are unreachable, those don't seem to matter. For example...

Yes, that's right. If a is unreachable, the next rule applies and allows any s1* and s2* on the right hand side.

src/ir/stack-utils.h

aheejin · 2021-02-09T19:21:58Z

test/example/stack-utils.cpp

+    // assert(StackSignature::isSubType(a, b));
+    // assert(!StackSignature::isSubType(b, a));


Why are these disabled? We can't compute GLB now, but we don't need to compute it in isSubType.

Good catch, thanks!

aheejin · 2021-02-09T19:51:00Z

src/ir/stack-utils.cpp

+  // Any additional provided params will pass through untouched, so they must be
+  // compatible with the additional produced results.
+  return std::equal(b.params.begin(),
+                    b.params.end() - a.params.size(),
+                    b.results.begin(),
+                    b.results.end() - a.results.size(),
+                    [](const Type& provided, const Type& expected) {
+                      return Type::isSubType(provided, expected);
+                    });


Do we still need to check this even if a is unreachable? I asked the same question in your formal definition of the rules in the comments.

Nope, in that case we would do an early return before this final check: https://github.com/WebAssembly/binaryen/pull/3543/files#diff-bf06899150c7e36bf606885aeb35046d00448bc3db1413a2a02ce5c0a093900cR151-R155

Co-authored-by: Heejin Ahn <aheejin@gmail.com>

kripken · 2021-02-09T22:48:03Z

src/ir/stack-utils.h

+  //
+  //  - t1' <: t1
+  //  - t2 <: t2'
+  //  - s1 <: s2


Why is the ' in the LHS in the first of the three clauses? That seems to allow [anyref] -> [] <: [funcref] -> [] but the LHS cannot be used in a context that expects the RHS?

[anyref] -> [] can be used anywhere [funcref] -> [] is expected because funcref can be consumed by anything that expects an anyref. This is contravariance explained in one of the examples below.

I apologize, my confusion is almost certainly a matter of language. But I still don't get it:

[anyref] -> [] can be used anywhere [funcref] -> [] is expected because funcref can be consumed by anything that expects an anyref.

Your statement here uses "expected" in two places. First you say [funcref] -> [] is expected, and then you say expects an anyref. To me it seems like you're using the term "expected" in two different (reversed) ways, as they are about the exact opposite things, or that you have a typo - ?

No worries, I find that I basically have to turn by brain inside out to wrap it around contravariance, so its no wonder everything looks reversed :)

Consider a context that expects an instruction sequence with type [funcref] -> []. It looks like this:

<prior context> ;; [t* funcref] on stack [insert our block here] ;; [t*] on stack <later context>

The block we insert needs to consume a funcref and produce no values on the stack. What if we actually plug in a block with type [anyref] -> []?

<prior context> ;; [t* funcref] on stack [consume `anyref`, produce nothing] ;; [t*] on stack <later context>

Our block expects an anyref, but it was provided a funcref by the context. This is fine, though, because funcref can be used by anything that expects an anyref.

The block of type [anyref] -> [] is valid to use in a context that expects a block of type [funcref] -> [], so the former type is a subtype of the latter type.

The analogy I found useful to understand is, basically, if A <: B, A can act like it is of type B, e.g., Cat can claim it is an Animal and be used it in place anywhere an Animal is expected.

When A = [anyref] -> [funcref] and B = [funcref] -> [anyref], even though A takes anyref, it can act like it takes funcref, because it can process all funcref arguments with no problem. (It is called contravariance in parameter types)
Also it can act like it returns anyref instead of funcref, because all values it returns will fit the condition of anyref. (It is called covariance in result types)
So here A is a subtype of B.

I see, thanks! So in [x] -> [y] ? [a] -> [b] the relation of x, a or y, b is reversed depending on if the values are consumed or produced... which does in fact make total sense. And then so does the rest.

aheejin

Thanks for your patience! Now I think I understand the concept of LUB, covariance, and contravariance between function types ;)

aheejin · 2021-02-10T14:34:53Z

src/ir/stack-utils.cpp

+  // Params and results may only be shorter on unreachable stack signatures.
+  if (!a.unreachable && (a.params.size() < b.params.size() ||
+                         a.results.size() < b.results.size())) {
+    return false;
+  }
+  if (!b.unreachable && (b.params.size() < a.params.size() ||
+                         b.results.size() < a.results.size())) {
+    return false;
+  }


I think there's no test case that tests this condition? It'd be good to add one.

Actually, I just realized that these conditions are incorrect. Pushing a fix with tests.

tlively · 2021-02-10T20:33:34Z

@kripken, does this make sense now? If not, I would be happy to chat more and further clarify the examples in the docs.

tlively requested review from aheejin and kripken February 4, 2021 02:20

kripken reviewed Feb 4, 2021

View reviewed changes

tlively added 2 commits February 4, 2021 10:38

Fix unreachability propagation and expand comment

6164c59

Improve asserts

1911ae1

aheejin reviewed Feb 5, 2021

View reviewed changes

tlively and others added 2 commits February 5, 2021 10:38

Stackify => Poppify

26a629e

Co-authored-by: Heejin Ahn <aheejin@gmail.com>

Document and implement subtype relation

0a5bde0

kripken reviewed Feb 8, 2021

View reviewed changes

aheejin reviewed Feb 9, 2021

View reviewed changes

tlively and others added 2 commits February 9, 2021 13:05

anyref => i32 in docs

d72ba69

Co-authored-by: Heejin Ahn <aheejin@gmail.com>

Improve doc comment and enable contravariance tests

abfe007

kripken reviewed Feb 9, 2021

View reviewed changes

aheejin approved these changes Feb 10, 2021

View reviewed changes

kripken approved these changes Feb 10, 2021

View reviewed changes

Fix extension condition

de06f9e

tlively enabled auto-merge (squash) February 11, 2021 01:16

tlively merged commit e283300 into WebAssembly:main Feb 11, 2021

tlively deleted the stack-signature-lub branch February 11, 2021 01:31

tlively mentioned this pull request Feb 27, 2021

Use enum instead of bool for StackSignature kind #3625

Merged

	void BinaryInstWriter::emitResultType(Type type) {
	if (type == Type::unreachable) {
	parent.writeType(Type::none);

		// assert(StackSignature::isSubType(a, b));
		// assert(!StackSignature::isSubType(b, a));

Conversation

tlively commented Feb 4, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively Feb 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively commented Feb 6, 2021

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin left a comment

Choose a reason for hiding this comment

Uh oh!

aheejin Feb 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively Feb 5, 2021 •

edited

Loading

aheejin Feb 9, 2021 •

edited

Loading

aheejin Feb 10, 2021 •

edited

Loading

kripken Feb 10, 2021 •

edited

Loading