Skip to content

'retval' proposal re: calls and multiple return values #280

@kg

Description

@kg

In #278, Dan suggested some good refinements for our thinking about calls and return values. Implied in this is the potential for multiple return values to show up in the future. I think it's relatively inevitable that multiple return values will creep their way in eventually, even if C/C++ never use them. In the event that they do, this has some nasty implications for our current model:

We need a way to represent the idea of an expression yielding multiple distinct values of different types.
We need a way to handle those value tuples: Assign them out into locals, etc.
A function needs a way to construct those value tuples so it can return them.

My suggestion for tackling this and some other issues is that we should define a distinct concept separate from local variables that I'll refer to as 'retval'. Effectively, retval is a tiny set of registers that contain the return value(s) of the most recent function call, with various restrictions to make it simple to reason about.

A rough sketch of how retval would work:

  • Every function call produces 0 - N return values. In the MVP, a function call is either void (no retvals) or has a single retval of type T.
  • After a function call, retval N is valid so long as N < the number of return values from the previous function. The type of this expression depends on the signature of the most recent function call, and it evaluates to the contents of that numbered return value register.
  • retval N can only be used once per return value per invocation; it consumes the result. To use a return value multiple times, store it into a local.
  • Crossing a branch invalidates the return values from the most recent function call.
    • This is probably a pain from a compiler perspective but seems valuable in terms of making the runtime side simpler. No need for SSA or phis or whatever.
  • Call expressions with a single return value evaluate to the equivalent of (call @a ...), (retval 0) - essentially this is a simple transform that can happen in the compiler, decoder, or native runtime. I'd probably put it in the decoder.

So, (complete strawman), existing code that looks like this:

local @a
setlocal @a (call @sin (const 0.75f))
call @print (getlocal @a)

would look something like this:

call @sin (const 0.75f)
call @print (retval 0)

(EDIT) or this:

call @print (call @sin (const 0.75f))

So, why? Various reasons:

  • Without this design change, function calls end up producing a temporary local for every return value. This bloats the number of locals in large functions tremendously, unless compilers pool temporary locals, in which case...
    • Now static analysis is more complicated because you have to SSA or GVN or some other horrifying three-letter-acronym to figure out the lifetime of return values, when stack/register space is needed, etc.
  • Transforming between nested and non-nested calls is simpler.
    • In the old model, hoisting a nested function call out into a statement would require you to introduce temporary locals for its return values. In this model, you just undo the return value transform described above (call @a (call @b ...) -> call @b ...; call @a (retval 0)).
  • The lifetime of return values is trivial to reason about.
    • It becomes trivial to identify unused return values. Just look for a retval.
    • It becomes easier to do optimizations in the vein of 'move the call to the location where its result is used, if it's safe', etc
    • It's way easier (IMO) to write a naive runtime with reasonable performance this way - you don't need to aggressively nuke temporaries quite so badly.
    • It's easier to figure out whether to shove return values into a register.
  • Forwarding multiple return values into a call immediately afterwards becomes easier.
    • call @a (call @b) is pretty easy to expand in the case that @b returns say, 3 values: call @b; call @a (retval 0) (retval 1) (retval 2).
  • Deduplication (nullary macros) becomes way more effective in this model.
    • Common sub-expressions that operate on the result(s) of a function will trivially deduplicate, where before the expressions were operating on arbitrary numbered locals (or on function call expressions directly), which needs parameterized macros to deduplicate. Yuck.
    • A compiler can generate whatever it wants, and a smart 'wasm optimizer' can attempt to convert calls between the two representations (inline vs retval) to see whether one of the two produces better deduplication results (and thus, smaller file size).

I think this is especially important to think about, because many future languages that we (probably) want to target wasm have multiple return values or something equivalent. Go extensively uses multiple return values, and C# somewhat-extensively uses something roughly equivalent (out parameters).

As far as returning multiple values from a function at once goes - I think we punt on that entirely until after the MVP, but this model probably makes sense there too (opcodes for storing values into your result registers, or something similar).

If people like this proposal I can try to draft up a PR for it later.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions