Skip to content

Update discussion of indirect calls and function pointers#392

Merged
titzer merged 2 commits intomasterfrom
add_func_ptrs
Oct 8, 2015
Merged

Update discussion of indirect calls and function pointers#392
titzer merged 2 commits intomasterfrom
add_func_ptrs

Conversation

@titzer
Copy link

@titzer titzer commented Oct 6, 2015

No description provided.

@titzer titzer mentioned this pull request Oct 6, 2015
AstSemantics.md Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "used" might be interpreted as saying that call_indirect will be relative to the module (which isn't true) when what I think you're getting at is that the as-yet-unspecified address-of expressions which refer to the dylib's static local indirect function table will be automatically adjusted so that they correctly refer to the index of that function in the dynamic instance indirect function table at runtime. If that explanation grows unwieldy, perhaps you want to move it to a new section in DynamicLinking.md and link to that section from here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this section to be a bit more clear.

@lukewagner
Copy link
Member

Other than small nit, lgtm!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the main function table local to each module?

Why has the indirect function table been introduced?

Are the elements of the indirect function table observable to the wasm code? Can it be accessed to load the index?

No provision has been made for homogeneous function tables. These seem important for code that wants to avoid a runtime signature check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Oct 7, 2015 at 8:55 AM, JSStats notifications@github.com wrote:

In AstSemantics.md
#392 (comment):

- * addressof: obtain a function pointer value for a given function

-Function-pointer values are comparable for equality and the addressof operator
-is monomorphic. Function-pointer values can be explicitly coerced to and from
-integers (which, in particular, is necessary when loading/storing to memory
-since memory only provides integer types). For security and safety reasons,
-the integer value of a coerced function-pointer value is an abstract index and

-does not reveal the actual machine code address of the target function.

-In the MVP, function pointer values are local to a single module. The
-dynamic linking feature is necessary for
-two modules to pass function pointers back and forth.
+
+Functions from the main function table are made addressable by defining an
+indirect function table that consists of a sequence of indices
+into the module's main function table. A function from the main table may appear more than

Is the main function table local to each module?

Yes, each module will have its own (local) main function table that
declares the functions in that module.

Why has the indirect function table been introduced?

The indirect function table allows the module to declare which functions
are addressable and arrange them into a table. Thus the assignment of
integers to functions pointers is under the control of the module. Since
the indirect function table allows functions to appear more than once, it
allows, e.g. a compiler to map vtables into the one big indirect table. So
a C++ table dispatch can be as simple as:

call_indirect(i32_add(i32_load(obj, #0), #meth_num), ... args ...)

So C++ objects store the "vtable base" in the object header, add a
method-specific offset, and then call that.

This has the advantage that vtables can be placed outside the linear memory
for safety (and performance--one less memory access).

Are the elements of the indirect function table observable to the wasm
code? Can it be accessed to load the index?

No, wasm code cannot directly read the indirect function table.

No provision has been made for homogeneous function tables. These seem
important for code that wants to avoid a runtime signature check.

That's an optimization that we can consider adding in the future, e.g. by
denoting an expected range within the single table where all the functions
have the same signature. AFAICT, that trades one branch (signature check)
for a subtract before the bounds check. Seems like it could be worthwhile,
but let's add that when we have data.


Reply to this email directly or view it on GitHub
https://github.com/WebAssembly/design/pull/392/files#r41356981.

@rossberg
Copy link
Member

rossberg commented Oct 7, 2015

lgtm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be i64 for wasm64? I don't have a specific use case that needs more than 4 billion indirect function table entries, but since functions can appear multiple times in the table, one could imagine possibilities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, having the a priori i32 limit would allow the C++ compiler so simply make func-ptrs 32-bit. (Is that easy to do in LLVM, or is that baked into the LLP model?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Oct 7, 2015 at 3:06 PM, Dan Gohman notifications@github.com wrote:

In AstSemantics.md
#392 (comment):

  • call_import : call imported function directly

-Indirect calls may be made to a value of function-pointer type. A
-function-pointer value may be obtained for a given function as specified by its index
-in the function table.
+Indirect calls allow calling target functions that are unknown at compile time.
+The target function is an expression of local type i32 and is always the first

Should this be i64 for wasm64? I don't have a specific use case that
needs more than 4 billon indirect function table entries, but since
functions can appear multiple times in the table, one could imagine
possibilities.

The tables would have to be sparse somehow, otherwise, declaring 4 billion
indirect tables is going to be a pretty big wasm module :-)


Reply to this email directly or view it on GitHub
https://github.com/WebAssembly/design/pull/392/files#r41386881.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice way to put it. That addresses my only nit.

@sunfishcode
Copy link
Member

lgtm; we can discuss whether i64 function pointers for wasm64 make sense later.

@ghost
Copy link

ghost commented Oct 7, 2015

@titzer Thank you for the explanation - some of those points would be useful to have in the patch text too, such as making it clear when tables a local to modules etc, and it tables a dense or sparse arrays.

  1. If the module-local indirect function table allows a function to occur 'more than once' then when concatenated together to form an instance indirect function table the index will not be unique which was a use case mentioned for the C language - to be able to compare function pointers. Perhaps the vtables should be separate from the instance indirect function table.
  2. The indirect function tables need to be dense arrays for efficiency in performance and space. An indirect call to an index from a homogeneous function table needs to be able to avoid the signature check and index into a dense array. If the space of indirect function indexes needs to be partitioned into sub-domains with a homogeneous index then this makes the table sparse (there needs to be room to concatenate), plus the call pattern would be unnecessarily complex having to mask off the high bits that correspond to the signature domain, and logocal-or in the high bits for the known signature domain, so that the compiler could recognise it as a known signature call.

Perhaps the module signature table could reserve index 0 to represent an undefined signature, then `addressOf(#, ) could return an index either from a heterogeneous (for an index of zero), or an index from a homogeneous function table (index greater than zero). The runtime would maintain separate concatenated indirect tables for both the instance heterogeneous indirect function array and the homogeneous indirect function arrays. A module might only use one or a few homogeneous indirect function tables, so the burden would not be too high, and proportional to the need. A separate operation to make an indirect homogeneous function call would be fine, and the signature index would then refer to the homogeneous function table to use.

@titzer
Copy link
Author

titzer commented Oct 7, 2015

On Wed, Oct 7, 2015 at 10:28 PM, JSStats notifications@github.com wrote:

@titzer https://github.com/titzer Thank you for the explanation - some
of those points would be useful to have in the patch text too, such as
making it clear when tables a local to modules etc, and it tables a dense
or sparse arrays.

If the module-local indirect function table allows a function to occur
'more than once' then when concatenated together to form an instance
indirect function table the index will not be unique which was a use case
mentioned for the C language - to be able to compare function pointers.
Perhaps the vtables should be separate from the instance indirect function
table.

For C-style direct function pointers, the module can still always use the
first occurrence of a function in the indirect table as the "canonical"
index that for that function.

The indirect function tables need to be dense arrays for efficiency in
performance and space. An indirect call to an index from a homogeneous
function table needs to be able to avoid the signature check and index into
a dense array. If the space of indirect function indexes needs to be
partitioned into sub-domains with a homogeneous index then this makes the
table sparse (there needs to be room to concatenate), plus the call pattern
would be unnecessarily complex having to mask off the high bits that
correspond to the signature domain, and logocal-or in the high bits for the
known signature domain, so that the compiler could recognise it as a known
signature call.

Perhaps the module signature table could reserve index 0 to represent an
undefined signature, then `addressOf(#, ) could return an index either from
a heterogeneous (for an index of zero), or an index from a homogeneous
function table (index greater than zero). The runtime would maintain
separate concatenated indirect tables for both the instance heterogeneous
indirect function array and the homogeneous indirect function arrays. A
module might only use one or a few homogeneous indirect function tables, so
the burden would not be too high, and proportional to the need. A separate
operation to make an indirect homogeneous function call would be fine, and
the signature index would then refer to the homogeneous function table to
use.

I can see how the concatenation semantics becomes a problem for homogenous
signatures in the presence of dynamic linking. In fact that was one of the
reasons why we couldn't make "only homogenous tables" work, since all of
them have to be concatenated as you point out. When we get to dynamic
linking, maybe we can allow modules themselves pre-allocate holes in the
main module's indirect function table for the purpose of filling in slots
in the homogenous ranges later.

There are other implementation techniques that we've discussed, such as
adding an inline cache (i.e. self-modifying code) or a single-element
cache, both of which avoid the signature check when the cache has a high
hit rate. And there is the big hammer of dynamic inlining, which probably
has an even higher performance potential.


Reply to this email directly or view it on GitHub
#392 (comment).

@ghost
Copy link

ghost commented Oct 7, 2015

@titzer Wasm has a fundamental design property that the signature must match between the caller and callee. This leads to more efficient indirect functions calls. This requires homogeneous functions tables.

The use case that C code can compare function pointers for equality should not compromise the efficiency of the wasm design, not just to meet this use case. This alone results in the need to do runtime signature checking. This needs to be a special case, and not the general case.

Concatenation semantics are only a problem for homogeneous ranges in a heterogeneous tables, not for dense homogeneous, and this is entirely a product of a C use case. Sorry these tables need to be dense arrays. If these need to be caches etc it is DOA.

If C code really needs to distinguish functions based on their signature then the onus is on C code to store both the signature and the homogeneous table index and compare both, to explicitly check the signature if it needs to, or propose some other scheme in addition to the core wasm support.

@titzer
Copy link
Author

titzer commented Oct 8, 2015

Merging based on lgtm's above

titzer pushed a commit that referenced this pull request Oct 8, 2015
Update discussion of indirect calls and function pointers
@titzer titzer merged commit 5b5ffa4 into master Oct 8, 2015
@titzer
Copy link
Author

titzer commented Oct 8, 2015

We can revisit the homogenous signature case when we are closer to a full end-to-end system from C++ -> LLVM -> wasm. A long-term goal is to have strongly-typed function pointers as part of wasm as well, which would not be integers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants