-
Notifications
You must be signed in to change notification settings - Fork 199
Binary function names #47
Description
Cretonne compiles functions independently, so function names are used differently than in LLVM. They serve two purposes:
- In
.ctontest cases, the function names are all ASCII, and are used to identify individual test cases in the same file. - When Cretonne is embedded as a JIT compiler, function names are used to identify other functions that may be called. This identifier can be any sequence of bytes, it doesn't have to be ASCII or UTF-8. Cretonne doesn't interpret these function names, they are opaque identifiers.
The binary function names are not well supported. They get printed out as quoted ASCII with a bunch of backslash escapes.
- The parser doesn't supported the quoted format of function names.
- If the name is a binary encoding of some identifier, ASCII with escapes is not a good representation.
- Right now, function names like
v7orebb4get printed without quotes, and the lexer recognizes them as value and EB tokens.
Alternative function name syntax.
Over in #24, I proposed two new identifier/data tokens: %nnnn and #xxxx. We should use these to represent function names everywhere:
- If the function name consists entirely of ASCII alphanumerical characters and
_, use the%nnnnnotation. Note that this also allows for names like%0. There is no need to give special treatment to the first character. - If the function name contains any other characters, use the
#xxxxhexadecimal representation of the function name bytes. - If the name is empty, use a special syntax, maybe
noname(no%).
With these changes, the parser should stop accepting unquoted identifiers as function names.
Binary name representation.
Currently, the FuncName struct contains a String:
pub struct FunctionName(String);This restricts us to names in UTF-8 form. We should accept any sequence of bytes, so change this to:
pub struct FunctionName(Vec<u8>);Allocation-free representation
For extra credit: Cretonne tries to minimize heap allocations everywhere, so an internal short-string optimization would make a lot of sense:
enum NameRepr {
Short {
length: u8,
bytes: [u8;12],
}
Long(Vec<u8>),
}