Skip to content
This repository was archived by the owner on Jun 26, 2020. It is now read-only.
This repository was archived by the owner on Jun 26, 2020. It is now read-only.

Binary function names #47

@stoklund

Description

@stoklund

Cretonne compiles functions independently, so function names are used differently than in LLVM. They serve two purposes:

  1. In .cton test cases, the function names are all ASCII, and are used to identify individual test cases in the same file.
  2. When Cretonne is embedded as a JIT compiler, function names are used to identify other functions that may be called. This identifier can be any sequence of bytes, it doesn't have to be ASCII or UTF-8. Cretonne doesn't interpret these function names, they are opaque identifiers.

The binary function names are not well supported. They get printed out as quoted ASCII with a bunch of backslash escapes.

  • The parser doesn't supported the quoted format of function names.
  • If the name is a binary encoding of some identifier, ASCII with escapes is not a good representation.
  • Right now, function names like v7 or ebb4 get printed without quotes, and the lexer recognizes them as value and EB tokens.

Alternative function name syntax.

Over in #24, I proposed two new identifier/data tokens: %nnnn and #xxxx. We should use these to represent function names everywhere:

  1. If the function name consists entirely of ASCII alphanumerical characters and _, use the %nnnn notation. Note that this also allows for names like %0. There is no need to give special treatment to the first character.
  2. If the function name contains any other characters, use the #xxxx hexadecimal representation of the function name bytes.
  3. If the name is empty, use a special syntax, maybe noname (no %).

With these changes, the parser should stop accepting unquoted identifiers as function names.

Binary name representation.

Currently, the FuncName struct contains a String:

pub struct FunctionName(String);

This restricts us to names in UTF-8 form. We should accept any sequence of bytes, so change this to:

pub struct FunctionName(Vec<u8>);

Allocation-free representation

For extra credit: Cretonne tries to minimize heap allocations everywhere, so an internal short-string optimization would make a lot of sense:

enum NameRepr {
    Short {
        length: u8,
        bytes: [u8;12],
    }
    Long(Vec<u8>),
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    E-easyIssues suitable for newcomers to investigate, including Rust newcomers!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions