Skip to content

UTF-8 for all string encodings #989

@jfbastien

Description

@jfbastien

Currently:

  • We use var[u]int for most of WebAssembly's binary integer encoding. Consistency is good.
  • We use length + bytes for all "strings" such import / export, and we let the embedder apply extra restrictions as they see fit (and JS.md does). Separation of concerns, and leeway for embedders, are good.

#984 opens a can of worms w.r.t. using UTF-8 for strings. We could either:

  • Do varuint for length + UTF-8 for each byte; or
  • Do varuint for number of codepoints + UTF-8 for each codepoint.

I'm not opposed to it—UTF-8 is super simple and doesn't imply Unicode—but I want the discussion to be a stand-alone thing. This issue is that discussion.

Let's discuss arguments for / against UTF-8 for all strings (not Unicode) in this issue, and vote 👍 or 👎 on the issue for general sentiment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions