Skip to content

Consolidate UTF8 character encoding implementations #78490

@am11

Description

@am11

We have two character encoding implementations for Utf8 <-> Utf16 <-> UCS4 in runtime; located at:

  • src/coreclr/pal/src/locale/{utf8,unicode}.cpp
  • src/mono/mono/eglib/giconv.c

The coreclr/pal implementation was ported from C# implementation in 2016: dotnet/coreclr#3809. The C# implementation has diverged / improved since with spanified and intrinsified APIs so much that the similarities between the two are unrecognizable.

We can move eglib/giconv.c from mono to src/native/minipal, and create a neutral C header, so both coreclr pal and mono glib can rely on unified APIs. There are also some unused methods in these files (e.g. g_utf8_to_ucs4_fast) which can be cleaned up at the same time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions