Skip to content

Compiler does not handle non-BMP characters in identifiers #9600

@Serentty

Description

@Serentty

The compiler uses methods on Char extensively to check character properties, even in recent code such as #9199. This breaks down on non-BMP characters, and these properties should be checked on the full 32-bit code points instead. A similar bug exists in Roslyn, but it has been acknowledged as a bug (given that the C# standard allows non-BMP characters identifiers), and is in the process of being fixed. Therefore, for interoperability with C#, even putting aside the many minority languages around the world whose scripts are outside the BMP and the many CJK personal and place names outside of the BMP, these identifiers will need to be handled properly.

Example of the issue

let δ𐌰 = 5

Although the Greek letter δ is handled properly, the Gothic letter 𐌰 causes the compiler to explain of an unexpected character in the pattern.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    New

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions