Unicode escapes: support u{N...}#2823
Conversation
| <tr> | ||
| <td><code>\UNNNNNN</code></td> | ||
| <td>hexadecimal 24-bit Unicode character code UTF-8 encoded (6 digits)</td> | ||
| <td><code>\u{NNNNNN}</code></td> |
There was a problem hiding this comment.
Not sure what the clearest way to write this is. Could also be something like:
\u{N...}
There was a problem hiding this comment.
I think the "1 or more digits" you have below is sufficient
| }, | ||
|
|
||
| State.CharLiteralHexEscape => switch (c) { | ||
| '0'...'9', 'a'...'z', 'A'...'F' => { |
There was a problem hiding this comment.
I assume this was a bug (found when new tests were added)
| }, | ||
| }, | ||
|
|
||
| State.CharLiteralUnicodeInvalid => switch (c) { |
There was a problem hiding this comment.
I got a little creative here because I thought this behavior might prevent some confusing error output. If it doesn't actually help, I'd be totally fine removing this special state.
There was a problem hiding this comment.
Let's run with this and see what happens.
| break; | ||
| } | ||
| if (t.char_code > 0x10ffff) { | ||
| tokenize_error(&t, "unicode value out of range: %x", t.char_code); |
There was a problem hiding this comment.
Move this down to the else below?
| <tr> | ||
| <td><code>\UNNNNNN</code></td> | ||
| <td>hexadecimal 24-bit Unicode character code UTF-8 encoded (6 digits)</td> | ||
| <td><code>\u{NNNNNN}</code></td> |
There was a problem hiding this comment.
I think the "1 or more digits" you have below is sufficient
| }, | ||
|
|
||
| State.CharLiteralHexEscape => switch (c) { | ||
| '0'...'9', 'a'...'z', 'A'...'F' => { |
| }, | ||
| }, | ||
|
|
||
| State.CharLiteralUnicodeInvalid => switch (c) { |
There was a problem hiding this comment.
Let's run with this and see what happens.
|
On neither stage1 or stage2 did you reject UTF-16 surrogate pairs, 0xd800 - 0xdfff. |
|
@shawnl The purpose of this PR was to change the grammar, not introduce new validation logic. |
Closes #2129
TODO
Notes
0x10ffff.\uNNNNand\UNNNNNNsyntaxes were removed.