Unicode escapes: support u{N...} by hryx · Pull Request #2823 · ziglang/zig

hryx · 2019-07-05T06:00:31Z

TODO

stage2 tokenizer
stage2 parser test
stage1 tokenizer
behavior tests
update documentation examples and grammar
update grammar in zig-spec Update rule for unicode escape zig-spec#8

Notes

Any number of digits (one or more) is allowed in the braces. The stage1 tokenizer retains upper limit on character value of 0x10ffff.
The old \uNNNN and \UNNNNNN syntaxes were removed.

hryx · 2019-07-05T06:01:42Z

doc/langref.html.in

-        <tr>
-            <td><code>\UNNNNNN</code></td>
-          <td>hexadecimal 24-bit Unicode character code UTF-8 encoded (6 digits)</td>
+            <td><code>\u{NNNNNN}</code></td>


Not sure what the clearest way to write this is. Could also be something like:

\u{N...}

I think the "1 or more digits" you have below is sufficient

hryx · 2019-07-05T06:02:31Z

std/zig/tokenizer.zig

                },

                State.CharLiteralHexEscape => switch (c) {
-                    '0'...'9', 'a'...'z', 'A'...'F' => {


I assume this was a bug (found when new tests were added)

yep. thanks!

hryx · 2019-07-05T06:07:01Z

std/zig/tokenizer.zig

+                    },
+                },
+
+                State.CharLiteralUnicodeInvalid => switch (c) {


I got a little creative here because I thought this behavior might prevent some confusing error output. If it doesn't actually help, I'd be totally fine removing this special state.

Let's run with this and see what happens.

daurnimator · 2019-07-05T06:24:36Z

src/tokenizer.cpp

+                            break;
+                        }
+                        if (t.char_code > 0x10ffff) {
+                            tokenize_error(&t, "unicode value out of range: %x", t.char_code);


Move this down to the else below?

andrewrk

Looks great, easy merge

andrewrk · 2019-07-06T17:10:48Z

doc/langref.html.in

-        <tr>
-            <td><code>\UNNNNNN</code></td>
-          <td>hexadecimal 24-bit Unicode character code UTF-8 encoded (6 digits)</td>
+            <td><code>\u{NNNNNN}</code></td>


I think the "1 or more digits" you have below is sufficient

andrewrk · 2019-07-06T17:12:08Z

std/zig/tokenizer.zig

                },

                State.CharLiteralHexEscape => switch (c) {
-                    '0'...'9', 'a'...'z', 'A'...'F' => {


yep. thanks!

andrewrk · 2019-07-06T17:13:25Z

std/zig/tokenizer.zig

+                    },
+                },
+
+                State.CharLiteralUnicodeInvalid => switch (c) {


Let's run with this and see what happens.

shawnl · 2019-07-20T13:55:52Z

On neither stage1 or stage2 did you reject UTF-16 surrogate pairs, 0xd800 - 0xdfff.

hryx · 2019-07-20T20:13:24Z

@shawnl The purpose of this PR was to change the grammar, not introduce new validation logic.

hryx added 3 commits July 4, 2019 14:48

Unicode escapes: stage2 tokenizer and parser test

8365a7a

Unicode escapes: stage1 tokenizer and behavior tests

6bfa854

Unicode escapes: documentation and grammar

e35d49c

hryx commented Jul 5, 2019

View reviewed changes

daurnimator suggested changes Jul 5, 2019

View reviewed changes

andrewrk approved these changes Jul 6, 2019

View reviewed changes

andrewrk merged commit 21c6092 into ziglang:master Jul 6, 2019

hryx deleted the unicode-escape branch July 20, 2019 19:59

Techatrix mentioned this pull request Sep 12, 2025

Update the Unicode escape syntax ziglang/zig.vim#87

Closed

Uh oh!

Conversation

hryx commented Jul 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewrk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shawnl commented Jul 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hryx commented Jul 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hryx commented Jul 5, 2019 •

edited

Loading

shawnl commented Jul 20, 2019 •

edited

Loading