From 295dab6eb5af1f05ce614cbfe7fd55161e2e40e3 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Sun, 12 Oct 2025 08:49:03 -0700 Subject: [PATCH 1/2] Change underscore to be a keyword This changes underscore from being a punctuation character to a keyword. This is intended to help better align with proc-macros, which treat it as an [`Ident`](https://doc.rust-lang.org/proc_macro/struct.Ident.html). Note one unusual rule is inline assembly `ParamName` which is `IDENTIFIER_OR_KEYWORD`. From what I can tell, it does accept `_`, but the fmt template does not. Templates are not specified in great detail in the std docs, and don't touch on this fact. Closes https://github.com/rust-lang/reference/issues/1236 Closes https://github.com/rust-lang/reference/issues/2020 --- src/identifiers.md | 8 ++------ src/keywords.md | 1 + src/macros-by-example.md | 4 ++-- src/tokens.md | 19 ++++++++----------- 4 files changed, 13 insertions(+), 19 deletions(-) diff --git a/src/identifiers.md b/src/identifiers.md index 27016a5171..7ee390c8bf 100644 --- a/src/identifiers.md +++ b/src/identifiers.md @@ -3,15 +3,13 @@ r[ident] r[ident.syntax] ```grammar,lexer -IDENTIFIER_OR_KEYWORD -> - XID_Start XID_Continue* - | `_` XID_Continue+ +IDENTIFIER_OR_KEYWORD -> ( XID_Start | `_` ) XID_Continue* XID_Start -> <`XID_Start` defined by Unicode> XID_Continue -> <`XID_Continue` defined by Unicode> -RAW_IDENTIFIER -> `r#` IDENTIFIER_OR_KEYWORD _except `crate`, `self`, `super`, `Self`_ +RAW_IDENTIFIER -> `r#` IDENTIFIER_OR_KEYWORD _except `crate`, `self`, `super`, `Self`, `_`_ NON_KEYWORD_IDENTIFIER -> IDENTIFIER_OR_KEYWORD _except a [strict][lex.keywords.strict] or [reserved][lex.keywords.reserved] keyword_ @@ -37,8 +35,6 @@ The profile used from UAX #31 is: * Continue := [`XID_Continue`] * Medial := empty -with the additional constraint that a single underscore character is not an identifier. - > [!NOTE] > Identifiers starting with an underscore are typically used to indicate an identifier that is intentionally unused, and will silence the unused warning in `rustc`. diff --git a/src/keywords.md b/src/keywords.md index cbcfbcf4cb..f7edfa3150 100644 --- a/src/keywords.md +++ b/src/keywords.md @@ -26,6 +26,7 @@ be used as the names of: r[lex.keywords.strict.list] The following keywords are in all editions: +- `_` - `as` - `async` - `await` diff --git a/src/macros-by-example.md b/src/macros-by-example.md index 4edad491c3..8de2aecdaf 100644 --- a/src/macros-by-example.md +++ b/src/macros-by-example.md @@ -25,7 +25,7 @@ MacroMatcher -> MacroMatch -> Token _except `$` and [delimiters][lex.token.delim]_ | MacroMatcher - | `$` ( IDENTIFIER_OR_KEYWORD _except `crate`_ | RAW_IDENTIFIER | `_` ) `:` MacroFragSpec + | `$` ( IDENTIFIER_OR_KEYWORD _except `crate`_ | RAW_IDENTIFIER ) `:` MacroFragSpec | `$` `(` MacroMatch+ `)` MacroRepSep? MacroRepOp MacroFragSpec -> @@ -134,7 +134,7 @@ Valid fragment specifiers are: * `block`: a [BlockExpression] * `expr`: an [Expression] * `expr_2021`: an [Expression] except [UnderscoreExpression] and [ConstBlockExpression] (see [macro.decl.meta.edition2024]) - * `ident`: an [IDENTIFIER_OR_KEYWORD], [RAW_IDENTIFIER], or [`$crate`] + * `ident`: an [IDENTIFIER_OR_KEYWORD] except `_`, [RAW_IDENTIFIER], or [`$crate`] * `item`: an [Item] * `lifetime`: a [LIFETIME_TOKEN] * `literal`: matches `-`?[LiteralExpression] diff --git a/src/tokens.md b/src/tokens.md index 1a8755148e..019937c9f6 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -116,7 +116,7 @@ A suffix is a sequence of characters following the primary part of a literal (wi r[lex.token.literal.suffix.syntax] ```grammar,lexer -SUFFIX -> IDENTIFIER_OR_KEYWORD +SUFFIX -> IDENTIFIER_OR_KEYWORD _except `_`_ SUFFIX_NO_E -> SUFFIX _not beginning with `e` or `E`_ ``` @@ -762,7 +762,6 @@ r[lex.token.life.syntax] ```grammar,lexer LIFETIME_TOKEN -> `'` IDENTIFIER_OR_KEYWORD _not immediately followed by `'`_ - | `'_` _not immediately followed by `'`_ | RAW_LIFETIME LIFETIME_OR_LABEL -> @@ -770,7 +769,7 @@ LIFETIME_OR_LABEL -> | RAW_LIFETIME RAW_LIFETIME -> - `'r#` IDENTIFIER_OR_KEYWORD _except `crate`, `self`, `super`, `Self` and not immediately followed by `'`_ + `'r#` IDENTIFIER_OR_KEYWORD _except `crate`, `self`, `super`, `Self`, `_` and not immediately followed by `'`_ RESERVED_RAW_LIFETIME -> `'r#_` _not immediately followed by `'`_ ``` @@ -845,7 +844,6 @@ PUNCTUATION -> | `#` | `$` | `?` - | `_` | `{` | `}` | `[` @@ -891,7 +889,6 @@ usages and meanings are defined in the linked pages. | `>=` | Ge | [Greater than or equal to][comparison], [Generics] | `<=` | Le | [Less than or equal to][comparison] | `@` | At | [Subpattern binding] -| `_` | Underscore | [Wildcard patterns], [Inferred types], Unnamed items in [constants], [extern crates], [use declarations], and [destructuring assignment] | `.` | Dot | [Field access][field], [Tuple index] | `..` | DotDot | [Range][range], [Struct expressions], [Patterns], [Range Patterns][rangepat] | `...` | DotDotDot | [Variadic functions][extern], [Range patterns] @@ -947,23 +944,23 @@ r[lex.token.reserved-prefix] r[lex.token.reserved-prefix.syntax] ```grammar,lexer RESERVED_TOKEN_DOUBLE_QUOTE -> - ( IDENTIFIER_OR_KEYWORD _except `b` or `c` or `r` or `br` or `cr`_ | `_` ) `"` + IDENTIFIER_OR_KEYWORD _except `b` or `c` or `r` or `br` or `cr`_ `"` RESERVED_TOKEN_SINGLE_QUOTE -> - ( IDENTIFIER_OR_KEYWORD _except `b`_ | `_` ) `'` + IDENTIFIER_OR_KEYWORD _except `b`_ `'` RESERVED_TOKEN_POUND -> - ( IDENTIFIER_OR_KEYWORD _except `r` or `br` or `cr`_ | `_` ) `#` + IDENTIFIER_OR_KEYWORD _except `r` or `br` or `cr`_ `#` RESERVED_TOKEN_LIFETIME -> - `'` ( IDENTIFIER_OR_KEYWORD _except `r`_ | `_` ) `#` + `'` IDENTIFIER_OR_KEYWORD _except `r`_ `#` ``` r[lex.token.reserved-prefix.intro] Some lexical forms known as _reserved prefixes_ are reserved for future use. r[lex.token.reserved-prefix.id] -Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or `_`) which is immediately followed by a `#`, `'`, or `"` character (without intervening whitespace) is identified as a reserved prefix. +Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword) which is immediately followed by a `#`, `'`, or `"` character (without intervening whitespace) is identified as a reserved prefix. r[lex.token.reserved-prefix.raw-token] Note that raw identifiers, raw string literals, and raw byte string literals may contain a `#` character but are not interpreted as containing a reserved prefix. @@ -972,7 +969,7 @@ r[lex.token.reserved-prefix.strings] Similarly the `r`, `b`, `br`, `c`, and `cr` prefixes used in raw string literals, byte literals, byte string literals, raw byte string literals, C string literals, and raw C string literals are not interpreted as reserved prefixes. r[lex.token.reserved-prefix.life] -Source input which would otherwise be lexically interpreted as a non-raw lifetime (or a keyword or `_`) which is immediately followed by a `#` character (without intervening whitespace) is identified as a reserved lifetime prefix. +Source input which would otherwise be lexically interpreted as a non-raw lifetime (or a keyword) which is immediately followed by a `#` character (without intervening whitespace) is identified as a reserved lifetime prefix. r[lex.token.reserved-prefix.edition2021] > [!EDITION-2021] From 0234184cf1317189399c2015f831bf6e047c0811 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Tue, 14 Oct 2025 11:44:49 -0700 Subject: [PATCH 2/2] Rework RESERVED_RAW_IDENTIFIER and RESERVED_RAW_LIFETIME This reworks the reserved raw identifier and lifetimes to hopefully more clearly express what they mean. The "except" clauses in the raw identifier were intended to mean a set subtraction, not an explicit "and it is an error if it is specified". Using set subtraction isn't correct because that would mean `r#crate` would be interpreted as 3 tokens (since RAW_IDENTIFIER did not match it, but IDENTIFIER_OR_KEYWORD PUNCTUATION IDENTIFIER_OR_KEYWORD would). I also reordered Token, since the intent is that the first production in an alternation that matches wins. The idea here is to make the reserved tokens high priority, so that they clearly match first and cause an error. (I did not exhaustively analyze the rest of the rules to see if they follow that behavior, that is for another day.) It could be said it would be nice to document the rationale for the restrictions (https://github.com/rust-lang/reference/issues/2042), but that is a bigger ask. --- src/identifiers.md | 6 +++--- src/tokens.md | 13 ++++++------- 2 files changed, 9 insertions(+), 10 deletions(-) diff --git a/src/identifiers.md b/src/identifiers.md index 7ee390c8bf..6165160a71 100644 --- a/src/identifiers.md +++ b/src/identifiers.md @@ -9,13 +9,13 @@ XID_Start -> <`XID_Start` defined by Unicode> XID_Continue -> <`XID_Continue` defined by Unicode> -RAW_IDENTIFIER -> `r#` IDENTIFIER_OR_KEYWORD _except `crate`, `self`, `super`, `Self`, `_`_ +RAW_IDENTIFIER -> `r#` IDENTIFIER_OR_KEYWORD NON_KEYWORD_IDENTIFIER -> IDENTIFIER_OR_KEYWORD _except a [strict][lex.keywords.strict] or [reserved][lex.keywords.reserved] keyword_ IDENTIFIER -> NON_KEYWORD_IDENTIFIER | RAW_IDENTIFIER -RESERVED_RAW_IDENTIFIER -> `r#_` +RESERVED_RAW_IDENTIFIER -> `r#` (`_` | `crate` | `self` | `Self` | `super`) ``` @@ -72,7 +72,7 @@ Unlike a normal identifier, a raw identifier may be any strict or reserved keyword except the ones listed above for `RAW_IDENTIFIER`. r[ident.raw.reserved] -It is an error to use the [RESERVED_RAW_IDENTIFIER] token `r#_` in order to avoid confusion with the [WildcardPattern]. +It is an error to use the [RESERVED_RAW_IDENTIFIER] token. [`extern crate`]: items/extern-crates.md [`no_mangle`]: abi.md#the-no_mangle-attribute diff --git a/src/tokens.md b/src/tokens.md index 019937c9f6..c825d93eda 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -4,7 +4,7 @@ r[lex.token] r[lex.token.syntax] ```grammar,lexer Token -> - IDENTIFIER_OR_KEYWORD + RESERVED_TOKEN | RAW_IDENTIFIER | CHAR_LITERAL | STRING_LITERAL @@ -18,7 +18,7 @@ Token -> | FLOAT_LITERAL | LIFETIME_TOKEN | PUNCTUATION - | RESERVED_TOKEN + | IDENTIFIER_OR_KEYWORD ``` r[lex.token.intro] @@ -769,9 +769,9 @@ LIFETIME_OR_LABEL -> | RAW_LIFETIME RAW_LIFETIME -> - `'r#` IDENTIFIER_OR_KEYWORD _except `crate`, `self`, `super`, `Self`, `_` and not immediately followed by `'`_ + `'r#` IDENTIFIER_OR_KEYWORD _not immediately followed by `'`_ -RESERVED_RAW_LIFETIME -> `'r#_` _not immediately followed by `'`_ +RESERVED_RAW_LIFETIME -> `'r#` (`_` | `crate` | `self` | `Self` | `super`) _not immediately followed by `'`_ ``` r[lex.token.life.intro] @@ -786,7 +786,7 @@ r[lex.token.life.raw.allowed] Unlike a normal lifetime, a raw lifetime may be any strict or reserved keyword except the ones listed above for `RAW_LIFETIME`. r[lex.token.life.raw.reserved] -It is an error to use the RESERVED_RAW_LIFETIME token `'r#_` in order to avoid confusion with the [placeholder lifetime]. +It is an error to use the [RESERVED_RAW_LIFETIME] token. r[lex.token.life.raw.edition2021] > [!EDITION-2021] @@ -922,7 +922,7 @@ r[lex.token.reserved] ## Reserved tokens r[lex.token.reserved.intro] -Several token forms are reserved for future use. It is an error for the source input to match one of these forms. +Several token forms are reserved for future use or to avoid confusion. It is an error for the source input to match one of these forms. r[lex.token.reserved.syntax] ```grammar,lexer @@ -1058,7 +1058,6 @@ r[lex.token.reserved-guards.edition2024] [numeric types]: types/numeric.md [paths]: paths.md [patterns]: patterns.md -[placeholder lifetime]: lifetime-elision.md [question]: expressions/operator-expr.md#the-try-propagation-expression [range]: expressions/range-expr.md [rangepat]: patterns.md#range-patterns