Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

- [Lexical structure](lexical-structure.md)
- [Input format](input-format.md)
- [Frontmatter](frontmatter.md)
- [Keywords](keywords.md)
- [Identifiers](identifiers.md)
- [Comments](comments.md)
Expand Down
53 changes: 53 additions & 0 deletions src/frontmatter.md
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two categories of feedback I am concerned about for this PR

Please keep in mind that I am not contributing to this for my own interest in the Reference (I was interested in that at one point but no longer) but to fulfill a requirement of another team.

How about an alternative route for this: this PR gets re-framed as a perma-draft that is a content dump without any editorial changes (changing the content to be as surgical as possible). Once we're agreed to the technical content, a T-reference person can then copy the commits, make any editorial changes they wish, and post their own PR.

Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
r[frontmatter]
# Frontmatter

r[frontmatter.syntax]
```grammar,lexer
@root FRONTMATTER ->
FRONTMATTER_FENCE HORIZONTAL_WHITESPACE* INFOSTRING? HORIZONTAL_WHITESPACE* LF
(FRONTMATTER_LINE LF )*
FRONTMATTER_FENCE[^matched-fence] HORIZONTAL_WHITESPACE* LF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unusual to have load-bearing footnotes in the grammar. Would it be possible to define this recursively so that isn't necessary? Or is there maybe some other way around it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, TC and I have been trying to come up with a grammar here, but we have not yet been able to land on something. I'm still working on it.

Copy link
Contributor Author

@epage epage Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that being too clever in encoding this could have a negative impact on users understanding this.

Another route is to extend the grammar for inline context notes like how *? gets a (non-greedy) note and ! gets a warning icon with with the exception of.


FRONTMATTER_FENCE -> `---` `-`{..=255}

INFOSTRING -> (XID_Start | `_`) ( XID_Continue | `-` | `.` )*

FRONTMATTER_LINE -> (~INVALID_FRONTMATTER_LINE_START (~INVALID_FRONTMATTER_LINE_CONTINUE)*)?

INVALID_FRONTMATTER_LINE_START -> (FRONTMATTER_FENCE[^escaped-fence] | LF)

INVALID_FRONTMATTER_LINE_CONTINUE -> LF
```

[^matched-fence]: The closing fence must have the same number of `-` as the opening fence
[^escaped-fence]: A `FRONTMATTER_FENCE` at the beginning of a `FRONTMATTER_LINE` is only invalid if it has the same or more `-` as the `FRONTMATTER_FENCE`

r[frontmatter.intro]
Frontmatter is an optional section for content intended for external tools without requiring these tools to have full knowledge of the Rust grammar.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always use "intro" for the introduction of something.

Also, perhaps this intro could be moved to the top of the file?

Suggested change
Frontmatter is an optional section for content intended for external tools without requiring these tools to have full knowledge of the Rust grammar.
r[frontmatter.intro]
Frontmatter is an optional section for content intended for external tools without requiring these tools to have full knowledge of the Rust grammar.

Can you also include an example here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, perhaps this intro could be moved to the top of the file?

All other Lexical structure pages with a grammar open with the grammar. Most (but not all) other pages I randomly sampled did the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The into reference and an example were added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI identified two problems with the example

  • A feature flag must be present to build
    • How do we handle the feature flag in the example when feature flags are not within the scope of the reference?
    • Would we be removing it from the example when this is stabilized and released? Unfortunate that we have to wait until then
  • It doesn't understand frontmatter and fails to build
    • I think it is important for examples to be practical so I made a small, runnable cargo script


```rust
#!/usr/bin/env cargo
---
[dependencies]
fastrand = "2"
---

fn main() {
let num = fastrand::i32(..);
println!("{num}");
}
```

r[frontmatter.document]
Frontmatter may only be preceded by a [shebang] and whitespace.

r[frontmatter.fence]
The delimiters are referred to as a *fence*. The opening and closing fences must be at the start of a line. They must be a matching pair of three or more hyphens (`-`). A fence may be followed by horizontal whitespace.

r[frontmatter.infostring]
Following the opening fence may be an infostring for identifying the intention of the contained content. An infostring may be followed by horizontal whitespace.

r[frontmatter.body]
The body of the frontmatter may contain any content except for a line starting with as many or more hyphens (`-`) than in the fences.

[shebang]: input-format.md#shebang-removal
6 changes: 6 additions & 0 deletions src/input-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,11 @@ This prevents an [inner attribute] at the start of a source file being removed.
> [!NOTE]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @traviscross at rust-lang/rust#148051 (comment) related to r[input.shebang.inner-attribute] (just above this)

Let's please document the behavior of frontmatter removal, including how it interacts with the documented shebang removal exception

There is no change to the shebang vs attribute detection. I've been trying to think this through and so far I've not come up with a way to include frontmatter that doesn't feel artificial (saying what doesn't affect this is an open set).

> The standard library [`include!`] macro applies byte order mark removal, CRLF normalization, and shebang removal to the file it reads. The [`include_str!`] and [`include_bytes!`] macros do not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the impact on these macros?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting on rust-lang/rust#146377 to update this

r[input.frontmatter]
## Frontmatter removal

After some whitespace, [frontmatter] may next appear in the input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be worded more explicitly how it relates to the items around it (particularly shebang)? This rule isn't quite standing on its own and it isn't quite clear how it fits.

For example, something like: "after the optional [shebang] and then optional [whitespace], [frontmatter] may appear next in the input".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule isn't quite standing on its own and it isn't quite clear how it fits.

At least shebang is written in an additive manner:

If the remaining sequence begins with the characters #!,

Seems like it would be good to be consistent. I could make the additive nature more explicit.


r[input.tokenization]
## Tokenization

Expand All @@ -69,4 +74,5 @@ The resulting sequence of characters is then converted into tokens as described
[comments]: comments.md
[Crates and source files]: crates-and-source-files.md
[_shebang_]: https://en.wikipedia.org/wiki/Shebang_(Unix)
[frontmatter]: frontmatter.md
[whitespace]: whitespace.md
2 changes: 1 addition & 1 deletion src/items/modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ r[items.mod.attributes]
r[items.mod.attributes.intro]
Modules, like all items, accept outer attributes. They also accept inner
attributes: either after `{` for a module with a body, or at the beginning of the
source file, after the optional BOM and shebang.
source file, after the optional BOM, shebang, and frontmatter.

r[items.mod.attributes.supported]
The built-in attributes that have meaning on a module are [`cfg`],
Expand Down
43 changes: 26 additions & 17 deletions src/whitespace.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,32 @@ r[lex.whitespace]
r[whitespace.syntax]
```grammar,lexer
@root WHITESPACE ->
U+0009 // Horizontal tab, `'\t'`
| U+000A // Line feed, `'\n'`
| U+000B // Vertical tab
| U+000C // Form feed
| U+000D // Carriage return, `'\r'`
| U+0020 // Space, `' '`
| U+0085 // Next line
| U+200E // Left-to-right mark
| U+200F // Right-to-left mark
| U+2028 // Line separator
| U+2029 // Paragraph separator
TAB -> U+0009 // Horizontal tab, `'\t'`
LF -> U+000A // Line feed, `'\n'`
CR -> U+000D // Carriage return, `'\r'`
END_OF_LINE
| IGNORABLE_CODE_POINT
| HORIZONTAL_WHITESPACE
END_OF_LINE ->
U+000A // line feed, `'\n'`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please keep the original comments and same comment style in these rules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some existing discussion on the comment style at #1974 (comment)

The intent of the comment style is to match the linked to standard for easier comparison.

| U+000B // vertical tabulation
| U+000C // form feed
| U+000D // carriage return, `'\r'`
| U+0085 // next line
| U+2028 // LINE SEPARATOR
| U+2029 // PARAGRAPH SEPARATOR
IGNORABLE_CODE_POINT ->
U+200E // LEFT-TO-RIGHT MARK
| U+200F // RIGHT-TO-LEFT MARK
HORIZONTAL_WHITESPACE ->
U+0009 // horizontal tab, `'\t'`
| U+0020 // space, `' '`
TAB -> U+0009 // horizontal tab, `'\t'`
LF -> U+000A // line feed, `'\n'`
CR -> U+000D // carriage return, `'\r'`
```

r[lex.whitespace.intro]
Expand Down
Loading