Conversation
Reviewer's GuideAdds a new Flow diagram for error recovery in parserflowchart TD
A[Parser encounters error] --> B[Emit N_ERROR node]
B --> C[Skip to synchronisation point]
C --> D[Continue parsing]
D --> E[Final output: AST root + CST with error nodes]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughA new documentation file, "Parser Porting Plan," has been introduced. It outlines a comprehensive, step-by-step approach for porting a Haskell parser to Rust, detailing strategies for tokenisation, grammar mapping, AST and CST construction, error recovery, and testing, as well as integration steps within the existing pipeline. Changes
Poem
✨ Finishing Touches🧪 Generate Unit Tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Gates Passed
6 Quality Gates Passed
See analysis details in CodeScene
Quality Gate Profile: Pay Down Tech Debt
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.
There was a problem hiding this comment.
Hey @leynos - I've reviewed your changes - here's some feedback:
- Add a dedicated strategy for handling left-recursive grammar productions (e.g. refactoring or explicit recursion elimination) since Chumsky doesn’t support left recursion natively.
- Introduce a performance benchmarks section to measure and compare parsing speed and memory usage against the existing Haskell parser to catch regressions early.
- Clarify how trivia tokens (whitespace and comments) will be modeled in the CST so they don’t leak into AST-based semantic analyses.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Add a dedicated strategy for handling left-recursive grammar productions (e.g. refactoring or explicit recursion elimination) since Chumsky doesn’t support left recursion natively.
- Introduce a performance benchmarks section to measure and compare parsing speed and memory usage against the existing Haskell parser to catch regressions early.
- Clarify how trivia tokens (whitespace and comments) will be modeled in the CST so they don’t leak into AST-based semantic analyses.
## Individual Comments
### Comment 1
<location> `docs/parser-plan.md:31` </location>
<code_context>
+provided by `num-derive`. This allows `rowan` to store `SyntaxKind` tags
+transparently.
+
+## 3. Build a Tokeniser
+
+Use `chumsky`'s text utilities (or integrate a `logos` lexer if preferred) to
</code_context>
<issue_to_address>
Consider changing 'Tokeniser' to 'Tokenizer' for consistency.
The document uses American English elsewhere, so using 'Tokenizer' would align with that convention.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
## 3. Build a Tokeniser
Use `chumsky`'s text utilities (or integrate a `logos` lexer if preferred) to
=======
## 3. Build a Tokenizer
Use `chumsky`'s text utilities (or integrate a `logos` lexer if preferred) to
>>>>>>> REPLACE
</suggested_fix>
### Comment 2
<location> `docs/parser-plan.md:40` </location>
<code_context>
+
+## 4. Construct the Parser with `chumsky`
+
+1. Express each grammar rule using `chumsky` combinators. The parser should
+ return both an AST node and instructions for the CST builder.
+2. Wrap every recognised token into its `SyntaxKind` and push it into a
+ `GreenNodeBuilder` from `rowan`.
+3. For syntactic errors, emit an `N_ERROR` node and recover by skipping to a
</code_context>
<issue_to_address>
Consider changing 'recognised' to 'recognized' for consistency.
Using 'recognized' will ensure spelling consistency throughout the document.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
2. Wrap every recognised token into its `SyntaxKind` and push it into a
`GreenNodeBuilder` from `rowan`.
=======
2. Wrap every recognized token into its `SyntaxKind` and push it into a
`GreenNodeBuilder` from `rowan`.
>>>>>>> REPLACE
</suggested_fix>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| ## 3. Build a Tokeniser | ||
|
|
||
| Use `chumsky`'s text utilities (or integrate a `logos` lexer if preferred) to |
There was a problem hiding this comment.
suggestion (typo): Consider changing 'Tokeniser' to 'Tokenizer' for consistency.
The document uses American English elsewhere, so using 'Tokenizer' would align with that convention.
| ## 3. Build a Tokeniser | |
| Use `chumsky`'s text utilities (or integrate a `logos` lexer if preferred) to | |
| ## 3. Build a Tokenizer | |
| Use `chumsky`'s text utilities (or integrate a `logos` lexer if preferred) to |
| 2. Wrap every recognised token into its `SyntaxKind` and push it into a | ||
| `GreenNodeBuilder` from `rowan`. |
There was a problem hiding this comment.
suggestion (typo): Consider changing 'recognised' to 'recognized' for consistency.
Using 'recognized' will ensure spelling consistency throughout the document.
| 2. Wrap every recognised token into its `SyntaxKind` and push it into a | |
| `GreenNodeBuilder` from `rowan`. | |
| 2. Wrap every recognized token into its `SyntaxKind` and push it into a | |
| `GreenNodeBuilder` from `rowan`. |
There was a problem hiding this comment.
Actionable comments posted: 5
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/parser-plan.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/parser-plan.md
[style] ~14-~14: The serial comma (Oxford comma, Harvard comma) is missing.
Context: ...e lexical elements, including keywords, operators and punctuation. These will become token...
(SERIAL_COMMA_ON)
[style] ~31-~31: Would you like to use the Oxford spelling “Tokenizer”? The spelling ‘Tokeniser’ is also correct.
Context: ...indtags transparently. ## 3. Build a Tokeniser Usechumsky`'s text utilities (or int...
(OXFORD_SPELLING_Z_NOT_S)
[style] ~42-~42: Would you like to use the Oxford spelling “recognized”? The spelling ‘recognised’ is also correct.
Context: ...ions for the CST builder. 2. Wrap every recognised token into its SyntaxKind and push it...
(OXFORD_SPELLING_Z_NOT_S)
[style] ~45-~45: Would you like to use the Oxford spelling “synchronization”? The spelling ‘synchronisation’ is also correct.
Context: ...e and recover by skipping to a known synchronisation point, as recommended by the design ...
(OXFORD_SPELLING_Z_NOT_S)
🔇 Additional comments (1)
docs/parser-plan.md (1)
1-8: Comprehensive porting plan outlined successfully.
The document clearly describes each step for migrating the Haskell parser to Rust using Chumsky and Rowan.
| provided by `num-derive`. This allows `rowan` to store `SyntaxKind` tags | ||
| transparently. | ||
|
|
||
| ## 3. Build a Tokeniser |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Unify spelling for “Tokeniser”.
Choose one variant—e.g. use American “Tokenizer” to match most Rust docs—or stick with British throughout.
🧰 Tools
🪛 LanguageTool
[style] ~31-~31: Would you like to use the Oxford spelling “Tokenizer”? The spelling ‘Tokeniser’ is also correct.
Context: ...indtags transparently. ## 3. Build a Tokeniser Usechumsky`'s text utilities (or int...
(OXFORD_SPELLING_Z_NOT_S)
🤖 Prompt for AI Agents
In docs/parser-plan.md at line 31, the term "Tokeniser" is used inconsistently
with the rest of the documentation. Choose either the American spelling
"Tokenizer" or the British "Tokeniser" and update all occurrences in the
document to use the selected variant consistently.
|
|
||
| 1. Express each grammar rule using `chumsky` combinators. The parser should | ||
| return both an AST node and instructions for the CST builder. | ||
| 2. Wrap every recognised token into its `SyntaxKind` and push it into a |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Unify spelling for “recognised”.
Either use British (“recognised”) everywhere or switch to American (“recognized”) for consistency.
🧰 Tools
🪛 LanguageTool
[style] ~42-~42: Would you like to use the Oxford spelling “recognized”? The spelling ‘recognised’ is also correct.
Context: ...ions for the CST builder. 2. Wrap every recognised token into its SyntaxKind and push it...
(OXFORD_SPELLING_Z_NOT_S)
🤖 Prompt for AI Agents
In docs/parser-plan.md at line 42, the word "recognised" is used inconsistently
with other parts of the document. Choose either British English ("recognised")
or American English ("recognized") and update all occurrences in the document to
use the same spelling for consistency.
| 2. Wrap every recognised token into its `SyntaxKind` and push it into a | ||
| `GreenNodeBuilder` from `rowan`. | ||
| 3. For syntactic errors, emit an `N_ERROR` node and recover by skipping to a | ||
| known synchronisation point, as recommended by the design |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Unify spelling for “synchronisation”.
Consider using either British (“synchronisation”) or American (“synchronization”) consistently.
🧰 Tools
🪛 LanguageTool
[style] ~45-~45: Would you like to use the Oxford spelling “synchronization”? The spelling ‘synchronisation’ is also correct.
Context: ...e and recover by skipping to a known synchronisation point, as recommended by the design ...
(OXFORD_SPELLING_Z_NOT_S)
🤖 Prompt for AI Agents
In docs/parser-plan.md at line 45, the word "synchronisation" is used
inconsistently with other occurrences. Choose either the British spelling
"synchronisation" or the American spelling "synchronization" and update all
instances in the document to use the chosen spelling consistently.
| - Create a `SyntaxKind` enum covering every token and node in the grammar. | ||
| - Derive `FromPrimitive` and `ToPrimitive` and use `#[repr(u16)]` so that each | ||
| kind can map to `rowan::SyntaxKind`. The design notes this pattern |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Standardise footnote link formatting.
Replace the custom footnote marker 【F:docs/ddlint-design-and-road-map.md†L71-L122】 with a standard Markdown link, e.g.:
…pattern explicitly ([design document](docs/ddlint-design-and-road-map.md#L71-L122)).🤖 Prompt for AI Agents
In docs/parser-plan.md around lines 21 to 23, replace the custom footnote marker
`【F:docs/ddlint-design-and-road-map.md†L71-L122】` with a standard Markdown link
format. Use the syntax `([design
document](docs/ddlint-design-and-road-map.md#L71-L122))` to ensure consistent
and standard footnote link formatting throughout the document.
| 1. Study the token definitions and parser entry points in the Haskell source. | ||
| 2. Identify each grammar production and note its corresponding structure in the | ||
| AST. | ||
| 3. Enumerate lexical elements, including keywords, operators and punctuation. |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Serial comma missing in list.
In the item Enumerate lexical elements, including keywords, operators and punctuation., add an Oxford comma:
- including keywords, operators and punctuation
+ including keywords, operators, and punctuation📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 3. Enumerate lexical elements, including keywords, operators and punctuation. | |
| 3. Enumerate lexical elements, including keywords, operators, and punctuation. |
🧰 Tools
🪛 LanguageTool
[style] ~14-~14: The serial comma (Oxford comma, Harvard comma) is missing.
Context: ...e lexical elements, including keywords, operators and punctuation. These will become token...
(SERIAL_COMMA_ON)
🤖 Prompt for AI Agents
In docs/parser-plan.md at line 14, the list "keywords, operators and
punctuation" is missing the Oxford comma. Add a comma after "operators" so the
list reads "keywords, operators, and punctuation" to improve clarity.
There was a problem hiding this comment.
Gates Passed
6 Quality Gates Passed
See analysis details in CodeScene
Quality Gate Profile: Pay Down Tech Debt
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.
Summary
Testing
cargo fmt --allmdformat-allcargo clippy --all-targets --all-features -- -D warnings(failed: lint_groups_priority and print-stdout)cargo test --all-targets --all-featuresmarkdownlint --fix docs/parser-plan.mdhttps://chatgpt.com/codex/tasks/task_e_68599e7c0d9c83228c3e1d2421e6d058
Summary by Sourcery
Documentation:
Summary by CodeRabbit