Skip to content

Introduce basic parser skeleton#10

Merged
leynos merged 6 commits intomainfrom
codex/construct-skeleton-parser-with-chumsky
Jun 25, 2025
Merged

Introduce basic parser skeleton#10
leynos merged 6 commits intomainfrom
codex/construct-skeleton-parser-with-chumsky

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Jun 25, 2025

Summary

  • add chumsky dependency
  • expose a parse function and minimal AST wrappers
  • build a simple CST with rowan
  • add parser integration tests using rstest

Testing

  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test --all-targets --all-features

https://chatgpt.com/codex/tasks/task_e_685bc309e7c08322a0c27c2c7f7e27d8

Summary by Sourcery

Add a skeleton DDlog parser using Chumsky and Rowan, expose a public parse API with typed AST wrappers, update documentation with formatting improvements and a parsing sequence diagram, and introduce integration tests to validate basic CST construction and error recovery

New Features:

  • Introduce a basic parser module with a parse function, Parsed result type, CST construction via Rowan, and minimal AST wrappers

Build:

  • Add chumsky and log dependencies for parsing support

Documentation:

  • Refine parser testing guide formatting and embed a Mermaid diagram illustrating the parsing pipeline in the design doc

Tests:

  • Add integration tests using rstest to verify parsing round-trip and error-node insertion

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Jun 25, 2025

Reviewer's Guide

This PR adds a rudimentary parser framework using chumsky for token streams and rowan for CST construction, exposes a parse API with minimal AST wrappers, incorporates necessary crate dependencies, includes rstest-based integration tests for round-trip printing and error-node insertion, and updates related documentation with formatting fixes and a mermaid diagram.

Sequence diagram for the new parse function workflow

sequenceDiagram
    actor User
    participant Parser
    participant Tokenizer
    participant SyntaxTree

    User->>Parser: parse(source)
    Parser->>Tokenizer: tokenize(source)
    Tokenizer-->>Parser: tokens
    Parser->>Parser: build token stream
    Parser->>SyntaxTree: construct green tree from tokens
    SyntaxTree-->>Parser: green tree
    Parser->>Parser: wrap green tree in AST root
    Parser-->>User: Parsed { green, root }
Loading

Class diagram for the new Parsed and Root AST wrappers

classDiagram
    class Parsed {
        +GreenNode green
        +Root root
        +Vec~Simple<SyntaxKind>~ errors
        +green() GreenNode
        +root() Root
        +errors() [Simple<SyntaxKind>]
    }
    class Root {
        +SyntaxNode~DdlogLanguage~ syntax
        +syntax() SyntaxNode~DdlogLanguage~
        +from_green(GreenNode) Root
        +kind() SyntaxKind
        +text_range() TextRange
        +text() String
    }
    Parsed --> Root
Loading

Class diagram for the new parser module API surface

classDiagram
    class parser {
        +parse(src: &str) Parsed
        +ast
    }
    parser --> Parsed
    parser --> Root
Loading

File-Level Changes

Change Details Files
Introduce parser skeleton with chumsky and rowan
  • Create src/parser/mod.rs defining parse, parse_tokens, and build_green_tree functions
  • Define Parsed struct capturing green tree, AST root, and parse errors
  • Implement minimal AST wrappers in parser/ast including Root type and accessors
  • Export parser, Parsed, ast, and parse from crate root
src/parser/mod.rs
src/lib.rs
Add parsing and logging dependencies
  • Add chumsky and log dependencies to Cargo.toml
  • Maintain and verify rstest dev-dependency for integration tests
Cargo.toml
Add integration tests for parser round-trip and error recovery
  • Create tests/parser.rs with rstest fixtures for simple, complex, and empty inputs
  • Implement pretty_print helper to traverse and reprint the CST
  • Write tests asserting round-trip text equality and presence of N_ERROR nodes
tests/parser.rs
Update documentation formatting and add mermaid diagram
  • Adjust indentation, line wraps, and table separators in parser testing guide
  • Insert a mermaid sequenceDiagram in the ddelint design roadmap for parsing flow
docs/rust-parser-testing-comprehensive-guide.md
docs/ddlint-design-and-road-map.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 25, 2025

Summary by CodeRabbit

  • New Features
    • Introduced a parser module for Datalog source code, enabling parsing and syntax tree construction.
    • Exposed new parsing-related functions and types in the public API.
  • Tests
    • Added integration tests to verify parser correctness and error handling.
  • Documentation
    • Updated design documentation with a sequence diagram of the parsing pipeline.
    • Improved formatting and readability in the parser testing guide.
  • Chores
    • Added new dependencies to support parsing and logging functionality.

Summary by CodeRabbit

  • New Features

    • Introduced a basic parser for DDlog source code, providing initial parsing and syntax tree construction capabilities.
    • Exposed new parsing-related functionality through the public API.
  • Tests

    • Added integration tests to verify correct syntax tree construction and error handling for invalid tokens.
  • Documentation

    • Updated documentation with corrected spelling and a new diagram illustrating the parsing pipeline.
  • Chores

    • Added new dependencies to support parsing and logging functionality.

Walkthrough

A new parser module was introduced, leveraging the chumsky crate and rowan syntax trees to parse DDlog source code. The crate's public API now exports parser-related types and functions. Integration tests were added to verify syntax tree construction and round-tripping. The Cargo.toml was updated to include the chumsky and log dependencies.

Changes

File(s) Change Summary
Cargo.toml Added chumsky and log dependencies with default-features = false; enabled std for chumsky.
src/lib.rs Declared and publicly exported the parser module and its key items: Parsed, ast, and parse.
src/parser/mod.rs Added parser module using chumsky and rowan; defined Parsed, parse, and AST wrapper ast.
tests/parser.rs Added integration tests for parser skeleton, round-trip property, and error node handling.
docs/ddlint-design-and-road-map.md Added Mermaid sequence diagram illustrating parsing pipeline; minor text corrections.
docs/rust-parser-testing-comprehensive-guide.md Reformatted text and tables for improved readability without semantic changes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Parser
    participant Tokenizer
    participant SyntaxTree

    User->>Parser: parse(source)
    Parser->>Tokenizer: tokenize(source)
    Tokenizer-->>Parser: tokens
    Parser->>Parser: build token stream
    Parser->>SyntaxTree: construct green tree from tokens
    SyntaxTree-->>Parser: green tree
    Parser->>Parser: wrap green tree in AST root
    Parser-->>User: Parsed { green, root }
Loading

Possibly related PRs

  • Add SyntaxKind and language wrapper #6: The main PR adds a parser module that produces a rowan-based syntax tree using the DdlogLanguage and SyntaxKind definitions introduced in the retrieved PR, directly building on the language wrapper and token kinds it defines.

Poem

In a warren of code, a parser appears,
With tokens and trees, it conquers our fears.
Chumsky and Rowan, together they play,
Building green forests from text every day.
Tests hop along, ensuring all’s right—
The rabbit’s delight in syntactic insight!
🐇🌳

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in Comment
  • Commit Unit Tests in branch codex/construct-skeleton-parser-with-chumsky

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leynos - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `src/parser/mod.rs:45` </location>
<code_context>
+pub fn parse(src: &str) -> Parsed {
+    let tokens = tokenize(src);
+    let len = src.len();
+    let stream = Stream::from_iter(len..len, tokens.clone().into_iter());
+
+    // Placeholder parser: simply consume all tokens.
</code_context>

<issue_to_address>
The stream's span range may not be meaningful for error reporting.

Using `0..len` instead of `len..len` for the stream span will provide more useful error diagnostics if error recovery is added.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    let len = src.len();
    let stream = Stream::from_iter(len..len, tokens.clone().into_iter());
=======
    let len = src.len();
    let stream = Stream::from_iter(0..len, tokens.clone().into_iter());
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `src/parser/mod.rs:56` </location>
<code_context>
+
+    let mut builder = GreenNodeBuilder::new();
+    builder.start_node(DdlogLanguage::kind_to_raw(SyntaxKind::N_DATALOG_PROGRAM));
+    for (kind_span, _) in tokens.into_iter().zip(parsed_kinds) {
+        let (kind, span) = kind_span;
+        let text = src.get(span.clone()).unwrap_or("");
</code_context>

<issue_to_address>
Zipping tokens and parsed_kinds may silently drop or mismatch tokens if their lengths differ.

Explicitly handle cases where tokens and parsed_kinds have different lengths to prevent silent truncation and data loss.
</issue_to_address>

### Comment 3
<location> `src/parser/mod.rs:58` </location>
<code_context>
+    builder.start_node(DdlogLanguage::kind_to_raw(SyntaxKind::N_DATALOG_PROGRAM));
+    for (kind_span, _) in tokens.into_iter().zip(parsed_kinds) {
+        let (kind, span) = kind_span;
+        let text = src.get(span.clone()).unwrap_or("");
+        if kind == SyntaxKind::N_ERROR {
+            builder.start_node(DdlogLanguage::kind_to_raw(SyntaxKind::N_ERROR));
</code_context>

<issue_to_address>
Using unwrap_or("") may hide out-of-bounds errors in token spans.

Silently returning an empty string for invalid spans may obscure underlying issues. Adding logging or assertions can help identify these problems during development.

Suggested implementation:

```rust
        let text = match src.get(span.clone()) {
            Some(t) => t,
            None => {
                log::warn!(
                    "Token span {:?} is out of bounds for source of length {}",
                    span,
                    src.len()
                );
                ""
            }
        };

```

- Ensure the `log` crate is included in your dependencies (Cargo.toml) and properly initialized in your application (e.g., with `env_logger` or similar).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/parser/mod.rs Outdated
Comment thread src/parser/mod.rs Outdated
Comment thread src/parser/mod.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f842e97 and ff328fd.

📒 Files selected for processing (4)
  • Cargo.toml (1 hunks)
  • src/lib.rs (1 hunks)
  • src/parser/mod.rs (1 hunks)
  • tests/parser.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`Cargo.toml`: Use explicit version ranges in `Cargo.toml` and keep dependencies up-to-date.

Cargo.toml: Use explicit version ranges in Cargo.toml and keep dependencies up-to-date.

📄 Source: CodeRabbit Inference Engine (AGENTS.md)

List of files the instruction was applied to:

  • Cargo.toml
`**/*.rs`: Document public APIs using Rustdoc comments (`///`) so documentation ...

**/*.rs: Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Use explicit version ranges in Cargo.toml and keep dependencies up-to-date.
Use rstest fixtures for shared setup.
Replace duplicated tests with #[rstest(...)] parameterised cases.
Prefer mockall for mocks/stubs.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality. Run both before and after making any change.

📄 Source: CodeRabbit Inference Engine (AGENTS.md)

List of files the instruction was applied to:

  • src/lib.rs
  • src/parser/mod.rs
  • tests/parser.rs
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build-test
🔇 Additional comments (7)
Cargo.toml (1)

12-12: LGTM! Dependency follows established patterns.

The chumsky dependency addition uses explicit version ranges and follows the same configuration pattern as other dependencies in the file.

src/lib.rs (2)

8-8: Appropriate module declaration.

The parser module addition follows the established pattern of other modules in the file.


12-12: Clean public API exposure.

The re-exports appropriately expose the key parser components at the crate root, maintaining consistency with other module re-exports.

tests/parser.rs (2)

34-48: Excellent use of modern Rust syntax.

The test logic is sound and the use of let chains in the conditional is clean and idiomatic.


10-19: Add documentation for the helper function.

The pretty_print function would benefit from a doc comment explaining its purpose and behaviour, especially since it's used across multiple tests.

+/// Recursively pretty-prints a syntax node by concatenating token text and child nodes.
 fn pretty_print(node: &rowan::SyntaxNode<ddlint::DdlogLanguage>) -> String {

Likely an incorrect or invalid review comment.

src/parser/mod.rs (2)

22-34: Well-documented accessor methods.

The Parsed struct implementation follows good practices with clear documentation and appropriate use of #[must_use] attributes.


42-73: Solid foundation for parser skeleton.

The parse function implementation is well-documented and follows a clear structure. The error handling with unwrap_or_default() and unwrap_or("") is appropriate for this context, and the overall approach provides a good foundation for future grammar rule implementation.

Comment thread tests/parser.rs
Comment thread src/parser/mod.rs
Comment thread src/parser/mod.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff328fd and 03fd266.

📒 Files selected for processing (4)
  • Cargo.toml (1 hunks)
  • docs/ddlint-design-and-road-map.md (1 hunks)
  • src/parser/mod.rs (1 hunks)
  • tests/parser.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
`Cargo.toml`: Use explicit version ranges in `Cargo.toml` and keep dependencies up-to-date.

Cargo.toml: Use explicit version ranges in Cargo.toml and keep dependencies up-to-date.

📄 Source: CodeRabbit Inference Engine (AGENTS.md)

List of files the instruction was applied to:

  • Cargo.toml
`**/*.rs`: Document public APIs using Rustdoc comments (`///`) so documentation ...

**/*.rs: Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Use explicit version ranges in Cargo.toml and keep dependencies up-to-date.
Use rstest fixtures for shared setup.
Replace duplicated tests with #[rstest(...)] parameterised cases.
Prefer mockall for mocks/stubs.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality. Run both before and after making any change.

📄 Source: CodeRabbit Inference Engine (AGENTS.md)

List of files the instruction was applied to:

  • tests/parser.rs
  • src/parser/mod.rs
`docs/**/*.md`: Use the markdown files within the `docs/` directory as a knowled...

docs/**/*.md: Use the markdown files within the docs/ directory as a knowledge base and source of truth for project requirements, dependency choices, and architectural decisions.
Proactively update the relevant file(s) in the docs/ directory to reflect the latest state when new decisions are made, requirements change, libraries are added/removed, or architectural patterns evolve.
Documentation in docs/ must use en-GB-oxendict spelling and grammar, except for the word 'license'.
Validate Markdown files using make markdownlint.
Run make fmt after any documentation changes to format all Markdown files and fix table markup.
Validate Markdown Mermaid diagrams using the make nixie.
Markdown paragraphs and bullet points must be wrapped at 80 columns.
Code blocks in Markdown must be wrapped at 120 columns.
Tables and headings in Markdown must not be wrapped.

📄 Source: CodeRabbit Inference Engine (AGENTS.md)

List of files the instruction was applied to:

  • docs/ddlint-design-and-road-map.md
`docs/ddlint-design-and-road-map.md`: Follow the guidance in the design document, especially sections L71-L122 and L124-L139, for SyntaxKind enum design and error recovery.

docs/ddlint-design-and-road-map.md: Follow the guidance in the design document, especially sections L71-L122 and L124-L139, for SyntaxKind enum design and error recovery.

📄 Source: CodeRabbit Inference Engine (docs/parser-plan.md)

List of files the instruction was applied to:

  • docs/ddlint-design-and-road-map.md
🔇 Additional comments (8)
Cargo.toml (1)

12-13: Dependency additions look good.

The new chumsky and log dependencies follow the coding guidelines with explicit version ranges and appropriate feature configurations. The versions are current and the feature selections support the parser implementation requirements.

docs/ddlint-design-and-road-map.md (2)

178-179: Spelling corrections align with coding guidelines.

The changes from "analyzing" to "analysing" and the non-breaking hyphen in "real‑world" correctly implement the en-GB-oxendict spelling requirement specified in the coding guidelines.


181-196: Excellent addition of parsing pipeline documentation.

The Mermaid sequence diagram clearly illustrates the parsing workflow and accurately reflects the implementation in src/parser/mod.rs. This enhances the design document by providing a visual representation of the token-to-CST transformation process.

tests/parser.rs (3)

10-19: Well-implemented CST traversal helper.

The pretty_print function correctly implements recursive traversal of the rowan syntax tree, handling both nodes and tokens appropriately to reconstruct the original text.


28-34: Good test coverage for round-trip property.

The test appropriately verifies both the text preservation and the expected root node kind, ensuring the parser produces the correct CST structure.


36-50: Effective error recovery validation.

This test properly verifies that invalid tokens result in error nodes being inserted into the CST, demonstrating the error recovery capability that will be crucial for IDE integration.

src/parser/mod.rs (2)

37-88: Solid parser implementation following the design.

The parser correctly implements the chumsky-to-rowan pipeline as specified in the design document. The error handling with proper logging is well done, and the debug assertion ensures consistency between lexer and parser output.


101-131: Well-designed AST wrapper structure.

The Root struct provides a clean typed interface to the rowan syntax tree with appropriate methods for navigation and introspection. The documentation clearly explains its purpose.

Comment thread src/parser/mod.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
src/parser/mod.rs (1)

73-73: Remove unnecessary clone operation.

The span.clone() is only used for logging and creates an unnecessary allocation. Consider using a reference or eliminating the clone entirely.

-        let log_span = span.clone();
         let text = src.get(span).map_or_else(
             || {
                 warn!(
                     "token span {:?} out of bounds for source of length {}",
-                    log_span,
+                    span,
                     src.len()
                 );
                 ""
             },
             |t| t,
         );
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03fd266 and e4415b0.

📒 Files selected for processing (2)
  • src/parser/mod.rs (1 hunks)
  • tests/parser.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.rs`: Document public APIs using Rustdoc comments (`///`) so documentation ...

**/*.rs: Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Use explicit version ranges in Cargo.toml and keep dependencies up-to-date.
Use rstest fixtures for shared setup.
Replace duplicated tests with #[rstest(...)] parameterised cases.
Prefer mockall for mocks/stubs.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality. Run both before and after making any change.

📄 Source: CodeRabbit Inference Engine (AGENTS.md)

List of files the instruction was applied to:

  • tests/parser.rs
  • src/parser/mod.rs
🔇 Additional comments (1)
src/parser/mod.rs (1)

85-91: Verify error node wrapping logic.

The logic wraps error tokens in error nodes, but it's unclear if this is the intended behaviour. Consider whether error tokens should be wrapped differently or if additional error information should be preserved.

Please verify that the error handling logic aligns with the intended CST structure:

#!/bin/bash
# Description: Check how error tokens are handled in related rowan/chumsky examples
# Expected: Find patterns for error node construction in similar parsers

# Search for error node construction patterns
rg -A 5 -B 5 "N_ERROR|ERROR.*node" --type rust

Comment thread tests/parser.rs
Comment thread tests/parser.rs
Comment thread src/parser/mod.rs Outdated
Comment thread src/parser/mod.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4415b0 and 19815e8.

📒 Files selected for processing (2)
  • src/parser/mod.rs (1 hunks)
  • tests/parser.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.rs`: Document public APIs using Rustdoc comments (`///`) so documentation ...

**/*.rs: Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Use explicit version ranges in Cargo.toml and keep dependencies up-to-date.
Use rstest fixtures for shared setup.
Replace duplicated tests with #[rstest(...)] parameterised cases.
Prefer mockall for mocks/stubs.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality. Run both before and after making any change.

📄 Source: CodeRabbit Inference Engine (AGENTS.md)

List of files the instruction was applied to:

  • src/parser/mod.rs
  • tests/parser.rs
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build-test
🔇 Additional comments (2)
src/parser/mod.rs (2)

44-67: Good implementation of the parse function.

The parse function is well-structured with clear separation of concerns - tokenization, parsing, and CST construction. The debug assertion ensures consistency between lexer and parser output, and error handling is properly implemented.


160-162: [web_search]

What is the return type of SyntaxNode::text() in the rowan crate?

Comment thread tests/parser.rs Outdated
Comment thread tests/parser.rs
Comment thread src/parser/mod.rs
Comment thread src/parser/mod.rs
Refactor test helper to use an explicit stack preventing recursion overflow and add more fixtures for parser tests. The parser internals now debug-assert token counts, span logging clones only when warnings are emitted, and the testing guide table formatting is corrected.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19815e8 and 97f8948.

📒 Files selected for processing (3)
  • docs/rust-parser-testing-comprehensive-guide.md (8 hunks)
  • src/parser/mod.rs (1 hunks)
  • tests/parser.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`docs/**/*.md`: Use the markdown files within the `docs/` directory as a knowled...

docs/**/*.md: Use the markdown files within the docs/ directory as a knowledge base and source of truth for project requirements, dependency choices, and architectural decisions.
Proactively update the relevant file(s) in the docs/ directory to reflect the latest state when new decisions are made, requirements change, libraries are added/removed, or architectural patterns evolve.
Documentation in docs/ must use en-GB-oxendict spelling and grammar, except for the word 'license'.
Validate Markdown files using make markdownlint.
Run make fmt after any documentation changes to format all Markdown files and fix table markup.
Validate Markdown Mermaid diagrams using the make nixie.
Markdown paragraphs and bullet points must be wrapped at 80 columns.
Code blocks in Markdown must be wrapped at 120 columns.
Tables and headings in Markdown must not be wrapped.

📄 Source: CodeRabbit Inference Engine (AGENTS.md)

List of files the instruction was applied to:

  • docs/rust-parser-testing-comprehensive-guide.md
`**/*.rs`: Document public APIs using Rustdoc comments (`///`) so documentation ...

**/*.rs: Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Every module must begin with a module level (//!) comment explaining the module's purpose and utility.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Use explicit version ranges in Cargo.toml and keep dependencies up-to-date.
Use rstest fixtures for shared setup.
Replace duplicated tests with #[rstest(...)] parameterised cases.
Prefer mockall for mocks/stubs.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality. Run both before and after making any change.

📄 Source: CodeRabbit Inference Engine (AGENTS.md)

List of files the instruction was applied to:

  • tests/parser.rs
  • src/parser/mod.rs
🧬 Code Graph Analysis (2)
tests/parser.rs (1)
src/parser/mod.rs (4)
  • parse (50-67)
  • text (165-167)
  • root (33-35)
  • kind (153-155)
src/parser/mod.rs (2)
src/tokenizer.rs (1)
  • tokenize (189-245)
src/language.rs (1)
  • kind_to_raw (182-187)
🪛 LanguageTool
docs/rust-parser-testing-comprehensive-guide.md

[style] ~88-~88: Since ownership is already implied, this phrasing may be redundant.
Context: ... beneficial to move even unit tests to their own files (e.g., `src/my_module/tests.rs...

(PRP_OWN)


[locale-violation] ~891-~891: Lower-case ‘Ok’ is American English. For British English use “OK”.
Context: ...identical AST. This can be expressed as parse(pretty_print(ast)) == Ok(ast). If this property holds, it provi...

(OK)

🔇 Additional comments (12)
docs/rust-parser-testing-comprehensive-guide.md (2)

76-91: LGTM on formatting improvements.

The paragraph rewrapping and formatting changes improve readability and align with the coding guidelines for 80-column wrapping of paragraphs and bullet points.


97-101: LGTM on table formatting improvements.

The table column alignment and formatting changes enhance readability whilst maintaining the guideline not to wrap tables and headings.

tests/parser.rs (4)

10-34: Excellent use of iterative traversal to prevent stack overflow.

The pretty_print function correctly uses an explicit stack instead of recursion, addressing the previous concern about deeply nested syntax trees causing stack overflow. This is a robust implementation for handling arbitrary input complexity.


36-49: Good test fixture coverage addressing previous feedback.

The addition of multiple fixtures (simple_prog, complex_prog, empty_prog) provides comprehensive test coverage for different scenarios, directly addressing the previous suggestion to expand beyond a single simple program.


51-59: Well-documented round-trip test.

The test includes proper documentation and validates both the text preservation and expected syntax kind, providing confidence in the parser's correctness.


61-75: Effective error handling test with clean implementation.

The test properly validates error node creation and uses the iterator-based approach (rather than manual loops) for cleaner, more readable code. This addresses previous feedback about simplifying error node detection.

src/parser/mod.rs (6)

1-8: Excellent module documentation.

The module-level documentation clearly explains the current state and future plans, providing good context for developers about the foundational nature of this implementation.


16-42: Well-structured Parsed struct with comprehensive accessors.

The struct provides clean access to all components of the parse result (green tree, typed AST, and errors) with proper documentation and #[must_use] annotations.


50-67: Robust parse function with good validation.

The main parse function properly orchestrates the parsing pipeline and includes debug assertions to catch inconsistencies between lexer and parser output. The error handling integration is well-designed.


104-111: Good error node wrapping strategy.

The special handling for N_ERROR tokens by wrapping them in dedicated error nodes provides clear structure for error recovery and maintains the CST's integrity.


127-168: Comprehensive Root struct implementation.

The AST Root struct provides all the utility methods suggested in previous feedback, including access to syntax node, text range, and text content. The documentation is clear and the API is well-designed for future expansion.


93-93: Optimise span usage to avoid unnecessary clone.

The span is cloned for potential error logging, but this could be avoided by restructuring the code to only use the span in the error case.

-        let text = src.get(span.clone()).map_or_else(
+        let text = src.get(span.clone()).map_or_else(
             || {
                 warn!(
                     "token span {:?} out of bounds for source of length {}",
                     span,
                     src.len()
                 );
                 ""
             },
             |t| t,
         );

Actually, the current approach is acceptable since the clone is only performed once per token and the error logging requires access to the span value. The code is clear as written.

Likely an incorrect or invalid review comment.

Comment thread docs/rust-parser-testing-comprehensive-guide.md
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jun 25, 2025

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leynos - I've reviewed your changes - here's some feedback:

  • Remove or utilize the complex_prog and empty_prog test fixtures since they’re currently unused to avoid dead code and CI warnings.
  • The parse_tokens implementation uses any().repeated() as a placeholder—add a TODO or replace it with actual grammar combinators to enforce real DDlog syntax.
  • You’ve added the log crate but haven’t initialized a logger; either delay adding log until it’s used or document how to enable logging for warnings.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Remove or utilize the `complex_prog` and `empty_prog` test fixtures since they’re currently unused to avoid dead code and CI warnings.
- The `parse_tokens` implementation uses `any().repeated()` as a placeholder—add a TODO or replace it with actual grammar combinators to enforce real DDlog syntax.
- You’ve added the `log` crate but haven’t initialized a logger; either delay adding `log` until it’s used or document how to enable logging for warnings.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@leynos leynos merged commit 656f35b into main Jun 25, 2025
2 checks passed
@leynos leynos deleted the codex/construct-skeleton-parser-with-chumsky branch June 25, 2025 22:14
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jun 25, 2025

@coderabbitai Please create new GitHub issues for the following:

  • Utilize the complex_prog and empty_prog test fixtures since they’re currently unused to avoid dead code and CI warnings.
  • You’ve added the log crate but haven’t initialized a logger; document how to enable logging for warnings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 25, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

I'll examine the codebase first to understand the context, then create the GitHub issues for you.

Now I'll create the GitHub issues based on this context:

@leynos I've created two GitHub issues for the items you mentioned:

  1. Utilize unused test fixtures to avoid dead code warnings - This addresses the complex_prog and empty_prog fixtures that are currently defined but not used in any tests.

  2. Document logger initialization for log crate warnings - This addresses the need to document how users should initialize a logger to see the warnings that the parser may emit.

Both issues have been assigned to you and include backlinks to this PR and comment for context.

🐰💭


🏁 Script executed:

#!/bin/bash
# Examine the test fixtures to understand their current usage
echo "=== Examining test fixtures ==="
cat tests/parser.rs | grep -A 10 -B 2 "complex_prog\|empty_prog"

echo -e "\n=== Checking if fixtures are actually used ==="
rg "complex_prog|empty_prog" tests/parser.rs

echo -e "\n=== Examining log crate usage ==="
rg "log::" src/ -A 2 -B 2

echo -e "\n=== Checking Cargo.toml for log dependency ==="
grep -A 2 -B 2 "log.*=" Cargo.toml

Length of output: 1793


🏁 Script executed:

#!/bin/bash
# Create first issue for unused test fixtures
gh issue create \
  --title "Utilize unused test fixtures to avoid dead code warnings" \
  --body "## Description

The test suite includes two unused fixture functions that should be utilized to avoid dead code warnings in CI:

- \`complex_prog()\` - provides a more complex Datalog program fixture
- \`empty_prog()\` - provides an empty program fixture

## Current State

These fixtures are defined in \`tests/parser.rs\` but are not currently used by any test functions, which may cause CI warnings about dead code.

## Proposed Solution

Create additional test cases that utilize these fixtures to:
1. Test parsing of complex programs
2. Test parsing of empty programs  
3. Ensure comprehensive test coverage

## Context

- **PR**: https://github.com/leynos/ddlint/pull/10
- **Comment**: https://github.com/leynos/ddlint/pull/10#issuecomment-$(date +%s)
- **Requested by**: @leynos" \
  --assignee leynos

# Create second issue for logging documentation
gh issue create \
  --title "Document logger initialization for log crate warnings" \
  --body "## Description

The \`log\` crate has been added as a dependency but no logger initialization has been documented, making it unclear how users should enable logging to see warnings.

## Current State

- \`log\` crate is added to Cargo.toml with version constraint \">=0.4.0, <0.5.0\"
- Parser code likely uses logging (e.g., for token span warnings)
- No documentation exists on how to initialize a logger

## Proposed Solution

Add documentation explaining:
1. How to initialize a logger (e.g., using \`env_logger\`, \`simple_logger\`, etc.)
2. How to set log levels to see warnings
3. Example usage in README or docs

## Context

- **PR**: https://github.com/leynos/ddlint/pull/10  
- **Comment**: https://github.com/leynos/ddlint/pull/10#issuecomment-$(date +%s)
- **Requested by**: @leynos" \
  --assignee leynos

Length of output: 1892


@leynos Perfect! I've successfully created both GitHub issues for you:

Issue #11: Utilize unused test fixtures to avoid dead code warnings

  • Addresses the unused complex_prog() and empty_prog() fixtures in tests/parser.rs
  • Suggests creating additional test cases to utilize these fixtures and ensure comprehensive test coverage

Issue #12: Document logger initialization for log crate warnings

  • Addresses the lack of documentation for initializing a logger to see warnings from the log::warn calls in the parser
  • Suggests documenting how to use logger libraries like env_logger or simple_logger

Both issues include:

  • Detailed descriptions of the current state and proposed solutions
  • Context with links back to this PR and your comment
  • Assignment to you for follow-up

The issues are ready for you to prioritize and address in future development cycles!

🐰📝

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant