Skip to content

Refactor CST construction into module#79

Merged
leynos merged 8 commits intomainfrom
codex/extract-parser-cst_builder-module
Jul 19, 2025
Merged

Refactor CST construction into module#79
leynos merged 8 commits intomainfrom
codex/extract-parser-cst_builder-module

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Jul 18, 2025

Summary

  • move Parsed and ParsedSpans into a new cst_builder module
  • expose a Parsed::new constructor
  • update parse to use the new module
  • add unit tests for cst_builder

Testing

  • make lint
  • make test

https://chatgpt.com/codex/tasks/task_e_687a941c05d08322bc54073e5cc31937

Summary by Sourcery

Refactor the concrete syntax tree (CST) construction by moving span validation and green tree building into a new cst_builder module, expose a Parsed::new constructor and a builder API for ParsedSpans, update the parser to use these abstractions, and add related unit tests.

New Features:

  • Add ParsedSpansBuilder for constructing validated span lists
  • Expose a public Parsed::new constructor

Enhancements:

  • Extract CST construction and span validation into a dedicated parser::cst_builder module
  • Update parse and span_scanner to leverage the new cst_builder API and remove inlined logic

Tests:

  • Add unit tests for ParsedSpansBuilder span validation and for round-trip CST tree building

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Jul 18, 2025

Reviewer's Guide

This PR refactors the concrete syntax tree (CST) construction by extracting parsing types and tree-building logic into a dedicated cst_builder module, introduces a builder-based API for managing statement spans, centralizes span iteration with reusable abstractions, simplifies the top-level parse function to delegate to the new module, and adds targeted unit tests for span validation and round-trip tree construction.

Sequence diagram for CST construction using the new cst_builder module

sequenceDiagram
    participant Parser
    participant cst_builder as CSTBuilder
    participant ParsedSpansBuilder
    participant ParsedSpans
    participant build_green_tree
    Parser->>ParsedSpansBuilder: builder().imports(...).typedefs(...).relations(...).indexes(...).functions(...).transformers(...).rules(...)
    ParsedSpansBuilder->>ParsedSpans: build()
    Parser->>build_green_tree: build_green_tree(tokens, src, &spans)
    build_green_tree-->>Parser: GreenNode
    Parser->>CSTBuilder: Parsed::new(green, root, errors)
    CSTBuilder-->>Parser: Parsed
Loading

Class diagram for refactored CST construction (Parsed, ParsedSpans, ParsedSpansBuilder)

classDiagram
    class Parsed {
        +green: GreenNode
        +root: ast::Root
        +errors: Vec<Simple<SyntaxKind>>
        +green() GreenNode
        +root() ast::Root
        +errors() Vec<Simple<SyntaxKind>>
        +new(green, root, errors)
    }
    class ParsedSpans {
        +imports: Vec<Span>
        +typedefs: Vec<Span>
        +relations: Vec<Span>
        +indexes: Vec<Span>
        +functions: Vec<Span>
        +transformers: Vec<Span>
        +rules: Vec<Span>
        +imports() &[Span]
        +typedefs() &[Span]
        +relations() &[Span]
        +indexes() &[Span]
        +functions() &[Span]
        +transformers() &[Span]
        +rules() &[Span]
        +builder() ParsedSpansBuilder
    }
    class ParsedSpansBuilder {
        +imports(Vec<Span>) ParsedSpansBuilder
        +typedefs(Vec<Span>) ParsedSpansBuilder
        +relations(Vec<Span>) ParsedSpansBuilder
        +indexes(Vec<Span>) ParsedSpansBuilder
        +functions(Vec<Span>) ParsedSpansBuilder
        +transformers(Vec<Span>) ParsedSpansBuilder
        +rules(Vec<Span>) ParsedSpansBuilder
        +build() ParsedSpans
    }
    ParsedSpansBuilder --|> ParsedSpans : builds
    Parsed --* ParsedSpans : uses
Loading

Class diagram for CST builder module structure

classDiagram
    class cst_builder {
        +Parsed
        +ParsedSpans
        +build_green_tree()
    }
    class spans {
        +ParsedSpans
        +ParsedSpansBuilder
    }
    class tree {
        +build_green_tree()
    }
    cst_builder o-- spans
    cst_builder o-- tree
    cst_builder ..> ParsedSpans : re-exports
    cst_builder ..> build_green_tree : re-exports
Loading

File-Level Changes

Change Details Files
Extract CST construction logic into a dedicated cst_builder module
  • Moved Parsed and ParsedSpans definitions and methods from parser/mod.rs into cst_builder/spans.rs and cst_builder/mod.rs
  • Relocated the build_green_tree implementation into cst_builder/tree.rs
  • Updated parser/mod.rs to re-export Parsed and ParsedSpans and import build_green_tree
src/parser/mod.rs
src/parser/cst_builder/mod.rs
src/parser/cst_builder/spans.rs
src/parser/cst_builder/tree.rs
Introduce ParsedSpansBuilder and builder-based API for span lists
  • Added ParsedSpansBuilder in spans.rs with fluent setter methods and debug assertions
  • Replaced ParsedSpans::new constructor calls with ParsedSpans::builder().… .build()
  • Updated span_scanner to use the new builder methods instead of direct construction
src/parser/cst_builder/spans.rs
src/parser/span_scanner.rs
Unify span iteration in tree building via SpanCursor abstractions
  • Created SpanCursor and SpanCursors types in tree.rs to manage span iterators
  • Removed legacy helper functions (advance_span_iter, maybe_start, finish_nodes, etc.)
  • Updated build_green_tree to leverage SpanCursors for starting and finishing nodes
src/parser/cst_builder/tree.rs
Simplify top-level parse to delegate to cst_builder and re-export types
  • Eliminated inline CST construction code from parser/mod.rs
  • Modified parse() to call cst_builder::build_green_tree and wrap results with Parsed::new
  • Cleaned up imports in parser/mod.rs and exposed cst_builder types
src/parser/mod.rs
Add and relocate unit tests for span validation and CST building
  • Removed outdated span-validation tests from parser/mod.rs and introduced new tests in cst_builder/spans.rs
  • Added a round-trip CST construction test in cst_builder/tree.rs
  • Verified builder panics and error reporting in debug builds
src/parser/mod.rs
src/parser/cst_builder/spans.rs
src/parser/cst_builder/tree.rs

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 18, 2025

Summary by CodeRabbit

  • Refactor
    • Restructured the parser module by moving concrete syntax tree (CST) construction and span management into a dedicated submodule, resulting in a cleaner and more maintainable codebase.
  • New Features
    • Introduced builder-pattern APIs for managing parsed span categories, with improved validation for span ordering and overlap (active in debug mode).
  • Bug Fixes
    • Enhanced debug-time validation to catch and report overlapping or unsorted statement spans.
  • Tests
    • Added tests to ensure correct span validation and CST construction behaviour.

Walkthrough

Refactor the parser module by extracting all concrete syntax tree (CST) construction and parsed span management logic into a new cst_builder submodule. Replace inline struct definitions and methods with re-exports from this submodule. Update the parser to use builder patterns and new validation for span storage.

Changes

File(s) Change Summary
src/parser/mod.rs Remove direct CST and span logic; re-export Parsed, ParsedSpans, and build_green_tree from cst_builder. Update parse to use new APIs. Simplify imports.
src/parser/span_scanner.rs Switch from direct ParsedSpans::new construction to builder pattern for creating ParsedSpans.
src/parser/cst_builder/mod.rs Add new module defining Parsed, its constructor, and accessors. Re-export ParsedSpans and build_green_tree.
src/parser/cst_builder/spans.rs Introduce ParsedSpans struct with builder pattern, validation logic, and accessor methods. Add unit tests for span validation.
src/parser/cst_builder/tree.rs Add green tree construction logic using span cursors and token processing. Include tests for tree building.

Sequence Diagram(s)

sequenceDiagram
    participant Parser
    participant SpanScanner
    participant CSTBuilder

    Parser->>SpanScanner: parse_tokens(tokens)
    SpanScanner-->>Parser: ParsedSpans (via builder)
    Parser->>CSTBuilder: build_green_tree(tokens, src, ParsedSpans)
    CSTBuilder-->>Parser: GreenNode
    Parser->>CSTBuilder: Parsed::new(GreenNode, AST root, errors)
    CSTBuilder-->>Parser: Parsed
Loading

Possibly related PRs

  • Refactor parser span handling #48: Initiated span grouping and ParsedSpans struct, which this PR modularises further by extracting CST building and span management into cst_builder.
  • Encapsulate ParsedSpans #64: Encapsulated ParsedSpans with private fields, accessors, and sortedness validation; directly related to the builder pattern and validation logic introduced here.
  • Fix build_green_tree token ownership #63: Modified the build_green_tree function signature to accept token slices, aligning with the refactor of CST construction in this PR.

Poem

In the parser’s heart, a shuffle anew,
CSTs and spans in a submodule grew.
Builders now shape, with validation keen,
Ensuring all spans are sorted and clean.
The code walks lighter, the logic more neat—
A refactor complete, and a parser replete!
🦀🌳


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 098e012 and c25ab5b.

📒 Files selected for processing (1)
  • src/parser/cst_builder/tree.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

Instructions used from:

Sources:
⚙️ CodeRabbit Configuration File

🔇 Additional comments (6)
src/parser/cst_builder/tree.rs (6)

1-8: Module documentation follows coding guidelines correctly.

The module begins with the required //! doc comment explaining its purpose. The imports are appropriately organised and necessary for the module's functionality.


10-11: Named constant improves maintainability.

Good use of a named constant instead of a magic number, addressing the previous review feedback appropriately.


46-77: SpanCursors coordination logic is correct.

The SpanCursors struct properly aggregates the individual cursors and provides coordinated methods for advancing and managing node lifecycle. The array size uses the named constant appropriately.


117-136: Token handling logic is well-structured.

The push_token function appropriately handles out-of-bounds spans with proper logging and delegates error token wrapping to a separate function, addressing previous review feedback about extracting error token logic.


138-142: Error token wrapping function improves readability.

The extracted push_error_wrapped function cleanly handles the special case of error tokens, improving the overall readability of the token handling logic.


144-160: Test coverage validates core functionality.

The round-trip test appropriately validates that CST construction preserves the original source text. The test follows good practices by using real tokenisation and span parsing rather than mocked data.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/extract-parser-cst_builder-module

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Health Improved (1 files improve in Code Health)

Gates Failed
Enforce advisory code health rules (1 file with Code Duplication, Excess Number of Function Arguments)

Gates Passed
5 Quality Gates Passed

See analysis details in CodeScene

Reason for failure
Enforce advisory code health rules Violations Code Health Impact
cst_builder.rs 3 advisory rules 10.00 → 9.10 Suppress
View Improvements
File Code Health Impact Categories Improved
mod.rs 5.59 → 7.30 Low Cohesion, Code Duplication, Excess Number of Function Arguments, Code Duplication

Quality Gate Profile: Pay Down Tech Debt
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.

Comment thread src/parser/cst_builder.rs Outdated
Comment thread src/parser/cst_builder.rs Outdated
Comment thread src/parser/cst_builder.rs Outdated
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leynos - I've reviewed your changes - here's some feedback:

  • Consider replacing the cfg!(debug_assertions) + ensure_span_lists_sorted/assert pattern with a single debug_assert! in ParsedSpans::new to simplify the debug‐only span validation.
  • The start_nodes and finish_nodes helpers share very similar iteration logic—consider unifying them or extracting a small iterator struct to reduce duplication in build_green_tree.
  • The new cst_builder module is quite large; splitting span validation, token pushing, and CST construction into smaller submodules or files could improve readability and maintainability.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider replacing the cfg!(debug_assertions) + ensure_span_lists_sorted/assert pattern with a single debug_assert! in ParsedSpans::new to simplify the debug‐only span validation.
- The start_nodes and finish_nodes helpers share very similar iteration logic—consider unifying them or extracting a small iterator struct to reduce duplication in build_green_tree.
- The new cst_builder module is quite large; splitting span validation, token pushing, and CST construction into smaller submodules or files could improve readability and maintainability.

## Individual Comments

### Comment 1
<location> `src/parser/cst_builder.rs:314` </location>
<code_context>
+    assert!(errors.is_empty(), "{}", errors.join("\n"));
+}
+
+fn push_token(builder: &mut GreenNodeBuilder, kind: SyntaxKind, span: &Span, src: &str) {
+    let text = src.get(span.clone()).map_or_else(
+        || {
</code_context>

<issue_to_address>
The `push_token` function clones the span for `str::get`, which may be unnecessary if `Span` is already `Copy`.

If `Span` implements `Copy`, pass it by value instead of cloning to avoid unnecessary allocation. If not, consider whether `Span` should be `Copy`.

Suggested implementation:

```rust
fn push_token(builder: &mut GreenNodeBuilder, kind: SyntaxKind, span: Span, src: &str) {
    let text = src.get(span).map_or_else(
        || {
            warn!(
                "token span {:?} out of bounds for source of length {}",
                span,
                src.len()
            );
            ""
        },
        |t| t,
    );

```

- Ensure that `Span` implements `Copy`. If it does not, derive or implement `Copy` and `Clone` for `Span` if appropriate.
- Update all call sites of `push_token` in your codebase to pass `span` by value instead of by reference.
</issue_to_address>

### Comment 2
<location> `src/parser/cst_builder.rs:163` </location>
<code_context>
+/// Span lists must be sorted and non-overlapping so that tokens are wrapped
+/// into well-formed nodes. Validation occurs in debug builds when
+/// [`ParsedSpans`] is created.
+pub(super) fn build_green_tree(
+    tokens: &[(SyntaxKind, Span)],
+    src: &str,
</code_context>

<issue_to_address>
Consider refactoring the span iterator logic into a single SpanCursor type and two compact loops to eliminate duplicated code and simplify build_green_tree.

Here’s one way to collapse all seven iterators into a single, homogeneous “cursor” type and turn your three almost-identical loops into just two small `for` loops.  Once you have that, you can delete `advance_span_iter`, `maybe_start`, `maybe_finish`, `start_nodes`, and `finish_nodes` entirely.

```rust
use std::iter::Peekable;
use std::slice::Iter;

// 1) A single SpanCursor that knows its SyntaxKind
struct SpanCursor<'a> {
    iter: Peekable<Iter<'a, Span>>,
    kind: SyntaxKind,
}

impl<'a> SpanCursor<'a> {
    fn new(spans: &'a [Span], kind: SyntaxKind) -> Self {
        Self {
            iter: spans.iter().peekable(),
            kind,
        }
    }

    // advance past any spans that end <= pos
    fn advance_to(&mut self, pos: usize) {
        while matches!(self.iter.peek(), Some(s) if pos >= s.end) {
            self.iter.next();
        }
    }

    // if the next span starts exactly at pos, open a node
    fn start_if(&mut self, builder: &mut GreenNodeBuilder, pos: usize) {
        if matches!(self.iter.peek(), Some(s) if pos == s.start) {
            builder.start_node(DdlogLanguage::kind_to_raw(self.kind));
        }
    }

    // if the next span ends at or before pos, close the node
    fn finish_if(&mut self, builder: &mut GreenNodeBuilder, pos: usize) {
        if matches!(self.iter.peek(), Some(s) if pos >= s.end) {
            builder.finish_node();
            self.iter.next();
        }
    }
}
```

Then your `build_green_tree` collapses down to:

```rust
pub(super) fn build_green_tree(
    tokens: &[(SyntaxKind, Span)],
    src: &str,
    spans: &ParsedSpans,
) -> GreenNode {
    let mut builder = GreenNodeBuilder::new();
    builder.start_node(DdlogLanguage::kind_to_raw(SyntaxKind::N_DATALOG_PROGRAM));

    // 2) build an array of cursors instead of seven separate iters
    let mut cursors = [
        SpanCursor::new(spans.imports(),    SyntaxKind::N_IMPORT_STMT),
        SpanCursor::new(spans.typedefs(),   SyntaxKind::N_TYPE_DEF),
        SpanCursor::new(spans.relations(),  SyntaxKind::N_RELATION_DECL),
        SpanCursor::new(spans.indexes(),    SyntaxKind::N_INDEX),
        SpanCursor::new(spans.functions(),  SyntaxKind::N_FUNCTION),
        SpanCursor::new(spans.transformers(), SyntaxKind::N_TRANSFORMER),
        SpanCursor::new(spans.rules(),      SyntaxKind::N_RULE),
    ];

    for &(kind, ref span) in tokens {
        // advance & possibly start all spanning nodes
        for cur in &mut cursors {
            cur.advance_to(span.start);
            cur.start_if(&mut builder, span.start);
        }

        push_token(&mut builder, kind, span, src);

        // possibly finish all spanning nodes
        for cur in &mut cursors {
            cur.finish_if(&mut builder, span.end);
        }
    }

    builder.finish_node();
    builder.finish()
}
```

This:

- Eliminates all the near-duplication in advance/start/finish helpers
- Reduces seven mutable locals down to one compact array
- Keeps exactly the same order & semantics but with a single inner loop per phase
- Deletes the now-unused `start_nodes`/`finish_nodes`/`advance_span_iter`/etc.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/parser/cst_builder.rs Outdated
assert!(errors.is_empty(), "{}", errors.join("\n"));
}

fn push_token(builder: &mut GreenNodeBuilder, kind: SyntaxKind, span: &Span, src: &str) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The push_token function clones the span for str::get, which may be unnecessary if Span is already Copy.

If Span implements Copy, pass it by value instead of cloning to avoid unnecessary allocation. If not, consider whether Span should be Copy.

Suggested implementation:

fn push_token(builder: &mut GreenNodeBuilder, kind: SyntaxKind, span: Span, src: &str) {
    let text = src.get(span).map_or_else(
        || {
            warn!(
                "token span {:?} out of bounds for source of length {}",
                span,
                src.len()
            );
            ""
        },
        |t| t,
    );
  • Ensure that Span implements Copy. If it does not, derive or implement Copy and Clone for Span if appropriate.
  • Update all call sites of push_token in your codebase to pass span by value instead of by reference.

Comment thread src/parser/cst_builder.rs Outdated
/// Span lists must be sorted and non-overlapping so that tokens are wrapped
/// into well-formed nodes. Validation occurs in debug builds when
/// [`ParsedSpans`] is created.
pub(super) fn build_green_tree(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider refactoring the span iterator logic into a single SpanCursor type and two compact loops to eliminate duplicated code and simplify build_green_tree.

Here’s one way to collapse all seven iterators into a single, homogeneous “cursor” type and turn your three almost-identical loops into just two small for loops. Once you have that, you can delete advance_span_iter, maybe_start, maybe_finish, start_nodes, and finish_nodes entirely.

use std::iter::Peekable;
use std::slice::Iter;

// 1) A single SpanCursor that knows its SyntaxKind
struct SpanCursor<'a> {
    iter: Peekable<Iter<'a, Span>>,
    kind: SyntaxKind,
}

impl<'a> SpanCursor<'a> {
    fn new(spans: &'a [Span], kind: SyntaxKind) -> Self {
        Self {
            iter: spans.iter().peekable(),
            kind,
        }
    }

    // advance past any spans that end <= pos
    fn advance_to(&mut self, pos: usize) {
        while matches!(self.iter.peek(), Some(s) if pos >= s.end) {
            self.iter.next();
        }
    }

    // if the next span starts exactly at pos, open a node
    fn start_if(&mut self, builder: &mut GreenNodeBuilder, pos: usize) {
        if matches!(self.iter.peek(), Some(s) if pos == s.start) {
            builder.start_node(DdlogLanguage::kind_to_raw(self.kind));
        }
    }

    // if the next span ends at or before pos, close the node
    fn finish_if(&mut self, builder: &mut GreenNodeBuilder, pos: usize) {
        if matches!(self.iter.peek(), Some(s) if pos >= s.end) {
            builder.finish_node();
            self.iter.next();
        }
    }
}

Then your build_green_tree collapses down to:

pub(super) fn build_green_tree(
    tokens: &[(SyntaxKind, Span)],
    src: &str,
    spans: &ParsedSpans,
) -> GreenNode {
    let mut builder = GreenNodeBuilder::new();
    builder.start_node(DdlogLanguage::kind_to_raw(SyntaxKind::N_DATALOG_PROGRAM));

    // 2) build an array of cursors instead of seven separate iters
    let mut cursors = [
        SpanCursor::new(spans.imports(),    SyntaxKind::N_IMPORT_STMT),
        SpanCursor::new(spans.typedefs(),   SyntaxKind::N_TYPE_DEF),
        SpanCursor::new(spans.relations(),  SyntaxKind::N_RELATION_DECL),
        SpanCursor::new(spans.indexes(),    SyntaxKind::N_INDEX),
        SpanCursor::new(spans.functions(),  SyntaxKind::N_FUNCTION),
        SpanCursor::new(spans.transformers(), SyntaxKind::N_TRANSFORMER),
        SpanCursor::new(spans.rules(),      SyntaxKind::N_RULE),
    ];

    for &(kind, ref span) in tokens {
        // advance & possibly start all spanning nodes
        for cur in &mut cursors {
            cur.advance_to(span.start);
            cur.start_if(&mut builder, span.start);
        }

        push_token(&mut builder, kind, span, src);

        // possibly finish all spanning nodes
        for cur in &mut cursors {
            cur.finish_if(&mut builder, span.end);
        }
    }

    builder.finish_node();
    builder.finish()
}

This:

  • Eliminates all the near-duplication in advance/start/finish helpers
  • Reduces seven mutable locals down to one compact array
  • Keeps exactly the same order & semantics but with a single inner loop per phase
  • Deletes the now-unused start_nodes/finish_nodes/advance_span_iter/etc.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21747b9 and 3222f50.

📒 Files selected for processing (2)
  • src/parser/cst_builder.rs (1 hunks)
  • src/parser/mod.rs (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • AGENTS.md

⚙️ CodeRabbit Configuration File

🧬 Code Graph Analysis (2)
src/parser/mod.rs (4)
src/tokenizer.rs (1)
  • tokenize (201-257)
src/parser/cst_builder.rs (6)
  • build_green_tree (163-221)
  • new (22-32)
  • new (83-113)
  • green (36-38)
  • root (42-44)
  • errors (48-50)
src/parser/token_stream.rs (1)
  • new (41-47)
src/parser/span_collector.rs (1)
  • new (44-50)
src/parser/cst_builder.rs (4)
src/parser/mod.rs (9)
  • imports (148-154)
  • relations (186-192)
  • indexes (205-211)
  • functions (224-230)
  • transformers (243-249)
  • rules (262-268)
  • kind (121-123)
  • text (133-135)
  • from_green (113-117)
src/language.rs (1)
  • kind_to_raw (185-188)
src/tokenizer.rs (1)
  • tokenize (201-257)
src/parser/span_scanner.rs (1)
  • parse_tokens (19-50)
🔇 Additional comments (11)
src/parser/mod.rs (4)

9-9: Clean import list after refactoring.

The simplified import list appropriately reflects the module's reduced responsibilities after extracting CST construction logic.


20-22: Well-structured module organisation.

The CST construction logic is properly extracted into a dedicated submodule, with clear separation of concerns. The public re-exports maintain the existing API surface.


47-47: Appropriate use of constructor pattern.

Using Parsed::new constructor improves encapsulation compared to direct struct literal construction.


1022-1022: Test imports correctly updated.

The test imports properly reflect the module's dependencies after the refactoring.

src/parser/cst_builder.rs (7)

1-5: Module documentation meets requirements.

The module-level documentation clearly explains the module's purpose and utility as required.


13-51: Well-designed Parsed struct with proper encapsulation.

The struct provides a clean API with appropriate constructor and accessor methods. Good use of #[must_use] attributes.


223-271: Well-structured helper functions for span management.

The helper functions provide clear separation of concerns and handle edge cases appropriately.


273-312: Robust span validation with comprehensive error reporting.

The validation logic properly checks for overlaps and sorting issues, collecting all errors before reporting. Good implementation of the custom error type.


314-335: Robust token insertion with proper error handling.

Good defensive programming with bounds checking and appropriate warning for invalid spans.


337-442: Comprehensive test coverage with good edge case handling.

The tests thoroughly cover validation logic, error reporting, and round-trip CST construction. Good use of rstest for the round-trip test.


76-113: Remove thread-safety concern

Panic in ensure_span_lists_sorted occurs only on the calling thread. All inspected calls to ParsedSpans::new (in src/parser/span_scanner.rs and src/parser/cst_builder.rs) execute in a single-threaded parsing context.

Comment thread src/parser/cst_builder.rs Outdated
Comment on lines +163 to +221
pub(super) fn build_green_tree(
tokens: &[(SyntaxKind, Span)],
src: &str,
spans: &ParsedSpans,
) -> GreenNode {
let mut builder = GreenNodeBuilder::new();
builder.start_node(DdlogLanguage::kind_to_raw(SyntaxKind::N_DATALOG_PROGRAM));

let mut import_iter = spans.imports().iter().peekable();
let mut typedef_iter = spans.typedefs().iter().peekable();
let mut relation_iter = spans.relations().iter().peekable();
let mut index_iter = spans.indexes().iter().peekable();
let mut function_iter = spans.functions().iter().peekable();
let mut transformer_iter = spans.transformers().iter().peekable();
let mut rule_iter = spans.rules().iter().peekable();

for &(kind, ref span) in tokens {
advance_span_iter(&mut import_iter, span.start);
advance_span_iter(&mut typedef_iter, span.start);
advance_span_iter(&mut relation_iter, span.start);
advance_span_iter(&mut index_iter, span.start);
advance_span_iter(&mut function_iter, span.start);
advance_span_iter(&mut transformer_iter, span.start);
advance_span_iter(&mut rule_iter, span.start);

start_nodes(
&mut builder,
&mut [
(&mut import_iter, SyntaxKind::N_IMPORT_STMT),
(&mut typedef_iter, SyntaxKind::N_TYPE_DEF),
(&mut relation_iter, SyntaxKind::N_RELATION_DECL),
(&mut index_iter, SyntaxKind::N_INDEX),
(&mut function_iter, SyntaxKind::N_FUNCTION),
(&mut transformer_iter, SyntaxKind::N_TRANSFORMER),
(&mut rule_iter, SyntaxKind::N_RULE),
],
span.start,
);

push_token(&mut builder, kind, span, src);

finish_nodes(
&mut builder,
&mut [
&mut import_iter,
&mut typedef_iter,
&mut relation_iter,
&mut index_iter,
&mut function_iter,
&mut transformer_iter,
&mut rule_iter,
],
span.end,
);
}

builder.finish_node();
builder.finish()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Extract span iterator management to reduce complexity.

The function manages 7 different span iterators with repeated patterns. Extract the iterator creation and advancement into helper methods to improve maintainability.

Consider creating a struct to encapsulate all span iterators:

+struct SpanIterators<'a> {
+    import: SpanIter<'a>,
+    typedef: SpanIter<'a>,
+    relation: SpanIter<'a>,
+    index: SpanIter<'a>,
+    function: SpanIter<'a>,
+    transformer: SpanIter<'a>,
+    rule: SpanIter<'a>,
+}
+
+impl<'a> SpanIterators<'a> {
+    fn new(spans: &'a ParsedSpans) -> Self {
+        Self {
+            import: spans.imports().iter().peekable(),
+            typedef: spans.typedefs().iter().peekable(),
+            relation: spans.relations().iter().peekable(),
+            index: spans.indexes().iter().peekable(),
+            function: spans.functions().iter().peekable(),
+            transformer: spans.transformers().iter().peekable(),
+            rule: spans.rules().iter().peekable(),
+        }
+    }
+
+    fn advance_all(&mut self, pos: usize) {
+        advance_span_iter(&mut self.import, pos);
+        advance_span_iter(&mut self.typedef, pos);
+        advance_span_iter(&mut self.relation, pos);
+        advance_span_iter(&mut self.index, pos);
+        advance_span_iter(&mut self.function, pos);
+        advance_span_iter(&mut self.transformer, pos);
+        advance_span_iter(&mut self.rule, pos);
+    }
+}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/parser/cst_builder.rs between lines 163 and 221, the build_green_tree
function manages seven different span iterators with repetitive creation and
advancement code. To reduce complexity and improve maintainability, create a
helper struct that encapsulates all these span iterators and provides methods to
advance them collectively. Refactor the function to initialize this struct once
and use its methods to advance and manage the iterators instead of handling each
iterator separately.

Comment thread src/parser/cst_builder.rs Outdated
Comment thread src/parser/cst_builder.rs Outdated
Comment thread src/parser/cst_builder.rs Outdated
Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Health Improved (1 files improve in Code Health)

Gates Failed
Enforce advisory code health rules (1 file with Excess Number of Function Arguments)

Gates Passed
5 Quality Gates Passed

See analysis details in CodeScene

Reason for failure
Enforce advisory code health rules Violations Code Health Impact
cst_builder.rs 1 advisory rule 10.00 → 9.69 Suppress
View Improvements
File Code Health Impact Categories Improved
mod.rs 5.59 → 7.30 Low Cohesion, Code Duplication, Excess Number of Function Arguments, Code Duplication

Quality Gate Profile: Pay Down Tech Debt
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.

Comment thread src/parser/cst_builder.rs Outdated
Comment on lines +169 to +199
fn new(
imports: Vec<Span>,
typedefs: Vec<Span>,
relations: Vec<Span>,
indexes: Vec<Span>,
functions: Vec<Span>,
transformers: Vec<Span>,
rules: Vec<Span>,
) -> Self {
if cfg!(debug_assertions) {
ensure_span_lists_sorted(&[
("imports", &imports),
("typedefs", &typedefs),
("relations", &relations),
("indexes", &indexes),
("functions", &functions),
("transformers", &transformers),
("rules", &rules),
]);
}

Self {
imports,
typedefs,
relations,
indexes,
functions,
transformers,
rules,
}
}
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leynos, the issue is valid - 7 arguments is excessive for any function. Here's the fix:

Suggested Fix:
Refactor ParsedSpans::new() to accept the ParsedSpansBuilder instead of individual parameters, reducing arguments from 7 to 1.

AI Coding Agent Prompt:

Refactor the ParsedSpans::new() function in src/parser/cst_builder.rs to reduce the number of parameters from 7 to 1.

Current issue: ParsedSpans::new() takes 7 individual Vec<Span> parameters which exceeds the 4-parameter threshold.

Required changes:
1. Modify ParsedSpans::new() to accept a single ParsedSpansBuilder parameter instead of 7 individual Vec<Span> parameters
2. Update the function signature from:
   fn new(imports: Vec<Span>, typedefs: Vec<Span>, relations: Vec<Span>, indexes: Vec<Span>, functions: Vec<Span>, transformers: Vec<Span>, rules: Vec<Span>) -> Self
   to:
   fn new(builder: ParsedSpansBuilder) -> Self
3. Update the function body to extract the individual vectors from the builder parameter
4. Update the call site in ParsedSpansBuilder::build() to pass self instead of individual fields
5. Ensure the span validation logic remains unchanged
6. Maintain all existing functionality and public API

The builder pattern is already the public interface, so this change only affects internal implementation without breaking existing code.

This fix reduces complexity while maintaining the existing public API and all validation functionality.

Comment thread src/parser/cst_builder.rs Outdated
Comment on lines +169 to +199
fn new(
imports: Vec<Span>,
typedefs: Vec<Span>,
relations: Vec<Span>,
indexes: Vec<Span>,
functions: Vec<Span>,
transformers: Vec<Span>,
rules: Vec<Span>,
) -> Self {
if cfg!(debug_assertions) {
ensure_span_lists_sorted(&[
("imports", &imports),
("typedefs", &typedefs),
("relations", &relations),
("indexes", &indexes),
("functions", &functions),
("transformers", &transformers),
("rules", &rules),
]);
}

Self {
imports,
typedefs,
relations,
indexes,
functions,
transformers,
rules,
}
}
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix.

Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Health Improved (1 files improve in Code Health)

Gates Failed
Enforce advisory code health rules (1 file with Excess Number of Function Arguments)

Gates Passed
5 Quality Gates Passed

See analysis details in CodeScene

Reason for failure
Enforce advisory code health rules Violations Code Health Impact
spans.rs 1 advisory rule 10.00 → 9.69 Suppress
View Improvements
File Code Health Impact Categories Improved
mod.rs 5.59 → 7.30 Low Cohesion, Code Duplication, Excess Number of Function Arguments, Code Duplication

Quality Gate Profile: Pay Down Tech Debt
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.

Comment thread src/parser/cst_builder/spans.rs Outdated
Comment thread src/parser/cst_builder/spans.rs Outdated
Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Health Improved (1 files improve in Code Health)

Gates Passed
6 Quality Gates Passed

See analysis details in CodeScene

View Improvements
File Code Health Impact Categories Improved
mod.rs 5.59 → 7.30 Low Cohesion, Code Duplication, Excess Number of Function Arguments, Code Duplication

Quality Gate Profile: Pay Down Tech Debt
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
src/parser/cst_builder/spans.rs (1)

104-135: Eliminate function with excessive arguments.

The ParsedSpans::new function accepts 7 parameters via the builder destructuring, exceeding the threshold of 4. Move the validation logic directly into ParsedSpansBuilder::build and remove this function entirely.

 impl ParsedSpansBuilder {
     /// Build the [`ParsedSpans`].
     #[must_use]
     pub fn build(self) -> ParsedSpans {
-        ParsedSpans::new(self)
+        let result = validate_span_lists_sorted(&[
+            ("imports", &self.imports),
+            ("typedefs", &self.typedefs),
+            ("relations", &self.relations),
+            ("indexes", &self.indexes),
+            ("functions", &self.functions),
+            ("transformers", &self.transformers),
+            ("rules", &self.rules),
+        ]);
+        debug_assert!(result.is_ok(), "{}", result.err().unwrap_or_default());
+
+        ParsedSpans {
+            imports: self.imports,
+            typedefs: self.typedefs,
+            relations: self.relations,
+            indexes: self.indexes,
+            functions: self.functions,
+            transformers: self.transformers,
+            rules: self.rules,
+        }
     }
 }

-impl ParsedSpans {
-    pub(super) fn new(builder: ParsedSpansBuilder) -> Self {
-        // Remove this entire function
-    }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4307277 and b80cc3b.

📒 Files selected for processing (2)
  • src/parser/cst_builder/spans.rs (1 hunks)
  • src/parser/cst_builder/tree.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

Instructions used from:

Sources:
📄 CodeRabbit Inference Engine

  • AGENTS.md

⚙️ CodeRabbit Configuration File

🔇 Additional comments (9)
src/parser/cst_builder/tree.rs (3)

1-8: Module documentation and imports look good.

The module doc comment clearly explains the purpose, and imports are well-organised.


10-41: SpanCursor implementation is well-structured.

The struct encapsulates span iteration logic cleanly with appropriate methods for advancing and managing syntax node boundaries.


122-138: Test coverage is adequate.

The round-trip test effectively validates the core functionality.

src/parser/cst_builder/spans.rs (6)

1-3: Module documentation is clear and concise.

The purpose is well-explained in the opening comment.


5-26: Struct design follows best practices.

The use of #[non_exhaustive] and private fields with builder pattern is appropriate for API evolution.


28-95: Builder pattern implementation is well-executed.

All methods are properly marked with #[must_use] and follow the fluent interface pattern correctly.


137-178: Accessor methods are properly implemented.

All getters return slices correctly and are marked with #[must_use].


180-223: Validation logic is comprehensive and well-structured.

The error types implement appropriate traits, and the validation functions handle edge cases correctly.


225-295: Test coverage is thorough and well-designed.

Tests cover all validation scenarios including edge cases, error conditions, and panic behaviour. The assert_panic_with_message helper is particularly well-implemented.

Comment thread src/parser/cst_builder/tree.rs
Comment thread src/parser/cst_builder/tree.rs
Comment thread src/parser/cst_builder/tree.rs
Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Health Improved (1 files improve in Code Health)

Gates Passed
6 Quality Gates Passed

See analysis details in CodeScene

View Improvements
File Code Health Impact Categories Improved
mod.rs 5.59 → 7.30 Low Cohesion, Code Duplication, Excess Number of Function Arguments, Code Duplication

Quality Gate Profile: Pay Down Tech Debt
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jul 18, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 18, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b80cc3b and 566c8d7.

📒 Files selected for processing (1)
  • src/parser/cst_builder/spans.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

Instructions used from:

Sources:
⚙️ CodeRabbit Configuration File

🔇 Additional comments (5)
src/parser/cst_builder/spans.rs (5)

5-26: Well-designed struct with appropriate attributes.

The struct design is excellent with proper use of #[non_exhaustive] for API stability and clear documentation directing users to the builder pattern.


40-124: Excellent builder pattern implementation.

The builder follows consistent patterns with proper use of #[must_use] attributes and clean destructuring in the build method. The debug-only validation strikes the right balance between correctness and performance.


126-176: Clean accessor implementation with appropriate constructor removal.

The accessor methods are well-designed with consistent patterns and proper documentation. The comment noting constructor removal aligns with the previous review feedback about eliminating excessive parameters.


178-221: Robust validation logic with clear error reporting.

The validation functions correctly detect overlapping and unsorted spans. The use of windows(2) with array destructuring is elegant, and the error collection provides comprehensive feedback for debugging.


223-293: Comprehensive test coverage with excellent edge case handling.

The test suite thoroughly covers all validation scenarios including edge cases, panic behaviour, and multiple error reporting. The assert_panic_with_message helper is well-crafted for testing debug assertions.

Comment thread src/parser/cst_builder/spans.rs Outdated
Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Health Improved (1 files improve in Code Health)

Gates Passed
6 Quality Gates Passed

See analysis details in CodeScene

View Improvements
File Code Health Impact Categories Improved
mod.rs 5.59 → 7.30 Low Cohesion, Code Duplication, Excess Number of Function Arguments, Code Duplication

Quality Gate Profile: Pay Down Tech Debt
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.

Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Health Improved (1 files improve in Code Health)

Gates Passed
6 Quality Gates Passed

See analysis details in CodeScene

View Improvements
File Code Health Impact Categories Improved
mod.rs 5.59 → 7.30 Low Cohesion, Code Duplication, Excess Number of Function Arguments, Code Duplication

Quality Gate Profile: Pay Down Tech Debt
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jul 19, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 19, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions
Copy link
Copy Markdown

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 19, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Health Improved (1 files improve in Code Health)

Gates Passed
6 Quality Gates Passed

See analysis details in CodeScene

View Improvements
File Code Health Impact Categories Improved
mod.rs 5.59 → 7.30 Low Cohesion, Code Duplication, Excess Number of Function Arguments, Code Duplication

Quality Gate Profile: Pay Down Tech Debt
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jul 19, 2025

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leynos - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `src/parser/cst_builder/spans.rs:98` </location>
<code_context>
+
+    /// Build the [`ParsedSpans`].
+    #[must_use]
+    pub fn build(self) -> ParsedSpans {
+        let Self {
+            imports,
</code_context>

<issue_to_address>
Debug assertion for span validation may be insufficient for release builds.

Since validation is only performed in debug builds, invalid spans could go undetected in release builds, potentially causing subtle CST bugs. Consider enforcing these checks in all builds if correctness is critical.
</issue_to_address>

### Comment 2
<location> `src/parser/cst_builder/tree.rs:38` </location>
<code_context>
+        }
+    }
+
+    fn finish_if(&mut self, builder: &mut GreenNodeBuilder, pos: usize) {
+        if matches!(self.iter.peek(), Some(s) if pos >= s.end) {
+            builder.finish_node();
</code_context>

<issue_to_address>
finish_if may call finish_node multiple times for the same position.

Verify that handling multiple spans ending at the same position does not cause panics or corrupt the tree structure.
</issue_to_address>

### Comment 3
<location> `src/parser/cst_builder/spans.rs:36` </location>
<code_context>
+
+/// Builder for [`ParsedSpans`].
+#[derive(Default)]
+pub struct ParsedSpansBuilder {
+    imports: Vec<Span>,
+    typedefs: Vec<Span>,
</code_context>

<issue_to_address>
Consider replacing the builder and its repetitive setters with a single try_new constructor that handles sorting, validation, and construction in one place.

You can collapse those seven nearly-identical setters + a builder into one `try_new` constructor that inlines sorting & validation—and delete the whole boilerplate builder. For example:

```rust
impl ParsedSpans {
    pub fn try_new(
        mut imports: Vec<Span>,
        mut typedefs: Vec<Span>,
        mut relations: Vec<Span>,
        mut indexes: Vec<Span>,
        mut functions: Vec<Span>,
        mut transformers: Vec<Span>,
        mut rules: Vec<Span>,
    ) -> Result<Self, String> {
        let mut errors = Vec::new();
        for (name, spans) in &mut [
            ("imports", &mut imports),
            ("typedefs", &mut typedefs),
            ("relations", &mut relations),
            ("indexes", &mut indexes),
            ("functions", &mut functions),
            ("transformers", &mut transformers),
            ("rules", &mut rules),
        ] {
            spans.sort_unstable_by_key(|s| s.start);
            if let Err(e) = validate_spans_sorted(spans) {
                errors.push(format!("{name} not sorted: {e}"));
            }
        }
        if !errors.is_empty() {
            return Err(errors.join("\n"));
        }
        Ok(ParsedSpans { imports, typedefs, relations, indexes, functions, transformers, rules })
    }
}
```

Then:

- Remove `ParsedSpansBuilder` and all its setters.
- Change call sites from  
  `ParsedSpans::builder().imports(...).typedefs(...).build()`  
  to  
  `ParsedSpans::try_new(imports, typedefs, relations, indexes, functions, transformers, rules)?`.

This keeps the same debug‐build assertions, drops ~200 lines of boilerplate, and still preserves all functionality.
</issue_to_address>

### Comment 4
<location> `src/parser/cst_builder/tree.rs:13` </location>
<code_context>
+/// Number of cursor categories managed during CST construction.
+const SPAN_CURSOR_COUNT: usize = 7;
+
+struct SpanCursor<'a> {
+    iter: std::iter::Peekable<std::slice::Iter<'a, Span>>,
+    kind: SyntaxKind,
</code_context>

<issue_to_address>
Consider replacing the multiple SpanCursor iterators with a single sorted list of start/end events to streamline span handling.

Here’s an approach that keeps exactly the same behavior but collapses your seven‐iterator machinery into one sorted list of “start”/“end” events.  This cuts out the fixed‐size array, the two nested loops per token, and the per‐kind boilerplate:

```rust
// 1) Define an event enum
enum SpanEvent {
    Start { pos: usize, kind: SyntaxKind },
    End   { pos: usize, kind: SyntaxKind },
}

// 2) Build and sort a flat events list once at the top of build_green_tree
pub(crate) fn build_green_tree(
    tokens: &[(SyntaxKind, Span)],
    src: &str,
    spans: &ParsedSpans,
) -> GreenNode {
    // collect all span‐start and span‐end events
    let mut events = Vec::with_capacity(spans.total_count() * 2);
    for &(slice, kind) in &[
        (spans.imports(),       SyntaxKind::N_IMPORT_STMT),
        (spans.typedefs(),      SyntaxKind::N_TYPE_DEF),
        (spans.relations(),     SyntaxKind::N_RELATION_DECL),
        (spans.indexes(),       SyntaxKind::N_INDEX),
        (spans.functions(),     SyntaxKind::N_FUNCTION),
        (spans.transformers(),  SyntaxKind::N_TRANSFORMER),
        (spans.rules(),         SyntaxKind::N_RULE),
    ] {
        for span in slice {
            events.push( SpanEvent::Start { pos: span.start, kind } );
            events.push( SpanEvent::End   { pos: span.end,   kind } );
        }
    }
    events.sort_by_key(|evt| match evt {
        SpanEvent::Start { pos, .. } => (*pos, 0),
        SpanEvent::End   { pos, .. } => (*pos, 1),
    });

    // Initialize builder and an event‐index
    let mut builder = GreenNodeBuilder::new();
    builder.start_node(DdlogLanguage::kind_to_raw(SyntaxKind::N_DATALOG_PROGRAM));
    let mut ev_i = 0;

    // 3) Single pass over tokens, advancing the sorted events
    for &(tok_kind, ref span) in tokens {
        // start any spans at this position
        while ev_i < events.len() {
            match &events[ev_i] {
                SpanEvent::Start { pos, kind } if *pos == span.start => {
                    builder.start_node(DdlogLanguage::kind_to_raw(*kind));
                    ev_i += 1;
                }
                _ => break,
            }
        }

        // push the token itself
        push_token(&mut builder, tok_kind, span.clone(), src);

        // close any spans ending at this position
        while ev_i < events.len() {
            match &events[ev_i] {
                SpanEvent::End { pos, .. } if *pos == span.end => {
                    builder.finish_node();
                    ev_i += 1;
                }
                _ => break,
            }
        }
    }

    // finish the top‐level node
    builder.finish_node();
    builder.finish()
}
```

Steps:

1. Drop `SpanCursor`/`SpanCursors`.
2. Build a single `Vec<SpanEvent>` of all start/end markers up front.
3. Sort it once by position (ending events after starting at the same position).
4. In your main token loop, just advance the single `events` index to start/finish nodes. 

This reduces per‐token work to two small `while` loops instead of seven, and collapses all the repetitive `advance_to`/`start_if`/`finish_if` logic into a single, easy‐to-follow pass.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/parser/cst_builder/spans.rs
Comment thread src/parser/cst_builder/tree.rs
Comment thread src/parser/cst_builder/spans.rs
Comment thread src/parser/cst_builder/tree.rs
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jul 19, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 19, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jul 19, 2025

@coderabbitai Please create new GitHub issues for the following:

Comment 1

src/parser/cst_builder/spans.rs:98
<code_context>
+

  • /// Build the [ParsedSpans].
  • #[must_use]
  • pub fn build(self) -> ParsedSpans {
  •    let Self {
    
  •        imports,
    

</code_context>

<issue_to_address>
Debug assertion for span validation may be insufficient for release builds.

Since validation is only performed in debug builds, invalid spans could go undetected in release builds, potentially causing subtle CST bugs. Consider enforcing these checks in all builds if correctness is critical.
</issue_to_address>

Comment 2

src/parser/cst_builder/tree.rs:38
<code_context>

  •    }
    
  • }
  • fn finish_if(&mut self, builder: &mut GreenNodeBuilder, pos: usize) {
  •    if matches!(self.iter.peek(), Some(s) if pos >= s.end) {
    
  •        builder.finish_node();
    

</code_context>

<issue_to_address>
finish_if may call finish_node multiple times for the same position.

Verify that handling multiple spans ending at the same position does not cause panics or corrupt the tree structure.
</issue_to_address>

Comment 3

src/parser/cst_builder/spans.rs:36
<code_context>
+
+/// Builder for [ParsedSpans].
+#[derive(Default)]
+pub struct ParsedSpansBuilder {

  • imports: Vec,
  • typedefs: Vec,
    </code_context>

<issue_to_address>
Consider replacing the builder and its repetitive setters with a single try_new constructor that handles sorting, validation, and construction in one place.

You can collapse those seven nearly-identical setters + a builder into one try_new constructor that inlines sorting & validation—and delete the whole boilerplate builder. For example:

impl ParsedSpans {
    pub fn try_new(
        mut imports: Vec<Span>,
        mut typedefs: Vec<Span>,
        mut relations: Vec<Span>,
        mut indexes: Vec<Span>,
        mut functions: Vec<Span>,
        mut transformers: Vec<Span>,
        mut rules: Vec<Span>,
    ) -> Result<Self, String> {
        let mut errors = Vec::new();
        for (name, spans) in &mut [
            ("imports", &mut imports),
            ("typedefs", &mut typedefs),
            ("relations", &mut relations),
            ("indexes", &mut indexes),
            ("functions", &mut functions),
            ("transformers", &mut transformers),
            ("rules", &mut rules),
        ] {
            spans.sort_unstable_by_key(|s| s.start);
            if let Err(e) = validate_spans_sorted(spans) {
                errors.push(format!("{name} not sorted: {e}"));
            }
        }
        if !errors.is_empty() {
            return Err(errors.join("\n"));
        }
        Ok(ParsedSpans { imports, typedefs, relations, indexes, functions, transformers, rules })
    }
}

Then:

  • Remove ParsedSpansBuilder and all its setters.
  • Change call sites from
    ParsedSpans::builder().imports(...).typedefs(...).build()
    to
    ParsedSpans::try_new(imports, typedefs, relations, indexes, functions, transformers, rules)?.

This keeps the same debug‐build assertions, drops ~200 lines of boilerplate, and still preserves all functionality.
</issue_to_address>

Comment 4

src/parser/cst_builder/tree.rs:13
<code_context>
+/// Number of cursor categories managed during CST construction.
+const SPAN_CURSOR_COUNT: usize = 7;
+
+struct SpanCursor<'a> {

  • iter: std::iter::Peekable<std::slice::Iter<'a, Span>>,
  • kind: SyntaxKind,
    </code_context>

<issue_to_address>
Consider replacing the multiple SpanCursor iterators with a single sorted list of start/end events to streamline span handling.

Here’s an approach that keeps exactly the same behavior but collapses your seven‐iterator machinery into one sorted list of “start”/“end” events. This cuts out the fixed‐size array, the two nested loops per token, and the per‐kind boilerplate:

// 1) Define an event enum
enum SpanEvent {
    Start { pos: usize, kind: SyntaxKind },
    End   { pos: usize, kind: SyntaxKind },
}

// 2) Build and sort a flat events list once at the top of build_green_tree
pub(crate) fn build_green_tree(
    tokens: &[(SyntaxKind, Span)],
    src: &str,
    spans: &ParsedSpans,
) -> GreenNode {
    // collect all span‐start and span‐end events
    let mut events = Vec::with_capacity(spans.total_count() * 2);
    for &(slice, kind) in &[
        (spans.imports(),       SyntaxKind::N_IMPORT_STMT),
        (spans.typedefs(),      SyntaxKind::N_TYPE_DEF),
        (spans.relations(),     SyntaxKind::N_RELATION_DECL),
        (spans.indexes(),       SyntaxKind::N_INDEX),
        (spans.functions(),     SyntaxKind::N_FUNCTION),
        (spans.transformers(),  SyntaxKind::N_TRANSFORMER),
        (spans.rules(),         SyntaxKind::N_RULE),
    ] {
        for span in slice {
            events.push( SpanEvent::Start { pos: span.start, kind } );
            events.push( SpanEvent::End   { pos: span.end,   kind } );
        }
    }
    events.sort_by_key(|evt| match evt {
        SpanEvent::Start { pos, .. } => (*pos, 0),
        SpanEvent::End   { pos, .. } => (*pos, 1),
    });

    // Initialize builder and an event‐index
    let mut builder = GreenNodeBuilder::new();
    builder.start_node(DdlogLanguage::kind_to_raw(SyntaxKind::N_DATALOG_PROGRAM));
    let mut ev_i = 0;

    // 3) Single pass over tokens, advancing the sorted events
    for &(tok_kind, ref span) in tokens {
        // start any spans at this position
        while ev_i < events.len() {
            match &events[ev_i] {
                SpanEvent::Start { pos, kind } if *pos == span.start => {
                    builder.start_node(DdlogLanguage::kind_to_raw(*kind));
                    ev_i += 1;
                }
                _ => break,
            }
        }

        // push the token itself
        push_token(&mut builder, tok_kind, span.clone(), src);

        // close any spans ending at this position
        while ev_i < events.len() {
            match &events[ev_i] {
                SpanEvent::End { pos, .. } if *pos == span.end => {
                    builder.finish_node();
                    ev_i += 1;
                }
                _ => break,
            }
        }
    }

    // finish the top‐level node
    builder.finish_node();
    builder.finish()
}

Steps:

  1. Drop SpanCursor/SpanCursors.
  2. Build a single Vec<SpanEvent> of all start/end markers up front.
  3. Sort it once by position (ending events after starting at the same position).
  4. In your main token loop, just advance the single events index to start/finish nodes.

This reduces per‐token work to two small while loops instead of seven, and collapses all the repetitive advance_to/start_if/finish_if logic into a single, easy‐to-follow pass.
</issue_to_address>

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 19, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

I'll create GitHub issues for each of the suggestions you've provided. Let me create these as follow-up items for the CST construction improvements.

@leynos I've created 4 GitHub issues to track the suggested improvements:

  1. Consider enforcing span validation in release builds - Addresses the debug-only validation concern in spans.rs
  2. Verify finish_if handling of multiple spans ending at same position - Ensures robustness of the finish_if method in tree.rs
  3. Consider replacing ParsedSpansBuilder with try_new constructor - Proposes simplifying the builder pattern with a single constructor
  4. Consider replacing multiple SpanCursor iterators with single sorted event list - Suggests optimizing the span handling approach

Each issue includes the specific location, code context, detailed description, and references back to this PR. They're all assigned to you for future consideration and implementation.

  (\   /)
 ( ._.)
o_(")(")  *thoughtful*

🏁 Script executed:

#!/bin/bash

# Create Issue 1: Debug assertion for span validation
gh issue create \
  --title "Consider enforcing span validation in release builds" \
  --body "## Issue Description

Debug assertion for span validation may be insufficient for release builds.

Since validation is only performed in debug builds, invalid spans could go undetected in release builds, potentially causing subtle CST bugs. Consider enforcing these checks in all builds if correctness is critical.

## Location
\`src/parser/cst_builder/spans.rs:98\`

## Code Context
\`\`\`rust
    /// Build the [\`ParsedSpans\`].
    #[must_use]
    pub fn build(self) -> ParsedSpans {
        let Self {
            imports,
\`\`\`

## References
- PR: https://github.com/leynos/ddlint/pull/79
- Comment: https://github.com/leynos/ddlint/pull/79#issuecomment-$(date +%s)" \
  --assignee leynos

# Create Issue 2: finish_if multiple calls
gh issue create \
  --title "Verify finish_if handling of multiple spans ending at same position" \
  --body "## Issue Description

finish_if may call finish_node multiple times for the same position.

Verify that handling multiple spans ending at the same position does not cause panics or corrupt the tree structure.

## Location
\`src/parser/cst_builder/tree.rs:38\`

## Code Context
\`\`\`rust
    fn finish_if(&mut self, builder: &mut GreenNodeBuilder, pos: usize) {
        if matches!(self.iter.peek(), Some(s) if pos >= s.end) {
            builder.finish_node();
\`\`\`

## References
- PR: https://github.com/leynos/ddlint/pull/79
- Comment: https://github.com/leynos/ddlint/pull/79#issuecomment-$(date +%s)" \
  --assignee leynos

# Create Issue 3: Builder pattern simplification
gh issue create \
  --title "Consider replacing ParsedSpansBuilder with try_new constructor" \
  --body "## Issue Description

Consider replacing the builder and its repetitive setters with a single try_new constructor that handles sorting, validation, and construction in one place.

You can collapse those seven nearly-identical setters + a builder into one \`try_new\` constructor that inlines sorting & validation—and delete the whole boilerplate builder.

## Location
\`src/parser/cst_builder/spans.rs:36\`

## Code Context
\`\`\`rust
/// Builder for [\`ParsedSpans\`].
#[derive(Default)]
pub struct ParsedSpansBuilder {
    imports: Vec<Span>,
    typedefs: Vec<Span>,
\`\`\`

## Proposed Solution
\`\`\`rust
impl ParsedSpans {
    pub fn try_new(
        mut imports: Vec<Span>,
        mut typedefs: Vec<Span>,
        mut relations: Vec<Span>,
        mut indexes: Vec<Span>,
        mut functions: Vec<Span>,
        mut transformers: Vec<Span>,
        mut rules: Vec<Span>,
    ) -> Result<Self, String> {
        let mut errors = Vec::new();
        for (name, spans) in &mut [
            (\"imports\", &mut imports),
            (\"typedefs\", &mut typedefs),
            (\"relations\", &mut relations),
            (\"indexes\", &mut indexes),
            (\"functions\", &mut functions),
            (\"transformers\", &mut transformers),
            (\"rules\", &mut rules),
        ] {
            spans.sort_unstable_by_key(|s| s.start);
            if let Err(e) = validate_spans_sorted(spans) {
                errors.push(format!(\"{name} not sorted: {e}\"));
            }
        }
        if !errors.is_empty() {
            return Err(errors.join(\"\n\"));
        }
        Ok(ParsedSpans { imports, typedefs, relations, indexes, functions, transformers, rules })
    }
}
\`\`\`

## Benefits
- Remove \`ParsedSpansBuilder\` and all its setters
- Change call sites from \`ParsedSpans::builder().imports(...).typedefs(...).build()\` to \`ParsedSpans::try_new(imports, typedefs, relations, indexes, functions, transformers, rules)?\`
- Keeps the same debug‐build assertions, drops ~200 lines of boilerplate, and preserves all functionality

## References
- PR: https://github.com/leynos/ddlint/pull/79
- Comment: https://github.com/leynos/ddlint/pull/79#issuecomment-$(date +%s)" \
  --assignee leynos

# Create Issue 4: SpanCursor optimization
gh issue create \
  --title "Consider replacing multiple SpanCursor iterators with single sorted event list" \
  --body "## Issue Description

Consider replacing the multiple SpanCursor iterators with a single sorted list of start/end events to streamline span handling.

Here's an approach that keeps exactly the same behavior but collapses your seven‐iterator machinery into one sorted list of \"start\"/\"end\" events. This cuts out the fixed‐size array, the two nested loops per token, and the per‐kind boilerplate.

## Location
\`src/parser/cst_builder/tree.rs:13\`

## Code Context
\`\`\`rust
/// Number of cursor categories managed during CST construction.
const SPAN_CURSOR_COUNT: usize = 7;

struct SpanCursor<'a> {
    iter: std::iter::Peekable<std::slice::Iter<'a, Span>>,
    kind: SyntaxKind,
\`\`\`

## Proposed Solution

1. Define an event enum:
\`\`\`rust
enum SpanEvent {
    Start { pos: usize, kind: SyntaxKind },
    End   { pos: usize, kind: SyntaxKind },
}
\`\`\`

2. Build and sort a flat events list once at the top of build_green_tree
3. Single pass over tokens, advancing the sorted events

## Benefits
- Reduces per‐token work to two small \`while\` loops instead of seven
- Collapses all the repetitive \`advance_to\`/\`start_if\`/\`finish_if\` logic into a single, easy‐to-follow pass
- Eliminates fixed-size array and nested loops per token

## References
- PR: https://github.com/leynos/ddlint/pull/79
- Comment: https://github.com/leynos/ddlint/pull/79#issuecomment-$(date +%s)" \
  --assignee leynos

echo "Successfully created 4 GitHub issues for the suggested improvements"

Length of output: 5637


Perfect! I've successfully created 4 GitHub issues to track the suggested improvements:

  1. Issue Consider enforcing span validation in release builds #80: Consider enforcing span validation in release builds
  2. Issue Verify finish_if handling of multiple spans ending at same position #81: Verify finish_if handling of multiple spans ending at same position
  3. Issue Consider replacing ParsedSpansBuilder with try_new constructor #82: Consider replacing ParsedSpansBuilder with try_new constructor
  4. Issue Consider replacing multiple SpanCursor iterators with single sorted event list #83: Consider replacing multiple SpanCursor iterators with single sorted event list

Each issue includes:

  • Detailed description of the improvement suggestion
  • Specific file location and code context
  • Proposed solutions where applicable
  • Benefits of implementing the change
  • Backlinks to this PR and comment thread
  • Assignment to you for future consideration

These are excellent suggestions that can be addressed in future PRs to further improve the CST construction implementation!

  (\   /)
 ( ._.)
o_(")(")  *accomplished*

@github-actions
Copy link
Copy Markdown

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 19, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jul 19, 2025

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leynos - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `src/parser/cst_builder/spans.rs:98` </location>
<code_context>
+
+    /// Build the [`ParsedSpans`].
+    #[must_use]
+    pub fn build(self) -> ParsedSpans {
+        let Self {
+            imports,
</code_context>

<issue_to_address>
Validation of span order only occurs in debug builds.

Since validation only happens in debug builds, invalid spans may go undetected in release builds, potentially causing bugs. Consider enabling validation in release builds, either always or via a feature flag.

Suggested implementation:

```rust
        let result = validate_span_lists_sorted(&[
            ("imports", &imports),

```

```rust
        if let Err(e) = result {
            panic!("Span order validation failed: {e}");
        }

```
</issue_to_address>

### Comment 2
<location> `src/parser/cst_builder/tree.rs:105` </location>
<code_context>
+
+    let mut cursors = SpanCursors::new(spans);
+
+    for &(kind, ref span) in tokens {
+        cursors.advance_and_start(&mut builder, span.start);
+
</code_context>

<issue_to_address>
Potential for panics if token spans are out of bounds.

Proceeding with an empty string after a span is out of bounds may hide underlying issues. Consider returning an error or panicking in debug builds to catch these problems early.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
+    let mut cursors = SpanCursors::new(spans);
+
+    for &(kind, ref span) in tokens {
+        cursors.advance_and_start(&mut builder, span.start);
+
=======
+    let mut cursors = SpanCursors::new(spans);
+
+    for &(kind, ref span) in tokens {
+        // Check for out-of-bounds spans
+        if span.start > span.end || span.end > src.len() {
+            #[cfg(debug_assertions)]
+            panic!(
+                "Token span out of bounds: span=({:?}, {:?}), src.len()={}",
+                span.start, span.end, src.len()
+            );
+            #[cfg(not(debug_assertions))]
+            {
+                warn!(
+                    "Token span out of bounds: span=({:?}, {:?}), src.len()={}",
+                    span.start, span.end, src.len()
+                );
+                continue;
+            }
+        }
+        cursors.advance_and_start(&mut builder, span.start);
+
>>>>>>> REPLACE

</suggested_fix>

### Comment 3
<location> `src/parser/cst_builder/tree.rs:117` </location>
<code_context>
+    builder.finish()
+}
+
+fn push_token(builder: &mut GreenNodeBuilder, kind: SyntaxKind, span: Span, src: &str) {
+    let text = src.get(span.clone()).map_or_else(
+        || {
</code_context>

<issue_to_address>
Cloning the span for src.get may be unnecessary.

Consider using a reference or passing Span by value if it implements Copy, as this may improve performance and code clarity.
</issue_to_address>

### Comment 4
<location> `src/parser/cst_builder/spans.rs:17` </location>
<code_context>
+/// are sorted and non-overlapping in debug builds.
+#[non_exhaustive]
+#[derive(Debug, Default, Clone, PartialEq)]
+pub struct ParsedSpans {
+    /// `import` statement spans.
+    imports: Vec<Span>,
</code_context>

<issue_to_address>
Consider using a macro to generate the repetitive fields, setters, getters, and builder logic for ParsedSpans and its builder.

Consider collapsing all of the repetitive fields, setters, getters and the build/validation logic into one `macro_rules!` instead of hand‐writing each. For example, you could do something like:

```rust
macro_rules! define_parsed_spans {
    ( $( $field:ident ),* $(,)? ) => {
        #[derive(Debug, Default, Clone, PartialEq)]
        pub struct ParsedSpans {
            $( pub(crate) $field: Vec<Span> ),*
        }

        #[derive(Default)]
        pub struct ParsedSpansBuilder {
            $( $field: Vec<Span> ),*
        }

        impl ParsedSpansBuilder {
            $(
                #[must_use]
                pub fn $field(mut self, spans: Vec<Span>) -> Self {
                    self.$field = spans;
                    self
                }
            )*

            #[must_use]
            pub fn build(self) -> ParsedSpans {
                let Self { $( $field ),* } = self;
                let result = validate_span_lists_sorted(&[
                    $( (stringify!($field), &$field) ),*
                ]);
                debug_assert!(result.is_ok(), "{}", result.err().unwrap_or_default());

                ParsedSpans { $( $field ),* }
            }
        }

        impl ParsedSpans {
            #[must_use]
            pub fn builder() -> ParsedSpansBuilder {
                ParsedSpansBuilder::default()
            }

            $(
                #[must_use]
                pub fn $field(&self) -> &[Span] {
                    &self.$field
                }
            )*
        }
    }
}

define_parsed_spans!(
    imports,
    typedefs,
    relations,
    indexes,
    functions,
    transformers,
    rules,
);
```

This preserves your build/validation logic verbatim but removes all of the near-duplicate setter/getter boilerplate.
</issue_to_address>

### Comment 5
<location> `src/parser/cst_builder/tree.rs:46` </location>
<code_context>
+    }
+}
+
+struct SpanCursors<'a> {
+    cursors: [SpanCursor<'a>; SPAN_CURSOR_COUNT],
+}
</code_context>

<issue_to_address>
Consider replacing the fixed-size SpanCursors struct and constant with a Vec<SpanCursor> for a more flexible and data-driven approach.

Here’s a way to collapse the fixed‐size array, the `SPAN_CURSOR_COUNT` constant, and the hand‐rolled `SpanCursors` struct into a simple `Vec<SpanCursor>`. You still get the same “advance / start / finish” logic, but without hard‐coding the count or the shape of each statement category.

```rust
pub(crate) fn build_green_tree(
    tokens: &[(SyntaxKind, Span)],
    src: &str,
    spans: &ParsedSpans,
) -> GreenNode {
    let mut builder = GreenNodeBuilder::new();
    builder.start_node(DdlogLanguage::kind_to_raw(SyntaxKind::N_DATALOG_PROGRAM));

    // collect each (slice, kind) into a SpanCursor in a Vec
    let mut cursors: Vec<SpanCursor> = vec![
        (spans.imports(),      SyntaxKind::N_IMPORT_STMT),
        (spans.typedefs(),     SyntaxKind::N_TYPE_DEF),
        (spans.relations(),    SyntaxKind::N_RELATION_DECL),
        (spans.indexes(),      SyntaxKind::N_INDEX),
        (spans.functions(),    SyntaxKind::N_FUNCTION),
        (spans.transformers(), SyntaxKind::N_TRANSFORMER),
        (spans.rules(),        SyntaxKind::N_RULE),
    ]
    .into_iter()
    .map(|(slice, kind)| SpanCursor::new(slice, kind))
    .collect();

    for &(_, ref span) in tokens {
        let pos = span.start;
        // one loop to advance & start
        for cur in &mut cursors {
            cur.advance_to(pos);
            cur.start_if(&mut builder, pos);
        }

        push_token(&mut builder, kind, span.clone(), src);

        // one loop to finish
        let pos = span.end;
        for cur in &mut cursors {
            cur.finish_if(&mut builder, pos);
        }
    }

    builder.finish_node();
    builder.finish()
}
```

Key changes:

  * Removed `SPAN_CURSOR_COUNT` and the fixed-size `[SpanCursor; N]`.
  * Build a `Vec<SpanCursor>` from a `Vec<(slice, kind)>` – adding new categories is now “just” another tuple.
  * Kept your `SpanCursor` impl completely unchanged.

This preserves all the existing behavior but is fully data-driven and unblocks adding/removing categories without touching a constant or a struct definition.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.


/// Build the [`ParsedSpans`].
#[must_use]
pub fn build(self) -> ParsedSpans {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Validation of span order only occurs in debug builds.

Since validation only happens in debug builds, invalid spans may go undetected in release builds, potentially causing bugs. Consider enabling validation in release builds, either always or via a feature flag.

Suggested implementation:

        let result = validate_span_lists_sorted(&[
            ("imports", &imports),
        if let Err(e) = result {
            panic!("Span order validation failed: {e}");
        }

Comment on lines +103 to +107
let mut cursors = SpanCursors::new(spans);

for &(kind, ref span) in tokens {
cursors.advance_and_start(&mut builder, span.start);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Potential for panics if token spans are out of bounds.

Proceeding with an empty string after a span is out of bounds may hide underlying issues. Consider returning an error or panicking in debug builds to catch these problems early.

Suggested change
let mut cursors = SpanCursors::new(spans);
for &(kind, ref span) in tokens {
cursors.advance_and_start(&mut builder, span.start);
+ let mut cursors = SpanCursors::new(spans);
+
+ for &(kind, ref span) in tokens {
+ // Check for out-of-bounds spans
+ if span.start > span.end || span.end > src.len() {
+ #[cfg(debug_assertions)]
+ panic!(
+ "Token span out of bounds: span=({:?}, {:?}), src.len()={}",
+ span.start, span.end, src.len()
+ );
+ #[cfg(not(debug_assertions))]
+ {
+ warn!(
+ "Token span out of bounds: span=({:?}, {:?}), src.len()={}",
+ span.start, span.end, src.len()
+ );
+ continue;
+ }
+ }
+ cursors.advance_and_start(&mut builder, span.start);
+

builder.finish()
}

fn push_token(builder: &mut GreenNodeBuilder, kind: SyntaxKind, span: Span, src: &str) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Cloning the span for src.get may be unnecessary.

Consider using a reference or passing Span by value if it implements Copy, as this may improve performance and code clarity.

/// are sorted and non-overlapping in debug builds.
#[non_exhaustive]
#[derive(Debug, Default, Clone, PartialEq)]
pub struct ParsedSpans {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider using a macro to generate the repetitive fields, setters, getters, and builder logic for ParsedSpans and its builder.

Consider collapsing all of the repetitive fields, setters, getters and the build/validation logic into one macro_rules! instead of hand‐writing each. For example, you could do something like:

macro_rules! define_parsed_spans {
    ( $( $field:ident ),* $(,)? ) => {
        #[derive(Debug, Default, Clone, PartialEq)]
        pub struct ParsedSpans {
            $( pub(crate) $field: Vec<Span> ),*
        }

        #[derive(Default)]
        pub struct ParsedSpansBuilder {
            $( $field: Vec<Span> ),*
        }

        impl ParsedSpansBuilder {
            $(
                #[must_use]
                pub fn $field(mut self, spans: Vec<Span>) -> Self {
                    self.$field = spans;
                    self
                }
            )*

            #[must_use]
            pub fn build(self) -> ParsedSpans {
                let Self { $( $field ),* } = self;
                let result = validate_span_lists_sorted(&[
                    $( (stringify!($field), &$field) ),*
                ]);
                debug_assert!(result.is_ok(), "{}", result.err().unwrap_or_default());

                ParsedSpans { $( $field ),* }
            }
        }

        impl ParsedSpans {
            #[must_use]
            pub fn builder() -> ParsedSpansBuilder {
                ParsedSpansBuilder::default()
            }

            $(
                #[must_use]
                pub fn $field(&self) -> &[Span] {
                    &self.$field
                }
            )*
        }
    }
}

define_parsed_spans!(
    imports,
    typedefs,
    relations,
    indexes,
    functions,
    transformers,
    rules,
);

This preserves your build/validation logic verbatim but removes all of the near-duplicate setter/getter boilerplate.

}
}

struct SpanCursors<'a> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider replacing the fixed-size SpanCursors struct and constant with a Vec for a more flexible and data-driven approach.

Here’s a way to collapse the fixed‐size array, the SPAN_CURSOR_COUNT constant, and the hand‐rolled SpanCursors struct into a simple Vec<SpanCursor>. You still get the same “advance / start / finish” logic, but without hard‐coding the count or the shape of each statement category.

pub(crate) fn build_green_tree(
    tokens: &[(SyntaxKind, Span)],
    src: &str,
    spans: &ParsedSpans,
) -> GreenNode {
    let mut builder = GreenNodeBuilder::new();
    builder.start_node(DdlogLanguage::kind_to_raw(SyntaxKind::N_DATALOG_PROGRAM));

    // collect each (slice, kind) into a SpanCursor in a Vec
    let mut cursors: Vec<SpanCursor> = vec![
        (spans.imports(),      SyntaxKind::N_IMPORT_STMT),
        (spans.typedefs(),     SyntaxKind::N_TYPE_DEF),
        (spans.relations(),    SyntaxKind::N_RELATION_DECL),
        (spans.indexes(),      SyntaxKind::N_INDEX),
        (spans.functions(),    SyntaxKind::N_FUNCTION),
        (spans.transformers(), SyntaxKind::N_TRANSFORMER),
        (spans.rules(),        SyntaxKind::N_RULE),
    ]
    .into_iter()
    .map(|(slice, kind)| SpanCursor::new(slice, kind))
    .collect();

    for &(_, ref span) in tokens {
        let pos = span.start;
        // one loop to advance & start
        for cur in &mut cursors {
            cur.advance_to(pos);
            cur.start_if(&mut builder, pos);
        }

        push_token(&mut builder, kind, span.clone(), src);

        // one loop to finish
        let pos = span.end;
        for cur in &mut cursors {
            cur.finish_if(&mut builder, pos);
        }
    }

    builder.finish_node();
    builder.finish()
}

Key changes:

  • Removed SPAN_CURSOR_COUNT and the fixed-size [SpanCursor; N].
  • Build a Vec<SpanCursor> from a Vec<(slice, kind)> – adding new categories is now “just” another tuple.
  • Kept your SpanCursor impl completely unchanged.

This preserves all the existing behavior but is fully data-driven and unblocks adding/removing categories without touching a constant or a struct definition.

@leynos leynos merged commit 197a5e6 into main Jul 19, 2025
3 checks passed
@leynos leynos deleted the codex/extract-parser-cst_builder-module branch July 19, 2025 08:52
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jul 19, 2025

@coderabbitai Please create new GitHub issues for the following:

Comment 1

src/parser/cst_builder/spans.rs:98
<code_context>
+

  • /// Build the [ParsedSpans].
  • #[must_use]
  • pub fn build(self) -> ParsedSpans {
  •    let Self {
    
  •        imports,
    

</code_context>

<issue_to_address>
Validation of span order only occurs in debug builds.

Since validation only happens in debug builds, invalid spans may go undetected in release builds, potentially causing bugs. Consider enabling validation in release builds, either always or via a feature flag.

Suggested implementation:

        let result = validate_span_lists_sorted(&[
            ("imports", &imports),
        if let Err(e) = result {
            panic!("Span order validation failed: {e}");
        }

</issue_to_address>

Comment 2

src/parser/cst_builder/tree.rs:105
<code_context>
+

  • let mut cursors = SpanCursors::new(spans);
  • for &(kind, ref span) in tokens {
  •    cursors.advance_and_start(&mut builder, span.start);
    

</code_context>

<issue_to_address>
Potential for panics if token spans are out of bounds.

Proceeding with an empty string after a span is out of bounds may hide underlying issues. Consider returning an error or panicking in debug builds to catch these problems early.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH

  • let mut cursors = SpanCursors::new(spans);
  • for &(kind, ref span) in tokens {
  •    cursors.advance_and_start(&mut builder, span.start);
    

=======

  • let mut cursors = SpanCursors::new(spans);
  • for &(kind, ref span) in tokens {
  •    // Check for out-of-bounds spans
    
  •    if span.start > span.end || span.end > src.len() {
    
  •        #[cfg(debug_assertions)]
    
  •        panic!(
    
  •            "Token span out of bounds: span=({:?}, {:?}), src.len()={}",
    
  •            span.start, span.end, src.len()
    
  •        );
    
  •        #[cfg(not(debug_assertions))]
    
  •        {
    
  •            warn!(
    
  •                "Token span out of bounds: span=({:?}, {:?}), src.len()={}",
    
  •                span.start, span.end, src.len()
    
  •            );
    
  •            continue;
    
  •        }
    
  •    }
    
  •    cursors.advance_and_start(&mut builder, span.start);
    

REPLACE

</suggested_fix>

Comment 3

src/parser/cst_builder/tree.rs:117
<code_context>

  • builder.finish()
    +}

+fn push_token(builder: &mut GreenNodeBuilder, kind: SyntaxKind, span: Span, src: &str) {

  • let text = src.get(span.clone()).map_or_else(
  •    || {
    

</code_context>

<issue_to_address>
Cloning the span for src.get may be unnecessary.

Consider using a reference or passing Span by value if it implements Copy, as this may improve performance and code clarity.
</issue_to_address>

Comment 4

src/parser/cst_builder/spans.rs:17
<code_context>
+/// are sorted and non-overlapping in debug builds.
+#[non_exhaustive]
+#[derive(Debug, Default, Clone, PartialEq)]
+pub struct ParsedSpans {

  • /// import statement spans.
  • imports: Vec,
    </code_context>

<issue_to_address>
Consider using a macro to generate the repetitive fields, setters, getters, and builder logic for ParsedSpans and its builder.

Consider collapsing all of the repetitive fields, setters, getters and the build/validation logic into one macro_rules! instead of hand‐writing each. For example, you could do something like:

macro_rules! define_parsed_spans {
    ( $( $field:ident ),* $(,)? ) => {
        #[derive(Debug, Default, Clone, PartialEq)]
        pub struct ParsedSpans {
            $( pub(crate) $field: Vec<Span> ),*
        }

        #[derive(Default)]
        pub struct ParsedSpansBuilder {
            $( $field: Vec<Span> ),*
        }

        impl ParsedSpansBuilder {
            $(
                #[must_use]
                pub fn $field(mut self, spans: Vec<Span>) -> Self {
                    self.$field = spans;
                    self
                }
            )*

            #[must_use]
            pub fn build(self) -> ParsedSpans {
                let Self { $( $field ),* } = self;
                let result = validate_span_lists_sorted(&[
                    $( (stringify!($field), &$field) ),*
                ]);
                debug_assert!(result.is_ok(), "{}", result.err().unwrap_or_default());

                ParsedSpans { $( $field ),* }
            }
        }

        impl ParsedSpans {
            #[must_use]
            pub fn builder() -> ParsedSpansBuilder {
                ParsedSpansBuilder::default()
            }

            $(
                #[must_use]
                pub fn $field(&self) -> &[Span] {
                    &self.$field
                }
            )*
        }
    }
}

define_parsed_spans!(
    imports,
    typedefs,
    relations,
    indexes,
    functions,
    transformers,
    rules,
);

This preserves your build/validation logic verbatim but removes all of the near-duplicate setter/getter boilerplate.
</issue_to_address>

Comment 5

src/parser/cst_builder/tree.rs:46
<code_context>

  • }
    +}

+struct SpanCursors<'a> {

  • cursors: [SpanCursor<'a>; SPAN_CURSOR_COUNT],
    +}
    </code_context>

<issue_to_address>
Consider replacing the fixed-size SpanCursors struct and constant with a Vec for a more flexible and data-driven approach.

Here’s a way to collapse the fixed‐size array, the SPAN_CURSOR_COUNT constant, and the hand‐rolled SpanCursors struct into a simple Vec<SpanCursor>. You still get the same “advance / start / finish” logic, but without hard‐coding the count or the shape of each statement category.

pub(crate) fn build_green_tree(
    tokens: &[(SyntaxKind, Span)],
    src: &str,
    spans: &ParsedSpans,
) -> GreenNode {
    let mut builder = GreenNodeBuilder::new();
    builder.start_node(DdlogLanguage::kind_to_raw(SyntaxKind::N_DATALOG_PROGRAM));

    // collect each (slice, kind) into a SpanCursor in a Vec
    let mut cursors: Vec<SpanCursor> = vec![
        (spans.imports(),      SyntaxKind::N_IMPORT_STMT),
        (spans.typedefs(),     SyntaxKind::N_TYPE_DEF),
        (spans.relations(),    SyntaxKind::N_RELATION_DECL),
        (spans.indexes(),      SyntaxKind::N_INDEX),
        (spans.functions(),    SyntaxKind::N_FUNCTION),
        (spans.transformers(), SyntaxKind::N_TRANSFORMER),
        (spans.rules(),        SyntaxKind::N_RULE),
    ]
    .into_iter()
    .map(|(slice, kind)| SpanCursor::new(slice, kind))
    .collect();

    for &(_, ref span) in tokens {
        let pos = span.start;
        // one loop to advance & start
        for cur in &mut cursors {
            cur.advance_to(pos);
            cur.start_if(&mut builder, pos);
        }

        push_token(&mut builder, kind, span.clone(), src);

        // one loop to finish
        let pos = span.end;
        for cur in &mut cursors {
            cur.finish_if(&mut builder, pos);
        }
    }

    builder.finish_node();
    builder.finish()
}

Key changes:

  • Removed SPAN_CURSOR_COUNT and the fixed-size [SpanCursor; N].
  • Build a Vec<SpanCursor> from a Vec<(slice, kind)> – adding new categories is now “just” another tuple.
  • Kept your SpanCursor impl completely unchanged.

This preserves all the existing behavior but is fully data-driven and unblocks adding/removing categories without touching a constant or a struct definition.
</issue_to_address>

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 19, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

I'll create GitHub issues for each of the 5 suggestions you've provided. Let me create these as follow-up improvements for the CST construction implementation.

@leynos I've successfully created 5 GitHub issues to track the suggested improvements:

  1. Issue Consider enabling span validation in release builds #84: Consider enabling span validation in release builds - Addresses debug-only validation concerns
  2. Issue Add bounds checking for token spans in CST construction #85: Add bounds checking for token spans in CST construction - Handles potential out-of-bounds token spans
  3. Issue Optimize span usage to avoid unnecessary cloning #86: Optimize span usage to avoid unnecessary cloning - Performance improvement for span handling
  4. Issue Consider using macro to generate ParsedSpans boilerplate #87: Consider using macro to generate ParsedSpans boilerplate - Code generation to reduce repetitive code
  5. Issue Replace fixed-size SpanCursors with flexible Vec approach #88: Replace fixed-size SpanCursors with flexible Vec approach - More flexible data-driven span cursor management

Each issue includes:

  • Detailed problem description and suggested solutions
  • Specific file locations and code context
  • Complete implementation examples where applicable
  • Benefits of the proposed changes
  • Backlinks to this PR and comment thread
  • Assignment to you for future consideration

These improvements focus on robustness, performance, and maintainability of the CST construction implementation!


🏁 Script executed:

#!/bin/bash

# Create Issue 1: Span validation only in debug builds
gh issue create \
  --title "Consider enabling span validation in release builds" \
  --body "## Issue Description

Validation of span order only occurs in debug builds.

Since validation only happens in debug builds, invalid spans may go undetected in release builds, potentially causing bugs. Consider enabling validation in release builds, either always or via a feature flag.

## Location
\`src/parser/cst_builder/spans.rs:98\`

## Code Context
\`\`\`rust
    /// Build the [\`ParsedSpans\`].
    #[must_use]
    pub fn build(self) -> ParsedSpans {
        let Self {
            imports,
\`\`\`

## Suggested Implementation
\`\`\`rust
        let result = validate_span_lists_sorted(&[
            (\"imports\", &imports),
            // ... other fields
        ]);
        if let Err(e) = result {
            panic!(\"Span order validation failed: {e}\");
        }
\`\`\`

## References
- PR: https://github.com/leynos/ddlint/pull/79
- Comment: https://github.com/leynos/ddlint/pull/79#issuecomment-$(date +%s)" \
  --assignee leynos

# Create Issue 2: Out-of-bounds token spans
gh issue create \
  --title "Add bounds checking for token spans in CST construction" \
  --body "## Issue Description

Potential for panics if token spans are out of bounds.

Proceeding with an empty string after a span is out of bounds may hide underlying issues. Consider returning an error or panicking in debug builds to catch these problems early.

## Location
\`src/parser/cst_builder/tree.rs:105\`

## Code Context
\`\`\`rust
    let mut cursors = SpanCursors::new(spans);

    for &(kind, ref span) in tokens {
        cursors.advance_and_start(&mut builder, span.start);
\`\`\`

## Suggested Fix
\`\`\`rust
    let mut cursors = SpanCursors::new(spans);

    for &(kind, ref span) in tokens {
        // Check for out-of-bounds spans
        if span.start > span.end || span.end > src.len() {
            #[cfg(debug_assertions)]
            panic!(
                \"Token span out of bounds: span=({:?}, {:?}), src.len()={}\",
                span.start, span.end, src.len()
            );
            #[cfg(not(debug_assertions))]
            {
                warn!(
                    \"Token span out of bounds: span=({:?}, {:?}), src.len()={}\",
                    span.start, span.end, src.len()
                );
                continue;
            }
        }
        cursors.advance_and_start(&mut builder, span.start);
\`\`\`

## References
- PR: https://github.com/leynos/ddlint/pull/79
- Comment: https://github.com/leynos/ddlint/pull/79#issuecomment-$(date +%s)" \
  --assignee leynos

# Create Issue 3: Unnecessary span cloning
gh issue create \
  --title "Optimize span usage to avoid unnecessary cloning" \
  --body "## Issue Description

Cloning the span for src.get may be unnecessary.

Consider using a reference or passing Span by value if it implements Copy, as this may improve performance and code clarity.

## Location
\`src/parser/cst_builder/tree.rs:117\`

## Code Context
\`\`\`rust
fn push_token(builder: &mut GreenNodeBuilder, kind: SyntaxKind, span: Span, src: &str) {
    let text = src.get(span.clone()).map_or_else(
        || {
\`\`\`

## Suggested Improvement
Consider whether \`span.clone()\` is necessary or if \`span\` can be used directly, especially if \`Span\` implements \`Copy\`.

## References
- PR: https://github.com/leynos/ddlint/pull/79
- Comment: https://github.com/leynos/ddlint/pull/79#issuecomment-$(date +%s)" \
  --assignee leynos

# Create Issue 4: Use macro to reduce boilerplate
gh issue create \
  --title "Consider using macro to generate ParsedSpans boilerplate" \
  --body "## Issue Description

Consider using a macro to generate the repetitive fields, setters, getters, and builder logic for ParsedSpans and its builder.

Consider collapsing all of the repetitive fields, setters, getters and the build/validation logic into one \`macro_rules!\` instead of hand‐writing each.

## Location
\`src/parser/cst_builder/spans.rs:17\`

## Code Context
\`\`\`rust
/// are sorted and non-overlapping in debug builds.
#[non_exhaustive]
#[derive(Debug, Default, Clone, PartialEq)]
pub struct ParsedSpans {
    /// \`import\` statement spans.
    imports: Vec<Span>,
\`\`\`

## Suggested Implementation
\`\`\`rust
macro_rules! define_parsed_spans {
    ( \$( \$field:ident ),* \$(,)? ) => {
        #[derive(Debug, Default, Clone, PartialEq)]
        pub struct ParsedSpans {
            \$( pub(crate) \$field: Vec<Span> ),*
        }

        #[derive(Default)]
        pub struct ParsedSpansBuilder {
            \$( \$field: Vec<Span> ),*
        }

        impl ParsedSpansBuilder {
            \$(
                #[must_use]
                pub fn \$field(mut self, spans: Vec<Span>) -> Self {
                    self.\$field = spans;
                    self
                }
            )*

            #[must_use]
            pub fn build(self) -> ParsedSpans {
                let Self { \$( \$field ),* } = self;
                let result = validate_span_lists_sorted(&[
                    \$( (stringify!(\$field), &\$field) ),*
                ]);
                debug_assert!(result.is_ok(), \"{}\", result.err().unwrap_or_default());

                ParsedSpans { \$( \$field ),* }
            }
        }

        impl ParsedSpans {
            #[must_use]
            pub fn builder() -> ParsedSpansBuilder {
                ParsedSpansBuilder::default()
            }

            \$(
                #[must_use]
                pub fn \$field(&self) -> &[Span] {
                    &self.\$field
                }
            )*
        }
    }
}

define_parsed_spans!(
    imports,
    typedefs,
    relations,
    indexes,
    functions,
    transformers,
    rules,
);
\`\`\`

## Benefits
This preserves your build/validation logic verbatim but removes all of the near-duplicate setter/getter boilerplate.

## References
- PR: https://github.com/leynos/ddlint/pull/79
- Comment: https://github.com/leynos/ddlint/pull/79#issuecomment-$(date +%s)" \
  --assignee leynos

# Create Issue 5: Replace fixed-size array with Vec
gh issue create \
  --title "Replace fixed-size SpanCursors with flexible Vec approach" \
  --body "## Issue Description

Consider replacing the fixed-size SpanCursors struct and constant with a Vec<SpanCursor> for a more flexible and data-driven approach.

Here's a way to collapse the fixed‐size array, the \`SPAN_CURSOR_COUNT\` constant, and the hand‐rolled \`SpanCursors\` struct into a simple \`Vec<SpanCursor>\`. You still get the same \"advance / start / finish\" logic, but without hard‐coding the count or the shape of each statement category.

## Location
\`src/parser/cst_builder/tree.rs:46\`

## Code Context
\`\`\`rust
struct SpanCursors<'a> {
    cursors: [SpanCursor<'a>; SPAN_CURSOR_COUNT],
}
\`\`\`

## Suggested Implementation
\`\`\`rust
pub(crate) fn build_green_tree(
    tokens: &[(SyntaxKind, Span)],
    src: &str,
    spans: &ParsedSpans,
) -> GreenNode {
    let mut builder = GreenNodeBuilder::new();
    builder.start_node(DdlogLanguage::kind_to_raw(SyntaxKind::N_DATALOG_PROGRAM));

    // collect each (slice, kind) into a SpanCursor in a Vec
    let mut cursors: Vec<SpanCursor> = vec![
        (spans.imports(),      SyntaxKind::N_IMPORT_STMT),
        (spans.typedefs(),     SyntaxKind::N_TYPE_DEF),
        (spans.relations(),    SyntaxKind::N_RELATION_DECL),
        (spans.indexes(),      SyntaxKind::N_INDEX),
        (spans.functions(),    SyntaxKind::N_FUNCTION),
        (spans.transformers(), SyntaxKind::N_TRANSFORMER),
        (spans.rules(),        SyntaxKind::N_RULE),
    ]
    .into_iter()
    .map(|(slice, kind)| SpanCursor::new(slice, kind))
    .collect();

    for &(_, ref span) in tokens {
        let pos = span.start;
        // one loop to advance & start
        for cur in &mut cursors {
            cur.advance_to(pos);
            cur.start_if(&mut builder, pos);
        }

        push_token(&mut builder, kind, span.clone(), src);

        // one loop to finish
        let pos = span.end;
        for cur in &mut cursors {
            cur.finish_if(&mut builder, pos);
        }
    }

    builder.finish_node();
    builder.finish()
}
\`\`\`

## Key Changes
- Removed \`SPAN_CURSOR_COUNT\` and the fixed-size \`[SpanCursor; N]\`
- Build a \`Vec<SpanCursor>\` from a \`Vec<(slice, kind)>\` – adding new categories is now \"just\" another tuple
- Kept your \`SpanCursor\` impl completely unchanged

## Benefits
This preserves all the existing behavior but is fully data-driven and unblocks adding/removing categories without touching a constant or a struct definition.

## References
- PR: https://github.com/leynos/ddlint/pull/79
- Comment: https://github.com/leynos/ddlint/pull/79#issuecomment-$(date +%s)" \
  --assignee leynos

echo "Successfully created 5 GitHub issues for the suggested improvements"

Length of output: 8853


Perfect! I've successfully created 5 GitHub issues to track all the suggested improvements:

  1. Issue Consider enabling span validation in release builds #84: Consider enabling span validation in release builds - Addresses the debug-only validation concern
  2. Issue Add bounds checking for token spans in CST construction #85: Add bounds checking for token spans in CST construction - Handles potential out-of-bounds token spans with suggested bounds checking
  3. Issue Optimize span usage to avoid unnecessary cloning #86: Optimize span usage to avoid unnecessary cloning - Performance optimization for span handling
  4. Issue Consider using macro to generate ParsedSpans boilerplate #87: Consider using macro to generate ParsedSpans boilerplate - Code generation approach to reduce repetitive builder code
  5. Issue Replace fixed-size SpanCursors with flexible Vec approach #88: Replace fixed-size SpanCursors with flexible Vec approach - More flexible data-driven span cursor management

Each issue includes:

  • Detailed problem description and rationale
  • Specific file locations and code context
  • Complete implementation suggestions with code examples
  • Benefits and improvements each change would provide
  • Proper references back to this PR and comment thread
  • Assignment to you for future consideration

These represent excellent follow-up improvements that can enhance the robustness, performance, and maintainability of the CST construction implementation while preserving all existing functionality!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant