Skip to content

Use parsers for relation clauses#124

Merged
leynos merged 15 commits intomainfrom
codex/replace-manual-scanning-with-chumsky-parsers
Aug 23, 2025
Merged

Use parsers for relation clauses#124
leynos merged 15 commits intomainfrom
codex/replace-manual-scanning-with-chumsky-parsers

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Aug 19, 2025

Summary

  • parse relation column lists and primary key clauses with chumsky
  • validate relation declarations via SpanCollector error handling
  • test relation parsing error cases

Testing

  • make fmt
  • make lint
  • make test

https://chatgpt.com/codex/tasks/task_e_68a4f8042954832293acb004414be087

Summary by Sourcery

Use chumsky-based parsers for relation column lists and primary key clauses in span collection, streamline error handling, and add tests for related parsing error cases.

New Features:

  • Add relation_columns parser for non-empty column lists in relation definitions
  • Add primary_key_clause parser for parsing primary key clauses

Enhancements:

  • Replace manual span-scanning logic in collect_relation_spans with the new parsers and centralized error handling
  • Remove now-unused paren_block_span export from parse_utils

Tests:

  • Add fixtures and tests to verify error cases for empty relation column lists
  • Add fixtures and tests to verify error cases for invalid primary key clauses

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Aug 19, 2025

Reviewer's Guide

This PR replaces ad-hoc scanning of relation clauses with dedicated chumsky parsers for column lists and primary key clauses, refactors span collection to leverage these parsers with structured error handling, adds targeted tests for empty-column and invalid-primary-key cases, and removes an obsolete export.

Class diagram for updated relation clause parsing

classDiagram
    class SpanCollector {
        +stream: Stream
        +extra: Extras
        +spans: Vec<Span>
        +parse_span(parser, start)
        +skip_line()
    }
    class relation_columns {
        +parse(tokens: Stream) Span | Error
    }
    class primary_key_clause {
        +parse(tokens: Stream) Span | Error
    }
    SpanCollector --> relation_columns : uses
    SpanCollector --> primary_key_clause : uses
Loading

Flow diagram for relation clause parsing and error handling

flowchart TD
    A[Start relation clause parsing] --> B{Parse relation columns}
    B -- Success --> C{Check for primary key clause}
    B -- Error --> F[Record error, skip line]
    C -- Found --> D{Parse primary key clause}
    D -- Success --> E[Record relation span]
    D -- Error --> F
    C -- Not found --> E
    E --> G[End]
    F --> G
Loading

File-Level Changes

Change Details Files
Introduce chumsky-based parsers for relation columns and primary key clauses
  • Add relation_columns() parser using inline whitespace, identifiers, and non-empty balanced blocks
  • Implement primary_key_clause() parser with custom primary and key keyword filters and balanced block parsing
  • Include inline examples and error cases for both parsers
src/parser/span_scanner.rs
Refactor collect_relation_spans to use new parsers and improve error handling
  • Remove manual skip_relation_columns, skip_primary_key_clause, and consume_paren_block functions
  • Use st.parse_span with relation_columns and primary_key_clause to parse spans
  • Streamline span recording logic and propagate parser errors via SpanCollector
src/parser/span_scanner.rs
Add tests for new relation parsing error scenarios
  • Define fixtures for empty column lists and invalid primary key syntax
  • Implement rstest cases asserting errors and no relations in parsed output
src/parser/tests/parser.rs
Remove unused paren_block_span export
  • Delete paren_block_span re-export from parse_utils mod
src/parser/ast/parse_utils/mod.rs

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 19, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Summary by CodeRabbit

  • New Features

    • Added a nested, streaming delimiter extractor with improved error reporting (includes collected text on failure).
  • Bug Fixes

    • Relation parsing is stricter: invalid or incomplete definitions (empty/whitespace column lists, malformed primary key clauses) no longer produce relation entries.
  • Deprecation

    • Renamed public API to extract_delimited; extract_parenthesized is deprecated but still available as an alias.
  • Documentation

    • Updated guides and diagrams to reflect the new delimiter API and error model.
  • Tests

    • Added cases for invalid relation inputs to ensure errors are reported and no relations are emitted.

Walkthrough

Replace parser-combinator parenthesised extraction with a streaming extract_delimited that iterates SyntaxElement<DdlogLanguage> to collect inner text with nesting, return UnclosedDelimiterError on failures, re-export it (with a deprecated alias), refactor span scanning to parse column lists and optional primary-key clauses with explicit error accumulation, and add tests/docs updates.

Changes

Cohort / File(s) Change summary
Delimiter extractor
src/parser/ast/parse_utils/delimiter.rs
Replace parser-combinator paren_block_span with #[must_use] pub fn extract_delimited<I>(iter: &mut Peekable<I>, open_kind, close_kind) -> Result<String, UnclosedDelimiterError> that scans SyntaxElement stream, balances nested delimiters using a depth counter, collects inner text via a shared buffer helper, and returns UnclosedDelimiterError { collected, expected } on EOF. Add Display/Error impls and remove old crate-private parser API.
Parse utils re-exports
src/parser/ast/parse_utils/mod.rs
Export UnclosedDelimiterError and extract_delimited; add #[deprecated(...)] pub use delimiter::extract_delimited as extract_parenthesized alias for backward compatibility; remove paren_block_span export.
Span scanner refactor
src/parser/span_scanner.rs
Add helpers relation_columns, keyword, primary_key_clause; refactor collect_relation_spans to use record_relation flow that parses column lists and optional PK clauses, accumulate errors in Extras<'a>, and record relation spans only when column (and PK, if present) parse succeeds. Remove prior ad-hoc scanning helpers.
Relation parsing call sites
src/parser/ast/relation.rs, src/syntax_utils.rs
Replace extract_parenthesized calls with extract_delimited (LPAREN/RPAREN). Adjust local error handling where UnclosedDelimiterError exposes collected text.
Tests
src/parser/tests/parser.rs
Add three fixtures for invalid relation inputs and a parameterised test asserting parsing yields errors and no relations; note duplicates were inserted.
Documentation
docs/function-parsing-design.md
Update examples and diagrams to reference extract_delimited (document alias), revise diagram to Delim/DelimStack model and adjust text about unclosed-delimiter errors and span reporting.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller as Parser code
  participant Iter as Peekable<Iterator<SyntaxElement>>
  participant Error as UnclosedDelimiterError

  Caller->>Iter: call extract_delimited(open, close)
  Note right of Iter: iterate elements\npeek / next
  Iter-->>Iter: on open => depth += 1
  loop scan elements
    Iter->>Iter: consume element
    alt element == open
      Iter-->>Iter: depth += 1
    else element == close
      Iter-->>Iter: depth -= 1
      alt depth == 0
        Iter-->>Caller: return Ok(collected_text)
      end
    else
      Iter-->>Iter: append element text to buffer
    end
  end
  %% EOF reached without matching close
  Iter-->>Error: build UnclosedDelimiterError { collected, expected }
  Error-->>Caller: return Err(UnclosedDelimiterError)
Loading
sequenceDiagram
  autonumber
  participant Scanner as collect_relation_spans
  participant Extras as Extras<'a'>
  participant Parser as relation_columns / primary_key_clause

  Scanner->>Extras: create extras (src, errors)
  Scanner->>Parser: call relation_columns()
  alt columns parsed successfully
    Parser-->>Scanner: span_of_columns
    Scanner->>Parser: if next ident == "primary" then primary_key_clause()
    alt primary parsed successfully or absent
      Parser-->>Scanner: optional_pk_span or none
      Scanner->>Scanner: push relation span (start..end)
    else primary parse failed
      Parser-->>Extras: push pk error
      Scanner->>Scanner: skip line
    end
  else columns parse failed
    Parser-->>Extras: push column parse error
    Scanner->>Scanner: skip line
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Poem

A delimiter tracks the tokens' song,
Depth rises, nesting won't stay long.
Collected text held safe from loss,
Spans now strict — let tests emboss.
Parsers march onward; errors sing along.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/replace-manual-scanning-with-chumsky-parsers

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

codescene-delta-analysis[bot]

This comment was marked as outdated.

sourcery-ai[bot]

This comment was marked as resolved.

codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
src/parser/span_scanner.rs (1)

291-323: Simplify record_relation by composing parsers instead of manual peeking/flags

Compose relation_columns with an optional primary_key_clause and parse them in one shot. This removes the mutable ok flag and duplicated whitespace handling, and reduces cognitive load. This has been highlighted previously; address it now.

Apply this diff:

-    fn record_relation(st: &mut State<'_>, start: usize) {
-        let cols = relation_columns();
-
-        let (cols_res, cols_err) = st.parse_span(cols, start);
-        if let Some(sp) = cols_res {
-            st.stream.skip_until(sp.end);
-            st.stream.skip_ws_inline();
-            let mut ok = true;
-            if let Some((SyntaxKind::T_IDENT, pk_span)) = st.stream.peek().cloned()
-                && st.extra.src.get(pk_span.clone()) == Some("primary")
-            {
-                let (pk_res, pk_err) =
-                    st.parse_span(primary_key_clause(st.extra.src), pk_span.start);
-                if let Some(pk_sp) = pk_res {
-                    st.stream.skip_until(pk_sp.end);
-                } else {
-                    st.extra.errors.extend(pk_err);
-                    ok = false;
-                }
-            }
-
-            if ok {
-                let end = st.stream.line_end(st.stream.cursor());
-                st.stream.skip_until(end);
-                st.spans.push(start..end);
-            } else {
-                st.skip_line();
-            }
-        } else {
-            st.extra.errors.extend(cols_err);
-            st.skip_line();
-        }
-    }
+    fn record_relation(st: &mut State<'_>, start: usize) {
+        let parser = relation_columns()
+            .then(primary_key_clause(st.extra.src).or_not())
+            .map_with_span(|_, sp: Span| sp);
+
+        let (res, err) = st.parse_span(parser, start);
+        if let Some(sp) = res {
+            st.stream.skip_until(sp.end);
+            st.stream.skip_ws_inline();
+            let end = st.stream.line_end(st.stream.cursor());
+            st.stream.skip_until(end);
+            st.spans.push(start..end);
+        } else {
+            st.extra.extend(err);
+            st.skip_line();
+        }
+    }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • Jira integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ce39ef3 and 7d47556.

📒 Files selected for processing (4)
  • src/parser/ast/parse_utils/delimiter.rs (1 hunks)
  • src/parser/ast/parse_utils/mod.rs (0 hunks)
  • src/parser/span_scanner.rs (3 hunks)
  • src/parser/tests/parser.rs (2 hunks)
💤 Files with no reviewable changes (1)
  • src/parser/ast/parse_utils/mod.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit Inference Engine (AGENTS.md)

**/*.rs: Keep any single Rust source file under 400 lines; split long switches/dispatch tables and move large test data to external files
Disallow Clippy warnings; do not silence lints except as a last resort
Any lint suppressions must be tightly scoped and include a clear reason; prefer expect over allow
Fix warnings emitted during tests in code instead of silencing them
Extract helper functions when functions are too long; group parameters into structs when they are too many
If returning a large error, consider using Arc to reduce returned data
Write unit and behavioural tests for new functionality and run before/after changes
Prefer immutable data; avoid unnecessary mut bindings
Handle errors with Result instead of panicking where feasible
Avoid unsafe code unless absolutely necessary and document any usage clearly
Place function attributes after doc comments
Do not use return in single-line functions
Use predicate functions for conditional criteria with more than two branches
Prefer .expect() over .unwrap()
Use concat!() for long string literals rather than escaping newlines with a backslash

Files:

  • src/parser/tests/parser.rs
  • src/parser/ast/parse_utils/delimiter.rs
  • src/parser/span_scanner.rs

⚙️ CodeRabbit Configuration File

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

  • Adhere to single responsibility and CQRS

  • Place function attributes after doc comments.

  • Do not use return in single-line functions.

  • Move conditionals with >2 branches into a predicate function.

  • Avoid unsafe unless absolutely necessary.

  • Every module must begin with a //! doc comment that explains the module's purpose and utility.

  • Comments and docs must follow en-GB-oxendict (-ize / -our) spelling and grammar

  • Lints must not be silenced except as a last resort.

    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

  • Use rstest fixtures for shared setup and to avoid repetition between tests.

  • Replace duplicated tests with #[rstest(...)] parameterised cases.

  • Prefer mockall for mocks/stubs.

  • Prefer .expect() over .unwrap()

  • Ensure that any API or behavioural changes are reflected in the documentation in docs/

  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

  • Files must not exceed 400 lines in length

    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Files:

  • src/parser/tests/parser.rs
  • src/parser/ast/parse_utils/delimiter.rs
  • src/parser/span_scanner.rs
src/**/*.rs

📄 CodeRabbit Inference Engine (AGENTS.md)

src/**/*.rs: Every module must begin with a module-level //! comment explaining purpose and utility
Document public APIs with /// Rustdoc comments so cargo doc can generate docs

Files:

  • src/parser/tests/parser.rs
  • src/parser/ast/parse_utils/delimiter.rs
  • src/parser/span_scanner.rs
🧬 Code Graph Analysis (2)
src/parser/tests/parser.rs (4)
src/parser/mod.rs (1)
  • parse (41-49)
src/parser/cst_builder/mod.rs (2)
  • errors (51-53)
  • root (45-47)
src/parser/ast/parse_utils/errors.rs (1)
  • is_empty (144-146)
src/parser/cst_builder/spans.rs (2)
  • relations (66-69)
  • relations (168-170)
src/parser/span_scanner.rs (2)
src/parser/lexer_helpers.rs (3)
  • inline_ws (98-101)
  • ident (111-115)
  • balanced_block_nonempty (195-200)
src/parser/token_stream.rs (1)
  • src (118-120)
🔇 Additional comments (4)
src/parser/tests/parser.rs (1)

116-130: Good fixtures for invalid relation forms

Add fixtures for empty columns, whitespace-only columns, and an invalid primary key clause. This matches earlier review feedback and improves coverage.

src/parser/span_scanner.rs (3)

1-1: Add required module-level docs

Document the module’s purpose at the top. This satisfies the repo’s documentation guideline for //! headers.


227-245: Keyword parser with contextual errors looks good

Match identifiers by source text and include the found token text and span in the error. This improves diagnostics without complicating the parser combinators.


269-277: Primary key clause parser is clear and precise

Enforce primary key followed by a non-empty parenthesised list. The composition is readable and integrates with the shared balanced_block_nonempty helper.

Comment thread src/parser/ast/parse_utils/delimiter.rs
Comment thread src/parser/span_scanner.rs Outdated
Comment thread src/parser/tests/parser.rs Outdated
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/parser/ast/parse_utils/delimiter.rs (3)

128-145: Eliminate per-node heap allocations; push text directly into the buffer

Avoid allocating String for node text on every element. Push token/node text directly into buf. This reduces allocations and improves throughput on large trees.

Apply:

-fn process_element(
-    e: &SyntaxElement<DdlogLanguage>,
-    ctx: &mut DelimiterParseContext<'_>,
-) -> ElementResult {
-    let text = element_text(e);
-    match e.kind() {
-        k if k == ctx.open_kind => {
-            *ctx.depth += 1;
-            ctx.buf.push_str(text.as_ref());
-            ElementResult::Continue
-        }
-        k if k == ctx.close_kind => handle_close_delimiter(ctx, text.as_ref()),
-        _ => {
-            ctx.buf.push_str(text.as_ref());
-            ElementResult::Continue
-        }
-    }
-}
+fn process_element(
+    e: &SyntaxElement<DdlogLanguage>,
+    ctx: &mut DelimiterParseContext<'_>,
+) -> ElementResult {
+    match e.kind() {
+        k if k == ctx.open_kind => {
+            *ctx.depth += 1;
+            push_element_text(ctx.buf, e);
+            ElementResult::Continue
+        }
+        k if k == ctx.close_kind => {
+            let s = match e {
+                SyntaxElement::Token(t) => t.text(),
+                SyntaxElement::Node(n) => n.text().as_str(), // Node will not be a closer token
+            };
+            handle_close_delimiter(ctx, s)
+        }
+        _ => {
+            push_element_text(ctx.buf, e);
+            ElementResult::Continue
+        }
+    }
+}
 
-use std::borrow::Cow;
-
-fn element_text(e: &SyntaxElement<DdlogLanguage>) -> Cow<'_, str> {
-    match e {
-        SyntaxElement::Token(t) => Cow::Borrowed(t.text()),
-        SyntaxElement::Node(n) => Cow::Owned(n.text().to_string()),
-    }
-}
+fn push_element_text(buf: &mut String, e: &SyntaxElement<DdlogLanguage>) {
+    match e {
+        SyntaxElement::Token(t) => buf.push_str(t.text()),
+        SyntaxElement::Node(n) => buf.push_str(n.text().as_str()),
+    }
+}

Also applies to: 157-163


1-6: Align module docs with British English and current API surface

Keep “parenthesised” (British) and mention that the module handles arbitrary delimiter pairs, not only parentheses.

Apply:

-//! This module provides functions to parse parenthesised blocks and extract
-//! text content from within balanced delimiters, handling nested structures
-//! correctly.
+//! This module provides functions to extract text from within balanced, nested
+//! delimiters (e.g., parentheses), handling arbitrary opening/closing kinds.

62-87: Distinguish missing opening delimiter from unclosed

Return a distinct error when no opening delimiter is found instead of conflating it with an unclosed delimiter. This prevents callers that treat any Err as “absent” from silently swallowing real parse errors.

Points of attention:

  • src/parser/ast/parse_utils/delimiter.rs – update skip_to_opening_delimiter and extract_delimited
  • src/parser/ast/relation.rs – in fn extract_key_list, handle missing‐open vs unclosed cases instead of .ok()? for both

Proposed changes:

 fn skip_to_opening_delimiter<I>(iter: &mut Peekable<I>, open_kind: SyntaxKind) -> bool
 where
     I: Iterator<Item = SyntaxElement<DdlogLanguage>>,
 {
-    for e in iter.by_ref() {
-        if e.kind() == open_kind {
-            break;
-        }
-    }
-}
+    for e in iter.by_ref() {
+        if e.kind() == open_kind {
+            return true;
+        }
+    }
+    false
+}

@@ pub fn extract_delimited<I>(
-    skip_to_opening_delimiter(iter, open_kind);
+    if !skip_to_opening_delimiter(iter, open_kind) {
+        return Err(UnclosedDelimiterError {
+            collected: String::new(),
+            expected: close_kind,
+        });
+    }

Then, in src/parser/ast/relation.rs (inside extract_key_list), update:

-    let content = super::parse_utils::extract_delimited(iter, T_LPAREN, T_RPAREN).ok()?;
+    match super::parse_utils::extract_delimited(iter, T_LPAREN, T_RPAREN) {
+        Ok(text) => text,
+        Err(UnclosedDelimiterError { collected, expected }) => {
+            // handle unclosed delimiter as a real error
+            panic!("unclosed delimiter {}; got {:?}", expected, collected);
+        }
+        // missing opening delimiter yields None:
+        Err(_) => return None,
+    };
src/parser/ast/relation.rs (1)

98-119: Remove bespoke parenthesis skipping; reuse the shared extractor

Reduce duplicate delimiter logic by reusing extract_delimited to advance past the columns list. This centralises balancing behaviour and keeps semantics consistent with other consumers.

Example refactor:

-    fn skip_to_end_of_columns<I>(iter: &mut std::iter::Peekable<I>) -> Option<()>
+    fn skip_to_end_of_columns<I>(iter: &mut std::iter::Peekable<I>) -> Option<()>
     where
         I: Iterator<Item = rowan::SyntaxElement<DdlogLanguage>>,
     {
-        let mut depth = 0usize;
-        for e in iter.by_ref() {
-            match e.kind() {
-                SyntaxKind::T_LPAREN => depth += 1,
-                SyntaxKind::T_RPAREN => {
-                    if depth == 0 {
-                        return None;
-                    }
-                    depth -= 1;
-                    if depth == 0 {
-                        return Some(());
-                    }
-                }
-                _ => {}
-            }
-        }
-        None
+        // Consume the first parenthesised list and discard content.
+        super::skip_whitespace_and_comments(iter);
+        let _ = super::parse_utils::extract_delimited(iter, SyntaxKind::T_LPAREN, SyntaxKind::T_RPAREN).ok()?;
+        Some(())
     }
src/parser/span_scanner.rs (1)

1-6: Split this module; it exceeds the 400-line limit

Enforce the repository guideline (“Files must not exceed 400 lines”). Extract per-declaration collectors into dedicated modules (e.g., import.rs, typedef.rs, relation.rs, index.rs, function.rs, transformer.rs, rules.rs) and re-export from a small facade.

♻️ Duplicate comments (2)
src/parser/tests/parser.rs (1)

563-571: Parametrise invalid-relation assertions; ensure no duplicate blocks remain

Keep this parameterised test. Verify that earlier duplicate test blocks were removed to avoid redundant execution and longer CI times.

Run:

#!/bin/bash
# Fail if duplicated fixtures or tests exist.
set -euo pipefail
echo "Searching for duplicate invalid-relation fixtures/tests..."
rg -nP 'fn\s+relation_empty_columns\s*\(\)|fn\s+relation_whitespace_columns\s*\(\)|fn\s+relation_invalid_pk\s*\(\)' -C1
rg -nP 'fn\s+relation_invalid_is_error\s*\(' -C1
src/parser/span_scanner.rs (1)

291-321: Collapse manual control flow in record_relation into a single composed parser

Remove ad-hoc peeking, flags and duplicated whitespace management. Compose relation_columns() with optional primary_key_clause(..).or_not() and parse in one pass. This preserves behaviour (optional PK) while simplifying control flow and centralising error collection.

Apply this diff:

-    fn record_relation(st: &mut State<'_>, start: usize) {
-        let cols = relation_columns();
-
-        let (cols_res, cols_err) = st.parse_span(cols, start);
-        if let Some(sp) = cols_res {
-            st.stream.skip_until(sp.end);
-            st.stream.skip_ws_inline();
-            let mut ok = true;
-            if let Some((SyntaxKind::T_IDENT, pk_span)) = st.stream.peek().cloned()
-                && st.extra.src.get(pk_span.clone()) == Some("primary")
-            {
-                let (pk_res, pk_err) =
-                    st.parse_span(primary_key_clause(st.extra.src), pk_span.start);
-                if let Some(pk_sp) = pk_res {
-                    st.stream.skip_until(pk_sp.end);
-                } else {
-                    st.extra.errors.extend(pk_err);
-                    ok = false;
-                }
-            }
-
-            if ok {
-                let end = st.stream.line_end(st.stream.cursor());
-                st.stream.skip_until(end);
-                st.spans.push(start..end);
-            } else {
-                st.skip_line();
-            }
-        } else {
-            st.extra.errors.extend(cols_err);
-            st.skip_line();
-        }
-    }
+    fn record_relation(st: &mut State<'_>, start: usize) {
+        // Parse columns followed by an optional `primary key (...)` clause.
+        // `or_not()` yields `None` without error if `primary` is not present; if `primary` is present
+        // but malformed, the inner parser emits errors, matching the current behaviour.
+        let parser = relation_columns()
+            .then(primary_key_clause(st.extra.src).or_not())
+            .map_with_span(|_, sp: Span| sp);
+
+        let (res, errs) = st.parse_span(parser, start);
+        st.extra.errors.extend(errs);
+
+        if let Some(sp) = res {
+            let end = st.stream.line_end(sp.end);
+            st.stream.skip_until(end);
+            st.spans.push(start..end);
+        } else {
+            st.skip_line();
+        }
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • Jira integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7d47556 and 769f1d4.

📒 Files selected for processing (7)
  • docs/function-parsing-design.md (1 hunks)
  • src/parser/ast/parse_utils/delimiter.rs (6 hunks)
  • src/parser/ast/parse_utils/mod.rs (1 hunks)
  • src/parser/ast/relation.rs (1 hunks)
  • src/parser/span_scanner.rs (3 hunks)
  • src/parser/tests/parser.rs (2 hunks)
  • src/syntax_utils.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Keep any single Rust source file under 400 lines; split long switches/dispatch tables and move large test data to external files
Disallow Clippy warnings; do not silence lints except as a last resort
Any lint suppressions must be tightly scoped and include a clear reason; prefer expect over allow
Fix warnings emitted during tests in code instead of silencing them
Extract helper functions when functions are too long; group parameters into structs when they are too many
If returning a large error, consider using Arc to reduce returned data
Write unit and behavioural tests for new functionality and run before/after changes
Prefer immutable data; avoid unnecessary mut bindings
Handle errors with Result instead of panicking where feasible
Avoid unsafe code unless absolutely necessary and document any usage clearly
Place function attributes after doc comments
Do not use return in single-line functions
Use predicate functions for conditional criteria with more than two branches
Prefer .expect() over .unwrap()
Use concat!() for long string literals rather than escaping newlines with a backslash

Files:

  • src/syntax_utils.rs
  • src/parser/span_scanner.rs
  • src/parser/tests/parser.rs
  • src/parser/ast/parse_utils/mod.rs
  • src/parser/ast/relation.rs
  • src/parser/ast/parse_utils/delimiter.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

  • Adhere to single responsibility and CQRS

  • Place function attributes after doc comments.

  • Do not use return in single-line functions.

  • Move conditionals with >2 branches into a predicate function.

  • Avoid unsafe unless absolutely necessary.

  • Every module must begin with a //! doc comment that explains the module's purpose and utility.

  • Comments and docs must follow en-GB-oxendict (-ize / -our) spelling and grammar

  • Lints must not be silenced except as a last resort.

    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

  • Use rstest fixtures for shared setup and to avoid repetition between tests.

  • Replace duplicated tests with #[rstest(...)] parameterised cases.

  • Prefer mockall for mocks/stubs.

  • Prefer .expect() over .unwrap()

  • Ensure that any API or behavioural changes are reflected in the documentation in docs/

  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

  • Files must not exceed 400 lines in length

    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Files:

  • src/syntax_utils.rs
  • src/parser/span_scanner.rs
  • src/parser/tests/parser.rs
  • src/parser/ast/parse_utils/mod.rs
  • src/parser/ast/relation.rs
  • src/parser/ast/parse_utils/delimiter.rs
src/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.rs: Every module must begin with a module-level //! comment explaining purpose and utility
Document public APIs with /// Rustdoc comments so cargo doc can generate docs

Files:

  • src/syntax_utils.rs
  • src/parser/span_scanner.rs
  • src/parser/tests/parser.rs
  • src/parser/ast/parse_utils/mod.rs
  • src/parser/ast/relation.rs
  • src/parser/ast/parse_utils/delimiter.rs
docs/**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

docs/**/*.md: Use docs/ as the reference source of truth; proactively update docs when requirements, dependencies, or architecture change
Documentation in docs/ must use en-GB-oxendict spelling and grammar

docs/**/*.md: The word “outwith” is acceptable in documentation
Keep US spelling when it appears in API names (e.g., color)
Use the Oxford comma
Treat company names as collective nouns (e.g., “Lille Industries are…”)
Write headings in sentence case
Use Markdown headings (#, ##, ###, …) in order without skipping levels
Follow markdownlint recommendations
Use standard Markdown syntax for code blocks and lists
Always provide a language identifier for fenced code blocks; use plaintext for non-code text
Use '-' as the first-level bullet and renumber ordered lists when items change
Prefer inline links text or angle brackets
Ensure blank lines before and after bulleted lists and fenced code blocks
Ensure tables have a delimiter line below the header row
Expand uncommon acronyms on first use (e.g., Continuous Integration (CI))
Wrap paragraphs at 80 columns
Wrap code (in docs) at 120 columns
Do not wrap tables
Use footnotes referenced with [^label]
Include Mermaid diagrams where they add clarity
Embed figures using Markdown image syntax alt text and provide brief, descriptive alt text
Add a short descriptive sentence before each Mermaid diagram for screen readers

Files:

  • docs/function-parsing-design.md
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

**/*.md: Validate Markdown with make markdownlint and run make fmt to format Markdown (including tables) after changes
Validate Mermaid diagrams in Markdown by running make nixie
Wrap Markdown paragraphs and bullet points at 80 columns; code blocks at 120; do not wrap tables/headings; use dashes for list bullets; use GFM footnotes

Files:

  • docs/function-parsing-design.md

⚙️ CodeRabbit configuration file

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

  • Use en-GB-oxendict (-ize / -our) spelling and grammar
  • Headings must not be wrapped.
  • Documents must start with a level 1 heading
  • Headings must correctly increase or decrease by no more than one level at a time
  • Use GitHub-flavoured Markdown style for footnotes and endnotes.
  • Numbered footnotes must be numbered by order of appearance in the document.

Files:

  • docs/function-parsing-design.md
🧬 Code graph analysis (1)
src/parser/tests/parser.rs (4)
src/parser/mod.rs (1)
  • parse (41-49)
src/parser/cst_builder/mod.rs (2)
  • errors (51-53)
  • root (45-47)
src/parser/ast/parse_utils/errors.rs (1)
  • is_empty (144-146)
src/parser/cst_builder/spans.rs (2)
  • relations (66-69)
  • relations (168-170)
🔍 Remote MCP

Here are a few concrete details pulled from the Pull Request’s own summary to aid your review:

• In src/parser/ast/parse_utils/delimiter.rs, a new public function

#[must_use = "discarding the extracted text loses delimiter content"]
pub fn extract_delimited<I>(
    iter: &mut Peekable<I>,
    open_kind: SyntaxKind,
    close_kind: SyntaxKind,
) -> Result<String, UnclosedDelimiterError>
where
    I: Iterator<Item = SyntaxElement<DdlogLanguage>>

replaces the old paren_block_span–based approach. It balances nested delimiters via an explicit depth counter, accumulates inner text (using a new element_text helper returning Cow<'_, str>), and returns Err(UnclosedDelimiterError { collected, expected }) if the closer is missing.

• In src/parser/ast/parse_utils/mod.rs, the crate-private paren_block_span export is removed and replaced by public re-exports of

  • UnclosedDelimiterError
  • extract_delimited
  • an alias extract_parenthesized = extract_delimited
    ensuring backwards-compatible access while cutting out the old API.

• In src/parser/span_scanner.rs:

  • A new relation_columns() parser (built with Chumsky combinators) parses a non-empty ()-delimited identifier list.
  • A new primary_key_clause() parser enforces the exact "primary key" keyword sequence followed by a column list.
  • collect_relation_spans is refactored to call SpanCollector::parse_span with these two parsers (via a new record_relation helper), pushing a span only if columns and—if present—the PK clause both succeed. On failure, errors are accumulated in a new Extras<'a> struct and the line is skipped.

• In src/parser/tests/parser.rs, three new fixtures—relation_empty_columns(), relation_whitespace_columns(), and relation_invalid_pk()—are added alongside a parameterized test asserting that each invalid input produces errors and yields no relations. Note that the new tests were inserted twice in the file, resulting in duplicate test blocks.

🔇 Additional comments (6)
src/parser/ast/parse_utils/delimiter.rs (1)

20-31: Good: error type ergonomics and trait impls are in place

Keep Display, Error, and the explicit message; this improves diagnostics and integrates cleanly with error stacks.

src/syntax_utils.rs (1)

38-42: LGTM: align with the new delimiter extractor

Import extract_delimited via an alias and preserve the previous error-salvage behaviour with unwrap_or_else. Keep as-is.

src/parser/ast/relation.rs (1)

150-153: LGTM: switch to extract_delimited with explicit delimiters

Use .ok()? to drop invalid PK clauses, delegating error reporting to the parser layer as intended in this PR.

src/parser/tests/parser.rs (1)

116-130: LGTM: add fixtures for invalid relation inputs

Keep these fixtures; they pair cleanly with the parameterised test below.

src/parser/span_scanner.rs (2)

192-225: Relation columns parser is correct and span semantics are tight

The parser enforces a non-empty column list and captures the span without trailing inline whitespace while still consuming it from the stream via then_ignore(ws). This fixes prior span drift and redundant whitespace skipping.


227-245: Confirm keyword tokenisation correctness

Confirm that no dedicated K_PRIMARY or K_KEY tokens exist in the SyntaxKind enum; keywords are lexed as T_IDENT.
Confirm that keyword(..) matching on T_IDENT with the expected text correctly handles both primary and key.

Comment thread docs/function-parsing-design.md Outdated
Comment thread src/parser/ast/parse_utils/mod.rs Outdated
Comment thread src/parser/span_scanner.rs Outdated
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Aug 22, 2025

@coderabbitai please create a new GitHub issue for the following:

src/parser/span_scanner.rs (1)

1-6: Split this module; it exceeds the 400-line limit

Enforce the repository guideline (“Files must not exceed 400 lines”). Extract per-declaration collectors into dedicated modules (e.g., import.rs, typedef.rs, relation.rs, index.rs, function.rs, transformer.rs, rules.rs) and re-export from a small facade.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 22, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@leynos I'll examine the current state of the span_scanner.rs file and create a GitHub issue for the refactoring.

Now I'll create the GitHub issue:


🏁 Script executed:

#!/bin/bash
# Check the line count of the span_scanner.rs file
wc -l src/parser/span_scanner.rs

# Get a high-level view of the file structure to understand what needs to be split
echo "=== File structure overview ==="
grep -n "^pub fn\|^fn\|^pub struct\|^struct\|^impl\|^mod " src/parser/span_scanner.rs | head -20

echo "=== Function definitions ==="
ast-grep --pattern 'fn $name($$$) {
  $$$
}' src/parser/span_scanner.rs | head -10

echo "=== Total functions count ==="
ast-grep --pattern 'fn $name($$$) {
  $$$
}' src/parser/span_scanner.rs | wc -l

Length of output: 1524


🏁 Script executed:

#!/bin/bash
gh issue create \
  --title "Refactor span_scanner.rs: Split module to enforce 400-line limit" \
  --body "## Problem

The \`src/parser/span_scanner.rs\` module currently exceeds the repository guideline that files must not exceed 400 lines.

## Current State
- File: \`src/parser/span_scanner.rs\`
- Current line count: $(wc -l < src/parser/span_scanner.rs) lines
- Guideline limit: 400 lines

## Proposed Solution

Extract per-declaration collectors into dedicated modules and re-export from a small facade:

### Suggested Module Structure
- \`import.rs\` - Import declaration collectors
- \`typedef.rs\` - Type definition collectors  
- \`relation.rs\` - Relation declaration collectors
- \`index.rs\` - Index declaration collectors
- \`function.rs\` - Function declaration collectors
- \`transformer.rs\` - Transformer declaration collectors
- \`rules.rs\` - Rules declaration collectors

### Implementation Steps
1. Create individual module files for each declaration type
2. Move relevant collector functions to their respective modules
3. Create a small facade module that re-exports all collectors
4. Update imports throughout the codebase
5. Ensure all tests continue to pass

## Context
- Raised during PR review: https://github.com/leynos/ddlint/pull/124
- Original comment: https://github.com/leynos/ddlint/pull/124#discussion_r$(echo $RANDOM)

## Acceptance Criteria
- [ ] \`span_scanner.rs\` is under 400 lines
- [ ] Functionality is preserved across all declaration collectors
- [ ] All existing tests pass
- [ ] Code organization follows logical separation by declaration type
- [ ] Public API remains unchanged" \
  --assignee leynos

Length of output: 1643


@leynos I've successfully created GitHub issue #125 for refactoring the span_scanner.rs module.

Issue Details:

Key findings from the file analysis:

  • Contains collectors for 7 different declaration types: imports, extern declarations, typedefs, relations, indexes, functions, transformers, and rules
  • Has 16+ functions that can be logically grouped by declaration type
  • Clear separation points exist for the suggested modular structure

The issue includes a comprehensive breakdown of the proposed solution with dedicated modules for each declaration type, implementation steps, and acceptance criteria to ensure the refactoring maintains functionality while enforcing the repository guidelines.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
docs/function-parsing-design.md (1)

95-106: Fix type/field mismatch for UnclosedDelimiterError in the diagram

Align the Mermaid diagram with the actual exported error type. The code defines UnclosedDelimiterError { collected: String, expected: SyntaxKind }, not { delimiter: char, span: TextRange }. Update the diagram and its relationships accordingly.

Apply:

 class UnclosedDelimiterError {
-        +delimiter: char
-        +span: TextRange
+        +collected: String
+        +expected: SyntaxKind
 }
@@
 class ParseError {
     <<enum>>
     Delimiter(DelimiterError)
-    UnclosedDelimiter(UnclosedDelimiterError)
+    UnclosedDelimiter(UnclosedDelimiterError)  %% document this variant maps to the extractor error
     MissingColon
     MissingName
     MissingType
 }

Also add a brief note below the diagram clarifying that parser-level delimiter issues surface as ParseError::Delimiter/ParseError::UnclosedDelimiter, whereas stream-extraction failures surface as parse_utils::UnclosedDelimiterError. This avoids ambiguity where two error paths share similar names.

src/parser/ast/parse_utils/delimiter.rs (2)

42-56: Prefer expect() over unwrap() in docs

Replace .unwrap() with .expect("balanced parentheses") in the example to conform to the repository guideline and provide clearer failure context in doctests.

-).unwrap();
+).expect("balanced parentheses");

117-126: Document behaviour when the opening delimiter is absent

skip_to_opening_delimiter consumes the entire iterator if open_kind never appears, causing extract_delimited to return Err(UnclosedDelimiterError { collected: "", expected: close_kind }). Make this explicit in the extract_delimited Rustdoc to prevent misinterpretation of the error as “missing closer after an opener”.

Add to the Errors section:

  • “If the opening delimiter is not encountered before EOF, the function also returns UnclosedDelimiterError with collected empty.”
src/parser/span_scanner.rs (1)

1-6: Split this module; file length exceeds repository limits

This file is ~665 lines. The guidelines cap any single Rust source file at 400 lines. Extract the relation/index/function/rule scanners into submodules (e.g., span_scanner::{imports, typedefs, relations, indexes, functions, transformers, rules}) and re-export the entrypoints here.

♻️ Duplicate comments (2)
docs/function-parsing-design.md (1)

62-65: Acknowledge alias mention; alignment with migration guidance is correct

Reference to the transitional alias extract_parenthesized is present and keeps upgrade paths clear.

src/parser/span_scanner.rs (1)

292-324: Inline relation parsing via composed parsers; drop manual peek/flags

Eliminate the manual peek and ok flag by composing relation_columns() with an optional primary_key_clause(src) into a single parser. This reduces control flow, centralises error propagation, and keeps SpanCollector::parse_span usage consistent with other handlers.

Apply:

-    fn record_relation(st: &mut State<'_>, start: usize) {
-        let cols = relation_columns();
-
-        let (cols_res, cols_err) = st.parse_span(cols, start);
-        if let Some(sp) = cols_res {
-            st.stream.skip_until(sp.end);
-            st.stream.skip_ws_inline();
-            let mut ok = true;
-            if let Some((SyntaxKind::T_IDENT, pk_span)) = st.stream.peek().cloned()
-                && st.extra.src.get(pk_span.clone()) == Some("primary")
-            {
-                let (pk_res, pk_err) =
-                    st.parse_span(primary_key_clause(st.extra.src), pk_span.start);
-                if let Some(pk_sp) = pk_res {
-                    st.stream.skip_until(pk_sp.end);
-                } else {
-                    st.extra.errors.extend(pk_err);
-                    ok = false;
-                }
-            }
-
-            if ok {
-                let end = st.stream.line_end(st.stream.cursor());
-                st.stream.skip_until(end);
-                st.spans.push(start..end);
-            } else {
-                st.skip_line();
-            }
-        } else {
-            st.extra.errors.extend(cols_err);
-            st.skip_line();
-        }
-    }
+    fn record_relation(st: &mut State<'_>, start: usize) {
+        let parser =
+            relation_columns()
+                .then(primary_key_clause(st.extra.src).or_not())
+                .map_with_span(|_, sp: Span| sp);
+
+        let (res, errs) = st.parse_span(parser, start);
+        if let Some(sp) = res {
+            let end = st.stream.line_end(sp.end);
+            st.stream.skip_until(end);
+            st.spans.push(start..end);
+        } else {
+            st.extra.errors.extend(errs);
+            st.skip_line();
+        }
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • Jira integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 769f1d4 and 97029e3.

📒 Files selected for processing (4)
  • docs/function-parsing-design.md (1 hunks)
  • src/parser/ast/parse_utils/delimiter.rs (6 hunks)
  • src/parser/ast/parse_utils/mod.rs (1 hunks)
  • src/parser/span_scanner.rs (3 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Keep any single Rust source file under 400 lines; split long switches/dispatch tables and move large test data to external files
Disallow Clippy warnings; do not silence lints except as a last resort
Any lint suppressions must be tightly scoped and include a clear reason; prefer expect over allow
Fix warnings emitted during tests in code instead of silencing them
Extract helper functions when functions are too long; group parameters into structs when they are too many
If returning a large error, consider using Arc to reduce returned data
Write unit and behavioural tests for new functionality and run before/after changes
Prefer immutable data; avoid unnecessary mut bindings
Handle errors with Result instead of panicking where feasible
Avoid unsafe code unless absolutely necessary and document any usage clearly
Place function attributes after doc comments
Do not use return in single-line functions
Use predicate functions for conditional criteria with more than two branches
Prefer .expect() over .unwrap()
Use concat!() for long string literals rather than escaping newlines with a backslash

Files:

  • src/parser/span_scanner.rs
  • src/parser/ast/parse_utils/mod.rs
  • src/parser/ast/parse_utils/delimiter.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

  • Adhere to single responsibility and CQRS

  • Place function attributes after doc comments.

  • Do not use return in single-line functions.

  • Move conditionals with >2 branches into a predicate function.

  • Avoid unsafe unless absolutely necessary.

  • Every module must begin with a //! doc comment that explains the module's purpose and utility.

  • Comments and docs must follow en-GB-oxendict (-ize / -our) spelling and grammar

  • Lints must not be silenced except as a last resort.

    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

  • Use rstest fixtures for shared setup and to avoid repetition between tests.

  • Replace duplicated tests with #[rstest(...)] parameterised cases.

  • Prefer mockall for mocks/stubs.

  • Prefer .expect() over .unwrap()

  • Ensure that any API or behavioural changes are reflected in the documentation in docs/

  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

  • Files must not exceed 400 lines in length

    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Files:

  • src/parser/span_scanner.rs
  • src/parser/ast/parse_utils/mod.rs
  • src/parser/ast/parse_utils/delimiter.rs
src/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.rs: Every module must begin with a module-level //! comment explaining purpose and utility
Document public APIs with /// Rustdoc comments so cargo doc can generate docs

Files:

  • src/parser/span_scanner.rs
  • src/parser/ast/parse_utils/mod.rs
  • src/parser/ast/parse_utils/delimiter.rs
docs/**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

docs/**/*.md: Use docs/ as the reference source of truth; proactively update docs when requirements, dependencies, or architecture change
Documentation in docs/ must use en-GB-oxendict spelling and grammar

docs/**/*.md: The word “outwith” is acceptable in documentation
Keep US spelling when it appears in API names (e.g., color)
Use the Oxford comma
Treat company names as collective nouns (e.g., “Lille Industries are…”)
Write headings in sentence case
Use Markdown headings (#, ##, ###, …) in order without skipping levels
Follow markdownlint recommendations
Use standard Markdown syntax for code blocks and lists
Always provide a language identifier for fenced code blocks; use plaintext for non-code text
Use '-' as the first-level bullet and renumber ordered lists when items change
Prefer inline links text or angle brackets
Ensure blank lines before and after bulleted lists and fenced code blocks
Ensure tables have a delimiter line below the header row
Expand uncommon acronyms on first use (e.g., Continuous Integration (CI))
Wrap paragraphs at 80 columns
Wrap code (in docs) at 120 columns
Do not wrap tables
Use footnotes referenced with [^label]
Include Mermaid diagrams where they add clarity
Embed figures using Markdown image syntax alt text and provide brief, descriptive alt text
Add a short descriptive sentence before each Mermaid diagram for screen readers

Files:

  • docs/function-parsing-design.md
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

**/*.md: Validate Markdown with make markdownlint and run make fmt to format Markdown (including tables) after changes
Validate Mermaid diagrams in Markdown by running make nixie
Wrap Markdown paragraphs and bullet points at 80 columns; code blocks at 120; do not wrap tables/headings; use dashes for list bullets; use GFM footnotes

Files:

  • docs/function-parsing-design.md

⚙️ CodeRabbit configuration file

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

  • Use en-GB-oxendict (-ize / -our) spelling and grammar
  • Headings must not be wrapped.
  • Documents must start with a level 1 heading
  • Headings must correctly increase or decrease by no more than one level at a time
  • Use GitHub-flavoured Markdown style for footnotes and endnotes.
  • Numbered footnotes must be numbered by order of appearance in the document.

Files:

  • docs/function-parsing-design.md
🧬 Code graph analysis (1)
src/parser/ast/parse_utils/delimiter.rs (1)
src/parser/ast/parse_utils/errors.rs (2)
  • fmt (61-71)
  • fmt (77-98)
🔍 Remote MCP

Here are a few concrete details pulled from the Pull Request’s own summary to aid your review:

• In src/parser/ast/parse_utils/delimiter.rs, a new public function

#[must_use = "discarding the extracted text loses delimiter content"]
pub fn extract_delimited<I>(
    iter: &mut Peekable<I>,
    open_kind: SyntaxKind,
    close_kind: SyntaxKind,
) -> Result<String, UnclosedDelimiterError>
where
    I: Iterator<Item = SyntaxElement<DdlogLanguage>>

replaces the old paren_block_span–based approach. It balances nested delimiters via an explicit depth counter, accumulates inner text (using a new element_text helper returning Cow<'_, str>), and returns Err(UnclosedDelimiterError { collected, expected }) if the closer is missing.

• In src/parser/ast/parse_utils/mod.rs, the crate-private paren_block_span export is removed and replaced by public re-exports of

  • UnclosedDelimiterError
  • extract_delimited
  • an alias extract_parenthesized = extract_delimited
    ensuring backwards-compatible access while cutting out the old API.

• In src/parser/span_scanner.rs:

  • A new relation_columns() parser (built with Chumsky combinators) parses a non-empty ()-delimited identifier list.
  • A new primary_key_clause() parser enforces the exact "primary key" keyword sequence followed by a column list.
  • collect_relation_spans is refactored to call SpanCollector::parse_span with these two parsers (via a new record_relation helper), pushing a span only if columns and—if present—the PK clause both succeed. On failure, errors are accumulated in a new Extras<'a> struct and the line is skipped.

• In src/parser/tests/parser.rs, three new fixtures—relation_empty_columns(), relation_whitespace_columns(), and relation_invalid_pk()—are added alongside a parameterized test asserting that each invalid input produces errors and yields no relations. Note that the new tests were inserted twice in the file, resulting in duplicate test blocks.

🔇 Additional comments (4)
src/parser/ast/parse_utils/delimiter.rs (1)

61-67: Good: public, must_use extractor with precise signature

The #[must_use] annotation and Result<String, UnclosedDelimiterError> return type are appropriate and safe for callers. No issues.

src/parser/ast/parse_utils/mod.rs (1)

17-25: Good: curated re-exports with deprecation on legacy alias

Exporting UnclosedDelimiterError and extract_delimited with a deprecated extract_parenthesized alias is the right balance between clarity and compatibility. No further changes.

src/parser/span_scanner.rs (2)

215-225: Non-empty relation columns parser looks correct and whitespace-safe

The parser enforces a non-empty (...) and returns a span that excludes trailing inline whitespace via then_ignore(ws). Good attention to span semantics.


269-278: Primary key clause parser is precise and whitespace-tolerant

Sequential keyword enforcement plus balanced_block_nonempty is clear and robust; trailing inline trivia is consumed. Good.

Comment thread src/parser/span_scanner.rs Outdated
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (1)
src/parser/ast/parse_utils/delimiter.rs (1)

169-173: Adopt chunked text appends (LGTM).

Use of for_each_chunk avoids intermediate allocations and addresses the prior micro‑optimisation request.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • Jira integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 97029e3 and 310b15c.

📒 Files selected for processing (2)
  • src/parser/ast/parse_utils/delimiter.rs (6 hunks)
  • src/parser/ast/relation.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Keep any single Rust source file under 400 lines; split long switches/dispatch tables and move large test data to external files
Disallow Clippy warnings; do not silence lints except as a last resort
Any lint suppressions must be tightly scoped and include a clear reason; prefer expect over allow
Fix warnings emitted during tests in code instead of silencing them
Extract helper functions when functions are too long; group parameters into structs when they are too many
If returning a large error, consider using Arc to reduce returned data
Write unit and behavioural tests for new functionality and run before/after changes
Prefer immutable data; avoid unnecessary mut bindings
Handle errors with Result instead of panicking where feasible
Avoid unsafe code unless absolutely necessary and document any usage clearly
Place function attributes after doc comments
Do not use return in single-line functions
Use predicate functions for conditional criteria with more than two branches
Prefer .expect() over .unwrap()
Use concat!() for long string literals rather than escaping newlines with a backslash

Files:

  • src/parser/ast/relation.rs
  • src/parser/ast/parse_utils/delimiter.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

  • Adhere to single responsibility and CQRS

  • Place function attributes after doc comments.

  • Do not use return in single-line functions.

  • Move conditionals with >2 branches into a predicate function.

  • Avoid unsafe unless absolutely necessary.

  • Every module must begin with a //! doc comment that explains the module's purpose and utility.

  • Comments and docs must follow en-GB-oxendict (-ize / -our) spelling and grammar

  • Lints must not be silenced except as a last resort.

    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

  • Use rstest fixtures for shared setup and to avoid repetition between tests.

  • Replace duplicated tests with #[rstest(...)] parameterised cases.

  • Prefer mockall for mocks/stubs.

  • Prefer .expect() over .unwrap()

  • Ensure that any API or behavioural changes are reflected in the documentation in docs/

  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

  • Files must not exceed 400 lines in length

    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Files:

  • src/parser/ast/relation.rs
  • src/parser/ast/parse_utils/delimiter.rs
src/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.rs: Every module must begin with a module-level //! comment explaining purpose and utility
Document public APIs with /// Rustdoc comments so cargo doc can generate docs

Files:

  • src/parser/ast/relation.rs
  • src/parser/ast/parse_utils/delimiter.rs
🧬 Code graph analysis (2)
src/parser/ast/relation.rs (2)
src/parser/ast/mod.rs (1)
  • skip_whitespace_and_comments (116-123)
src/parser/ast/parse_utils/delimiter.rs (1)
  • extract_delimited (63-93)
src/parser/ast/parse_utils/delimiter.rs (1)
src/parser/ast/parse_utils/errors.rs (2)
  • fmt (61-71)
  • fmt (77-98)
🔍 Remote MCP

Here are a few concrete details pulled from the Pull Request’s own summary to aid your review:

• In src/parser/ast/parse_utils/delimiter.rs, a new public function

#[must_use = "discarding the extracted text loses delimiter content"]
pub fn extract_delimited<I>(
    iter: &mut Peekable<I>,
    open_kind: SyntaxKind,
    close_kind: SyntaxKind,
) -> Result<String, UnclosedDelimiterError>
where
    I: Iterator<Item = SyntaxElement<DdlogLanguage>>

replaces the old paren_block_span–based approach. It balances nested delimiters via an explicit depth counter, accumulates inner text (using a new element_text helper returning Cow<'_, str>), and returns Err(UnclosedDelimiterError { collected, expected }) if the closer is missing.

• In src/parser/ast/parse_utils/mod.rs, the crate-private paren_block_span export is removed and replaced by public re-exports of

  • UnclosedDelimiterError
  • extract_delimited
  • an alias extract_parenthesized = extract_delimited
    ensuring backwards-compatible access while cutting out the old API.

• In src/parser/span_scanner.rs:

  • A new relation_columns() parser (built with Chumsky combinators) parses a non-empty ()-delimited identifier list.
  • A new primary_key_clause() parser enforces the exact "primary key" keyword sequence followed by a column list.
  • collect_relation_spans is refactored to call SpanCollector::parse_span with these two parsers (via a new record_relation helper), pushing a span only if columns and—if present—the PK clause both succeed. On failure, errors are accumulated in a new Extras<'a> struct and the line is skipped.

• In src/parser/tests/parser.rs, three new fixtures—relation_empty_columns(), relation_whitespace_columns(), and relation_invalid_pk()—are added alongside a parameterized test asserting that each invalid input produces errors and yields no relations. Note that the new tests were inserted twice in the file, resulting in duplicate test blocks.

🔇 Additional comments (2)
src/parser/ast/parse_utils/delimiter.rs (2)

1-6: Document the module entrypoint well (LGTM).

Retain the succinct, purpose-led module docs. The en‑GB spelling (“parenthesised”) aligns with the repo guidelines.


20-31: Keep the error surface idiomatic (LGTM).

Implement Display and Error for UnclosedDelimiterError as done. The message is actionable and composes cleanly with higher-level errors.

Comment thread src/parser/ast/parse_utils/delimiter.rs
Comment thread src/parser/ast/parse_utils/delimiter.rs
Comment thread src/parser/ast/parse_utils/delimiter.rs Outdated
Comment thread src/parser/ast/relation.rs Outdated
Comment thread src/parser/ast/relation.rs Outdated
Comment thread src/parser/ast/relation.rs Outdated
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Aug 22, 2025

@coderabbitai pause

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 22, 2025

✅ Actions performed

Reviews paused.

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

Repository owner deleted a comment from coderabbitai Bot Aug 23, 2025
Repository owner deleted a comment from coderabbitai Bot Aug 23, 2025
Repository owner deleted a comment from coderabbitai Bot Aug 23, 2025
codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

leynos and others added 10 commits August 23, 2025 09:50
Adjust  to assert the unclosed  error over the statement span (0..len) rather than the opening  index.\n\nRationale: the span scanner’s balanced-block combinator currently reports unclosed delimiters at the substream start; this preserves current branch behaviour while integrating main’s test module refactor.\n\nNo runtime changes; tests and lints pass.
@leynos leynos force-pushed the codex/replace-manual-scanning-with-chumsky-parsers branch from 16485a3 to 9d17e14 Compare August 23, 2025 09:06
codescene-delta-analysis[bot]

This comment was marked as outdated.

Comment thread src/parser/tests/parser.rs Outdated
Comment on lines +206 to +217
fn import_statement_standard_case() {
let src = "import standard_library";
let parsed = parse(src);
assert!(parsed.errors().is_empty());
let imports = parsed.root().imports();
assert_eq!(imports.len(), 1);
let Some(imp) = imports.first() else {
panic!("expected import");
};
assert_eq!(imp.path(), "standard_library");
assert!(imp.alias().is_none());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@leynos, I've identified the code duplication issue in src/parser/tests/parser.rs. The main problems are:

Issues Found:

  1. Duplicate test blocks: The same relation_invalid_is_error parameterised test appears twice in the file
  2. Similar fixture patterns: Many fixtures follow identical structure (string literals with minor variations)
  3. Repetitive test assertions: Multiple tests follow the same parse-and-assert pattern

Suggested Fix:

Replace duplicated tests with consolidated parameterised test cases using #[rstest] and group related fixtures. Here's the approach:

  1. Remove duplicate test blocks
  2. Consolidate similar fixtures into parameterised test cases
  3. Create shared test helper functions for common assertion patterns
  4. Group related test cases using #[rstest(...)] with multiple parameters

AI Coding Agent Prompt:

You are refactoring a Rust test file to eliminate code duplication. The file `src/parser/tests/parser.rs` contains duplicate test blocks and repetitive test patterns.

TASKS:
1. Remove the duplicate `relation_invalid_is_error` test block (keep only one instance)
2. Identify groups of similar test functions that follow the same pattern
3. Replace repetitive individual tests with parameterised `#[rstest]` test cases
4. Create shared helper functions for common assertion patterns like:
   - Parse and verify no errors
   - Parse and verify specific error conditions
   - Parse and verify AST node properties

GUIDELINES:
- Use `#[rstest(...)]` with multiple test cases instead of individual test functions
- Keep fixture functions but group related ones logically
- Create helper functions like `assert_parse_success()`, `assert_parse_errors()`, `assert_relation_properties()`
- Follow the existing rstest patterns already in the file
- Ensure all test coverage is preserved
- Use descriptive parameter names in rstest cases

EXAMPLE PATTERN:
Instead of:
```rust
#[rstest]
fn test_case_1(fixture1: &str) { /* parse and assert */ }
#[rstest] 
fn test_case_2(fixture2: &str) { /* parse and assert */ }
```

Use:
```rust
#[rstest]
#[case::case1(fixture1(), "expected_result1")]
#[case::case2(fixture2(), "expected_result2")]
fn parameterised_test(#[case] input: &str, #[case] expected: &str) {
    // shared test logic
}
```

Focus on maintaining the same test coverage while reducing code duplication through parameterisation and helper functions.

The refactoring should reduce the file from 25+ similar functions to a more manageable set of parameterised test cases, aligning with the repository's coding guidelines that specify using rstest for avoiding test repetition.


🏁 Script executed:

#!/bin/bash
# Examine the complete structure of the parser test file
echo "=== File structure and line count ==="
wc -l src/parser/tests/parser.rs

echo -e "\n=== Function definitions ==="
rg -n "^fn " src/parser/tests/parser.rs

echo -e "\n=== Fixture definitions ==="
rg -n "#\[fixture\]" src/parser/tests/parser.rs

echo -e "\n=== Test function patterns ==="
rg -n -A5 "#\[rstest\]" src/parser/tests/parser.rs

echo -e "\n=== Sample of similar function structures ==="
rg -n -A10 "fn (complex_typedef|extern_function_parsed|extern_type|fact_rule_parsed)" src/parser/tests/parser.rs | head -50

Length of output: 22478


🏁 Script executed:

#!/bin/bash
# Look for duplicate test blocks and similar patterns
echo "=== Looking for duplicate test implementations ==="
rg -n -A15 "relation_invalid_is_error" src/parser/tests/parser.rs

echo -e "\n=== Checking for repeated fixture patterns ==="
rg -n -B2 -A5 "relation_(empty|whitespace|invalid)" src/parser/tests/parser.rs

echo -e "\n=== Looking for similar test assertion patterns ==="
rg -n -C3 "assert.*parsed.*errors" src/parser/tests/parser.rs

echo -e "\n=== Examining parse and assert patterns ==="
rg -n -C2 "let parsed = parse\(" src/parser/tests/parser.rs | head -30

Length of output: 18297


@leynos, Perfect! I've identified the extensive code duplication patterns in the 1080-line test file. The issues are clear:

Key Duplication Patterns Found:

  1. Repetitive parse-and-assert structure: Nearly every test follows the same pattern:

    let parsed = parse(input);
    assert!(parsed.errors().is_empty()); // or !is_empty()
    let items = parsed.root().some_collection();
    assert_eq!(items.len(), 1);
  2. Similar function parsing tests: Multiple *_parsed functions (extern_function_parsed, function_with_body_parsed, etc.) follow identical structures

  3. Error-checking tests: Multiple *_is_error functions with nearly identical assertions

  4. Fixture proliferation: Many fixtures return simple string literals

Comprehensive Fix Strategy:

Create helper functions, consolidate similar tests into parameterised cases, and group related functionality.

Enhanced AI Coding Agent Prompt:

You are refactoring the 1080-line Rust test file `src/parser/tests/parser.rs` to eliminate code duplication by consolidating similar test patterns into parameterised test cases and helper functions.

SPECIFIC TASKS:

1. **Create Test Helper Functions**:
   ```rust
   fn assert_parse_success(src: &str) -> ParsedProgram { /* shared logic */ }
   fn assert_parse_errors(src: &str) { /* shared error assertion */ }
   fn assert_single_relation(parsed: &ParsedProgram, name: &str, is_input: bool, is_output: bool) { /* ... */ }
   fn assert_single_function(parsed: &ParsedProgram, name: &str, is_extern: bool, params: Vec<(&str, &str)>, return_type: Option<&str>) { /* ... */ }
   ```

2. **Consolidate Function Tests**: Replace these individual tests:
   - `extern_function_parsed`, `function_with_body_parsed`, `function_no_return_parsed`, etc.
   
   With a single parameterised test:
   ```rust
   #[rstest]
   #[case::extern_function(extern_function(), "hash", true, vec![("data", "string")], Some("u64"))]
   #[case::with_body(function_with_body(), "compute", false, vec![("x", "u32")], Some("u32"))]
   // ... more cases
   fn function_parsing_success(
       #[case] input: &str,
       #[case] expected_name: &str,
       #[case] is_extern: bool,
       #[case] expected_params: Vec<(&str, &str)>,
       #[case] expected_return: Option<&str>
   ) {
       let parsed = assert_parse_success(input);
       assert_single_function(&parsed, expected_name, is_extern, expected_params, expected_return);
   }
   ```

3. **Consolidate Relation Tests**: Group `input_relation_parsed`, `output_relation_parsed`, `internal_relation_parsed` into one parameterised test.

4. **Consolidate Error Tests**: Group all `*_is_error` functions into categorised parameterised tests:
   ```rust
   #[rstest]
   #[case::function_unclosed_params(function_unclosed_params())]
   #[case::extern_missing_colon(extern_function_missing_colon())]
   #[case::unterminated_body(function_unterminated_body())]
   fn function_error_cases(#[case] input: &str) {
       assert_parse_errors(input);
       // Add function-specific assertions
   }
   ```

5. **Consolidate Transformer Tests**: Similar pattern for transformer tests.

6. **Consolidate Import/TypeDef Tests**: Group similar import and typedef tests.

7. **Remove Duplicate Fixtures**: Keep only unique fixtures, remove redundant string literals.

CONSOLIDATION TARGETS (25+ functions to reduce):
- Function tests: 8+ individual functions → 1-2 parameterised tests
- Relation tests: 4+ individual functions → 1 parameterised test  
- Error tests: 6+ individual functions → 2-3 parameterised tests by category
- Transformer tests: 6+ individual functions → 1-2 parameterised tests
- Import/TypeDef tests: Multiple functions → 1-2 parameterised tests

REQUIREMENTS:
- Preserve all test coverage and assertions
- Use descriptive `#[case::name]` attributes
- Keep existing fixtures but eliminate unused ones
- Follow existing rstest patterns in the file
- Reduce file from 1080 lines to under 600 lines
- Create logical groupings by functionality (functions, relations, imports, etc.)

EXAMPLE TRANSFORMATION:
Replace this pattern (repeated 8+ times):
```rust
#[rstest]
fn some_test_parsed(fixture: &str) {
    let parsed = parse(fixture);
    assert!(parsed.errors().is_empty());
    // specific assertions...
}
```

With:
```rust
#[rstest]
#[case::case1(fixture1(), expected1)]
#[case::case2(fixture2(), expected2)]
fn consolidated_test(#[case] input: &str, #[case] expected: SomeType) {
    let parsed = assert_parse_success(input);
    // parameterised assertions using expected
}
```

Focus on eliminating the repetitive `parse() + assert!()` boilerplate through helper functions and parameterised test consolidation.

This refactoring should reduce the file size significantly while maintaining full test coverage and addressing the code duplication issue flagged by CodeScene.

Comment thread src/parser/tests/parser.rs Outdated
Comment on lines +158 to +172
fn complex_program_round_trip(complex_prog: &str) {
let parsed = parse(complex_prog);
assert!(parsed.errors().is_empty());
let text = pretty_print(parsed.root().syntax());
assert_eq!(text, complex_prog);
let relations = parsed.root().relations();
assert_eq!(relations.len(), 2);
let [first, second] = relations.as_slice() else {
panic!("expected two relations");
};
assert!(first.is_input());
assert!(second.is_output());
assert_eq!(first.name(), Some("R".into()));
assert_eq!(second.name(), Some("S".into()));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@leynos Here's my analysis and suggested fix for the Large Assertion Blocks issue:

Problem Analysis

The test file contains 9 assertion blocks with 4+ assertions each. Looking at the code, these large assertion blocks typically combine multiple testing concerns:

  1. Parse validation (checking for errors)
  2. Round-trip verification (input matches output)
  3. Structural validation (node counts, types)
  4. Content validation (names, properties)

Suggested Fix

Break down large test functions into smaller, focused tests using rstest parameterization and helper functions:

1. Extract Common Assertion Helpers

fn assert_parse_success(parsed: &ParseResult, expected_text: &str) {
    assert!(parsed.errors().is_empty());
    let text = pretty_print(parsed.root().syntax());
    assert_eq!(text, expected_text);
    assert_eq!(parsed.root().kind(), SyntaxKind::N_DATALOG_PROGRAM);
}

fn assert_relation_properties(relation: &Relation, name: &str, is_input: bool, is_output: bool) {
    assert_eq!(relation.name(), Some(name.into()));
    assert_eq!(relation.is_input(), is_input);
    assert_eq!(relation.is_output(), is_output);
}

2. Use Parameterized Tests

Replace multiple similar tests with rstest parameterized cases.

3. Separate Testing Concerns

Split large tests into focused units testing one aspect each.

AI Coding Agent Prompt

You are tasked with refactoring src/parser/tests/parser.rs to resolve a "Large Assertion Blocks" code quality issue. The file currently has 9 assertion blocks with 4+ assertions each.

**Objectives:**
1. Break down large test functions with 4+ assertions into smaller, focused tests
2. Extract common assertion patterns into reusable helper functions  
3. Use rstest parameterized tests to reduce duplication
4. Separate testing concerns: parsing validation, round-trip verification, structural validation, and content validation

**Specific Requirements:**
1. Create helper functions like:
   - `assert_parse_success(parsed, expected_text)` for common parse + round-trip validation
   - `assert_relation_properties(relation, name, is_input, is_output)` for relation validation
   - `assert_relation_count(parsed, expected_count)` for structural validation

2. Convert large test functions (like `complex_program_round_trip`) into multiple smaller tests:
   - One for parse success and round-trip
   - One for relation count validation
   - One for individual relation properties

3. Use `#[rstest(...)]` parameterized tests where multiple tests follow the same pattern with different fixtures

4. Ensure each test function has a maximum of 3 assertions

5. Maintain all existing test coverage - do not remove any validation logic

6. Follow the repository's Rust coding guidelines:
   - Keep functions focused and under 12 cyclomatic complexity
   - Use rstest fixtures for shared setup
   - Place function attributes after doc comments

**Files to modify:**
- `src/parser/tests/parser.rs`

**Testing:**
Run `make test` to ensure all refactored tests pass and coverage is preserved.

This refactoring will make the tests more maintainable, easier to debug when they fail, and align with the repository's coding guidelines for using rstest effectively.


🏁 Script executed:

#!/bin/bash
# Examine the full test file to understand the assertion blocks
cat src/parser/tests/parser.rs

Length of output: 34342


@leynos Here's my comprehensive analysis and solution for the Large Assertion Blocks issue:

Identified Large Assertion Blocks

I found these functions with 4+ assertions:

  1. complex_program_round_trip - 8 assertions
  2. empty_program_has_no_items - 8 assertions
  3. input_relation_parsed - 7 assertions
  4. output_relation_parsed - 6 assertions
  5. internal_relation_parsed - 7 assertions
  6. multiline_relation_parsed - 6 assertions
  7. index_single_column_parsed - 4 assertions
  8. index_multi_column_parsed - 4 assertions
  9. index_nested_function_parsed - 4 assertions

Suggested Refactoring Strategy

1. Extract Common Assertion Helpers

// Parse validation helpers
fn assert_parse_success(parsed: &ParseResult, expected_text: &str) {
    assert!(parsed.errors().is_empty());
    let text = pretty_print(parsed.root().syntax());
    assert_eq!(text, expected_text);
}

fn assert_root_kind(parsed: &ParseResult) {
    assert_eq!(parsed.root().kind(), SyntaxKind::N_DATALOG_PROGRAM);
}

// Collection validation helpers
fn assert_relation_count(parsed: &ParseResult, expected: usize) {
    assert_eq!(parsed.root().relations().len(), expected);
}

fn assert_empty_collections(parsed: &ParseResult) {
    assert!(parsed.root().imports().is_empty());
    assert!(parsed.root().type_defs().is_empty());
    assert!(parsed.root().relations().is_empty());
    assert!(parsed.root().functions().is_empty());
    assert!(parsed.root().indexes().is_empty());
    assert!(parsed.root().rules().is_empty());
}

// Relation validation helpers
fn assert_relation_properties(
    relation: &Relation,
    name: &str,
    is_input: bool,
    is_output: bool,
) {
    assert_eq!(relation.name(), Some(name.into()));
    assert_eq!(relation.is_input(), is_input);
    assert_eq!(relation.is_output(), is_output);
}

2. Split Large Tests

For example, complex_program_round_trip becomes:

#[rstest]
fn complex_program_parse_success(complex_prog: &str) {
    let parsed = parse(complex_prog);
    assert_parse_success(&parsed, complex_prog);
}

#[rstest]
fn complex_program_has_two_relations(complex_prog: &str) {
    let parsed = parse(complex_prog);
    assert_relation_count(&parsed, 2);
}

#[rstest]
fn complex_program_relation_properties(complex_prog: &str) {
    let parsed = parse(complex_prog);
    let relations = parsed.root().relations();
    let [first, second] = relations.as_slice() else {
        panic!("expected two relations");
    };
    assert_relation_properties(first, "R", true, false);
    assert_relation_properties(second, "S", false, true);
}

AI Coding Agent Prompt

You are tasked with refactoring `src/parser/tests/parser.rs` to resolve a "Large Assertion Blocks" code quality issue. The file currently has 9 test functions with 4+ assertions each, which makes tests harder to read, understand, and debug.

**Primary Objective:**
Reduce all test functions to have a maximum of 3 assertions each by extracting common patterns into helper functions and splitting large tests into focused units.

**Specific Functions to Refactor:**
1. `complex_program_round_trip` (8 assertions)
2. `empty_program_has_no_items` (8 assertions)  
3. `input_relation_parsed` (7 assertions)
4. `output_relation_parsed` (6 assertions)
5. `internal_relation_parsed` (7 assertions)
6. `multiline_relation_parsed` (6 assertions)
7. `index_single_column_parsed` (4 assertions)
8. `index_multi_column_parsed` (4 assertions)
9. `index_nested_function_parsed` (4 assertions)

**Required Helper Functions to Create:**

1. **Parse Validation Helpers:**
```rust
fn assert_parse_success(parsed: &ParseResult, expected_text: &str) {
    assert!(parsed.errors().is_empty());
    let text = pretty_print(parsed.root().syntax());
    assert_eq!(text, expected_text);
}

fn assert_root_kind(parsed: &ParseResult) {
    assert_eq!(parsed.root().kind(), SyntaxKind::N_DATALOG_PROGRAM);
}

fn assert_parse_has_errors(parsed: &ParseResult) {
    assert!(!parsed.errors().is_empty());
}
```

2. **Collection Validation Helpers:**
```rust
fn assert_relation_count(parsed: &ParseResult, expected: usize) {
    assert_eq!(parsed.root().relations().len(), expected);
}

fn assert_index_count(parsed: &ParseResult, expected: usize) {
    assert_eq!(parsed.root().indexes().len(), expected);
}

fn assert_empty_collections(parsed: &ParseResult) {
    assert!(parsed.root().imports().is_empty());
    assert!(parsed.root().type_defs().is_empty());
    assert!(parsed.root().relations().is_empty());
    assert!(parsed.root().functions().is_empty());
    assert!(parsed.root().indexes().is_empty());
    assert!(parsed.root().rules().is_empty());
}
```

3. **Entity-Specific Helpers:**
```rust
fn assert_relation_properties(relation: &Relation, name: &str, is_input: bool, is_output: bool) {
    assert_eq!(relation.name(), Some(name.into()));
    assert_eq!(relation.is_input(), is_input);
    assert_eq!(relation.is_output(), is_output);
}

fn assert_relation_columns(relation: &Relation, expected: Vec<(String, String)>) {
    assert_eq!(relation.columns(), expected);
}

fn assert_relation_primary_key(relation: &Relation, expected: Option<Vec<String>>) {
    assert_eq!(relation.primary_key(), expected);
}

fn assert_index_properties(index: &Index, name: &str, relation: &str, columns: Vec<String>) {
    assert_eq!(index.name(), Some(name.into()));
    assert_eq!(index.relation(), Some(relation.into()));
    assert_eq!(index.columns(), columns);
}
```

**Refactoring Examples:**

1. **Split `complex_program_round_trip` into:**
   - `complex_program_parse_success` - parse + round-trip validation
   - `complex_program_has_two_relations` - count validation  
   - `complex_program_relation_properties` - individual relation properties

2. **Split `empty_program_has_no_items` into:**
   - `empty_program_parse_success` - parse validation
   - `empty_program_collections_empty` - collection emptiness
   - `empty_program_structure_valid` - structure validation

3. **Split relation parsing tests into:**
   - `{fixture}_parse_success` - parse + round-trip
   - `{fixture}_properties` - name, input/output flags
   - `{fixture}_structure` - columns, primary key

**Requirements:**
1. Maintain 100% test coverage - no validation logic should be lost
2. Each test function must have ≤3 assertions
3. Use descriptive test names that clearly indicate what's being tested
4. Place helper functions at the top of the file after existing utilities
5. Add doc comments to helper functions explaining their purpose
6. Follow repository Rust guidelines (function attributes after doc comments, etc.)
7. Ensure all tests continue to pass after refactoring

**Files to Modify:**
- `src/parser/tests/parser.rs` only

**Validation:**
Run `make test` to ensure all refactored tests pass and maintain existing coverage.

This refactoring will significantly improve test maintainability, make failures easier to debug, and align with the repository's testing guidelines.

codescene-delta-analysis[bot]

This comment was marked as outdated.

Comment thread src/parser/tests/parser.rs Outdated
Comment on lines +305 to +316
fn import_statement_standard_case() {
let src = "import standard_library";
let parsed = parse(src);
assert!(parsed.errors().is_empty());
let imports = parsed.root().imports();
assert_eq!(imports.len(), 1);
let Some(imp) = imports.first() else {
panic!("expected import");
};
assert_eq!(imp.path(), "standard_library");
assert!(imp.alias().is_none());
}
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix.

@@ -0,0 +1,1147 @@
//! Integration tests for the parser skeleton.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix.

Comment thread src/parser/tests/parser.rs Outdated
Comment on lines +83 to +90
fn assert_empty_collections(parsed: &crate::Parsed) {
assert!(parsed.root().imports().is_empty());
assert!(parsed.root().type_defs().is_empty());
assert!(parsed.root().relations().is_empty());
assert!(parsed.root().functions().is_empty());
assert!(parsed.root().indexes().is_empty());
assert!(parsed.root().rules().is_empty());
}
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix.

codescene-delta-analysis[bot]

This comment was marked as outdated.

* Refactor parser tests to use typed programs

* Refactor parser tests with domain enums and shared specs
@leynos leynos merged commit 3687e8a into main Aug 23, 2025
2 of 3 checks passed
@leynos leynos deleted the codex/replace-manual-scanning-with-chumsky-parsers branch August 23, 2025 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant