Skip to content

Normalization into canonical formula model#99

Closed
leynos wants to merge 40 commits intomainfrom
4-1-5-normalization-into-canonical-formula-model
Closed

Normalization into canonical formula model#99
leynos wants to merge 40 commits intomainfrom
4-1-5-normalization-into-canonical-formula-model

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Apr 5, 2026

Summary by Sourcery

Normalize legacy and v2 Semgrep search rules into a shared canonical Formula model and wire YAML compilation to return real query plans using this representation.

New Features:

  • Introduce a canonical Formula and DecoratedFormula model in sempai_core to represent normalized query logic across legacy and v2 syntaxes.
  • Add a normalization pipeline in the sempai crate that lowers parsed YAML search rules into the canonical formula model, including language expansion into per-language rules.
  • Expose query plans from Engine::compile_yaml that carry the normalized canonical formula for each compiled search rule.

Enhancements:

  • Enforce semantic constraints during normalization, rejecting negation directly inside disjunctions and conjunctions without any positive match-producing terms via structured diagnostics.
  • Refine mode handling so only search-mode rules normalize to plans while other modes return deterministic E_SEMPAI_UNSUPPORTED_MODE errors.
  • Extend QueryPlan with accessors and storage for the canonical formula, enabling inspection in tests and future backends.

Documentation:

  • Update the query language design and user guide to describe the canonical Formula model, the normalization pipeline, and new semantic validation behaviour for compile_yaml.
  • Add an execplan document for roadmap item 4.1.5, capturing the design, constraints, and implementation stages for normalization into the canonical formula model.
  • Mark roadmap milestone 4.1.5 as complete now that normalization and semantic checks are implemented.

Tests:

  • Add unit tests for the canonical formula model, including positive-term classification helpers.
  • Add normalization tests and fixtures to ensure legacy and v2 rules normalize equivalently and that semantic errors produce the expected diagnostic codes.
  • Extend engine behaviour and BDD tests to assert successful compilation of valid search-mode YAML, rejection of unsupported modes, and the presence of real query plans.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 5, 2026

Overview

Implements milestone 4.1.5: normalisation into a canonical Formula model. Converts parsed YAML search rules (both legacy pattern* and v2 match syntaxes) into a unified semantic representation, enabling Engine::compile_yaml to return real executable QueryPlan objects for search-mode rules whilst enforcing deterministic semantic validation and error reporting.

Core Model Additions (sempai_core)

Introduces canonical types in new crates/sempai-core/src/formula.rs (242 lines):

  • Formula enum: Atom(Atom), Not(Box<DecoratedFormula>), Inside(Box<DecoratedFormula>), Anywhere(Box<DecoratedFormula>), And(Vec<DecoratedFormula>), Or(Vec<DecoratedFormula>); includes is_positive_term() method
  • Atom enum: Pattern(String), Regex(String)
  • DecoratedFormula struct: wraps Formula with optional where_clauses: Vec<WhereClause>, as_name: Option<String>, fix: Option<String>, span: Option<SourceSpan>; provides builder methods and positivity classification
  • WhereClause enum: Focus { metavariable }, MetavariablePattern { metavariable, formula }, MetavariableRegex { metavariable, regex }

Re-exported from sempai_core via updated lib.rs.

Normalisation Pipeline (sempai::normalize, 298 lines)

Public entrypoint normalize_rule_file(&RuleFile) -> Result<Vec<NormalizedSearchRule>, DiagnosticReport>:

  • Iterates over rules, filtering by RuleMode
  • Search-mode rules normalise the principal, validate semantics, expand by language
  • Non-search modes (taint, join, extract, other) deterministically fail with ESempaiUnsupportedMode

Legacy principal normalisation: maps LegacyFormula and nested LegacyValue/LegacyClause to canonical variants; special-cases pattern-not-insideNot(Inside(...)) and pattern-not-regexNot(regex(...)); constraint clauses return NotImplemented

v2 principal normalisation (normalize_v2_formula): recursively converts v2 match variants (AllAnd, AnyOr) and unary operators; preserves decorations (as_name, fix); parses where_clauses using new where-clause module with normalisation callback for nested formulas

Language expansion: parses language strings via normalize_languages; unknown values produce ESempaiSchemaInvalid with supported-language guidance

Semantic Validation (sempai::validate, 117 lines)

New module with public ValidationContext enum (Default, MetavariablePattern):

  • validate_formula_semantics(formula, context): runs both validators, returns first error
  • validate_invalid_not_in_or: rejects negation directly inside disjunctions; returns DiagnosticCode::ESempaiInvalidNotInOr
  • validate_positive_terms(formula, context): rejects conjunctions lacking positive match terms unless in metavariable-pattern context; returns DiagnosticCode::ESempaiMissingPositiveTermInAnd
  • Pre-order tree walk (walk_formula) with early exit on error

Positivity classification: Atom, Or are positive; Not, Inside, Anywhere are non-positive; And is positive if any child is positive.

Where Clause Module (sempai::where_clause, 128 lines + 372 lines submodule)

mod.rs:

  • parse_where_clauses(clauses, normalize_v2_formula_fn) and parse_where_clause(value, normalize_v2_formula_fn)
  • Validates schema (non-empty JSON object, exactly one key)
  • Dispatches by key: focusFocus variant; metavariableMetavariablePattern/MetavariableRegex via submodule; others → NotImplemented
  • focus accepts single string or single-element array, extracts metavariable via dollar-prefix stripping, rejects empty arrays/mismatched types

metavariable.rs (372 lines):

  • Helper extract_metavariable_name(mapping) with dollar-stripping
  • parse_metavariable_pattern_clause and parse_metavariable_regex_clause
  • parse_metavariable_clause(value, normalize_fn): main entrypoint dispatching to pattern/regex handlers; validates keys against METAVARIABLE_PATTERN_KEYS constant
  • build_match_formula_from_mapping(mapping): manual JSON parsing for MatchFormula; enforces exactly one operator among pattern, regex, all, any, not, inside, anywhere; rejects invalid combinations (e.g., mixing pattern/regex with type/analyzer)
  • parse_match_formula_array and parse_nested_match_formula for recursive formula parsing
  • Support for type, types, analyzer, comparison returns NotImplemented

QueryPlan Integration (sempai::engine, +37/-13)

  • QueryPlan struct field change: _plan: ()formula: Formula
  • New accessor pub const fn formula(&self) -> &Formula
  • New constructor pub(crate) fn with_formula(rule_id, language, formula)
  • Engine::compile_yaml: now calls normalize_rule_file, constructs QueryPlan per normalised rule with populated rule_id, language, formula; returns Ok(Vec<QueryPlan>) for successful search-mode compilation

Testing Coverage

Normalization tests (normalization_tests.rs, 271 lines):

  • Paired legacy/v2 fixture equivalence: simple_pattern, conjunction, disjunction, nested_context
  • Negative semantic validation: invalid_not_in_or, invalid_no_positive_term for both syntaxes
  • Metavariable-pattern exception validity via valid_metavariable_pattern_exception.yaml
  • v2 where-clause variant verification: Focus, MetavariableRegex, MetavariablePattern (including inner formula shape)
  • Unsupported comparison and multi-focus where-clause arrays expect NotImplemented with message checks

Normalize unit tests (normalize_tests.rs, 190 lines):

  • Legacy/v2 equivalence via helper parse_and_normalize
  • Invalid not-placement detection (both syntaxes) via expect_diagnostic
  • Missing positive-term detection (both syntaxes)
  • Unsupported mode validation
  • Multi-language expansion with formula preservation

BDD tests (behaviour.rs, +22/-5):

  • TestWorld.compile_result changed from Option<Result<(), DiagnosticReport>> to Option<Result<Vec<QueryPlan>, DiagnosticReport>>
  • New step: Then compilation succeeds with {count} plan asserting plan count
  • Generalised assert_diagnostic_code helper for any success type
  • when_compile_dsl wraps plan in vec to match new type

Engine tests (engine_tests.rs, +91/-7):

  • Replaced NOT_IMPLEMENTED placeholder test with successful compilation + plan inspection
  • New assert_compile_yaml_diagnostic(yaml, expected_code) helper
  • Parameterised negative cases via rstest: ESempaiInvalidNotInOr (pattern-either and match.any), ESempaiMissingPositiveTermInAnd (patterns and match.all), ESempaiUnsupportedMode (taint mode)

Test helpers introduced: parse_and_normalize, expect_diagnostic, assert_equivalent_formulas, assert_normalizes_to_single_pattern_atom, only_constraint_children, assert_compile_yaml_diagnostic, get_single_where_clause

Test Fixtures (15 YAML files)

Paired legacy/v2 examples: simple_pattern, conjunction, disjunction, nested_context
Invalid scenarios: invalid_not_in_or, invalid_no_positive_term (both syntaxes)
Metavariable support: valid_metavariable_pattern_exception.yaml
Where-clause variants: v2_where_focus.yaml, v2_where_focus_array.yaml, v2_where_metavariable_pattern.yaml, v2_where_metavariable_regex.yaml, v2_where_unsupported_comparison.yaml, v2_where_metavariable_invalid_not_in_or.yaml

Documentation

New execplan: docs/execplans/4-1-5-normalization-into-canonical-formula-model.md (638 lines) specifies normalisation stages (A–E), semantic constraint enforcement, test requirements, quality gates, acceptance criteria, and completion record

Updated guides:

  • Query language design: Formula/DecoratedFormula/WhereClause definitions; positivity rules; normalisation location references; semantic validation diagnostics
  • User guide: three-stage pipeline (parse, mode validation, normalisation); deterministic error codes; search-mode real plans; non-search modes fail with E_SEMPAI_UNSUPPORTED_MODE
  • Roadmap: milestone 4.1.5 marked complete; formatting normalised

Minor updates: AGENTS.md, SKILL.md, documentation style guide (footnote formatting), multiple execplans (indentation/wrapping adjustments)

Dependency Changes

  • Moved serde_json from [dev-dependencies] to [dependencies] in crates/sempai/Cargo.toml

Code Quality Notes

  • Refactored parse_metavariable_clause reducing cyclomatic complexity from 17 to <9 via extracted helpers and constant classification
  • Some test .unwrap() calls preserved under clippy suppression pending macroisation
  • Comments documenting design omissions (e.g., where-clauses deferred from NormalizedSearchRule to future milestone)
  • Open review suggestions: removing NormalizedRule::UnsupportedMode indirection, factoring traversal patterns (walk_formula, map_decorated)

Walkthrough

Introduced normalisation of legacy and v2 Semgrep rules into a canonical Formula model. The pipeline parses YAML rules, validates modes, normalises search-mode rules into canonical form, performs semantic validation, and populates QueryPlan with the resulting formula instead of returning NOT_IMPLEMENTED.

Changes

Cohort / File(s) Summary
Core Formula Model
crates/sempai-core/src/formula.rs, crates/sempai-core/src/lib.rs
Introduced canonical Formula, Atom, DecoratedFormula, and WhereClause types with is_positive_term() classification methods. Re-exported from lib.rs for use across crates.
Normalisation Pipeline
crates/sempai/src/normalize.rs, crates/sempai/src/lib.rs
Added normalize_rule_file() and normalize_v2_formula() functions that convert legacy pattern* and v2 match principals into canonical Formula instances, with language expansion per-rule.
Semantic Validation
crates/sempai/src/validate.rs
Implemented validate_formula_semantics() with ValidationContext to enforce deterministic rejection of invalid negation placements (InvalidNotInOr) and conjunctions without positive terms (MissingPositiveTermInAnd).
Where Clause Parsing
crates/sempai/src/where_clause/mod.rs, crates/sempai/src/where_clause/metavariable.rs
Added where-clause parsing for v2 rules, supporting focus, metavariable-pattern, and metavariable-regex constraints with fallback to not_implemented for comparison and type-based filters.
Engine Integration
crates/sempai/src/engine.rs
Replaced QueryPlan._plan: () placeholder with formula: Formula field. Modified compile_yaml to execute full normalisation pipeline for search-mode rules instead of returning NOT_IMPLEMENTED.
Test Framework Updates
crates/sempai/src/tests/behaviour.rs, crates/sempai/src/tests/engine_tests.rs, crates/sempai/src/tests/mod.rs
Updated BDD framework to capture compiled Vec<QueryPlan> results. Added negative test cases for unsupported modes and schema violations. Introduced normalization_tests and normalize_tests modules.
Normalisation Test Fixtures
crates/sempai/tests/fixtures/normalization/*.yaml
Added paired legacy/v2 rule fixtures for positive equivalence tests (simple_pattern, conjunction, disjunction, nested_context) and negative schema/validation tests (invalid_not_in_or, invalid_no_positive_term, valid_metavariable_pattern_exception, v2 where-clause variants).
Feature Specification
crates/sempai/tests/features/sempai_engine.feature
Replaced placeholder expectations with success assertions for valid search rules (Then compilation succeeds with 1 plan) and added unsupported-mode validation scenario for mode: taint.
Documentation & Roadmap
docs/execplans/4-1-5-normalization-into-canonical-formula-model.md, docs/roadmap.md, docs/sempai-query-language-design.md, docs/users-guide.md, AGENTS.md, agent-skill/weaver/SKILL.md
Completed ExecPlan 4.1.5 with full implementation documentation. Updated roadmap checklist and design guide to reflect non-generic DecoratedFormula and string-based Atom variants. Refreshed compile_yaml contract documentation with three-stage pipeline description and new diagnostic codes. Minor formatting/guidance updates.
Dependency Configuration
crates/sempai/Cargo.toml
Moved serde_json from [dev-dependencies] to [dependencies] to support where-clause JSON parsing in normalisation code.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Engine as Engine::compile_yaml()
    participant Parser as sempai_yaml::Parser
    participant Validator as Mode Validator
    participant Normaliser as normalize_rule_file()
    participant SchemaVal as validate_formula_semantics()
    participant Expander as Language Expander
    
    User->>Engine: invoke with YAML text
    Engine->>Parser: parse_rule_file(yaml)
    Parser-->>Engine: RuleFile | DiagnosticReport
    
    alt Parse Error
        Engine-->>User: Err(DiagnosticReport)
    else Parse Success
        Engine->>Validator: check rule.mode
        alt Non-search Mode
            Validator-->>Engine: Err(ESempaiUnsupportedMode)
            Engine-->>User: Err(DiagnosticReport)
        else Search Mode
            Validator-->>Engine: Ok()
            Engine->>Normaliser: normalize_rule_file(&file)
            Normaliser->>Normaliser: for each rule: normalise search principal
            Normaliser->>SchemaVal: validate_formula_semantics(&formula)
            alt Validation Fails
                SchemaVal-->>Normaliser: Err(ESempaiInvalidNotInOr|ESempaiMissingPositiveTermInAnd)
                Normaliser-->>Engine: Err(DiagnosticReport)
                Engine-->>User: Err(DiagnosticReport)
            else Validation Succeeds
                SchemaVal-->>Normaliser: Ok()
                Normaliser->>Expander: expand per language
                Expander-->>Normaliser: Vec<NormalizedSearchRule>
                Normaliser-->>Engine: Ok(Vec<NormalizedSearchRule>)
                Engine->>Engine: construct QueryPlan per rule with formula
                Engine-->>User: Ok(Vec<QueryPlan>)
            end
        end
    end
Loading

Possibly Related PRs

Poem

🎭 From patterns old to formulas new,
Legacy and v2 both pass through,
Canonical form now takes the stage,
Validation guards each branching page,
Search rules emerge, no more delay—
The normalisers' grand ballet!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 4-1-5-normalization-into-canonical-formula-model

codescene-delta-analysis[bot]

This comment was marked as outdated.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 5, 2026

Reviewer's Guide

Introduce a canonical Formula model in sempai_core, add a normalization pipeline in sempai that lowers legacy and v2 search rules into this model with semantic validation, and update Engine::compile_yaml, tests, and docs to return real query plans for search-mode rules while gating unsupported modes.

Sequence diagram for Engine::compile_yaml normalization pipeline

sequenceDiagram
    actor Caller
    participant Engine
    participant YamlParser as SempaiYaml
    participant ModeValidator as ModeValidation
    participant Normalizer as Normalize

    Caller->>Engine: compile_yaml(yaml)
    Engine->>YamlParser: parse_rule_file(yaml, None)
    YamlParser-->>Engine: RuleFile or DiagnosticReport error
    alt YAML parse error
        Engine-->>Caller: Err(DiagnosticReport E_SEMPAI_YAML_PARSE or E_SEMPAI_SCHEMA_INVALID)
    else YAML parsed
        Engine->>ModeValidator: validate_supported_modes(RuleFile)
        ModeValidator-->>Engine: Ok or DiagnosticReport error
        alt Unsupported mode
            Engine-->>Caller: Err(DiagnosticReport E_SEMPAI_UNSUPPORTED_MODE)
        else Modes supported
            Engine->>Normalizer: normalize_rule_file(&RuleFile)
            Normalizer-->>Engine: Vec NormalizedSearchRule or DiagnosticReport error
            alt Semantic or normalization error
                Engine-->>Caller: Err(DiagnosticReport E_SEMPAI_* including E_SEMPAI_INVALID_NOT_IN_OR or E_SEMPAI_MISSING_POSITIVE_TERM_IN_AND)
            else Normalization succeeded
                loop For each NormalizedSearchRule
                    Engine->>Engine: QueryPlan::with_formula(rule_id, language, formula)
                end
                Engine-->>Caller: Ok(Vec QueryPlan)
            end
        end
    end
Loading

Class diagram for canonical Formula model and normalization outputs

classDiagram
    class Formula {
        <<enum>>
        +Atom atom
        +Not(Box~DecoratedFormula~) not
        +Inside(Box~DecoratedFormula~) inside
        +Anywhere(Box~DecoratedFormula~) anywhere
        +And(Vec~DecoratedFormula~) and
        +Or(Vec~DecoratedFormula~) or
        +bool is_positive_term()
    }

    class Atom {
        <<enum>>
        +Pattern(String)
        +Regex(String)
    }

    class DecoratedFormula {
        +Formula formula
        +Vec~WhereClause~ where_clauses
        +Option~String~ as_name
        +Option~String~ fix
        +Option~SourceSpan~ span
        +new(formula Formula) DecoratedFormula
        +with_where_clause(clause WhereClause) DecoratedFormula
        +with_as_name(name String) DecoratedFormula
        +with_fix(fix String) DecoratedFormula
        +with_span(span SourceSpan) DecoratedFormula
        +bool is_positive_term()
    }

    class WhereClause {
        <<enum>>
        +Focus(metavariable String)
        +MetavariablePattern(metavariable String, formula Formula)
        +MetavariableRegex(metavariable String, regex String)
    }

    class NormalizedSearchRule {
        +String rule_id
        +Language language
        +Formula formula
    }

    class QueryPlan {
        -String rule_id
        -Language language
        -Formula formula
        +language() Language
        +formula() &Formula
        +with_formula(rule_id String, language Language, formula Formula) QueryPlan
    }

    class Language {
        <<enum>>
        +Rust
        +Python
        +Typescript
        +Go
        +Hcl
    }

    class DiagnosticReport {
        +Vec~Diagnostic~ diagnostics
        +not_implemented(message String) DiagnosticReport
        +single_error(code DiagnosticCode, message String, span Option~SourceSpan~, notes Vec~String~) DiagnosticReport
    }

    Formula --> Atom : uses
    Formula --> DecoratedFormula : wraps
    DecoratedFormula --> Formula : contains
    DecoratedFormula --> WhereClause : has many
    NormalizedSearchRule --> Formula : contains
    NormalizedSearchRule --> Language : targets
    QueryPlan --> Formula : contains
    QueryPlan --> Language : targets
    DiagnosticReport ..> Formula : validation_errors_on
Loading

File-Level Changes

Change Details Files
Define and expose a canonical Formula model and helpers in sempai_core for normalized Semgrep queries.
  • Add Formula, Atom, DecoratedFormula, and WhereClause types with serde support and positivity helpers.
  • Provide builder-style APIs on DecoratedFormula and implement where-clause variants for focus and metavariable constraints.
  • Re-export the new formula types from the core lib and add focused unit tests for positivity classification and builders.
crates/sempai-core/src/formula.rs
crates/sempai-core/src/lib.rs
Implement normalization of parsed YAML search rules (legacy and v2) into the canonical Formula model with semantic validation and language expansion.
  • Add normalize_rule_file and NormalizedSearchRule to lower RuleFile/Rule into canonical Formula, including per-language expansion.
  • Implement legacy and v2 lowering functions to map pattern*, pattern-either, patterns, pattern-not(-inside/-regex), anywhere, and match.* constructs into Formula/DecoratedFormula.
  • Add semantic validation passes for InvalidNotInOr and MissingPositiveTermInAnd on the canonical form and normalize languages with diagnostic reporting for unsupported modes.
  • Add unit tests that exercise normalization, semantic error reporting, multi-language expansion, and unsupported-mode handling.
crates/sempai/src/normalize.rs
Wire Engine::compile_yaml and QueryPlan to use the new normalization pipeline and expose formulas, updating engine tests and BDD scenarios accordingly.
  • Change QueryPlan to store a Formula instead of a placeholder and add a with_formula constructor plus a formula accessor.
  • Update Engine::compile_yaml to parse YAML, validate modes, run normalize_rule_file, and return QueryPlan instances for search-mode rules while surfacing diagnostics for parse, mode, or semantic errors.
  • Adjust unit tests to expect successful compilation for valid search rules and deterministic unsupported-mode diagnostics, and extend BDD steps and feature scenarios to assert plan success and new mode gating behaviour.
crates/sempai/src/engine.rs
crates/sempai/src/tests/engine_tests.rs
crates/sempai/tests/features/sempai_engine.feature
crates/sempai/src/tests/behaviour.rs
crates/sempai/src/tests/mod.rs
Add fixture-based normalization tests to assert legacy/v2 equivalence and semantic failure behaviour.
  • Introduce normalization_tests that read YAML fixtures, run normalize_rule_file, and assert formula equivalence and expected diagnostic codes.
  • Create paired legacy/v2 fixture YAML files for simple patterns, conjunctions, disjunctions, and nested context cases, plus invalid fixtures for not-in-or and no-positive-term cases and a deferred metavariable-pattern exception fixture.
crates/sempai/src/tests/normalization_tests.rs
crates/sempai/tests/fixtures/normalization/simple_pattern_legacy.yaml
crates/sempai/tests/fixtures/normalization/simple_pattern_v2.yaml
crates/sempai/tests/fixtures/normalization/conjunction_legacy.yaml
crates/sempai/tests/fixtures/normalization/conjunction_v2.yaml
crates/sempai/tests/fixtures/normalization/disjunction_legacy.yaml
crates/sempai/tests/fixtures/normalization/disjunction_v2.yaml
crates/sempai/tests/fixtures/normalization/nested_context_legacy.yaml
crates/sempai/tests/fixtures/normalization/nested_context_v2.yaml
crates/sempai/tests/fixtures/normalization/invalid_not_in_or_legacy.yaml
crates/sempai/tests/fixtures/normalization/invalid_not_in_or_v2.yaml
crates/sempai/tests/fixtures/normalization/invalid_no_positive_term_legacy.yaml
crates/sempai/tests/fixtures/normalization/invalid_no_positive_term_v2.yaml
crates/sempai/tests/fixtures/normalization/valid_metavariable_pattern_exception.yaml
Update design, user, and roadmap documentation plus add an execplan document for normalization milestone 4.1.5.
  • Revise the query-language design doc to reflect the new Formula/DecoratedFormula shape, semantic validation rules, and compile_yaml normalization behaviour.
  • Update the users guide compile_yaml section to describe the full parse → mode validation → normalization pipeline and new E_SEMPAI_* diagnostics.
  • Mark roadmap item 4.1.5 as complete and add a detailed execution plan document for the normalization milestone.
docs/sempai-query-language-design.md
docs/users-guide.md
docs/roadmap.md
docs/execplans/4-1-5-normalization-into-canonical-formula-model.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 5, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response.

crates/sempai/src/normalize.rs

Comment on file

    fn normalize_simple_pattern_legacy() {
        let yaml = concat!(
            "rules:\n",
            "  - id: test.rule\n",
            "    message: test\n",
            "    languages: [rust]\n",
            "    severity: ERROR\n",
            "    pattern: fn $F($X)\n",
        );
        let file = sempai_yaml::parse_rule_file(yaml, None).unwrap();
        let result = normalize_rule_file(&file).unwrap();

        assert_eq!(result.len(), 1);
        assert_eq!(result[0].rule_id, "test.rule");
        assert_eq!(result[0].language, Language::Rust);
        assert!(matches!(result[0].formula, Formula::Atom(Atom::Pattern(_))));
    }

❌ New issue: Code Duplication
The module contains 8 functions with similar structure: tests.invalid_not_in_or_detected_legacy,tests.invalid_not_in_or_detected_v2,tests.missing_positive_term_detected_legacy,tests.missing_positive_term_detected_v2 and 4 more functions

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 5, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response.

crates/sempai/src/tests/normalization_tests.rs

Comment on file

//! Tests for legacy/v2 formula normalization into canonical `Formula` model.

❌ New issue: Duplicated Assertion Blocks
The test suite contains 4 functions with duplicated assertion blocks (conjunction_legacy_and_v2_normalize_to_equivalent_formulas,disjunction_legacy_and_v2_normalize_to_equivalent_formulas,nested_context_legacy_and_v2_normalize_to_equivalent_formulas,simple_pattern_legacy_and_v2_normalize_to_equivalent_formulas), threshold = 2

@coderabbitai

This comment was marked as resolved.

@coderabbitai

This comment was marked as resolved.

leynos and others added 2 commits April 5, 2026 12:14
Capture the 4.1.5 work in a self-contained plan so implementation can
proceed after approval. The plan defines normalization boundaries,
semantic checks, fixture coverage, documentation updates, and required
validation gates.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
… model

This change delivers the normalization layer for Sempai query compilation:

- Add canonical Formula model to sempai_core (Formula, Atom, DecoratedFormula,
  WhereClause) with semantic validation helpers

- Implement normalization pass in sempai::normalize that lowers both legacy
  pattern* syntax and v2 match syntax into the canonical representation

- Implement semantic validation:
  - InvalidNotInOr: rejects negation directly inside disjunctions
  - MissingPositiveTermInAnd: ensures conjunctions have match producers

- Update Engine::compile_yaml to return real QueryPlan values for search-mode
  rules instead of the previous NOT_IMPLEMENTED placeholder

- Add paired fixture tests (legacy/v2 equivalence) and BDD scenarios

- Update documentation: sempai-query-language-design.md, users-guide.md,
  roadmap.md (4.1.5 marked complete), and the execplan itself

- Add test lint suppressions for intentional panic patterns in tests

The metavariable-pattern exception for MissingPositiveTermInAnd is deferred
to a future milestone (test marked #[ignore]).

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@leynos leynos force-pushed the 4-1-5-normalization-into-canonical-formula-model branch from dfc6fc9 to e20e698 Compare April 5, 2026 12:34
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 5, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response.

Code Duplication

crates/sempai/src/normalize.rs:

What lead to degradation?

The module contains 8 functions with similar structure: tests.invalid_not_in_or_detected_legacy,tests.invalid_not_in_or_detected_v2,tests.missing_positive_term_detected_legacy,tests.missing_positive_term_detected_v2 and 4 more functions

Why does this problem occur?

Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health.

How to fix it?

A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More
Once you have identified the similarities across functions, look to extract and encapsulate the concept that varies into its own function(s). These shared abstractions can then be re-used, which minimizes the amount of duplication and simplifies change.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 5, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response.

Duplicated Assertion Blocks

crates/sempai/src/tests/normalization_tests.rs:

What lead to degradation?

The test suite contains 4 functions with duplicated assertion blocks (conjunction_legacy_and_v2_normalize_to_equivalent_formulas,disjunction_legacy_and_v2_normalize_to_equivalent_formulas,nested_context_legacy_and_v2_normalize_to_equivalent_formulas,simple_pattern_legacy_and_v2_normalize_to_equivalent_formulas), threshold = 2

Why does this problem occur?

Just like duplicated logic in application code is problematic, duplicated test criteria lead to code that's hard to maintain. Duplicated assertion blocks usually indicate a missing abstraction or a test suite that attempts to test too many things inside the same module.

How to fix it?

Consider to encapsulate the duplicated assertions (i.e. test criteria) in a custom assert statement that you can then re-use. We also recommend to consider the granularity of the tests; sometimes a single test tests too many things; extracting smaller tests can usually help you get rid of the duplication.

@coderabbitai

This comment was marked as resolved.

@coderabbitai

This comment was marked as resolved.

Refactor sempai normalize tests to eliminate duplication:
- Add parse_and_normalize helper for success-path tests
- Add expect_diagnostic helper for error-path tests
- Update all test functions to use new helpers

Refactor normalization_tests to eliminate duplication:
- Add assert_equivalent_formulas helper for fixture tests
- Update four equivalence test functions to use helper

Fix markdown lint violations:
- Wrap long lines in execplan checklist items
- Add blank lines before lists in execplan and design doc

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 5, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response.

Code Duplication

crates/sempai/src/normalize.rs:

What lead to degradation?

The module contains 8 functions with similar structure: tests.invalid_not_in_or_detected_legacy,tests.invalid_not_in_or_detected_v2,tests.missing_positive_term_detected_legacy,tests.missing_positive_term_detected_v2 and 4 more functions

Why does this problem occur?

Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health.

How to fix it?

A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More
Once you have identified the similarities across functions, look to extract and encapsulate the concept that varies into its own function(s). These shared abstractions can then be re-used, which minimizes the amount of duplication and simplifies change.

@coderabbitai

This comment was marked as resolved.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 6, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph.

If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced.

Code Duplication

crates/sempai/src/normalize.rs:

What lead to degradation?

The module contains 8 functions with similar structure: tests.invalid_not_in_or_detected_legacy,tests.invalid_not_in_or_detected_v2,tests.missing_positive_term_detected_legacy,tests.missing_positive_term_detected_v2 and 4 more functions

Why does this problem occur?

Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health.

How to fix it?

A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More
Once you have identified the similarities across functions, look to extract and encapsulate the concept that varies into its own function(s). These shared abstractions can then be re-used, which minimizes the amount of duplication and simplifies change.

@coderabbitai

This comment was marked as resolved.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos leynos marked this pull request as ready for review April 7, 2026 10:36
sourcery-ai[bot]

This comment was marked as resolved.

@coderabbitai coderabbitai Bot added the Roadmap label Apr 7, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa855ba909

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/sempai/src/normalize.rs Outdated
Comment thread crates/sempai/src/normalize.rs Outdated
Comment thread crates/sempai/src/normalize.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/sempai/src/engine.rs (1)

122-136: ⚠️ Potential issue | 🟡 Minor

Duplicate # Errors section in doc comment.

The doc comment for compile_yaml contains two # Errors sections (lines 124-126 and 132-135). Merge them into a single section:

     /// Compiles a YAML rule file into query plans.
     ///
     /// # Errors
     ///
-    /// Returns a diagnostic report if parsing or validation fails.
-    ///
     /// Successfully parsed search-mode rules are normalized into canonical
     /// formulas and returned as query plans. Other modes (taint, join, extract)
     /// return unsupported mode diagnostics.
-    ///
-    /// # Errors
     ///
     /// Returns a diagnostic report if parsing fails, if unsupported modes are
     /// encountered, or if normalization produces semantic validation errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sempai/src/engine.rs` around lines 122 - 136, Duplicate "# Errors"
sections exist in the doc comment for compile_yaml; merge them into one coherent
Errors section that mentions all failure cases (parsing, unsupported modes,
normalization/validation errors) and keep references to return types
DiagnosticReport and QueryPlan; update the doc comment above pub fn
compile_yaml(&self, yaml: &str) -> Result<Vec<QueryPlan>, DiagnosticReport> to
remove the extra "# Errors" header and consolidate the content into a single
paragraph under a single "# Errors" heading.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/sempai-core/src/formula.rs`:
- Around line 11-14: Update the docstring for the Formula enum to use British
spelling by changing the word "normalized" to "normalised" in the comment that
begins "A canonical formula representing a normalized Semgrep query." so the
sentence reads "A canonical formula representing a normalised Semgrep query.";
locate the comment above the Formula enum in formula.rs and make the single-word
spelling edit.
- Around line 1-5: Update the module docstring in formula.rs to use British
spelling "normalised" instead of "normalized" (i.e., change "normalized Semgrep
queries" to "normalised Semgrep queries") so the top-of-file documentation
matches the requested spelling convention; edit the doc comment block that
begins with "Canonical formula model for normalized Semgrep queries." and update
the subsequent mention as needed.

In `@crates/sempai/src/normalize.rs`:
- Around line 1-516: The file is too long; split out validation and tests:
create a new module file (e.g., validate.rs) containing the three functions
validate_formula_semantics, validate_invalid_not_in_or, and
validate_positive_terms (move their implementations verbatim) and make them
pub(crate) so normalize.rs can call them; in normalize.rs remove those function
definitions and add `mod validate; use validate::{validate_formula_semantics,
validate_invalid_not_in_or, validate_positive_terms};`. Also move the entire
#[cfg(test)] tests module into a new tests file (e.g., tests/normalize_tests.rs
or src/normalize_tests.rs) keeping the test contents unchanged but adjust
imports to `use crate::normalize::*;` (or `use super::*;` if placed as a sibling
module) and preserve the same expect annotations; update module declarations as
needed so the crate builds.
- Around line 84-85: The docstring for normalize_search_principal uses American
spelling ("Normalizes"); update it to British spelling by changing the leading
comment text to "Normalises a search query principal (legacy or v2) into a
canonical Formula." so the function comment uses "Normalises" and otherwise
leave the function signature and behavior unchanged.
- Around line 179-180: Change the docstring for the function
normalize_v2_formula to use British spelling: replace "Normalizes" with
"Normalises" (i.e., update the comment line "Normalizes a v2 match formula into
canonical form." to "Normalises a v2 match formula into canonical form.") so the
documentation follows British spelling conventions.
- Around line 55-56: Docstring for the function normalize_rule uses American
spelling ("Normalizes a single rule."); change it to British spelling by
updating the docstring to "Normalises a single rule." so the documentation for
fn normalize_rule matches the project's British English convention.
- Line 1: The module docstring at the top of the file uses American spelling
"Normalization"; change it to British spelling "Normalisation" so the line reads
"Normalisation of parsed YAML rules into canonical `Formula` model." Update the
module-level comment string in crates/sempai/src/normalize.rs (the top-of-file
doc comment) to use "Normalisation" to comply with en-GB-oxendict guidelines.
- Around line 108-109: Change the docstring for normalize_legacy_formula to use
British spelling: replace "Normalizes a legacy formula into canonical form."
with "Normalises a legacy formula into canonical form." Update the comment
immediately above the fn normalize_legacy_formula(formula: &LegacyFormula) ->
Result<Formula, DiagnosticReport> to reflect this spelling change.
- Around line 312-313: The docstring for function normalize_languages uses
American spelling ("Normalizes"); change it to British spelling by updating the
docstring to "Normalises language strings from the rule into Language enums" so
the function header and surrounding comment use "Normalises" instead of
"Normalizes" (update the comment immediately above fn normalize_languages).
- Around line 20-26: Update the docstring in crates/sempai/src/normalize.rs to
use British (en-GB) spelling: change occurrences of "Normalizes" to "Normalises"
and "normalized" to "normalised" in the function/module docstring (the comment
that begins with "Normalizes a rule file into a vector of normalized search
rules."). Ensure both the title line and any later instances in that docstring
are updated to the en-GB variants while leaving the rest of the text and
formatting unchanged.
- Around line 220-224: The MatchFormula::Decorated arm in normalize_v2_formula
currently drops where_clauses, as_name, and fix; update it to either (A)
propagate decorations by returning a Decorated variant that wraps the normalized
inner formula and preserves where_clauses, as_name, and fix (i.e., construct
MatchFormula::Decorated { formula: normalize_v2_formula(inner), where_clauses:
original.where_clauses.clone(), as_name: original.as_name.clone(), fix:
original.fix.clone(), .. }), or (B) emit a not-implemented diagnostic instead of
silently discarding them (call the project's diagnostic helper like
emit_not_implemented("v2 decorations") or return an appropriate Err/Diagnostic
referencing where_clauses/as_name/fix) so callers are aware; change only the
MatchFormula::Decorated match arm in normalize_v2_formula and ensure you
reference the original fields (where_clauses, as_name, fix) when implementing
the fix.
- Around line 155-169: The Constraint arm in normalize_legacy_clause currently
returns an empty conjunction (Formula::And(vec![])) which later triggers
validate_positive_terms (ESempaiMissingPositiveTermInAnd) and invalidates rules
with legacy metavariable constraints; change the Constraint handling to return a
not-implemented diagnostic instead of an empty DecoratedFormula: in
normalize_legacy_clause match on LegacyClause::Constraint and return an
Err(DiagnosticReport::not_implemented(...)) (or the project's equivalent helper)
describing that constraint normalization / WhereClause conversion is
unimplemented, referencing DecoratedFormula and the
ESempaiMissingPositiveTermInAnd validation to guide the message.
- Around line 171-172: The docstring for the function normalize_legacy_value
uses American spelling ("Normalizes"); update it to British spelling
("Normalises") so the comment reads "Normalises a legacy value (string or
formula) into a formula." Ensure you only change the docstring text for the fn
normalize_legacy_value signature and keep the rest of the comment content
intact.

In `@crates/sempai/src/tests/behaviour.rs`:
- Around line 113-124: The step handler then_compilation_succeeds currently
ignores the {count} parameter because TestWorld::compile_result is mapped to ()
in when_compile_yaml; either persist the plan count (or full Vec<QueryPlan>) in
TestWorld::compile_result so you can assert equality with the incoming count, or
change the step signature and feature to "compilation succeeds" and remove the
count parameter from the step; to implement the first option, modify
when_compile_yaml to store Result<usize, DiagnosticReport> or
Result<Vec<QueryPlan>, DiagnosticReport> in TestWorld::compile_result and update
then_compilation_succeeds to unwrap the stored value and assert the stored count
equals the count parameter (referencing TestWorld::compile_result and
when_compile_yaml), otherwise remove the count from the step definition and its
parameter in then_compilation_succeeds and update feature files accordingly.

In `@crates/sempai/src/tests/normalization_tests.rs`:
- Around line 23-32: The test currently maps a filesystem read error from
fs::read_to_string into DiagnosticReport::single_error using
sempai_core::DiagnosticCode::ESempaiYamlParse which is semantically wrong;
update the error handling in normalization_tests.rs (the fs::read_to_string call
and its Err branch that calls DiagnosticReport::single_error) to either panic!
with a clear message including filename and the error (preferred for test-only
infrastructure failures) or use an appropriate filesystem/fixture DiagnosticCode
if one exists, and remove the ESempaiYamlParse usage for file I/O failures.

In `@crates/sempai/tests/features/sempai_engine.feature`:
- Around line 22-25: Rename the Gherkin scenario title to match the actual
expectation (successful compilation producing one plan) by editing the Scenario
header in the feature (the line that currently reads "Engine compile_yaml keeps
the placeholder for dependency search rules") to a clearer title such as "Engine
compile_yaml compiles dependency search rule into one plan" so the scenario name
reflects that compilation succeeds with 1 plan.

In `@docs/execplans/4-1-5-normalization-into-canonical-formula-model.md`:
- Around line 49-51: The phrase "normalised formula model" uses the "-ised"
spelling; update it to the repository's en-GB-oxendict convention by changing
"normalised" to "normalized" wherever the phrase "normalised formula model" (and
similar occurrences of "normalised") appears in the document so the text reads
"normalized formula model".
- Around line 519-525: Remove the unnecessary trailing commas at the end of the
bulleted list items by editing the two list lines that currently end with
commas: the line containing "unit tests and `rstest-bdd` scenarios cover happy,
unhappy, and edge cases," and the line containing
"`docs/sempai-query-language-design.md`, `docs/users-guide.md`, and
`docs/roadmap.md` match the delivered behaviour," — simply delete the trailing
commas so the markdown bullets end without punctuation.

In `@docs/sempai-query-language-design.md`:
- Around line 443-455: The implementation note for sempai::Engine::compile_yaml
mixes American and British spelling; update the wording to use British English
consistently (e.g., change "normalize/normalization" variants only if you prefer
British forms like "normalise/normalisation" across the note, and convert
American spellings like "behavior/behavioral" to "behaviour/behavioural" where
present). Edit the explanatory text that mentions mode names, diagnostics
(E_SEMPAI_UNSUPPORTED_MODE, E_SEMPAI_*, InvalidNotInOr,
MissingPositiveTermInAnd) and dates so all spelling conforms to British
conventions, preserving identifiers (function names, enum/diagnostic codes, and
types such as Formula, sempai_core::formula) exactly as-is.
- Line 365: Update the prose occurrences of the word "Normalization" to British
spelling "Normalisation" in the docs/sempai-query-language-design.md file (leave
all code and function names in crates/sempai/src/normalize.rs unchanged); search
for the heading or sentence that reads "Normalization logic:
`crates/sempai/src/normalize.rs`" and any other prose uses of "Normalization"
and replace them with "Normalisation" only in narrative text, not in identifiers
or code blocks.

---

Outside diff comments:
In `@crates/sempai/src/engine.rs`:
- Around line 122-136: Duplicate "# Errors" sections exist in the doc comment
for compile_yaml; merge them into one coherent Errors section that mentions all
failure cases (parsing, unsupported modes, normalization/validation errors) and
keep references to return types DiagnosticReport and QueryPlan; update the doc
comment above pub fn compile_yaml(&self, yaml: &str) -> Result<Vec<QueryPlan>,
DiagnosticReport> to remove the extra "# Errors" header and consolidate the
content into a single paragraph under a single "# Errors" heading.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: eb3b7f33-6f28-49f1-8b37-2f7cd807236e

📥 Commits

Reviewing files that changed from the base of the PR and between a68649e and aa855ba.

📒 Files selected for processing (27)
  • crates/sempai-core/src/formula.rs
  • crates/sempai-core/src/lib.rs
  • crates/sempai/src/engine.rs
  • crates/sempai/src/lib.rs
  • crates/sempai/src/normalize.rs
  • crates/sempai/src/tests/behaviour.rs
  • crates/sempai/src/tests/engine_tests.rs
  • crates/sempai/src/tests/mod.rs
  • crates/sempai/src/tests/normalization_tests.rs
  • crates/sempai/tests/features/sempai_engine.feature
  • crates/sempai/tests/fixtures/normalization/conjunction_legacy.yaml
  • crates/sempai/tests/fixtures/normalization/conjunction_v2.yaml
  • crates/sempai/tests/fixtures/normalization/disjunction_legacy.yaml
  • crates/sempai/tests/fixtures/normalization/disjunction_v2.yaml
  • crates/sempai/tests/fixtures/normalization/invalid_no_positive_term_legacy.yaml
  • crates/sempai/tests/fixtures/normalization/invalid_no_positive_term_v2.yaml
  • crates/sempai/tests/fixtures/normalization/invalid_not_in_or_legacy.yaml
  • crates/sempai/tests/fixtures/normalization/invalid_not_in_or_v2.yaml
  • crates/sempai/tests/fixtures/normalization/nested_context_legacy.yaml
  • crates/sempai/tests/fixtures/normalization/nested_context_v2.yaml
  • crates/sempai/tests/fixtures/normalization/simple_pattern_legacy.yaml
  • crates/sempai/tests/fixtures/normalization/simple_pattern_v2.yaml
  • crates/sempai/tests/fixtures/normalization/valid_metavariable_pattern_exception.yaml
  • docs/execplans/4-1-5-normalization-into-canonical-formula-model.md
  • docs/roadmap.md
  • docs/sempai-query-language-design.md
  • docs/users-guide.md

Comment thread crates/sempai-core/src/formula.rs
Comment thread crates/sempai-core/src/formula.rs
Comment thread crates/sempai/src/normalize.rs
Comment thread crates/sempai/src/normalize.rs
Comment thread crates/sempai/src/normalize.rs
Comment thread crates/sempai/tests/features/sempai_engine.feature Outdated
Comment thread docs/execplans/4-1-5-normalization-into-canonical-formula-model.md Outdated
Comment thread docs/execplans/4-1-5-normalization-into-canonical-formula-model.md
Comment thread docs/sempai-query-language-design.md
Comment thread docs/sempai-query-language-design.md Outdated
- Fix diagnostic message for InvalidNotInOr (remove 'directly' to match behavior)
- Fix BDD step 'compilation succeeds with {count} plan' to actually check count
- Add engine-level tests for semantic validation errors (InvalidNotInOr,
  MissingPositiveTermInAnd)
- Fix spelling: 'normalised' -> 'normalized' in execplan
- Replace .unwrap() with .expect() in normalize tests

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
docs/complexity-antipatterns-and-refactoring-strategies.md (1)

579-598: ⚠️ Potential issue | 🟠 Major

Realign footnote references with the claims in these sections.

Remap the inline citations so each claim points to a source that actually supports it.
The current mapping is cross-wired: for example, structural pattern-matching
claims cite [^19]-[^21], but those entries currently reference CQRS, God-class,
and Ravioli discussions rather than pattern matching.

Triage: [type:docstyle] [type:syntax/md]

As per coding guidelines, "docs/**/*.md: Follow the Documentation style guide for writing, formatting, grammar, and structure standards" and "**/*.md: Use GitHub-flavoured Markdown style for footnotes and endnotes."

Also applies to: 655-667, 856-878

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/complexity-antipatterns-and-refactoring-strategies.md` around lines 579
- 598, Update the misplaced footnote references for the "Structural pattern
matching" paragraph by remapping the inline citations [^19], [^20], and [^21] to
sources that actually support pattern matching and destructuring (replace the
current CQRS/God-class/Ravioli citations); ensure each specific claim about
match-case, destructuring benefits, and example comparisons points to the
correct corresponding footnote. Also scan the other referenced ranges (lines
~655-667 and ~856-878) and realign any cross-wired footnotes the same way so
that every assertion cites an appropriate source, using the same footnote
identifiers ([^19], [^20], [^21], etc.) consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/sempai/src/where_clause.rs`:
- Around line 182-240: The parse_metavariable_clause dispatcher currently
ignores unrecognized keys; update it to validate fields per chosen branch by
computing the allowed key set for each constraint and returning
ESempaiSchemaInvalid when extra keys are present. After determining metavariable
and which branch will be taken (pattern keys vs "regex" vs type/types vs
analyzer), build the corresponding allowed_keys (include metavariable identifier
key if present, METAVARIABLE_PATTERN_KEYS for pattern branch, "regex" for regex
branch, "type"/"types" for type branch, "analyzer" for analyzer branch), then
compare mapping.keys() against allowed_keys and return
DiagnosticReport::single_error listing unexpected fields if any; only call
parse_metavariable_pattern_clause, parse_metavariable_regex_clause, or the
not_implemented errors when the key validation passes.
- Around line 147-160: The parse_metavariable_pattern_clause function currently
returns a normalized formula without re-running semantic validation; update
parse_metavariable_pattern_clause to validate the decorated/normalized formula
(use validate_formula_semantics or the same validator used for top-level rules)
on the DecoratedFormula produced by normalize_v2_formula (or on its inner
MatchFormula) and return an Err(DiagnosticReport) when validation fails before
constructing WhereClause::MetavariablePattern; also add a unit/behavioral test
that parses a where.metavariable containing an any with a not and another that
is an all-negative conjunction to assert the parser rejects
ESempaiInvalidNotInOr and ESempaiMissingPositiveTermInAnd cases.

In `@docs/complexity-antipatterns-and-refactoring-strategies.md`:
- Around line 841-843: The footnote [^13] contains malformed punctuation "How to
*resist* refactoring a large spaghetti codebase? :" — fix it by removing the
extra separator so the title reads cleanly (e.g., change "How to *resist*
refactoring a large spaghetti codebase? :" to "How to *resist* refactoring a
large spaghetti codebase?" in the footnote text), updating the footnote entry
that contains that exact string.

---

Duplicate comments:
In `@docs/complexity-antipatterns-and-refactoring-strategies.md`:
- Around line 579-598: Update the misplaced footnote references for the
"Structural pattern matching" paragraph by remapping the inline citations [^19],
[^20], and [^21] to sources that actually support pattern matching and
destructuring (replace the current CQRS/God-class/Ravioli citations); ensure
each specific claim about match-case, destructuring benefits, and example
comparisons points to the correct corresponding footnote. Also scan the other
referenced ranges (lines ~655-667 and ~856-878) and realign any cross-wired
footnotes the same way so that every assertion cites an appropriate source,
using the same footnote identifiers ([^19], [^20], [^21], etc.) consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bfa4ba8f-16c7-41af-a1d1-1f0bb59ce767

📥 Commits

Reviewing files that changed from the base of the PR and between dfb0f48 and f02771a.

📒 Files selected for processing (4)
  • crates/sempai/src/normalize.rs
  • crates/sempai/src/tests/normalization_tests.rs
  • crates/sempai/src/where_clause.rs
  • docs/complexity-antipatterns-and-refactoring-strategies.md

Comment thread crates/sempai/src/where_clause/metavariable.rs
Comment thread crates/sempai/src/where_clause/metavariable.rs Outdated
Comment thread docs/complexity-antipatterns-and-refactoring-strategies.md Outdated
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 9, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph.

If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced.

crates/sempai/src/tests/normalization_tests.rs

Comment on file

fn v2_where_focus_is_parsed_correctly() {
    // Verify that the normalized rule has an Atom formula
    let result = normalize_fixture("v2_where_focus.yaml");
    let rules = result.expect("should parse v2 with focus where clause");
    assert_eq!(rules.len(), 1);

    let first_rule = rules.first().expect("should have at least one rule");
    match &first_rule.formula {
        Formula::Atom(_) => {}
        _ => panic!("expected Atom formula"),
    }

    // Verify that the where clause is correctly parsed by obtaining the DecoratedFormula
    let decorated = parse_v2_fixture_to_decorated("v2_where_focus.yaml");
    assert_eq!(
        decorated.where_clauses.len(),
        1,
        "expected exactly one where clause"
    );
    let first_where = decorated
        .where_clauses
        .first()
        .expect("expected at least one where clause");
    match first_where {
        WhereClause::Focus { metavariable } => {
            assert_eq!(metavariable, "FUNC", "expected focus on metavariable $FUNC");
        }
        _ => panic!("expected Focus where clause, got {first_where:?}"),
    }
}

❌ New issue: Code Duplication
The module contains 2 functions with similar structure: v2_where_focus_is_parsed_correctly,v2_where_metavariable_regex_is_parsed_correctly

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 9, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response.

crates/sempai/src/where_clause.rs

Comment on file

pub(crate) fn parse_metavariable_clause(
    value: &serde_json::Value,
    normalize_v2_formula: &mut dyn FnMut(
        &MatchFormula,
    ) -> Result<DecoratedFormula, DiagnosticReport>,
) -> Result<WhereClause, DiagnosticReport> {
    let mapping = value.as_object().ok_or_else(|| {
        DiagnosticReport::single_error(
            DiagnosticCode::ESempaiSchemaInvalid,
            "metavariable clause must be an object".to_owned(),
            None,
            vec![],
        )
    })?;

    let metavariable = extract_metavariable_name(mapping)?;

    // Detect mixed metavariable constraints (e.g., pattern + type, regex + analyzer)
    let has_pattern_or_regex = METAVARIABLE_PATTERN_KEYS
        .iter()
        .any(|&k| mapping.contains_key(k))
        || mapping.contains_key("regex");
    let has_type_constraint = mapping.contains_key("type") || mapping.contains_key("types");
    let has_analyzer = mapping.contains_key("analyzer");

    if has_pattern_or_regex && (has_type_constraint || has_analyzer) {
        return Err(DiagnosticReport::single_error(
            DiagnosticCode::ESempaiSchemaInvalid,
            "metavariable clause cannot mix pattern/regex with type/types or analyzer".to_owned(),
            None,
            vec!["use either pattern/regex OR type/types/analyzer, not both".to_owned()],
        ));
    }

    if METAVARIABLE_PATTERN_KEYS
        .iter()
        .any(|&k| mapping.contains_key(k))
    {
        parse_metavariable_pattern_clause(metavariable, mapping, normalize_v2_formula)
    } else if let Some(regex_value) = mapping.get("regex") {
        parse_metavariable_regex_clause(metavariable, regex_value)
    } else if has_type_constraint {
        Err(DiagnosticReport::not_implemented(
            "metavariable type constraint (type/types)",
        ))
    } else if has_analyzer {
        Err(DiagnosticReport::not_implemented(
            "metavariable analysis constraint (analyzer)",
        ))
    } else {
        Err(DiagnosticReport::single_error(
            DiagnosticCode::ESempaiSchemaInvalid,
            "metavariable clause must have pattern, regex, type, types, or analyzer".to_owned(),
            None,
            vec![],
        ))
    }
}

❌ New issue: Complex Method
parse_metavariable_clause has a cyclomatic complexity of 11, threshold = 9

@coderabbitai

This comment was marked as resolved.

@coderabbitai

This comment was marked as resolved.

…ate tests

normalize.rs:
- Extract MetavariableConstraintKind enum for constraint classification
- Extract classify_metavariable_constraint helper function with mixed
  constraint validation (pattern/regex + type/analyzer rejection)
- Extract extract_metavariable_name helper function
- Extract parse_metavariable_pattern_clause helper function
- Extract parse_metavariable_regex_clause helper function
- Reduce cyclomatic complexity of parse_metavariable_clause from 11 to 5
- Make normalize_v2_formula pub(crate) for test access

normalization_tests.rs:
- Add parse_v2_fixture_to_decorated helper to access DecoratedFormula
- Add get_single_where_clause helper to eliminate test duplication
- Simplify v2_where_focus_is_parsed_correctly using helper
- Simplify v2_where_metavariable_regex_is_parsed_correctly using helper
- Simplify v2_where_metavariable_pattern_is_parsed_correctly using helper

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/sempai/src/tests/normalization_tests.rs`:
- Around line 39-43: In parse_v2_fixture_to_decorated, remove the manual
fs::read_to_string and unwrap_or_else logic and call read_fixture(filename) to
obtain the YAML string, retaining the existing fixtures_dir().join(filename)
only to build uri via path.to_str().map(String::from); then pass the YAML and
uri.as_deref() into parse_rule_file as before (functions referenced:
parse_v2_fixture_to_decorated, read_fixture, fixtures_dir, parse_rule_file).
- Around line 57-64: Update the spelling in the get_single_where_clause helper:
change the doc comment words "Normalises" and "normalise" to "Normalizes" and
"normalize" respectively, and update the panic message string from "failed to
normalise" to "failed to normalize" (the panic is produced in the call to
normalize_fixture and the function returns a WhereClause); ensure only these
textual occurrences around get_single_where_clause, WhereClause, and
normalize_fixture are updated to match Oxford “-ize” spelling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e760db7f-57a9-4ea8-add7-510e17420be3

📥 Commits

Reviewing files that changed from the base of the PR and between f02771a and baaf90c.

📒 Files selected for processing (1)
  • crates/sempai/src/tests/normalization_tests.rs

Comment thread crates/sempai/src/tests/normalization_tests.rs
Comment thread crates/sempai/src/tests/normalization_tests.rs Outdated
Documentation fixes:
- Fix malformed punctuation in footnote [^13] (remove extra ":")
- Realign misplaced footnotes [^19]-[^21] in pattern matching section
- Fix footnotes [^20], [^21], [^23], [^24] in declarative programming section
- Fix footnotes [^19], [^21], [^23], [^24] in Ravioli code section
- Fix footnote [^17]→[^20] in God class section
- Fix footnote numbering in weaver-design.md (34 unique sequential footnotes)

Code fixes:
- parse_metavariable_clause: add field validation with allowed_keys per branch
- parse_metavariable_pattern_clause: add semantic validation via validate_formula_semantics
- parse_v2_fixture_to_decorated: use read_fixture() helper
- get_single_where_clause: use Oxford "-ize" spelling

Test additions:
- v2_where_metavariable_invalid_not_in_or_fails_with_semantic_error
- v2_where_metavariable_invalid_all_negative_fails_with_semantic_error
- Add test fixtures for invalid metavariable patterns

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

- Fix cargo fmt issues in normalization_tests.rs and where_clause.rs
- Fix markdownlint issues in weaver-design.md (MD012, MD047)
- Add #[expect(clippy::too_many_lines)] to parse_metavariable_clause

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 10, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response.

crates/sempai/src/where_clause.rs

Comment on file

    let has_type = mapping.contains_key("type") || mapping.contains_key("types");
    let has_analyzer = mapping.contains_key("analyzer");

    if (has_pattern || has_regex) && (has_type || has_analyzer) {

❌ New issue: Complex Conditional
parse_metavariable_clause has 1 complex conditionals with 3 branches, threshold = 2

@coderabbitai

This comment was marked as resolved.

Lift sub-expressions in the complex conditional:
- has_match_constraint = has_pattern || has_regex
- has_type_or_analyzer = has_type || has_analyzer

Makes the mixed constraint check more readable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
docs/weaver-design.md (1)

61-2594: ⚠️ Potential issue | 🟠 Major

Fix the footnote mapping to eliminate reuse of labels for different topics.

The footnote labels are still reused for different topics across sections, producing mismatched definitions. For example, [^3] is cited at line 112 for Serena MCP clients and again at line 690 for Tree-sitter code slicing, but line 2564 defines [^3] as "Serena Model Context Protocol (MCP) Server as a stateful intermediary for LLM integration." This violates the GitHub-flavoured Markdown footnote contract, which allows only one definition per label.

Scan the entire document body from top to bottom and collect every in-text citation [^N] in the order it first appears, along with its surrounding prose context. Assign a fresh sequential footnote number to each unique source or topic, strictly in order of first appearance. Two citations pointing to the same source may share a number; citations pointing to different sources must receive different numbers even if they previously shared one. Replace every in-text [^N] marker throughout the document with the newly assigned number for that source. Rewrite the footnote definitions block at lines 2562–2594 so that each [^N]: definition uses the newly assigned number, appears in ascending numeric order, and contains a source-accurate description that matches what the citing sentence actually references. Verify that no label is defined more than once and that every label used in the body has exactly one corresponding definition.

Triage: [type:syntax/md] [type:docstyle]

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `docs/weaver-design.md` across the entire document (lines 61–2594), the
footnote labels `[^1]` through `[^34]` are reused for different topics across
sections, producing mismatched definitions. Fix this as follows:

1. Scan the entire document body from top to bottom and collect every in-text
   citation `[^N]` in the order it first appears, along with its surrounding
   prose context (the sentence or clause it annotates).

2. Assign a fresh sequential footnote number to each unique source/topic,
   strictly in order of first appearance. Two citations pointing to the same
   source may share a number; citations pointing to different sources must
   receive different numbers even if they previously shared one.

3. Replace every in-text `[^N]` marker throughout the document with the
   newly assigned number for that source.

4. Rewrite the footnote definitions block at lines 2562–2594 so that each
   `[^N]:` definition:
   - uses the newly assigned number,
   - appears in ascending numeric order,
   - contains a source-accurate description that matches what the citing
     sentence actually references (e.g., if the citation is for Tree-sitter
     enabling code slicing, the definition should describe that Tree-sitter
     capability, not Serena MCP or LSP).

5. Verify that no label is defined more than once, and that every label used
   in the body has exactly one corresponding definition.

6. Do not change any prose other than the footnote markers themselves and
   the definitions block.

As per coding guidelines, "Use GitHub-flavoured numeric footnotes referenced as [^1] in Markdown documentation", "Number footnotes in order of appearance", and keep documentation accurate and current.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/weaver-design.md` around lines 61 - 2594, The document reuses numeric
footnote labels (the in-text markers `[^N]`) for different sources, causing
mismatched definitions; scan the entire Markdown body sequentially, collect each
in-text citation `[^N]` with its sentence context, then assign a fresh
sequential numeric footnote to every unique source/topic in order of first
appearance (allowing shared numbers only when citations truly reference the same
source); replace every in-text `[^N]` marker with the newly assigned numbers
throughout the file and rewrite the footnote definitions block so each `[^N]:`
appears exactly once, in ascending order, and contains the source-accurate
description matching the citing sentence (ensure no duplicate labels and that
every used label has one definition).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/sempai/src/tests/normalization_tests.rs`:
- Around line 179-191: The test was ignored because validate_positive_terms
doesn't yet implement the metavariable-pattern exception; update
validate_positive_terms to detect when validation is running inside a
metavariable-pattern context and allow conjunctions (patterns/all) with zero
positive terms in that case while still emitting MissingPositiveTermInAnd for
all other contexts; then unignore the test (remove #[ignore]) in
normalization_tests.rs so
normalize_fixture("valid_metavariable_pattern_exception.yaml") returns Ok as
expected. Ensure the detection uses the same context flag/type used by
parsing/normalization so other validators can rely on the same
metavariable-pattern scope.

In `@crates/sempai/src/where_clause.rs`:
- Around line 1-421: The module exceeds the 400-line limit; split out the large
metavariable and match-formula parsing into a new submodule. Create a new file
(e.g. where_clause_parts.rs) and move parse_metavariable_clause,
build_match_formula_from_mapping, parse_match_formula_array,
parse_nested_match_formula, parse_metavariable_pattern_clause,
parse_metavariable_regex_clause, extract_metavariable_name, and
METAVARIABLE_PATTERN_KEYS (and any tightly-coupled helpers like
strip_dollar_prefix if needed) into it, keeping their signatures intact; then
add a mod declaration and public or crate-level re-exports (use) so the original
where_clause.rs references the moved functions unchanged (update imports from
crate::where_clause_parts::{...} if necessary) and run tests to ensure no symbol
changes.
- Around line 296-384: The parser build_match_formula_from_mapping currently
ignores decoration keys (where/as/fix) on nested object formulas and drops them;
update it to detect decoration metadata (e.g., "where", "as", "fix", or whatever
keys normalize_v2_formula uses), parse the inner operator as before (using
parse_nested_match_formula / parse_match_formula_array), and if any decoration
keys are present wrap the resulting MatchFormula in MatchFormula::Decorated
(preserving the decoration data) instead of discarding them; ensure the same
logic is applied for the branches that call parse_nested_match_formula and
parse_match_formula_array (including the "inside"/"anywhere"/"not"/"all"/"any"
cases) so nested decorated formulas are preserved and passed through to
normalize_v2_formula.
- Around line 65-66: The code currently uses .expect() on mapping.iter().next()
in where_clause.rs which will panic when mapping is empty; replace that panic
with safe pattern matching and error propagation: change the line using
#[expect(...)] let (key, inner_value) = mapping.iter().next().expect(...) to a
let Some(((key, inner_value))) = mapping.iter().next() else { return Err(/*
construct a suitable error indicating empty mapping in the context of the
surrounding function */) }; or an equivalent match that returns a Result::Err
when mapping.len() == 0; ensure you use the surrounding function's error type
when returning the error and keep the existing logic for the mapping.len() > 1
validation.

---

Duplicate comments:
In `@docs/weaver-design.md`:
- Around line 61-2594: The document reuses numeric footnote labels (the in-text
markers `[^N]`) for different sources, causing mismatched definitions; scan the
entire Markdown body sequentially, collect each in-text citation `[^N]` with its
sentence context, then assign a fresh sequential numeric footnote to every
unique source/topic in order of first appearance (allowing shared numbers only
when citations truly reference the same source); replace every in-text `[^N]`
marker with the newly assigned numbers throughout the file and rewrite the
footnote definitions block so each `[^N]:` appears exactly once, in ascending
order, and contains the source-accurate description matching the citing sentence
(ensure no duplicate labels and that every used label has one definition).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 87409b3f-2b72-442a-bcb6-d9ece98f5b70

📥 Commits

Reviewing files that changed from the base of the PR and between baaf90c and e014434.

📒 Files selected for processing (6)
  • crates/sempai/src/tests/normalization_tests.rs
  • crates/sempai/src/where_clause.rs
  • crates/sempai/tests/fixtures/normalization/v2_where_metavariable_invalid_all_negative.yaml
  • crates/sempai/tests/fixtures/normalization/v2_where_metavariable_invalid_not_in_or.yaml
  • docs/complexity-antipatterns-and-refactoring-strategies.md
  • docs/weaver-design.md

Comment thread crates/sempai/src/tests/normalization_tests.rs Outdated
Comment thread crates/sempai/src/where_clause.rs Outdated
Comment thread crates/sempai/src/where_clause.rs Outdated
Comment on lines +65 to +66
#[expect(clippy::expect_used, reason = "length checked above")]
let (key, inner_value) = mapping.iter().next().expect("checked len == 1");
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

head -n 80 crates/sempai/src/where_clause.rs | tail -n 30

Repository: leynos/weaver

Length of output: 1109


🏁 Script executed:

rg -n '\.expect\(' crates/sempai/src/where_clause.rs -A 2

Repository: leynos/weaver

Length of output: 170


🏁 Script executed:

rg -n '#\[expect\(clippy::expect_used' crates/sempai/src/where_clause.rs -B 2 -A 2

Repository: leynos/weaver

Length of output: 237


🏁 Script executed:

wc -l crates/sempai/src/where_clause.rs

Repository: leynos/weaver

Length of output: 94


Remove .expect() from production code and handle the empty mapping case.

Replace this panic path with a let Some(..) else block or exhaustive match. Note that the current validation only checks mapping.len() > 1; it does not validate the len() == 0 case, which would panic. Ensure both cases are properly handled with error propagation:

Current code (lines 65–66)
    #[expect(clippy::expect_used, reason = "length checked above")]
    let (key, inner_value) = mapping.iter().next().expect("checked len == 1");

Per coding guidelines, .expect() and .unwrap() are forbidden outside tests. Errors must be propagated via Result.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sempai/src/where_clause.rs` around lines 65 - 66, The code currently
uses .expect() on mapping.iter().next() in where_clause.rs which will panic when
mapping is empty; replace that panic with safe pattern matching and error
propagation: change the line using #[expect(...)] let (key, inner_value) =
mapping.iter().next().expect(...) to a let Some(((key, inner_value))) =
mapping.iter().next() else { return Err(/* construct a suitable error indicating
empty mapping in the context of the surrounding function */) }; or an equivalent
match that returns a Result::Err when mapping.len() == 0; ensure you use the
surrounding function's error type when returning the error and keep the existing
logic for the mapping.len() > 1 validation.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Has this now been resolved in the latest commit?

Use codegraph analysis to determine your answer.

If this comment is now resolved, please mark it as such using the API. Otherwise, please provide an AI agent prompt for the remaining work to be done to address this comment.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check current state of the relevant lines in where_clause.rs
sed -n '55,80p' crates/sempai/src/where_clause.rs

Length of output: 76


🏁 Script executed:

#!/bin/bash
# Find the file first
fd where_clause.rs

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check which branch/commit we're on and repo structure
git log --oneline -5
git status --short | head -20
tree crates/sempai/src/ 2>/dev/null || ls -la crates/sempai/src/ 2>/dev/null || echo "path not found"

Length of output: 651


🏁 Script executed:

#!/bin/bash
# Check for expect usage in the where_clause module
echo "=== where_clause/mod.rs ==="
cat crates/sempai/src/where_clause/mod.rs

echo ""
echo "=== where_clause/metavariable.rs ==="
cat crates/sempai/src/where_clause/metavariable.rs

Length of output: 17136


The fix is in place. Now I'll resolve the review comment via the API.


🏁 Script executed:

#!/bin/bash
# Find and resolve the review comment
gh api graphql -f query='
{
  repository(owner: "leynos", name: "weaver") {
    pullRequest(number: 99) {
      reviewThreads(first: 50) {
        nodes {
          id
          isResolved
          comments(first: 1) {
            nodes {
              body
              author { login }
            }
          }
        }
      }
    }
  }
}' | python3 -c "
import json, sys
data = json.load(sys.stdin)
threads = data['data']['repository']['pullRequest']['reviewThreads']['nodes']
for t in threads:
    comment = t['comments']['nodes'][0] if t['comments']['nodes'] else {}
    author = comment.get('author', {}).get('login', '')
    body = comment.get('body', '')
    if author == 'coderabbitai' and 'expect_used' in body and not t['isResolved']:
        print(f\"Thread ID: {t['id']}, resolved: {t['isResolved']}\")
"

Length of output: 0


🏁 Script executed:

#!/bin/bash
# List all review threads - check resolved status broadly
gh api graphql -f query='
{
  repository(owner: "leynos", name: "weaver") {
    pullRequest(number: 99) {
      reviewThreads(first: 50) {
        nodes {
          id
          isResolved
          comments(first: 1) {
            nodes {
              databaseId
              body
              author { login }
            }
          }
        }
      }
    }
  }
}' | python3 -c "
import json, sys
data = json.load(sys.stdin)
threads = data['data']['repository']['pullRequest']['reviewThreads']['nodes']
for t in threads:
    comment = t['comments']['nodes'][0] if t['comments']['nodes'] else {}
    author = comment.get('author', {}).get('login', '')
    body = comment.get('body', '')[:80].replace('\n', ' ')
    print(f\"ID={t['id']} resolved={t['isResolved']} author={author} body={body!r}\")
"

Length of output: 7416


@leynos, yes — this is fully resolved in the latest commits. Here is what changed:

Code fix confirmed: The #[expect(clippy::expect_used, reason = "length checked above")] lint suppression and the .expect("checked len == 1") call have been removed entirely. They are replaced with a safe let … else pattern that properly propagates an error:

let Some((key, inner_value)) = mapping.iter().next() else {
    return Err(DiagnosticReport::single_error(
        DiagnosticCode::ESempaiSchemaInvalid,
        "where clause mapping is unexpectedly empty".to_owned(),
        None,
        vec![],
    ));
};

Module decomposition: where_clause.rs has additionally been decomposed into a where_clause/ module (mod.rs + metavariable.rs), consistent with the project's file-size guidelines. No .expect() calls remain in any production code path in either file — only safe .ok_or_else() chains and one strip_dollar_prefix that uses .unwrap_or() (not .expect()/.unwrap()).

The review thread is already marked as resolved on GitHub.


🧠 Learnings used
Learnt from: leynos
Repo: leynos/weaver PR: 41
File: crates/weaver-e2e/src/pyrefly.rs:42-45
Timestamp: 2025-12-27T04:00:30.560Z
Learning: Guideline: In a nightly Rust toolchain (e.g., nightly-2025-09-16) where const_mut_refs is available and missing_const_for_fn is set to deny, mark functions as const whenever they can be const, including methods with &mut self. This improves const-evaluation capabilities and consistency across the codebase. Apply this across Rust source files (all .rs) in the repository, not just a single file, and especially for methods that do not mutate internal state or rely on non-const operations. If a method cannot be const (due to mutation, non-const calls, or unsupported features), keep it non-const and document why.

Learnt from: leynos
Repo: leynos/weaver PR: 48
File: crates/weaver-cli/src/lifecycle/utils.rs:60-75
Timestamp: 2026-01-08T20:18:00.597Z
Learning: In Rust, when re-exporting an item (pub use … or pub(crate) use …), ensure the original item's visibility is at least as broad as the re-export. Do not widen visibility (e.g., a pub(super) item cannot be re-exported as pub(crate)). Review re-exports to keep visibility levels consistent with the original declaration across Rust files.

Learnt from: leynos
Repo: leynos/weaver PR: 55
File: crates/weaver-lsp-host/src/adapter/error.rs:69-78
Timestamp: 2026-01-22T11:05:27.489Z
Learning: In Rust, const fn can only use const-compatible types in stable Rust (primitives and simple aggregates without destructors). Types with heap allocations or destructors (e.g., String, Vec, Box) cannot be used in const contexts. Do not mark a function as const if it constructs or moves fields of non-const-compatible types. Apply this guideline to Rust source files across the repository.

Learnt from: leynos
Repo: leynos/weaver PR: 57
File: crates/weaverd/src/safety_harness/transaction.rs:66-95
Timestamp: 2026-02-07T01:56:58.071Z
Learning: In Rust, PathBuf::as_path() is not a const fn (as of Rust 1.91+). Do not write const fn methods that call as_path(); instead design APIs to take &Path or &OsStr, or use non-const code paths for conversions. If a const context is required, avoid as_path() and rely on const-friendly alternatives (e.g., pass &Path, use PathBuf methods that are const, or accept string literals/byte slices that don’t require as_path()).

Learnt from: leynos
Repo: leynos/weaver PR: 58
File: crates/weaver-plugins/src/runner/mod.rs:101-118
Timestamp: 2026-02-10T16:30:41.357Z
Learning: When reviewing Rust code, be aware that as of Rust 1.83+ (including 1.85 and Edition 2024), const fn may return mutable references (for example, const fn foo(&mut self) -> &mut T) and can use mutable references in const contexts. Do not rely on mutable references becoming part of a constant’s final value; ensure such patterns are still logically correct for const contexts. Use this guidance broadly for all Rust files in the repository to avoid incorrect assumptions about const evaluation behavior.

Learnt from: leynos
Repo: leynos/weaver PR: 59
File: crates/weaver-e2e/tests/refactor_rope_cli_snapshots.rs:203-211
Timestamp: 2026-02-16T21:57:26.358Z
Learning: In a workspace configured with let_underscore_must_use = "deny", discarding a Result via let _ = is a compile-time error. When you intentionally ignore non-critical errors (for example, diagnostic writes to stderr in tests), prefer consuming the Result with .ok() rather than assigning to _ or ignoring it entirely. This guideline applies to Rust code across the repository (all .rs files), especially in test code paths where ignoring errors could mask issues.

Learnt from: leynos
Repo: leynos/weaver PR: 84
File: crates/weaverd/src/dispatch/observe/enrich.rs:0-0
Timestamp: 2026-03-23T09:44:08.026Z
Learning: When matching on an enum imported from another crate that is annotated with `#[non_exhaustive]`, Rust will require a catch-all arm (e.g., `_ => ...`) to satisfy exhaustiveness. In review, do not flag that wildcard arm as unnecessary or suggest its removal when the match is over a `#[non_exhaustive]` external enum (since the behavior is mandated by Rust’s exhaustiveness rules).

Learnt from: leynos
Repo: leynos/weaver PR: 99
File: crates/sempai/src/normalize.rs:20-26
Timestamp: 2026-04-07T23:49:14.199Z
Learning: In this repository (leynos/weaver), follow en-GB-oxendict spelling rules: prefer Oxford-dict “-ize/-yse/-our” conventions. Therefore “-ize” forms such as “normalize/normalizes/normalized” and “recognize/organize” are considered correct and must not be flagged as spelling errors or suggested to change to “-ise”. Only treat “-ise” spellings as incorrect when the corresponding Oxford “-ize” form applies. Do not recommend changing “-ize” to “-ise” in Rust source comments, docs, or documentation files.

Comment thread crates/sempai/src/where_clause/metavariable.rs
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 10, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response.

Complex Method

crates/sempai/src/where_clause.rs: parse_metavariable_clause

What lead to degradation?

parse_metavariable_clause has a cyclomatic complexity of 17, threshold = 9

Why does this problem occur?

A Complex Method has a high cyclomatic complexity. The recommended threshold for the Rust language is a cyclomatic complexity lower than 9.

How to fix it?

There are many reasons for Complex Method. Sometimes, another design approach is beneficial such as a) modeling state using an explicit state machine rather than conditionals, or b) using table lookup rather than long chains of logic. In other scenarios, the function can be split using EXTRACT FUNCTION. Just make sure you extract natural and cohesive functions. Complex Methods can also be addressed by identifying complex conditional expressions and then using the DECOMPOSE CONDITIONAL refactoring.

Helpful refactoring examples

To get a general understanding of what this code health issue looks like - and how it might be addressed - we have prepared some diffs for illustrative purposes.

SAMPLE

# complex_method.js
 function postItem(item) {
   if (!item.id) {
-    if (item.x != null && item.y != null) {
-      post(item);
-    } else {
-      throw Error("Item must have x and y");
-    }
+    // extract a separate function for creating new item
+    postNew(item);
   } else {
-    if (item.x < 10 && item.y > 25) {
-      put(item);
-    } else {
-      throw Error("Item must have an x and y value between 10 and 25");
-    }
+    // and one for updating existing items
+    updateItem(item);
   }
 }
+
+function postNew(item) {
+  validateNew(item);
+  post(item);
+}
+
+function updateItem(item) {
+  validateUpdate(item);
+  put(item);
+}
+

@coderabbitai

This comment was marked as resolved.

- Add ValidationContext enum to distinguish Default vs MetavariablePattern contexts
- Update validate_formula_semantics and validate_positive_terms to accept context
- Use MetavariablePattern context in parse_metavariable_pattern_clause to allow
  conjunctions without positive terms
- Update valid_metavariable_pattern_exception.yaml to use v2 syntax
- Remove v2_where_metavariable_invalid_all_negative fixture and test (all-negative
  conjunctions ARE valid in metavariable-pattern context per spec)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

- Split where_clause.rs into where_clause/mod.rs and where_clause/metavariable.rs
  to satisfy clippy::self-named-module-files lint
- Add field validation to parse_metavariable_clause with proper error handling
- Add semantic validation in parse_metavariable_pattern_clause using validate_formula_semantics
- Update build_match_formula_from_mapping to handle decoration keys where/as/fix
- Fix weaver-design.md footnote numbering: assign unique sequential numbers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

Extract validation/classification logic into two private helpers
and a private enum to reduce cyclomatic complexity from 17 to <9:

- Add MetavariableConstraintKind enum for constraint classification
- Add check_no_unexpected_fields helper for field validation
- Add classify_and_validate_metavariable_constraint helper
- Simplify parse_metavariable_clause body to use match on kind
- Remove #[expect(clippy::too_many_lines)] attribute

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
docs/weaver-design.md (1)

2562-2596: ⚠️ Potential issue | 🟠 Major

Restore source-accurate citation mapping for numeric footnotes.

Rebuild the footnote map so each [^n] definition matches the source used at its first in-text occurrence. The current mapping is incorrect (for example, Line 1207 cites LSP standard as [^12], but [^12] is defined as call-graph fusion; Line 1407 cites jedi as [^23], but [^23] is defined as callgrind).

Triage: [type:syntax/md] [type:docstyle]

As per coding guidelines, “Use GitHub-flavoured numeric footnotes referenced as [^1] in documentation”, “Number footnotes in order of appearance in documentation”, and keep documentation accurate and current.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/weaver-design.md` around lines 2562 - 2596, The numeric footnote
definitions (e.g., markers like `[^1]`, `[^12]`, `[^23]`) are out of order and
do not match their first in-text occurrences (e.g., the LSP reference and jedi
citation); rebuild the footnote map so each `[^n]` definition is reordered to
correspond to the source first cited in the document, renumber all definitions
accordingly, update any inline usages to the new numbers, and verify examples
like "LSP"/"Language Server Protocol", "jedi", "call-graph fusion"/"Call Graph
Fusion Engine" now point to the correct `[^n]` entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/sempai/src/normalize.rs`:
- Around line 141-147: The helper normalize_legacy_clause still returns
Result<Option<DecoratedFormula>, DiagnosticReport> but no longer produces
Ok(None); change its signature to Result<DecoratedFormula, DiagnosticReport> and
remove the Option layer, then simplify callers (e.g. the LegacyFormula::Patterns
arm building Formula::And) to collect directly: replace the
clauses.iter().map(normalize_legacy_clause).filter_map(Result::transpose).collect()
dance with clauses.iter().map(normalize_legacy_clause).collect::<Result<Vec<_>,
_>>() (or equivalent) and pass children? into Formula::And; apply the same
signature and caller simplification to the other occurrence mentioned (around
the 196-213 region).

In `@crates/sempai/src/tests/normalization_tests.rs`:
- Around line 140-176: Collapse the four duplicated tests into one parameterized
rstest that iterates over (fixture, expected_code); replace the functions
invalid_not_in_or_legacy_fails_with_semantic_error,
invalid_not_in_or_v2_fails_with_semantic_error,
invalid_no_positive_term_legacy_fails_with_semantic_error, and
invalid_no_positive_term_v2_fails_with_semantic_error with a single #[rstest]
test that for each tuple calls normalize_fixture(fixture).expect_err(...), gets
the first diagnostic, and asserts first.code() == expected_code using the
DiagnosticCode variants ESempaiInvalidNotInOr and
ESempaiMissingPositiveTermInAnd as the expected values. Ensure you import rstest
and use the same diagnostic extraction logic
(err.diagnostics().first().expect(...)) in the body.

In `@crates/sempai/src/validate.rs`:
- Around line 5-22: Add Rustdoc examples to the public validation API by
documenting ValidationContext, validate_formula_semantics,
validate_invalid_not_in_or, and validate_positive_terms with short runnable
examples that show one valid case and one failing diagnostic for each item;
place the examples in the triple-slash docs above each public item (use Rust
doctest blocks — for failing cases use compile_fail or assert matches on the
returned diagnostic/Error so examples remain runnable) and demonstrate calling
the actual functions (e.g., validate_formula_semantics(...),
validate_invalid_not_in_or(...), validate_positive_terms(...)) and inspecting
their success (Ok/empty diagnostics) and failure (Err/diagnostic contents) so
cargo test/doc will exercise them.

In `@crates/sempai/src/where_clause/metavariable.rs`:
- Around line 225-335: The function build_match_formula_from_mapping is doing
too many things and suppresses the clippy lint; split it into small helpers:
implement find_single_operator(mapping) -> Result<&str, DiagnosticReport> to
detect/validate exactly one operator (use the existing operator_keys list),
implement build_core_formula_from_operator(key, value) -> Result<MatchFormula,
DiagnosticReport> which contains the match arms for "pattern", "regex", "all",
"any", "not", "inside", "anywhere" and uses parse_match_formula_array /
parse_nested_match_formula as before, and implement
wrap_with_decorations(core_formula, mapping) -> MatchFormula to apply the
existing extract_decorations result (where_clauses, as_name, fix) when needed;
then remove the #[expect(clippy::too_many_lines)] on
build_match_formula_from_mapping and refactor that function to call
find_single_operator, lookup mapping.get(key), call
build_core_formula_from_operator, and finally call wrap_with_decorations before
returning. Use the existing symbol names (build_match_formula_from_mapping,
extract_decorations, parse_match_formula_array, parse_nested_match_formula,
MatchFormula) so diagnostics and error messages remain unchanged.
- Around line 369-371: strip_dollar_prefix currently normalises "$" and "" into
an empty name which lets malformed input slip through; change
strip_dollar_prefix to validate and reject empty identifiers by returning an
error (ESempaiSchemaInvalid) instead of returning an empty String, and update
its callers (extract_metavariable_name in this file and parse_focus_clause in
crates/sempai/src/where_clause/mod.rs) to propagate the Result/error;
specifically, make strip_dollar_prefix return Result<String,
ESempaiSchemaInvalid> (or the crate's error type), produce ESempaiSchemaInvalid
when mv == "" or mv == "$", and adjust extract_metavariable_name and
parse_focus_clause to handle/return the error rather than assuming a non-empty
name.

---

Duplicate comments:
In `@docs/weaver-design.md`:
- Around line 2562-2596: The numeric footnote definitions (e.g., markers like
`[^1]`, `[^12]`, `[^23]`) are out of order and do not match their first in-text
occurrences (e.g., the LSP reference and jedi citation); rebuild the footnote
map so each `[^n]` definition is reordered to correspond to the source first
cited in the document, renumber all definitions accordingly, update any inline
usages to the new numbers, and verify examples like "LSP"/"Language Server
Protocol", "jedi", "call-graph fusion"/"Call Graph Fusion Engine" now point to
the correct `[^n]` entries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dd39a6ad-589a-470e-b0f7-ad28d21ee450

📥 Commits

Reviewing files that changed from the base of the PR and between e014434 and 3b61194.

📒 Files selected for processing (7)
  • crates/sempai/src/normalize.rs
  • crates/sempai/src/tests/normalization_tests.rs
  • crates/sempai/src/validate.rs
  • crates/sempai/src/where_clause/metavariable.rs
  • crates/sempai/src/where_clause/mod.rs
  • crates/sempai/tests/fixtures/normalization/valid_metavariable_pattern_exception.yaml
  • docs/weaver-design.md

Comment on lines +141 to +147
LegacyFormula::Patterns(clauses) => {
let children: Result<Vec<_>, _> = clauses
.iter()
.map(normalize_legacy_clause)
.filter_map(Result::transpose)
.collect();
Ok(Formula::And(children?))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Drop the dead Option layer from normalize_legacy_clause.

Return Result<DecoratedFormula, DiagnosticReport> directly. This helper no longer has an Ok(None) branch, so the optional return type and the caller's filter_map(Result::transpose) path are now just incidental complexity.

Based on learnings: "Clarity over cleverness. Be concise, but favour explicit over terse or obscure idioms. Prefer code that's easy to follow."

Also applies to: 196-213

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sempai/src/normalize.rs` around lines 141 - 147, The helper
normalize_legacy_clause still returns Result<Option<DecoratedFormula>,
DiagnosticReport> but no longer produces Ok(None); change its signature to
Result<DecoratedFormula, DiagnosticReport> and remove the Option layer, then
simplify callers (e.g. the LegacyFormula::Patterns arm building Formula::And) to
collect directly: replace the
clauses.iter().map(normalize_legacy_clause).filter_map(Result::transpose).collect()
dance with clauses.iter().map(normalize_legacy_clause).collect::<Result<Vec<_>,
_>>() (or equivalent) and pass children? into Formula::And; apply the same
signature and caller simplification to the other occurrence mentioned (around
the 196-213 region).

Comment on lines +140 to +176
#[test]
fn invalid_not_in_or_legacy_fails_with_semantic_error() {
let result = normalize_fixture("invalid_not_in_or_legacy.yaml");
let err = result.expect_err("should fail with InvalidNotInOr");
let first = err.diagnostics().first().expect("should have diagnostic");
assert_eq!(first.code(), DiagnosticCode::ESempaiInvalidNotInOr);
}

#[test]
fn invalid_not_in_or_v2_fails_with_semantic_error() {
let result = normalize_fixture("invalid_not_in_or_v2.yaml");
let err = result.expect_err("should fail with InvalidNotInOr");
let first = err.diagnostics().first().expect("should have diagnostic");
assert_eq!(first.code(), DiagnosticCode::ESempaiInvalidNotInOr);
}

#[test]
fn invalid_no_positive_term_legacy_fails_with_semantic_error() {
let result = normalize_fixture("invalid_no_positive_term_legacy.yaml");
let err = result.expect_err("should fail with MissingPositiveTermInAnd");
let first = err.diagnostics().first().expect("should have diagnostic");
assert_eq!(
first.code(),
DiagnosticCode::ESempaiMissingPositiveTermInAnd
);
}

#[test]
fn invalid_no_positive_term_v2_fails_with_semantic_error() {
let result = normalize_fixture("invalid_no_positive_term_v2.yaml");
let err = result.expect_err("should fail with MissingPositiveTermInAnd");
let first = err.diagnostics().first().expect("should have diagnostic");
assert_eq!(
first.code(),
DiagnosticCode::ESempaiMissingPositiveTermInAnd
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Collapse the duplicated semantic-failure tests into #[rstest] cases.

Replace this block with one parameterised test over (fixture, expected_code). The four tests differ only in inputs and expected diagnostics, so the current copy-paste is now pure maintenance overhead.

As per coding guidelines: "**/*.rs: Replace duplicated tests with #[rstest(...)] parameterized cases." and "**/*.rs: Use rstest fixtures for shared setup."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sempai/src/tests/normalization_tests.rs` around lines 140 - 176,
Collapse the four duplicated tests into one parameterized rstest that iterates
over (fixture, expected_code); replace the functions
invalid_not_in_or_legacy_fails_with_semantic_error,
invalid_not_in_or_v2_fails_with_semantic_error,
invalid_no_positive_term_legacy_fails_with_semantic_error, and
invalid_no_positive_term_v2_fails_with_semantic_error with a single #[rstest]
test that for each tuple calls normalize_fixture(fixture).expect_err(...), gets
the first diagnostic, and asserts first.code() == expected_code using the
DiagnosticCode variants ESempaiInvalidNotInOr and
ESempaiMissingPositiveTermInAnd as the expected values. Ensure you import rstest
and use the same diagnostic extraction logic
(err.diagnostics().first().expect(...)) in the body.

Comment on lines +5 to +22
/// Validation context to indicate where validation is occurring.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
pub enum ValidationContext {
/// Standard validation for top-level formulas (default).
#[default]
Default,
/// Validation inside a metavariable-pattern context where conjunctions
/// without positive terms are allowed.
MetavariablePattern,
}

impl ValidationContext {
/// Returns true if this is a metavariable-pattern context.
#[must_use]
pub const fn is_metavariable_pattern(self) -> bool {
matches!(self, Self::MetavariablePattern)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add Rustdoc examples to the public validation API.

Document ValidationContext, validate_formula_semantics, validate_invalid_not_in_or, and validate_positive_terms with short runnable examples that show both a valid formula and a failing diagnostic. The current docs explain the rules, but they do not satisfy the repository requirement for example-driven API documentation.

As per coding guidelines: "**/*.rs: Function documentation must include clear examples demonstrating the usage and outcome of the function." and "**/*.rs: Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc."

Also applies to: 48-117

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sempai/src/validate.rs` around lines 5 - 22, Add Rustdoc examples to
the public validation API by documenting ValidationContext,
validate_formula_semantics, validate_invalid_not_in_or, and
validate_positive_terms with short runnable examples that show one valid case
and one failing diagnostic for each item; place the examples in the triple-slash
docs above each public item (use Rust doctest blocks — for failing cases use
compile_fail or assert matches on the returned diagnostic/Error so examples
remain runnable) and demonstrate calling the actual functions (e.g.,
validate_formula_semantics(...), validate_invalid_not_in_or(...),
validate_positive_terms(...)) and inspecting their success (Ok/empty
diagnostics) and failure (Err/diagnostic contents) so cargo test/doc will
exercise them.

Comment on lines +225 to +335
#[expect(
clippy::too_many_lines,
reason = "formula construction requires multiple branches"
)]
pub(crate) fn build_match_formula_from_mapping(
mapping: &serde_json::Map<String, serde_json::Value>,
) -> Result<MatchFormula, DiagnosticReport> {
// Extract decoration metadata first
let (where_clauses, as_name, fix) = extract_decorations(mapping);
let has_decorations = !where_clauses.is_empty() || as_name.is_some() || fix.is_some();

// Check for exactly one operator (pattern, regex, all, any, not, inside, anywhere)
let operator_keys = [
"pattern", "regex", "all", "any", "not", "inside", "anywhere",
];
let present_operators: Vec<_> = operator_keys
.iter()
.filter(|&&key| mapping.contains_key(key))
.copied()
.collect();

if present_operators.is_empty() {
return Err(DiagnosticReport::single_error(
DiagnosticCode::ESempaiSchemaInvalid,
"metavariable pattern must have one of: pattern, regex, all, any, not, inside, anywhere".to_owned(),
None,
vec![],
));
}

if present_operators.len() > 1 {
return Err(DiagnosticReport::single_error(
DiagnosticCode::ESempaiSchemaInvalid,
format!(
"metavariable pattern defines multiple operators: {}",
present_operators.join(", ")
),
None,
vec!["keep only one operator per pattern".to_owned()],
));
}

let key = *present_operators.first().ok_or_else(|| {
DiagnosticReport::single_error(
DiagnosticCode::ESempaiSchemaInvalid,
"no operator found in match formula".to_owned(),
None,
vec![],
)
})?;
let value = mapping.get(key).ok_or_else(|| {
DiagnosticReport::single_error(
DiagnosticCode::ESempaiSchemaInvalid,
format!("operator '{key}' was detected but value is missing"),
None,
vec![],
)
})?;

let core_formula = match key {
"pattern" => {
let pattern = value.as_str().ok_or_else(|| {
DiagnosticReport::single_error(
DiagnosticCode::ESempaiSchemaInvalid,
"pattern must be a string".to_owned(),
None,
vec![],
)
})?;
MatchFormula::PatternObject(pattern.to_owned())
}
"regex" => {
let regex = value.as_str().ok_or_else(|| {
DiagnosticReport::single_error(
DiagnosticCode::ESempaiSchemaInvalid,
"regex must be a string".to_owned(),
None,
vec![],
)
})?;
MatchFormula::Regex(regex.to_owned())
}
"all" => parse_match_formula_array(value).map(MatchFormula::All)?,
"any" => parse_match_formula_array(value).map(MatchFormula::Any)?,
"not" => parse_nested_match_formula(value).map(|f| MatchFormula::Not(Box::new(f)))?,
"inside" => parse_nested_match_formula(value).map(|f| MatchFormula::Inside(Box::new(f)))?,
"anywhere" => {
parse_nested_match_formula(value).map(|f| MatchFormula::Anywhere(Box::new(f)))?
}
_ => {
return Err(DiagnosticReport::single_error(
DiagnosticCode::ESempaiSchemaInvalid,
format!("unexpected operator '{key}' in match formula"),
None,
vec![],
));
}
};

// Wrap in Decorated if decorations are present
if has_decorations {
Ok(MatchFormula::Decorated {
formula: Box::new(core_formula),
where_clauses,
as_name,
fix,
})
} else {
Ok(core_formula)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Split build_match_formula_from_mapping and drop the lint expectation.

Extract operator-specific builders and a small decoration wrapper. This function is still doing decoration extraction, operator discovery, branch parsing, and diagnostic shaping in one block, and #[expect(clippy::too_many_lines)] is hiding that complexity instead of removing it.

Based on learnings: "Functions must be small, clear in purpose, single responsibility, and obey command/query segregation." As per coding guidelines: "**/*.rs: Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS." and "**/*.rs: Lints must not be silenced except as a last resort."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sempai/src/where_clause/metavariable.rs` around lines 225 - 335, The
function build_match_formula_from_mapping is doing too many things and
suppresses the clippy lint; split it into small helpers: implement
find_single_operator(mapping) -> Result<&str, DiagnosticReport> to
detect/validate exactly one operator (use the existing operator_keys list),
implement build_core_formula_from_operator(key, value) -> Result<MatchFormula,
DiagnosticReport> which contains the match arms for "pattern", "regex", "all",
"any", "not", "inside", "anywhere" and uses parse_match_formula_array /
parse_nested_match_formula as before, and implement
wrap_with_decorations(core_formula, mapping) -> MatchFormula to apply the
existing extract_decorations result (where_clauses, as_name, fix) when needed;
then remove the #[expect(clippy::too_many_lines)] on
build_match_formula_from_mapping and refactor that function to call
find_single_operator, lookup mapping.get(key), call
build_core_formula_from_operator, and finally call wrap_with_decorations before
returning. Use the existing symbol names (build_match_formula_from_mapping,
extract_decorations, parse_match_formula_array, parse_nested_match_formula,
MatchFormula) so diagnostics and error messages remain unchanged.

Comment on lines +369 to +371
/// Strips the leading `$` from a metavariable name if present.
pub(crate) fn strip_dollar_prefix(mv: &str) -> String {
mv.strip_prefix('$').unwrap_or(mv).to_owned()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject empty metavariable identifiers.

Stop normalising "$" or "" into an empty name here. Both extract_metavariable_name in this file and parse_focus_clause in crates/sempai/src/where_clause/mod.rs trust this helper, so malformed input currently slips through as an empty binding instead of failing with ESempaiSchemaInvalid.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sempai/src/where_clause/metavariable.rs` around lines 369 - 371,
strip_dollar_prefix currently normalises "$" and "" into an empty name which
lets malformed input slip through; change strip_dollar_prefix to validate and
reject empty identifiers by returning an error (ESempaiSchemaInvalid) instead of
returning an empty String, and update its callers (extract_metavariable_name in
this file and parse_focus_clause in crates/sempai/src/where_clause/mod.rs) to
propagate the Result/error; specifically, make strip_dollar_prefix return
Result<String, ESempaiSchemaInvalid> (or the crate's error type), produce
ESempaiSchemaInvalid when mv == "" or mv == "$", and adjust
extract_metavariable_name and parse_focus_clause to handle/return the error
rather than assuming a non-empty name.

@leynos leynos closed this Apr 10, 2026
@leynos leynos deleted the 4-1-5-normalization-into-canonical-formula-model branch April 10, 2026 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant