Skip to content

4.1.4: Implement mode-aware validation and execution gating for Sempai#94

Merged
leynos merged 7 commits intomainfrom
implement-mode-aware-validation-j3m9gz
Apr 4, 2026
Merged

4.1.4: Implement mode-aware validation and execution gating for Sempai#94
leynos merged 7 commits intomainfrom
implement-mode-aware-validation-j3m9gz

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Mar 28, 2026

Summary

  • Adds mode-aware validation and execution gating in Sempai across YAML parsing and engine validation paths.
  • Introduces engine-side gating so only valid search rules proceed past validation; non-search modes (extract, taint, join, and unknown modes) produce deterministic errors.
  • Extends the YAML parser/model to carry mode-related span information and to preserve the compatibility key r2c-internal-project-depends-on for search mode validation.
  • Adds comprehensive tests (parser, engine, and behavior) and includes a new ExecPlan document for milestone 4.1.4.
  • Updates several design/user/docs artifacts to reflect the new mode-aware behavior and gating plan.

Changes

  • New file: docs/execplans/4-1-4-mode-aware-sempai-validation.md (added)
  • Code changes across crates:
    • crates/sempai-yaml/src/model.rs: added span fields (span, mode_span) and accessor methods for rule/span data.
    • crates/sempai-yaml/src/parser/builders.rs: support for r2c-internal-project-depends-on principal and updated error messaging when multiple principals are defined.
    • crates/sempai-yaml/src/parser/mod.rs: recognizes project_depends_on as part of search/legacy field checks.
    • crates/sempai-yaml/src/raw.rs: deserializes r2c-internal-project-depends-on into the RawRule.
    • crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs: added test for parsing project-depends-on search rule.
    • crates/sempai-yaml/tests/features/sempai_yaml.feature: added scenario that parses a valid dependency search rule.
    • crates/sempai/src/engine.rs: introduced mode-aware gating during YAML compile; validates modes and wires in span-based diagnostics.
    • crates/sempai/src/tests/engine_tests.rs: added tests for project-depends-on gating and unsupported modes (extract/unknown/custom modes).
    • crates/sempai/tests/features/sempai_engine.feature: added engine feature scenarios for mode-aware gating.
  • Documentation updates:
    • docs/roadmap.md: marks 4.1.4 as completed in the roadmap path.
    • docs/sempai-query-language-design.md: updated to reflect mode-handling boundary and project-depends-on compatibility path.
    • docs/users-guide.md: updated to describe new compile_yaml(...) mode-aware behavior and diagnostic expectations.
  • The changes collectively implement the 4.1.4 plan as a concrete feature, with tests and docs reflecting the new behavior.

Rationale

  • The prior 4.1.4 PR was planning-only for a parse-vs-engine boundary. The updated changes implement that boundary: parsing remains responsible for YAML shape and certain compatibility keys, while the engine now gates on mode validity and surfaces deterministic diagnostics for unsupported modes. This ensures observable behavior is well-defined and testable, without conflating parsing with execution semantics.

Testing plan

  • Unit and integration tests cover:
    • Parsing of r2c-internal-project-depends-on as a compatibility key (ProjectDependsOn principal).
    • Engine-side gating: compile_yaml(...) returns a not-implemented placeholder for valid search rules but with mode-aware validation for non-search modes that fail with E_SEMPAI_UNSUPPORTED_MODE.
    • Deterministic diagnostics location: mode span preferred when available, falling back to rule span.
  • Existing tests are extended to reflect the new behavior, and new tests verify behavior for both valid and invalid mode configurations.

Documentation plan

  • Update design docs to reflect the final boundary between parser validation and engine validation.
  • Update user docs to describe the new mode-aware behavior users will observe when calling compile_yaml(...).
  • Ensure the ExecPlan document remains a living guide and is updated with progress and outcomes.

Risks and constraints

  • The gating introduces new diagnostic codes and spans; care was taken to preserve the existing diagnostic surface while adding engine-side mode validation.
  • parser-vs-engine boundary must be maintained to avoid moving mode validation into YAML parsing prematurely.

Context

  • The changes align with the roadmap item 4.1.4 and lock in the mode-aware validation path for Sempai, including compatibility keys and test coverage.

Summary by Sourcery

Documentation: - Add a detailed ExecPlan documenting how Sempai will perform mode-aware validation and execution gating, including constraints, risks, progress tracking, and test and rollout strategy for milestone 4.1.4.

Summary by Sourcery

Introduce mode-aware validation and execution gating for Sempai YAML compilation, while preserving compatibility for dependency search rules and updating documentation and tests accordingly.

New Features:

  • Add support in the YAML parser and model for the Semgrep compatibility key r2c-internal-project-depends-on as a search principal.
  • Expose rule- and mode-level source spans on parsed rules to support precise diagnostics in downstream components.

Enhancements:

  • Add an engine-side validation pass that gates compile_yaml on supported modes, allowing only search rules to proceed and rejecting other modes with deterministic diagnostics.
  • Prefer mode-field spans, falling back to rule spans, when emitting unsupported-mode diagnostics to make failures more actionable.
  • Extend search principal validation to ensure exactly one of legacy search keys, match, or r2c-internal-project-depends-on is defined.

Documentation:

  • Document the new mode-aware compile_yaml behaviour and diagnostics in the user guide and design document, including the parser–engine validation boundary and handling of dependency rules.
  • Mark roadmap item 4.1.4 as complete and add an ExecPlan describing the implementation, risks, and verification steps for mode-aware validation.

Tests:

  • Add unit and BDD tests in both the YAML parser and engine crates to cover dependency search rules, unsupported modes, and the preserved not-implemented behaviour for valid search rules.

📎 Task: https://www.devboxer.com/task/541762c5-fca6-442c-ad49-410edc3daa24

Introduce mode-aware Sempai validation to distinguish between parseable and executable rules. Valid 'search' mode rules proceed to normalization placeholder with NOT_IMPLEMENTED, while non-search modes such as 'extract', 'taint', and 'join' fail with E_SEMPAI_UNSUPPORTED_MODE diagnostics. Enhance diagnostics with stable error codes and source span metadata. Add comprehensive unit and behavioural tests covering happy, unhappy, and edge cases. Update docs to reflect new validation boundaries, user-visible behaviour, and mark roadmap item 4.1.4 complete.

This change separates parser validation from engine validation, preserving compatibility and deterministic error handling per the ExecPlan 4.1.4 specifications.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Mar 28, 2026

Reviewer's Guide

Implements mode-aware validation and execution gating in Sempai by extending the YAML model with rule/mode spans and a new ProjectDependsOn search principal, then adding an engine-side post-parse validator that only allows search rules past validation while deterministically erroring on other modes, with corresponding tests and docs updates (including an ExecPlan and roadmap status).

Sequence diagram for mode-aware compile_yaml validation

sequenceDiagram
    actor User
    participant Engine
    participant SempaiYaml as sempai_yaml_parser
    participant ModeValidator as mode_validator

    User->>Engine: compile_yaml(yaml)
    Engine->>SempaiYaml: parse_rule_file(yaml, None)
    SempaiYaml-->>Engine: RuleFile file or DiagnosticReport

    alt YAML_parse_error_or_schema_error
        SempaiYaml-->>User: DiagnosticReport with E_SEMPAI_YAML_PARSE or E_SEMPAI_SCHEMA_INVALID
    else YAML_parsed_successfully
        Engine->>ModeValidator: validate_supported_modes(&file)
        alt all_rules_have_search_mode
            ModeValidator-->>Engine: Ok
            Engine-->>User: DiagnosticReport::not_implemented("compile_yaml query-plan normalization")
        else first_rule_with_unsupported_mode_found
            ModeValidator-->>Engine: Err(DiagnosticReport::validation_error(ESempaiUnsupportedMode, ...))
            Engine-->>User: DiagnosticReport with E_SEMPAI_UNSUPPORTED_MODE
        end
    end
Loading

Updated class diagram for Sempai rule model and engine mode gating

classDiagram
    class Engine {
        +compile_yaml(yaml: &str) Result~Vec_QueryPlan_ , DiagnosticReport~
        -validate_supported_modes(file: &RuleFile) Result~() , DiagnosticReport~
        -unsupported_mode_diagnostic(rule: &Rule) Option~DiagnosticReport~
        -unsupported_mode_span(rule: &Rule) Option~SourceSpan~
        -rule_mode_name(mode: &RuleMode) &str
    }

    class RuleFile {
        +rules() &Vec~Rule~
    }

    class Rule {
        -id: String
        -mode: RuleMode
        -span: Option~SourceSpan~
        -mode_span: Option~SourceSpan~
        -message: Option~String~
        -languages: Vec~String~
        -severity: Option~RuleSeverity~
        +id() &String
        +rule_span() Option~&SourceSpan~
        +mode() &RuleMode
        +mode_span() Option~&SourceSpan~
        +message() Option~&str~
    }

    class RuleMode {
        <<enumeration>>
        Search
        Extract
        Join
        Taint
        Other(mode: String)
    }

    class SearchQueryPrincipal {
        <<enumeration>>
        Legacy(LegacyFormula)
        Match(MatchFormula)
        ProjectDependsOn(Value)
    }

    class RawRule {
        +id: Option~Spanned~String~~
        +mode: Option~Spanned~String~~
        +pattern: Option~Spanned~String~~
        +patterns: Option~Spanned~Vec~RawLegacyFormulaObject~~
        +pattern_either: Option~Spanned~Vec~RawLegacyFormulaObject~~
        +match_formula: Option~Spanned~RawMatchFormula~~
        +project_depends_on: Option~Spanned~Value~~
        +dest_language: Option~Spanned~String~~
        +extract: Option~Spanned~String~~
        +taint: Option~Spanned~String~~
        +join: Option~Spanned~String~~
    }

    class DiagnosticReport {
        +code: DiagnosticCode
        +message: String
        +primary_span: Option~SourceSpan~
        +notes: Vec~String~
        +not_implemented(feature: &str) DiagnosticReport
        +validation_error(code: DiagnosticCode, message: String, span: Option~SourceSpan~, notes: Vec~String~) DiagnosticReport
        +parser_error(code: DiagnosticCode, message: String, span: Option~SourceSpan~, notes: Vec~String~) DiagnosticReport
    }

    class DiagnosticCode {
        <<enumeration>>
        ESempaiYamlParse
        ESempaiSchemaInvalid
        ESempaiUnsupportedMode
        ENotImplemented
    }

    class SourceSpan {
        +start: usize
        +end: usize
    }

    class QueryPlan

    Engine --> RuleFile : uses
    Engine --> DiagnosticReport : returns
    Engine --> DiagnosticCode : uses
    Engine --> Rule : validates
    RuleFile --> Rule : contains
    Rule --> RuleMode : has
    Rule --> SearchQueryPrincipal : has
    RawRule --> SearchQueryPrincipal : builds
    DiagnosticReport --> DiagnosticCode : has
    Rule --> SourceSpan : uses
    RawRule --> SourceSpan : derives via Spanned
    Engine --> SourceSpan : uses for diagnostics
    Engine --> QueryPlan : will produce in future
Loading

File-Level Changes

Change Details Files
Add engine-side mode-aware validation/gating in compile_yaml so only search rules proceed and others fail with E_SEMPAI_UNSUPPORTED_MODE using precise spans.
  • Update Engine::compile_yaml to parse the rule file, then invoke a new validate_supported_modes step before returning the existing NOT_IMPLEMENTED normalization placeholder.
  • Introduce validate_supported_modes, unsupported_mode_diagnostic, unsupported_mode_span, and rule_mode_name helpers to scan rules and emit DiagnosticReport::validation_error with DiagnosticCode::ESempaiUnsupportedMode for non-search modes.
  • Prefer the mode field span for unsupported-mode diagnostics, falling back to the enclosing rule span when mode span is unavailable.
  • Extend unit tests to cover project-depends-on search rules reaching NOT_IMPLEMENTED, and extract/unknown modes failing with E_SEMPAI_UNSUPPORTED_MODE, plus corresponding BDD feature scenarios in the engine feature file.
crates/sempai/src/engine.rs
crates/sempai/src/tests/engine_tests.rs
crates/sempai/tests/features/sempai_engine.feature
Extend the YAML model and parser to carry rule/mode spans and support the r2c-internal-project-depends-on compatibility principal for search rules.
  • Augment the Rule model with span and mode_span fields and public accessors rule_span and mode_span so consumers (like the engine) can emit span-aware diagnostics.
  • Teach RawRule to deserialize the r2c-internal-project-depends-on field and treat it as a search principal via SearchQueryPrincipal::ProjectDependsOn(Value).
  • Update builders to count project_depends_on as a search principal, enforce exactly one top-level principal among legacy, match, and project_depends_on, and prioritize match, then project_depends_on, then legacy when constructing the principal.
  • Include project_depends_on in has_search_or_legacy_fields so dependency search rules are recognized as search mode.
  • Add parser unit tests and BDD scenarios asserting that dependency search rules parse successfully with the expected principal and mode.
crates/sempai-yaml/src/model.rs
crates/sempai-yaml/src/raw.rs
crates/sempai-yaml/src/parser/builders.rs
crates/sempai-yaml/src/parser/mod.rs
crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs
crates/sempai-yaml/tests/features/sempai_yaml.feature
Document the 4.1.4 mode-aware validation boundary, user-visible compile_yaml behaviour, and mark the roadmap item complete.
  • Add an ExecPlan document describing constraints, risks, decisions, and verification steps for 4.1.4 mode-aware Sempai validation and execution gating.
  • Update the Sempai query-language design doc to note preservation of r2c-internal-project-depends-on as an opaque principal and describe the new engine-side mode-aware validation (including error code, span selection, and first-unsupported-rule semantics).
  • Revise the user guide section on Engine methods to explain that compile_yaml performs parsing plus mode-aware validation, keeps NOT_IMPLEMENTED for valid search rules, and emits E_SEMPAI_UNSUPPORTED_MODE for other modes with mode-focused spans.
  • Mark roadmap item 4.1.4 as completed now that unsupported modes are deterministically rejected and search mode is gated appropriately.
docs/execplans/4-1-4-mode-aware-sempai-validation.md
docs/sempai-query-language-design.md
docs/users-guide.md
docs/roadmap.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

codescene-delta-analysis[bot]

This comment was marked as outdated.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 864c2a1b-30a0-480c-9bf3-41c844008951

📥 Commits

Reviewing files that changed from the base of the PR and between 0b7d368 and 2f35a71.

📒 Files selected for processing (1)
  • crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs

Mode-Aware Validation and Gating for Sempai

This PR implements deterministic mode-aware validation and execution gating across the parser–engine boundary so only search rules proceed; non-search modes (extract, taint, join, and unknown modes) now produce deterministic E_SEMPAI_UNSUPPORTED_MODE diagnostics.

Key Changes

YAML Model & Parser

  • Raw parsing now preserves the compatibility key r2c-internal-project-depends-on (deserialised into RawRule::project_depends_on) and re-exports a typed payload ProjectDependsOnPayload.
  • Added typed payload ProjectDependsOnPayload (validation via TryFrom<Value>, accessors, and conversion to/from Value).
  • SearchQueryPrincipal gains ProjectDependsOn(ProjectDependsOnPayload) to represent dependency-based search principals.
  • Rule now carries optional span and mode_span with accessors so diagnostics may prefer the mode field span.
  • build_search_principal refactored to accept the new compatibility key alongside legacy search keys and match_formula, enforcing explicit mutual-exclusion and precedence; emits an explicit missing-principal diagnostic when none of the required principal keys are present.
  • extract, taint, and join rule builders now reject r2c-internal-project-depends-on with mode-specific schema errors.

Engine-Side Validation & Gating

  • New engine helper module crates/sempai/src/mode_validation.rs (moved out of engine.rs) with pub(crate) fn validate_supported_modes(file: &RuleFile) -> Result<(), DiagnosticReport> that scans rules and returns the first unsupported-mode diagnostic.
  • Diagnostics for unsupported modes use DiagnosticCode::ESempaiUnsupportedMode, include a human-readable mode name, and prefer the mode field span, falling back to the rule span.
  • Engine::compile_yaml modified to run parsing then validate_supported_modes(&file) before returning the existing NOT_IMPLEMENTED placeholder for valid search rules.

API / Code Hygiene

  • Fixed an incorrect const usage: rule_mode_name is a regular fn (no const) to avoid calling non-const methods in const contexts.
  • Re-exported ProjectDependsOnPayload from crates/sempai-yaml/src/lib.rs as part of the public model.

Tests

  • Parser tests:
    • Added positive test for valid r2c-internal-project-depends-on search rules.
    • Added negative tests asserting schema invalid when legacy principals and r2c-internal-project-depends-on co-occur and when required fields in the compatibility payload are missing.
    • Introduced helper assert_schema_invalid(yaml: &str, expected_fragment: &str) to centralise parser diagnostic assertions. Some legacy tests were refactored/parameterised (rstest) per commit messages.
  • Engine tests:
    • Added helper assert_compile_yaml_unsupported_mode(yaml: &str, expected_mode_fragment: &str) to centralise engine diagnostic assertions and replaced duplicated assertions across tests.
    • Added tests asserting ESempaiUnsupportedMode for mode: extract and unknown modes, and a test showing NOT_IMPLEMENTED for valid dependency search rules that reach the normalization placeholder.
  • BDD feature scenarios:
    • New feature scenario to parse a valid dependency search rule; updated engine feature scenarios to assert deterministic unsupported-mode diagnostics and corrected spelling to “normalization”.

Documentation

  • Added execplan docs/execplans/4-1-4-mode-aware-sempai-validation.md (new living exec-plan for 4.1.4) documenting the parser–engine boundary, deterministic diagnostics, principal/key validation including r2c-internal-project-depends-on, and scope/out-of-scope items.
  • Updated docs/sempai-query-language-design.md, docs/users-guide.md, and docs/roadmap.md to reflect the new mode-aware validation pass, diagnostic selection rules (prefer mode span), and that task 4.1.4 is complete. Execplan explicitly spells out DSL/BDD acronyms at first use.

Rationale

Maintain a clean parser vs engine separation: parser validates YAML shape and preserves compatibility keys opaquely; engine enforces mode support and deterministic, first-failure-in-source-order diagnostics. This preserves Semgrep-compatible rules while gating unsupported execution modes at a single well-defined boundary.

Risks & Testing

  • Deterministic diagnostics are enforced by selecting the mode field span when available and returning the first unsupported-rule diagnostic in source order.
  • Tests cover parser acceptance/rejection of the compatibility key, cross-mode field rejection, engine-side gating, deterministic span selection, and schema error messages.
  • Out-of-scope items documented in the execplan: no normalization/query-plan generation or DSL execution implemented in this change.

Walkthrough

Add span metadata to parsed rules; preserve Semgrep-compatible r2c-internal-project-depends-on as ProjectDependsOnPayload; extend parser and tests to accept/reject it by mode; add engine-side mode gating that rejects non-search modes with deterministic diagnostics.

Changes

Cohort / File(s) Summary
Model Extensions
crates/sempai-yaml/src/model.rs
Add pub(crate) span: Option<SourceSpan> and pub(crate) mode_span: Option<SourceSpan> to Rule with pub const fn rule_span() and mode_span() accessors. Re-export ProjectDependsOnPayload and add SearchQueryPrincipal::ProjectDependsOn.
Project Depends-On Payload
crates/sempai-yaml/src/project_depends_on.rs
Introduce ProjectDependsOnPayload with namespace/package accessors, TryFrom<Value> validation, into_inner() and From<...> for Value serialization.
Parser Builders
crates/sempai-yaml/src/parser/builders.rs
Refactor build_search_principal to treat project_depends_on as a mutually-exclusive principal; add reject_project_depends_on helper to emit mode-specific schema errors for non-search modes.
Parser Core & Raw
crates/sempai-yaml/src/parser/mod.rs, crates/sempai-yaml/src/raw.rs
Deserialize r2c-internal-project-depends-on into RawRule.project_depends_on: Option<Spanned<Value>>. Compute and attach mode_span and span when constructing Rule. Extend search-field detection to include project_depends_on.
YAML Public API
crates/sempai-yaml/src/lib.rs
Re-export ProjectDependsOnPayload from the crate public API.
Engine Validation
crates/sempai/src/engine.rs, crates/sempai/src/mode_validation.rs
Capture parsed RuleFile and run validate_supported_modes(&file) before proceeding. Add validate_supported_modes to reject Extract, Join, Taint, and unknown Other(_) modes with ESempaiUnsupportedMode, using mode_span then rule_span for the diagnostic primary span.
Tests
crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs, crates/sempai-yaml/src/tests/parser_tests/mode_tests.rs, crates/sempai-yaml/tests/features/sempai_yaml.feature, crates/sempai/src/tests/engine_tests.rs, crates/sempai/tests/features/sempai_engine.feature
Add positive/negative tests for r2c-internal-project-depends-on parsing and mutual-exclusion; add mode rejection cases for extract/taint; add engine tests asserting ESempaiUnsupportedMode and NOT_IMPLEMENTED behaviour; update feature scenarios and messages (normalisation → normalization).
Docs & Roadmap
docs/execplans/4-1-4-mode-aware-sempai-validation.md, docs/roadmap.md, docs/sempai-query-language-design.md, docs/users-guide.md
Add ExecPlan 4.1.4 and mark done; update design and user docs to describe mode-aware compile-time validation, diagnostic selection, and treatment of r2c-internal-project-depends-on.

Sequence Diagram(s)

sequenceDiagram
    participant User as "User"
    participant Engine as "Engine::compile_yaml"
    participant Parser as "YAML Parser"
    participant Validator as "Mode Validator"
    participant Diagnostic as "Diagnostic Report"

    User->>Engine: Supply YAML
    Engine->>Parser: parse_rule_file(yaml)
    Parser->>Parser: Deserialize rules, capture spans\nParse `project_depends_on` into payload
    Parser-->>Engine: Return RuleFile (with rule_span, mode_span)
    Engine->>Validator: validate_supported_modes(&file)
    Validator->>Validator: Iterate rules, inspect RuleMode
    alt mode == Search
        Validator-->>Engine: Ok
        Engine-->>User: Return NOT_IMPLEMENTED normalization placeholder
    else mode != Search
        Validator->>Diagnostic: Build ESempaiUnsupportedMode (primary span = mode_span or rule_span)
        Diagnostic-->>Engine: Return DiagnosticReport (error)
        Engine-->>User: Report failure with targeted span
    end
Loading

Possibly related PRs

Poem

🔍 Attach spans and keep the old key true,
Preserve dependencies so searches pass through,
Gate modes at the engine, halt non-searched ways,
Point diagnostics at mode spans in steady blaze,
Ship docs and tests — let validation sing anew.

🚥 Pre-merge checks | ✅ 5 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Testing ⚠️ Warning Pull request introduces critical functionality (ProjectDependsOnPayload struct and mode_validation module) without direct unit test coverage for error paths and edge cases. Add unit tests for ProjectDependsOnPayload TryFrom error cases and mode_validation functions with isolated test coverage, including snapshot tests for diagnostic messages.
Developer Documentation ⚠️ Warning Pull request introduces significant new internal APIs and architectural boundaries but fails to update docs/developers-guide.md with necessary implementation guidance. Extend docs/developers-guide.md to document the Sempai parser-engine architecture, mode_validation module purpose, ProjectDependsOnPayload role, span metadata semantics, and mode-gating design decisions.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: implementing mode-aware validation and execution gating for Sempai, which directly aligns with the primary objective across all changed files.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing mode-aware validation, engine-side gating, YAML parser extensions, test additions, and documentation updates—all present in the raw summary.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
User-Facing Documentation ✅ Passed The pull request comprehensively documents new mode-aware validation and execution gating functionality for the Sempai query engine in the users guide.
Module-Level Documentation ✅ Passed Both newly added modules carry clear, informative module-level docstrings explaining their purpose and utility.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch implement-mode-aware-validation-j3m9gz

Comment @coderabbitai help to get the list of available commands and usage tips.

…l-project-depends-on'

- Add support for opaque 'r2c-internal-project-depends-on' search mode in parser and model.
- Introduce source span tracking for rule and mode fields to improve error reporting.
- Implement execution gating in Engine::compile_yaml to reject unsupported modes with detailed diagnostics.
- Update tests and feature specs to cover dependency search rules and mode validation.
- Enhance documentation for new mode-aware validation behavior and error handling.

This enables the system to parse dependency search rules compatibly while gating execution support to supported modes only, improving robustness and user feedback.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
@leynos leynos changed the title 4.1.4: Implement mode-aware validation in Sempai 4.1.4: Implement mode-aware validation and gating for Sempai Mar 29, 2026
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Mar 29, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response.

crates/sempai/src/tests/engine_tests.rs

Comment on file

fn compile_yaml_returns_unsupported_mode_for_extract_rules() {
    let engine = default_engine();
    let result = engine.compile_yaml(concat!(
        "rules:\n",
        "  - id: demo.extract\n",
        "    mode: extract\n",
        "    message: extract foo\n",
        "    languages: [python]\n",
        "    severity: WARNING\n",
        "    dest-language: python\n",
        "    extract: foo($X)\n",
        "    pattern: source($X)\n",
    ));
    let (code, diag) = first_diagnostic_of_err(result);
    assert_eq!(code, DiagnosticCode::ESempaiUnsupportedMode);
    assert!(diag.message().contains("extract"));
    assert!(diag.primary_span().is_some());
}

❌ New issue: Code Duplication
The module contains 2 functions with similar structure: compile_yaml_returns_unsupported_mode_for_extract_rules,compile_yaml_returns_unsupported_mode_for_unknown_modes

@coderabbitai

This comment was marked as resolved.

Refactored compile_yaml_returns_unsupported_mode_for_extract_rules and compile_yaml_returns_unsupported_mode_for_unknown_modes tests to use a new helper function assert_compile_yaml_unsupported_mode. This centralizes code that checks for unsupported mode diagnostics, reduces duplication, and improves test readability.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos leynos marked this pull request as ready for review March 30, 2026 06:55
sourcery-ai[bot]

This comment was marked as resolved.

@coderabbitai coderabbitai Bot added the Roadmap label Mar 30, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e302ab400

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/sempai-yaml/src/parser/mod.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/sempai-yaml/src/model.rs`:
- Around line 228-229: Replace the untyped serde_json::Value in the public enum
variant ProjectDependsOn with a dedicated newtype (e.g.,
ProjectDependsOnPayload) and expose accessors/validation on that type;
specifically, create a tuple struct/newtype that wraps the internal Value (or
better, a concrete typed struct), implement/derive Serialize/Deserialize and
TryFrom<Value>/TryInto<Value> or a custom validate/normalize constructor, then
change the enum variant ProjectDependsOn(Value) to
ProjectDependsOn(ProjectDependsOnPayload) and update any
construction/deserialization sites to go through the newtype’s constructor or
TryFrom so future schema changes can validate/normalize without breaking the
public API.

In `@crates/sempai-yaml/src/parser/builders.rs`:
- Around line 238-260: The new project_depends_on branch currently accepts
r2c-internal-project-depends-on for any rule, causing non-search rules (e.g.,
build_extract_rule, build_join_rule, build_taint_rule) to silently drop the
dependency; change the logic in builders.rs so that before returning
SearchQueryPrincipal::ProjectDependsOn for raw.project_depends_on you validate
the rule's mode and reject it with a schema_error when the mode is not Search
(inspect the rule mode field on raw or the surrounding context), and also add
the same mode-specific validation to the other builders (build_extract_rule,
build_join_rule, build_taint_rule) so they either propagate/validate
raw.project_depends_on or surface an error instead of ignoring it; keep existing
error text style and use schema_error like the other branches.

In `@crates/sempai/src/tests/engine_tests.rs`:
- Around line 65-81: The test
compile_yaml_returns_not_implemented_for_project_depends_on_search_rule and the
diagnostic message in engine.rs use US spelling "normalization" but the BDD
expects en-GB "normalisation"; update the user-facing diagnostic strings in
engine.rs (the messages produced where the NotImplemented diagnostic is created
for project-depends/search rule) to use "normalisation" and then change the
assertion in the test
(compile_yaml_returns_not_implemented_for_project_depends_on_search_rule) to
assert that diag.message() contains "normalisation" so both the code and test
match the en-GB spelling.

In `@docs/sempai-query-language-design.md`:
- Around line 429-435: The doc uses inconsistent spelling ("normalisation" vs
"normalization"); update all occurrences to the Oxford -ization form within this
design note so it's consistent—search for and replace instances near the
`sempai::Engine::compile_yaml` implementation note and related mentions of
`NOT_IMPLEMENTED`, `E_SEMPAI_UNSUPPORTED_MODE`, `primary_span`, and `mode` so
the entire paragraph and neighbouring sentences uniformly use "normalization".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 21404e87-1b16-45a6-bcaa-5b681822a30d

📥 Commits

Reviewing files that changed from the base of the PR and between 57f50f0 and 0e302ab.

📒 Files selected for processing (13)
  • crates/sempai-yaml/src/model.rs
  • crates/sempai-yaml/src/parser/builders.rs
  • crates/sempai-yaml/src/parser/mod.rs
  • crates/sempai-yaml/src/raw.rs
  • crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs
  • crates/sempai-yaml/tests/features/sempai_yaml.feature
  • crates/sempai/src/engine.rs
  • crates/sempai/src/tests/engine_tests.rs
  • crates/sempai/tests/features/sempai_engine.feature
  • docs/execplans/4-1-4-mode-aware-sempai-validation.md
  • docs/roadmap.md
  • docs/sempai-query-language-design.md
  • docs/users-guide.md

Comment thread crates/sempai-yaml/src/model.rs Outdated
Comment thread crates/sempai-yaml/src/parser/builders.rs
Comment thread crates/sempai/src/tests/engine_tests.rs
Comment thread docs/sempai-query-language-design.md
…cy principal

- Introduce `ProjectDependsOnPayload` struct with validated `namespace` and `package` fields.
- Replace loosely typed `ProjectDependsOn(Value)` with strongly typed payload.
- Update parser and builders to deserialize and validate this new payload.
- Reject use of `r2c-internal-project-depends-on` in extract, join, and taint modes.
- Add unit tests checking valid and invalid usage of dependency principal.
- Keep legacy compatibility in search mode for dependency principal.

This change improves safety and clarity when handling internal dependency principals in Semgrep YAML rules.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/sempai-yaml/src/model.rs`:
- Around line 233-313: Move the ProjectDependsOnPayload type and its conversion
impls into a new module named project_depends_on: create a new module file
containing the struct ProjectDependsOnPayload, its methods (namespace, package,
into_inner), the impl TryFrom<Value> for ProjectDependsOnPayload and impl
From<ProjectDependsOnPayload> for Value, keeping the exact logic; then in the
original model module remove the type and conversion impls and re-export it with
a pub mod declaration and a pub use project_depends_on::ProjectDependsOnPayload
so the type remains available at the original module surface. Ensure the new
module has the same imports (serde derives and Value) so the conversions
compile.

In `@crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs`:
- Around line 55-91: Replace the two nearly identical tests
parse_project_depends_on_with_legacy_principal_fails and
parse_project_depends_on_requires_namespace_and_package with a single
parameterized rstest that enumerates both YAML inputs and their expected error
substrings; call first_err_diagnostic for each case and assert the
DiagnosticCode::ESempaiSchemaInvalid and that the returned message contains the
expected substring, keeping has_span assertions; ensure the new test references
the original YAML samples (or strings) and the helper first_err_diagnostic to
preserve behavior.

In `@docs/execplans/4-1-4-mode-aware-sempai-validation.md`:
- Around line 264-266: Update the doc's retrospective variant type for
r2c-internal-project-depends-on so it matches the implemented API: change the
opaque `SearchQueryPrincipal::ProjectDependsOn(serde_json::Value)` reference to
use the actual typed payload name used in code (e.g.
`SearchQueryPrincipal::ProjectDependsOn(ProjectDependsOnPayload)`), ensuring the
variant signature in the docs matches the real type returned/accepted by the
ProjectDependsOn implementation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 917fd672-17d8-4104-ac13-62264651b2a1

📥 Commits

Reviewing files that changed from the base of the PR and between 0e302ab and e1aa3a0.

📒 Files selected for processing (11)
  • crates/sempai-yaml/src/lib.rs
  • crates/sempai-yaml/src/model.rs
  • crates/sempai-yaml/src/parser/builders.rs
  • crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs
  • crates/sempai-yaml/src/tests/parser_tests/mode_tests.rs
  • crates/sempai/src/engine.rs
  • crates/sempai/src/lib.rs
  • crates/sempai/src/mode_validation.rs
  • crates/sempai/tests/features/sempai_engine.feature
  • docs/execplans/4-1-4-mode-aware-sempai-validation.md
  • docs/sempai-query-language-design.md

Comment thread crates/sempai-yaml/src/model.rs Outdated
Comment thread crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs Outdated
Comment thread docs/execplans/4-1-4-mode-aware-sempai-validation.md Outdated
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 3, 2026

@coderabbitai Have the following now been resolved?

Annul any requirements that violate the en-GB-oxendict spelling (-ize / -yse / -our) conventions (for example a request to replace "normalize" with "normalise" or "artefact" with "artifact"), or where the requirement unnecessarily increases cyclomatic complexity.

Please address the comments from this code review:

## Overall Comments
- The new mode-gating helpers (`validate_supported_modes`, `unsupported_mode_diagnostic`, `rule_mode_name`, etc.) make `engine.rs` more complex; consider moving them into a small dedicated validator module and keeping `Engine` focused on the façade wiring in line with the ExecPlan’s "keep files small; split validator code" constraint.
- In `build_search_principal`, the "exactly one top-level query principal" error now mentions `r2c-internal-project-depends-on`, but the "no principal provided" path still falls through legacy-building; consider emitting an explicit missing-principal diagnostic that also names `r2c-internal-project-depends-on` to keep search-mode errors consistent.

## Individual Comments

### Comment 1
<location path="crates/sempai/src/engine.rs" line_range="182-188" />
<code_context>
+        .or_else(|| rule.rule_span().cloned())
+}
+
+const fn rule_mode_name(mode: &RuleMode) -> &str {
+    match mode {
+        RuleMode::Search => "search",
+        RuleMode::Taint => "taint",
+        RuleMode::Join => "join",
+        RuleMode::Extract => "extract",
+        RuleMode::Other(other_mode) => other_mode.as_str(),
+    }
+}
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `const fn` here is not valid because it calls non-const methods like `String::as_str`.

This won’t compile on stable Rust because `String::as_str` isn’t const-evaluable. Since this helper is only used at runtime for diagnostics, it should just be a regular `fn`; removing `const` fixes the compilation error without changing behavior.
</issue_to_address>

### Comment 2
<location path="crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs" line_range="30-31" />
<code_context>
     });
 }

+#[test]
+fn parse_project_depends_on_search_rule() {
+    let yaml = concat!(
+        "rules:\n",
</code_context>
<issue_to_address>
**suggestion (testing):** Add a negative parser test for multiple query principals including `r2c-internal-project-depends-on`

The new test validates that `r2c-internal-project-depends-on` parses as a `ProjectDependsOn` principal and keeps the rule in `search` mode. Please also add a negative test where a rule includes both a legacy principal (e.g. `pattern` or `match`) and `r2c-internal-project-depends-on`, and assert that parsing fails with the expected schema error. This will cover the "exactly one principal" constraint for the new compatibility key.

Suggested implementation:

```rust
#[test]
fn parse_project_depends_on_search_rule() {
    let yaml = concat!(
        "rules:\n",
        "  - id: demo.depends\n",
        "    message: detect vulnerable dependency\n",
        "    languages: [python]\n",
        "    severity: WARNING\n",
        "    r2c-internal-project-depends-on:\n",
        "      namespace: pypi\n",
        "      package: requests\n",
    );

    check_first_rule(yaml, |rule| {
        // assert that the rule is in search mode
        assert_eq!(rule.mode, RuleMode::Search);

        // assert that the principal is parsed as ProjectDependsOn with the expected fields
        match &rule.principal {
            Principal::ProjectDependsOn { namespace, package } => {
                assert_eq!(namespace, "pypi");
                assert_eq!(package, "requests");
            }
            other => panic!("expected ProjectDependsOn principal, got {:?}", other),
        }
    });
}

#[test]
fn parse_project_depends_on_with_legacy_principal_fails() {
    let yaml = concat!(
        "rules:\n",
        "  - id: demo.depends\n",
        "    message: detect vulnerable dependency\n",
        "    languages: [python]\n",
        "    severity: WARNING\n",
        "    pattern: foo()\n",
        "    r2c-internal-project-depends-on:\n",
        "      namespace: pypi\n",
        "      package: requests\n",
    );

    // parsing should fail because the rule has more than one principal
    check_schema_error(
        yaml,
        "expected exactly one principal between pattern/match and r2c-internal-project-depends-on",
    );

```

The edit above assumes the following, which you may need to align with the actual codebase:

1. **Field / type names**
   - `rule.mode` and `RuleMode::Search`:
     - If the `Rule` struct exposes mode differently (e.g. `rule.mode()` or `rule.rule_mode`), update the assertion accordingly.
   - `rule.principal` and `Principal::ProjectDependsOn { namespace, package }`:
     - Adjust the field name (`principal`) and enum variant (`Principal::ProjectDependsOn`) to match the actual AST types used in the parser tests.
     - If the `ProjectDependsOn` struct has additional fields (e.g. version constraints), include them in the pattern and assertions as needed.

2. **Error helper**
   - The negative test uses `check_schema_error(yaml, ...)`:
     - If the test helpers use a different name (e.g. `check_invalid_rule`, `check_error`, `parse_and_expect_error`), swap it out.
     - Update the expected error string to match the real schema error message for the “exactly one principal” constraint; the important part is that the test asserts this specific violation (multiple principals: legacy + `r2c-internal-project-depends-on`).

3. **Debug formatting**
   - If `Principal` does not implement `Debug`, replace `panic!("expected ... got {:?}", other)` with a simpler message (e.g. `panic!("expected ProjectDependsOn principal")`) or any pattern the rest of the tests use for this kind of assertion.

4. **Imports**
   - Ensure `RuleMode` and `Principal` are in scope at the top of `legacy_tests.rs`:
     - e.g. `use crate::ast::{RuleMode, Principal};`
   - If they are already imported elsewhere in this file, no further changes are needed.

These adjustments will make the positive test assert both the `ProjectDependsOn` principal and `search` mode, and the negative test will enforce the “exactly one principal” rule when combining `pattern` (or `match`) with `r2c-internal-project-depends-on`.
</issue_to_address>

### Comment 3
<location path="docs/execplans/4-1-4-mode-aware-sempai-validation.md" line_range="76" />
<code_context>
+- Keep this milestone out of 4.1.5 scope:
+  do not normalize legacy and v2 principals into canonical `Formula`, and do
+  not produce real `QueryPlan` values yet.
+- Keep this milestone out of 4.1.6 scope:
+  do not implement the one-liner DSL parser as part of this work.
+- Add both unit tests and behavioural tests using `rstest-bdd` v0.5.0. Cover
</code_context>
<issue_to_address>
**suggestion (review_instructions):** The acronym “DSL” is used here without being defined on first use.

To follow the style guidance on acronyms, spell out DSL the first time it appears, for example:

"do not implement the one-liner domain-specific language (DSL) parser as part of this work."

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.md`

**Instructions:**
Define uncommon acronyms on first use.

</details>
</issue_to_address>

### Comment 4
<location path="docs/execplans/4-1-4-mode-aware-sempai-validation.md" line_range="328" />
<code_context>
+  - `r2c-internal-project-depends-on`
+- Add unhappy-path unit tests for search rules that declare the search header
+  but none of the allowed principal keys.
+- Extend the BDD feature file with one happy path proving the compatibility key
+  is accepted and one unhappy path proving missing-principal search rules still
+  fail with `E_SEMPAI_SCHEMA_INVALID`.
</code_context>
<issue_to_address>
**suggestion (review_instructions):** The acronym “BDD” is used without an earlier definition in this document.

Please expand BDD at first use, for example:

"Extend the behaviour-driven development (BDD) feature file with one happy path..."

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.md`

**Instructions:**
Define uncommon acronyms on first use.

</details>
</issue_to_address>

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 3, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph.

If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced.

crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs

Comment on file

fn parse_project_depends_on_with_legacy_principal_fails() {
    let yaml = concat!(
        "rules:\n",
        "  - id: demo.depends\n",
        "    message: detect vulnerable dependency\n",
        "    languages: [python]\n",
        "    severity: WARNING\n",
        "    pattern: foo()\n",
        "    r2c-internal-project-depends-on:\n",
        "      namespace: pypi\n",
        "      package: requests\n",
    );

    let (code, message, has_span) = first_err_diagnostic(yaml);
    assert_eq!(code, DiagnosticCode::ESempaiSchemaInvalid);
    assert!(message.contains("exactly one top-level query principal"));
    assert!(has_span);
}

❌ New issue: Code Duplication
The module contains 2 functions with similar structure: parse_project_depends_on_requires_namespace_and_package,parse_project_depends_on_with_legacy_principal_fails

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== mode_validation.rs ==="
cat crates/sempai/src/mode_validation.rs

echo ""
echo "=== legacy_tests.rs ==="
cat crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs

echo ""
echo "=== execplan doc (lines 70-90 and 320-340) ==="
sed -n '70,90p' docs/execplans/4-1-4-mode-aware-sempai-validation.md
echo "..."
sed -n '320,340p' docs/execplans/4-1-4-mode-aware-sempai-validation.md

echo ""
echo "=== build_search_principal in builders.rs ==="
cat crates/sempai-yaml/src/parser/builders.rs

Length of output: 21726


@leynos Yes — all six items are resolved. Here is the breakdown, after inspecting the files directly:

# Comment Status
Overall 1 Move mode-gating helpers out of engine.rs into a dedicated validator module crates/sempai/src/mode_validation.rs exists with an explicit //! doc comment stating it "keeps Engine focused on facade wiring"
Overall 2 Emit an explicit missing-principal diagnostic in build_search_principal that names r2c-internal-project-depends-on ✅ The query_principal_count == 0 branch now emits "search rule is missing a query principal" via search_principal_note(), which lists r2c-internal-project-depends-on as a valid option
1 const fn rule_mode_name won't compile on stable Rust const removed; the function is now a plain fn, suppressing the Clippy lint with #[expect(clippy::missing_const_for_fn, reason = "keep this helper runtime-only to avoid const-eval coupling in diagnostics")]
2 Add a negative parser test for combined legacy principal + r2c-internal-project-depends-on parse_project_depends_on_with_legacy_principal_fails is present in legacy_tests.rs, asserting ESempaiSchemaInvalid, message containing "exactly one top-level query principal", and a present span
3 Define "DSL" on first use in the ExecPlan doc ✅ Now reads "one-liner domain-specific language (DSL) parser"
4 Define "BDD" on first use in the ExecPlan doc ✅ Now reads "behaviour-driven development (BDD) tests" at first occurrence, and again at the later reference

No requirements are annulled: neither suggestion violates en-GB-oxendict spelling ("behaviour-driven development" is already correct en-GB; "domain-specific language" has no spelling conflict), and neither increases cyclomatic complexity.

@coderabbitai

This comment was marked as resolved.

Moved the ProjectDependsOnPayload struct and its implementations from model.rs into a new dedicated module project_depends_on.rs. Updated model.rs to re-export ProjectDependsOnPayload and reference the new module. This improves code organization and separation of concerns in the sempai-yaml crate.

Also updated related tests to use the new module properly and adapted execplan documentation to reflect the new type usage.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

…d helper

Replaced rstest parameterized cases in legacy_tests.rs with individual test functions
that use the assert_schema_invalid helper for clearer and more direct test assertions.
This simplifies test code and improves diagnostic message checking with explicit
assert messages.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs (1)

65-97: 🛠️ Refactor suggestion | 🟠 Major

Replace duplicated negative tests with one parameterized #[rstest].

Collapse Line 66 and Line 84 into one #[rstest] with two #[case] inputs, and keep assert_schema_invalid(...) as the shared assertion path.

♻️ Proposed refactor
-#[test]
-fn parse_project_depends_on_with_legacy_principal_fails() {
-    assert_schema_invalid(
-        concat!(
-            "rules:\n",
-            "  - id: demo.depends\n",
-            "    message: detect vulnerable dependency\n",
-            "    languages: [python]\n",
-            "    severity: WARNING\n",
-            "    pattern: foo()\n",
-            "    r2c-internal-project-depends-on:\n",
-            "      namespace: pypi\n",
-            "      package: requests\n",
-        ),
-        "exactly one top-level query principal",
-    );
-}
-
-#[test]
-fn parse_project_depends_on_requires_namespace_and_package() {
-    assert_schema_invalid(
-        concat!(
-            "rules:\n",
-            "  - id: demo.depends.invalid\n",
-            "    message: detect vulnerable dependency\n",
-            "    languages: [python]\n",
-            "    severity: WARNING\n",
-            "    r2c-internal-project-depends-on:\n",
-            "      namespace: pypi\n",
-        ),
-        "must define string `namespace` and `package` fields",
-    );
-}
+#[rstest]
+#[case::with_legacy_principal(
+    concat!(
+        "rules:\n",
+        "  - id: demo.depends\n",
+        "    message: detect vulnerable dependency\n",
+        "    languages: [python]\n",
+        "    severity: WARNING\n",
+        "    pattern: foo()\n",
+        "    r2c-internal-project-depends-on:\n",
+        "      namespace: pypi\n",
+        "      package: requests\n",
+    ),
+    "exactly one top-level query principal",
+)]
+#[case::missing_package(
+    concat!(
+        "rules:\n",
+        "  - id: demo.depends.invalid\n",
+        "    message: detect vulnerable dependency\n",
+        "    languages: [python]\n",
+        "    severity: WARNING\n",
+        "    r2c-internal-project-depends-on:\n",
+        "      namespace: pypi\n",
+    ),
+    "must define string `namespace` and `package` fields",
+)]
+fn reject_invalid_project_depends_on(#[case] yaml: &str, #[case] expected_fragment: &str) {
+    assert_schema_invalid(yaml, expected_fragment);
+}

As per coding guidelines, "Replace duplicated tests with #[rstest(...)] parameterised cases in Rust."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs` around lines 65 -
97, Replace the two duplicated tests
parse_project_depends_on_with_legacy_principal_fails and
parse_project_depends_on_requires_namespace_and_package with a single
parameterized rstest: add #[rstest] on a new test fn (e.g.,
parse_project_depends_on_invalid_cases) and supply two #[case] inputs containing
the two different YAML payloads and their respective expected error strings
("exactly one top-level query principal" and "must define string `namespace` and
`package` fields"); inside the test call assert_schema_invalid(case_yaml,
case_expected) for the shared assertion path and ensure the rstest crate is
imported/enabled for the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs`:
- Around line 65-97: Replace the two duplicated tests
parse_project_depends_on_with_legacy_principal_fails and
parse_project_depends_on_requires_namespace_and_package with a single
parameterized rstest: add #[rstest] on a new test fn (e.g.,
parse_project_depends_on_invalid_cases) and supply two #[case] inputs containing
the two different YAML payloads and their respective expected error strings
("exactly one top-level query principal" and "must define string `namespace` and
`package` fields"); inside the test call assert_schema_invalid(case_yaml,
case_expected) for the shared assertion path and ensure the rstest crate is
imported/enabled for the file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7a367078-2f1c-4f4d-a926-b3df69184d8f

📥 Commits

Reviewing files that changed from the base of the PR and between e1aa3a0 and 0b7d368.

📒 Files selected for processing (4)
  • crates/sempai-yaml/src/model.rs
  • crates/sempai-yaml/src/project_depends_on.rs
  • crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs
  • docs/execplans/4-1-4-mode-aware-sempai-validation.md

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 3, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph.

If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced.

crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs

Comment on file

fn parse_project_depends_on_with_legacy_principal_fails() {
    assert_schema_invalid(
        concat!(
            "rules:\n",
            "  - id: demo.depends\n",
            "    message: detect vulnerable dependency\n",
            "    languages: [python]\n",
            "    severity: WARNING\n",
            "    pattern: foo()\n",
            "    r2c-internal-project-depends-on:\n",
            "      namespace: pypi\n",
            "      package: requests\n",
        ),
        "exactly one top-level query principal",
    );
}

❌ New issue: Code Duplication
The module contains 2 functions with similar structure: parse_project_depends_on_requires_namespace_and_package,parse_project_depends_on_with_legacy_principal_fails

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs

Length of output: 5425


@leynos — the fix from the previous round has been fully applied: both functions are already irreducible single-call stubs to assert_schema_invalid. CodeScene is now flagging them purely because they share the same call shape (assert_schema_invalid(concat!(…), "…")), not because of any substantive repeated logic.

A further merge with #[rstest] is technically consistent with the file's existing conventions but trades two distinctly named, self-documenting tests for anonymous case slots — a mild detriment to test-failure triage. Both options are provided below.


File: crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs
Symbols: parse_project_depends_on_requires_namespace_and_package, parse_project_depends_on_with_legacy_principal_fails


Fix (if you prefer to act on the finding)

Merge the two tests into a single #[rstest]-parameterised function, consistent with reject_empty_languages in the same file.

#[rstest]
#[case::with_legacy_principal(
    concat!(
        "rules:\n",
        "  - id: demo.depends\n",
        "    message: detect vulnerable dependency\n",
        "    languages: [python]\n",
        "    severity: WARNING\n",
        "    pattern: foo()\n",
        "    r2c-internal-project-depends-on:\n",
        "      namespace: pypi\n",
        "      package: requests\n",
    ),
    "exactly one top-level query principal",
)]
#[case::missing_required_fields(
    concat!(
        "rules:\n",
        "  - id: demo.depends.invalid\n",
        "    message: detect vulnerable dependency\n",
        "    languages: [python]\n",
        "    severity: WARNING\n",
        "    r2c-internal-project-depends-on:\n",
        "      namespace: pypi\n",
    ),
    "must define string `namespace` and `package` fields",
)]
fn reject_invalid_project_depends_on(#[case] yaml: &str, #[case] expected_fragment: &str) {
    assert_schema_invalid(yaml, expected_fragment);
}

Remove the original parse_project_depends_on_with_legacy_principal_fails and parse_project_depends_on_requires_namespace_and_package functions entirely.


AI Coding Agent Prompt

File: crates/sempai-yaml/src/tests/parser_tests/legacy_tests.rs
Symbols: parse_project_depends_on_requires_namespace_and_package,
         parse_project_depends_on_with_legacy_principal_fails

Task: Merge the two #[test] functions into a single #[rstest]-parameterised
function, following the same pattern as `reject_empty_languages` in the same
file.

Steps:

1. Delete the two existing functions:
     parse_project_depends_on_with_legacy_principal_fails
     parse_project_depends_on_requires_namespace_and_package

2. In their place, insert:

     #[rstest]
     #[case::with_legacy_principal(
         concat!(
             "rules:\n",
             "  - id: demo.depends\n",
             "    message: detect vulnerable dependency\n",
             "    languages: [python]\n",
             "    severity: WARNING\n",
             "    pattern: foo()\n",
             "    r2c-internal-project-depends-on:\n",
             "      namespace: pypi\n",
             "      package: requests\n",
         ),
         "exactly one top-level query principal",
     )]
     #[case::missing_required_fields(
         concat!(
             "rules:\n",
             "  - id: demo.depends.invalid\n",
             "    message: detect vulnerable dependency\n",
             "    languages: [python]\n",
             "    severity: WARNING\n",
             "    r2c-internal-project-depends-on:\n",
             "      namespace: pypi\n",
             "      namespace: pypi\n",
         ),
         "must define string `namespace` and `package` fields",
     )]
     fn reject_invalid_project_depends_on(#[case] yaml: &str, #[case] expected_fragment: &str) {
         assert_schema_invalid(yaml, expected_fragment);
     }

   Note: `rstest` is already a dev-dependency of this crate and `#[rstest]` is
   already imported in the parent test module via `use super::*;`. Do not add
   any new imports unless the crate fails to compile.

3. Do not modify `assert_schema_invalid` or any other test in the file.
4. Verify the crate compiles and all tests pass.

Suppression message (if you prefer not to refactor further)

These two test functions — parse_project_depends_on_with_legacy_principal_fails and parse_project_depends_on_requires_namespace_and_package — were the subject of a prior deduplication round that introduced the shared assert_schema_invalid helper. Each function now consists of a single delegating call and tests a categorically distinct failure mode: one asserts that supplying two simultaneous query principals is rejected, while the other asserts that omitting required sub-fields is rejected. The structural similarity CodeScene detects is the unavoidable shape of any parameterised helper call — there is no meaningful logic left to consolidate without collapsing two independently named, self-documenting test entry points into anonymous #[rstest] case slots, which would impede failure triage. The finding should be suppressed as a false positive at this abstraction level.

…dation tests

Refactored existing legacy_tests.rs to use rstest parameterized test cases
for invalid 'r2c-internal-project-depends-on' rules. This replaces multiple
individual test functions with a single parameterized test covering both
legacy principal conflicts and missing required fields, improving test
maintainability and clarity.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
@leynos leynos changed the title 4.1.4: Implement mode-aware validation and gating for Sempai 4.1.4: Implement mode-aware validation and execution gating for Sempai Apr 3, 2026
@leynos leynos merged commit e850d9f into main Apr 4, 2026
11 checks passed
@leynos leynos deleted the implement-mode-aware-validation-j3m9gz branch April 4, 2026 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant