Conversation
OverviewImplements milestone 4.1.5: normalisation into a canonical Formula model. Converts parsed YAML search rules (both legacy Core Model Additions (
|
| Cohort / File(s) | Summary |
|---|---|
Core Formula Model crates/sempai-core/src/formula.rs, crates/sempai-core/src/lib.rs |
Introduced canonical Formula, Atom, DecoratedFormula, and WhereClause types with is_positive_term() classification methods. Re-exported from lib.rs for use across crates. |
Normalisation Pipeline crates/sempai/src/normalize.rs, crates/sempai/src/lib.rs |
Added normalize_rule_file() and normalize_v2_formula() functions that convert legacy pattern* and v2 match principals into canonical Formula instances, with language expansion per-rule. |
Semantic Validation crates/sempai/src/validate.rs |
Implemented validate_formula_semantics() with ValidationContext to enforce deterministic rejection of invalid negation placements (InvalidNotInOr) and conjunctions without positive terms (MissingPositiveTermInAnd). |
Where Clause Parsing crates/sempai/src/where_clause/mod.rs, crates/sempai/src/where_clause/metavariable.rs |
Added where-clause parsing for v2 rules, supporting focus, metavariable-pattern, and metavariable-regex constraints with fallback to not_implemented for comparison and type-based filters. |
Engine Integration crates/sempai/src/engine.rs |
Replaced QueryPlan._plan: () placeholder with formula: Formula field. Modified compile_yaml to execute full normalisation pipeline for search-mode rules instead of returning NOT_IMPLEMENTED. |
Test Framework Updates crates/sempai/src/tests/behaviour.rs, crates/sempai/src/tests/engine_tests.rs, crates/sempai/src/tests/mod.rs |
Updated BDD framework to capture compiled Vec<QueryPlan> results. Added negative test cases for unsupported modes and schema violations. Introduced normalization_tests and normalize_tests modules. |
Normalisation Test Fixtures crates/sempai/tests/fixtures/normalization/*.yaml |
Added paired legacy/v2 rule fixtures for positive equivalence tests (simple_pattern, conjunction, disjunction, nested_context) and negative schema/validation tests (invalid_not_in_or, invalid_no_positive_term, valid_metavariable_pattern_exception, v2 where-clause variants). |
Feature Specification crates/sempai/tests/features/sempai_engine.feature |
Replaced placeholder expectations with success assertions for valid search rules (Then compilation succeeds with 1 plan) and added unsupported-mode validation scenario for mode: taint. |
Documentation & Roadmap docs/execplans/4-1-5-normalization-into-canonical-formula-model.md, docs/roadmap.md, docs/sempai-query-language-design.md, docs/users-guide.md, AGENTS.md, agent-skill/weaver/SKILL.md |
Completed ExecPlan 4.1.5 with full implementation documentation. Updated roadmap checklist and design guide to reflect non-generic DecoratedFormula and string-based Atom variants. Refreshed compile_yaml contract documentation with three-stage pipeline description and new diagnostic codes. Minor formatting/guidance updates. |
Dependency Configuration crates/sempai/Cargo.toml |
Moved serde_json from [dev-dependencies] to [dependencies] to support where-clause JSON parsing in normalisation code. |
Sequence Diagram(s)
sequenceDiagram
actor User
participant Engine as Engine::compile_yaml()
participant Parser as sempai_yaml::Parser
participant Validator as Mode Validator
participant Normaliser as normalize_rule_file()
participant SchemaVal as validate_formula_semantics()
participant Expander as Language Expander
User->>Engine: invoke with YAML text
Engine->>Parser: parse_rule_file(yaml)
Parser-->>Engine: RuleFile | DiagnosticReport
alt Parse Error
Engine-->>User: Err(DiagnosticReport)
else Parse Success
Engine->>Validator: check rule.mode
alt Non-search Mode
Validator-->>Engine: Err(ESempaiUnsupportedMode)
Engine-->>User: Err(DiagnosticReport)
else Search Mode
Validator-->>Engine: Ok()
Engine->>Normaliser: normalize_rule_file(&file)
Normaliser->>Normaliser: for each rule: normalise search principal
Normaliser->>SchemaVal: validate_formula_semantics(&formula)
alt Validation Fails
SchemaVal-->>Normaliser: Err(ESempaiInvalidNotInOr|ESempaiMissingPositiveTermInAnd)
Normaliser-->>Engine: Err(DiagnosticReport)
Engine-->>User: Err(DiagnosticReport)
else Validation Succeeds
SchemaVal-->>Normaliser: Ok()
Normaliser->>Expander: expand per language
Expander-->>Normaliser: Vec<NormalizedSearchRule>
Normaliser-->>Engine: Ok(Vec<NormalizedSearchRule>)
Engine->>Engine: construct QueryPlan per rule with formula
Engine-->>User: Ok(Vec<QueryPlan>)
end
end
end
Possibly Related PRs
- Add sempai_yaml: YAML parser front-end, engine wiring, and tests #85: Introduced canonical
Formulatypes insempai_core; this PR extends that work by implementing the full normalisation pipeline and integrating it intocompile_yaml.
Poem
🎭 From patterns old to formulas new,
Legacy and v2 both pass through,
Canonical form now takes the stage,
Validation guards each branching page,
Search rules emerge, no more delay—
The normalisers' grand ballet!
✨ Finishing Touches
📝 Generate docstrings
- Create stacked PR
- Commit on current branch
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Commit unit tests in branch
4-1-5-normalization-into-canonical-formula-model
Reviewer's GuideIntroduce a canonical Formula model in sempai_core, add a normalization pipeline in sempai that lowers legacy and v2 search rules into this model with semantic validation, and update Engine::compile_yaml, tests, and docs to return real query plans for search-mode rules while gating unsupported modes. Sequence diagram for Engine::compile_yaml normalization pipelinesequenceDiagram
actor Caller
participant Engine
participant YamlParser as SempaiYaml
participant ModeValidator as ModeValidation
participant Normalizer as Normalize
Caller->>Engine: compile_yaml(yaml)
Engine->>YamlParser: parse_rule_file(yaml, None)
YamlParser-->>Engine: RuleFile or DiagnosticReport error
alt YAML parse error
Engine-->>Caller: Err(DiagnosticReport E_SEMPAI_YAML_PARSE or E_SEMPAI_SCHEMA_INVALID)
else YAML parsed
Engine->>ModeValidator: validate_supported_modes(RuleFile)
ModeValidator-->>Engine: Ok or DiagnosticReport error
alt Unsupported mode
Engine-->>Caller: Err(DiagnosticReport E_SEMPAI_UNSUPPORTED_MODE)
else Modes supported
Engine->>Normalizer: normalize_rule_file(&RuleFile)
Normalizer-->>Engine: Vec NormalizedSearchRule or DiagnosticReport error
alt Semantic or normalization error
Engine-->>Caller: Err(DiagnosticReport E_SEMPAI_* including E_SEMPAI_INVALID_NOT_IN_OR or E_SEMPAI_MISSING_POSITIVE_TERM_IN_AND)
else Normalization succeeded
loop For each NormalizedSearchRule
Engine->>Engine: QueryPlan::with_formula(rule_id, language, formula)
end
Engine-->>Caller: Ok(Vec QueryPlan)
end
end
end
Class diagram for canonical Formula model and normalization outputsclassDiagram
class Formula {
<<enum>>
+Atom atom
+Not(Box~DecoratedFormula~) not
+Inside(Box~DecoratedFormula~) inside
+Anywhere(Box~DecoratedFormula~) anywhere
+And(Vec~DecoratedFormula~) and
+Or(Vec~DecoratedFormula~) or
+bool is_positive_term()
}
class Atom {
<<enum>>
+Pattern(String)
+Regex(String)
}
class DecoratedFormula {
+Formula formula
+Vec~WhereClause~ where_clauses
+Option~String~ as_name
+Option~String~ fix
+Option~SourceSpan~ span
+new(formula Formula) DecoratedFormula
+with_where_clause(clause WhereClause) DecoratedFormula
+with_as_name(name String) DecoratedFormula
+with_fix(fix String) DecoratedFormula
+with_span(span SourceSpan) DecoratedFormula
+bool is_positive_term()
}
class WhereClause {
<<enum>>
+Focus(metavariable String)
+MetavariablePattern(metavariable String, formula Formula)
+MetavariableRegex(metavariable String, regex String)
}
class NormalizedSearchRule {
+String rule_id
+Language language
+Formula formula
}
class QueryPlan {
-String rule_id
-Language language
-Formula formula
+language() Language
+formula() &Formula
+with_formula(rule_id String, language Language, formula Formula) QueryPlan
}
class Language {
<<enum>>
+Rust
+Python
+Typescript
+Go
+Hcl
}
class DiagnosticReport {
+Vec~Diagnostic~ diagnostics
+not_implemented(message String) DiagnosticReport
+single_error(code DiagnosticCode, message String, span Option~SourceSpan~, notes Vec~String~) DiagnosticReport
}
Formula --> Atom : uses
Formula --> DecoratedFormula : wraps
DecoratedFormula --> Formula : contains
DecoratedFormula --> WhereClause : has many
NormalizedSearchRule --> Formula : contains
NormalizedSearchRule --> Language : targets
QueryPlan --> Formula : contains
QueryPlan --> Language : targets
DiagnosticReport ..> Formula : validation_errors_on
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/sempai/src/normalize.rs Comment on file fn normalize_simple_pattern_legacy() {
let yaml = concat!(
"rules:\n",
" - id: test.rule\n",
" message: test\n",
" languages: [rust]\n",
" severity: ERROR\n",
" pattern: fn $F($X)\n",
);
let file = sempai_yaml::parse_rule_file(yaml, None).unwrap();
let result = normalize_rule_file(&file).unwrap();
assert_eq!(result.len(), 1);
assert_eq!(result[0].rule_id, "test.rule");
assert_eq!(result[0].language, Language::Rust);
assert!(matches!(result[0].formula, Formula::Atom(Atom::Pattern(_))));
}❌ New issue: Code Duplication |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/sempai/src/tests/normalization_tests.rs Comment on file //! Tests for legacy/v2 formula normalization into canonical `Formula` model.
❌ New issue: Duplicated Assertion Blocks |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Capture the 4.1.5 work in a self-contained plan so implementation can proceed after approval. The plan defines normalization boundaries, semantic checks, fixture coverage, documentation updates, and required validation gates. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
… model This change delivers the normalization layer for Sempai query compilation: - Add canonical Formula model to sempai_core (Formula, Atom, DecoratedFormula, WhereClause) with semantic validation helpers - Implement normalization pass in sempai::normalize that lowers both legacy pattern* syntax and v2 match syntax into the canonical representation - Implement semantic validation: - InvalidNotInOr: rejects negation directly inside disjunctions - MissingPositiveTermInAnd: ensures conjunctions have match producers - Update Engine::compile_yaml to return real QueryPlan values for search-mode rules instead of the previous NOT_IMPLEMENTED placeholder - Add paired fixture tests (legacy/v2 equivalence) and BDD scenarios - Update documentation: sempai-query-language-design.md, users-guide.md, roadmap.md (4.1.5 marked complete), and the execplan itself - Add test lint suppressions for intentional panic patterns in tests The metavariable-pattern exception for MissingPositiveTermInAnd is deferred to a future milestone (test marked #[ignore]). Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
dfc6fc9 to
e20e698
Compare
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Code Duplicationcrates/sempai/src/normalize.rs: What lead to degradation?The module contains 8 functions with similar structure: tests.invalid_not_in_or_detected_legacy,tests.invalid_not_in_or_detected_v2,tests.missing_positive_term_detected_legacy,tests.missing_positive_term_detected_v2 and 4 more functions Why does this problem occur?Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health. How to fix it?A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Duplicated Assertion Blockscrates/sempai/src/tests/normalization_tests.rs: What lead to degradation?The test suite contains 4 functions with duplicated assertion blocks (conjunction_legacy_and_v2_normalize_to_equivalent_formulas,disjunction_legacy_and_v2_normalize_to_equivalent_formulas,nested_context_legacy_and_v2_normalize_to_equivalent_formulas,simple_pattern_legacy_and_v2_normalize_to_equivalent_formulas), threshold = 2 Why does this problem occur?Just like duplicated logic in application code is problematic, duplicated test criteria lead to code that's hard to maintain. Duplicated assertion blocks usually indicate a missing abstraction or a test suite that attempts to test too many things inside the same module. How to fix it?Consider to encapsulate the duplicated assertions (i.e. test criteria) in a custom assert statement that you can then re-use. We also recommend to consider the granularity of the tests; sometimes a single test tests too many things; extracting smaller tests can usually help you get rid of the duplication. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Refactor sempai normalize tests to eliminate duplication: - Add parse_and_normalize helper for success-path tests - Add expect_diagnostic helper for error-path tests - Update all test functions to use new helpers Refactor normalization_tests to eliminate duplication: - Add assert_equivalent_formulas helper for fixture tests - Update four equivalence test functions to use helper Fix markdown lint violations: - Wrap long lines in execplan checklist items - Add blank lines before lists in execplan and design doc Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Code Duplicationcrates/sempai/src/normalize.rs: What lead to degradation?The module contains 8 functions with similar structure: tests.invalid_not_in_or_detected_legacy,tests.invalid_not_in_or_detected_v2,tests.missing_positive_term_detected_legacy,tests.missing_positive_term_detected_v2 and 4 more functions Why does this problem occur?Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health. How to fix it?A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More |
This comment was marked as resolved.
This comment was marked as resolved.
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph. If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced. Code Duplicationcrates/sempai/src/normalize.rs: What lead to degradation?The module contains 8 functions with similar structure: tests.invalid_not_in_or_detected_legacy,tests.invalid_not_in_or_detected_v2,tests.missing_positive_term_detected_legacy,tests.missing_positive_term_detected_v2 and 4 more functions Why does this problem occur?Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health. How to fix it?A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More |
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa855ba909
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 20
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/sempai/src/engine.rs (1)
122-136:⚠️ Potential issue | 🟡 MinorDuplicate
# Errorssection in doc comment.The doc comment for
compile_yamlcontains two# Errorssections (lines 124-126 and 132-135). Merge them into a single section:/// Compiles a YAML rule file into query plans. /// /// # Errors /// - /// Returns a diagnostic report if parsing or validation fails. - /// /// Successfully parsed search-mode rules are normalized into canonical /// formulas and returned as query plans. Other modes (taint, join, extract) /// return unsupported mode diagnostics. - /// - /// # Errors /// /// Returns a diagnostic report if parsing fails, if unsupported modes are /// encountered, or if normalization produces semantic validation errors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sempai/src/engine.rs` around lines 122 - 136, Duplicate "# Errors" sections exist in the doc comment for compile_yaml; merge them into one coherent Errors section that mentions all failure cases (parsing, unsupported modes, normalization/validation errors) and keep references to return types DiagnosticReport and QueryPlan; update the doc comment above pub fn compile_yaml(&self, yaml: &str) -> Result<Vec<QueryPlan>, DiagnosticReport> to remove the extra "# Errors" header and consolidate the content into a single paragraph under a single "# Errors" heading.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai-core/src/formula.rs`:
- Around line 11-14: Update the docstring for the Formula enum to use British
spelling by changing the word "normalized" to "normalised" in the comment that
begins "A canonical formula representing a normalized Semgrep query." so the
sentence reads "A canonical formula representing a normalised Semgrep query.";
locate the comment above the Formula enum in formula.rs and make the single-word
spelling edit.
- Around line 1-5: Update the module docstring in formula.rs to use British
spelling "normalised" instead of "normalized" (i.e., change "normalized Semgrep
queries" to "normalised Semgrep queries") so the top-of-file documentation
matches the requested spelling convention; edit the doc comment block that
begins with "Canonical formula model for normalized Semgrep queries." and update
the subsequent mention as needed.
In `@crates/sempai/src/normalize.rs`:
- Around line 1-516: The file is too long; split out validation and tests:
create a new module file (e.g., validate.rs) containing the three functions
validate_formula_semantics, validate_invalid_not_in_or, and
validate_positive_terms (move their implementations verbatim) and make them
pub(crate) so normalize.rs can call them; in normalize.rs remove those function
definitions and add `mod validate; use validate::{validate_formula_semantics,
validate_invalid_not_in_or, validate_positive_terms};`. Also move the entire
#[cfg(test)] tests module into a new tests file (e.g., tests/normalize_tests.rs
or src/normalize_tests.rs) keeping the test contents unchanged but adjust
imports to `use crate::normalize::*;` (or `use super::*;` if placed as a sibling
module) and preserve the same expect annotations; update module declarations as
needed so the crate builds.
- Around line 84-85: The docstring for normalize_search_principal uses American
spelling ("Normalizes"); update it to British spelling by changing the leading
comment text to "Normalises a search query principal (legacy or v2) into a
canonical Formula." so the function comment uses "Normalises" and otherwise
leave the function signature and behavior unchanged.
- Around line 179-180: Change the docstring for the function
normalize_v2_formula to use British spelling: replace "Normalizes" with
"Normalises" (i.e., update the comment line "Normalizes a v2 match formula into
canonical form." to "Normalises a v2 match formula into canonical form.") so the
documentation follows British spelling conventions.
- Around line 55-56: Docstring for the function normalize_rule uses American
spelling ("Normalizes a single rule."); change it to British spelling by
updating the docstring to "Normalises a single rule." so the documentation for
fn normalize_rule matches the project's British English convention.
- Line 1: The module docstring at the top of the file uses American spelling
"Normalization"; change it to British spelling "Normalisation" so the line reads
"Normalisation of parsed YAML rules into canonical `Formula` model." Update the
module-level comment string in crates/sempai/src/normalize.rs (the top-of-file
doc comment) to use "Normalisation" to comply with en-GB-oxendict guidelines.
- Around line 108-109: Change the docstring for normalize_legacy_formula to use
British spelling: replace "Normalizes a legacy formula into canonical form."
with "Normalises a legacy formula into canonical form." Update the comment
immediately above the fn normalize_legacy_formula(formula: &LegacyFormula) ->
Result<Formula, DiagnosticReport> to reflect this spelling change.
- Around line 312-313: The docstring for function normalize_languages uses
American spelling ("Normalizes"); change it to British spelling by updating the
docstring to "Normalises language strings from the rule into Language enums" so
the function header and surrounding comment use "Normalises" instead of
"Normalizes" (update the comment immediately above fn normalize_languages).
- Around line 20-26: Update the docstring in crates/sempai/src/normalize.rs to
use British (en-GB) spelling: change occurrences of "Normalizes" to "Normalises"
and "normalized" to "normalised" in the function/module docstring (the comment
that begins with "Normalizes a rule file into a vector of normalized search
rules."). Ensure both the title line and any later instances in that docstring
are updated to the en-GB variants while leaving the rest of the text and
formatting unchanged.
- Around line 220-224: The MatchFormula::Decorated arm in normalize_v2_formula
currently drops where_clauses, as_name, and fix; update it to either (A)
propagate decorations by returning a Decorated variant that wraps the normalized
inner formula and preserves where_clauses, as_name, and fix (i.e., construct
MatchFormula::Decorated { formula: normalize_v2_formula(inner), where_clauses:
original.where_clauses.clone(), as_name: original.as_name.clone(), fix:
original.fix.clone(), .. }), or (B) emit a not-implemented diagnostic instead of
silently discarding them (call the project's diagnostic helper like
emit_not_implemented("v2 decorations") or return an appropriate Err/Diagnostic
referencing where_clauses/as_name/fix) so callers are aware; change only the
MatchFormula::Decorated match arm in normalize_v2_formula and ensure you
reference the original fields (where_clauses, as_name, fix) when implementing
the fix.
- Around line 155-169: The Constraint arm in normalize_legacy_clause currently
returns an empty conjunction (Formula::And(vec![])) which later triggers
validate_positive_terms (ESempaiMissingPositiveTermInAnd) and invalidates rules
with legacy metavariable constraints; change the Constraint handling to return a
not-implemented diagnostic instead of an empty DecoratedFormula: in
normalize_legacy_clause match on LegacyClause::Constraint and return an
Err(DiagnosticReport::not_implemented(...)) (or the project's equivalent helper)
describing that constraint normalization / WhereClause conversion is
unimplemented, referencing DecoratedFormula and the
ESempaiMissingPositiveTermInAnd validation to guide the message.
- Around line 171-172: The docstring for the function normalize_legacy_value
uses American spelling ("Normalizes"); update it to British spelling
("Normalises") so the comment reads "Normalises a legacy value (string or
formula) into a formula." Ensure you only change the docstring text for the fn
normalize_legacy_value signature and keep the rest of the comment content
intact.
In `@crates/sempai/src/tests/behaviour.rs`:
- Around line 113-124: The step handler then_compilation_succeeds currently
ignores the {count} parameter because TestWorld::compile_result is mapped to ()
in when_compile_yaml; either persist the plan count (or full Vec<QueryPlan>) in
TestWorld::compile_result so you can assert equality with the incoming count, or
change the step signature and feature to "compilation succeeds" and remove the
count parameter from the step; to implement the first option, modify
when_compile_yaml to store Result<usize, DiagnosticReport> or
Result<Vec<QueryPlan>, DiagnosticReport> in TestWorld::compile_result and update
then_compilation_succeeds to unwrap the stored value and assert the stored count
equals the count parameter (referencing TestWorld::compile_result and
when_compile_yaml), otherwise remove the count from the step definition and its
parameter in then_compilation_succeeds and update feature files accordingly.
In `@crates/sempai/src/tests/normalization_tests.rs`:
- Around line 23-32: The test currently maps a filesystem read error from
fs::read_to_string into DiagnosticReport::single_error using
sempai_core::DiagnosticCode::ESempaiYamlParse which is semantically wrong;
update the error handling in normalization_tests.rs (the fs::read_to_string call
and its Err branch that calls DiagnosticReport::single_error) to either panic!
with a clear message including filename and the error (preferred for test-only
infrastructure failures) or use an appropriate filesystem/fixture DiagnosticCode
if one exists, and remove the ESempaiYamlParse usage for file I/O failures.
In `@crates/sempai/tests/features/sempai_engine.feature`:
- Around line 22-25: Rename the Gherkin scenario title to match the actual
expectation (successful compilation producing one plan) by editing the Scenario
header in the feature (the line that currently reads "Engine compile_yaml keeps
the placeholder for dependency search rules") to a clearer title such as "Engine
compile_yaml compiles dependency search rule into one plan" so the scenario name
reflects that compilation succeeds with 1 plan.
In `@docs/execplans/4-1-5-normalization-into-canonical-formula-model.md`:
- Around line 49-51: The phrase "normalised formula model" uses the "-ised"
spelling; update it to the repository's en-GB-oxendict convention by changing
"normalised" to "normalized" wherever the phrase "normalised formula model" (and
similar occurrences of "normalised") appears in the document so the text reads
"normalized formula model".
- Around line 519-525: Remove the unnecessary trailing commas at the end of the
bulleted list items by editing the two list lines that currently end with
commas: the line containing "unit tests and `rstest-bdd` scenarios cover happy,
unhappy, and edge cases," and the line containing
"`docs/sempai-query-language-design.md`, `docs/users-guide.md`, and
`docs/roadmap.md` match the delivered behaviour," — simply delete the trailing
commas so the markdown bullets end without punctuation.
In `@docs/sempai-query-language-design.md`:
- Around line 443-455: The implementation note for sempai::Engine::compile_yaml
mixes American and British spelling; update the wording to use British English
consistently (e.g., change "normalize/normalization" variants only if you prefer
British forms like "normalise/normalisation" across the note, and convert
American spellings like "behavior/behavioral" to "behaviour/behavioural" where
present). Edit the explanatory text that mentions mode names, diagnostics
(E_SEMPAI_UNSUPPORTED_MODE, E_SEMPAI_*, InvalidNotInOr,
MissingPositiveTermInAnd) and dates so all spelling conforms to British
conventions, preserving identifiers (function names, enum/diagnostic codes, and
types such as Formula, sempai_core::formula) exactly as-is.
- Line 365: Update the prose occurrences of the word "Normalization" to British
spelling "Normalisation" in the docs/sempai-query-language-design.md file (leave
all code and function names in crates/sempai/src/normalize.rs unchanged); search
for the heading or sentence that reads "Normalization logic:
`crates/sempai/src/normalize.rs`" and any other prose uses of "Normalization"
and replace them with "Normalisation" only in narrative text, not in identifiers
or code blocks.
---
Outside diff comments:
In `@crates/sempai/src/engine.rs`:
- Around line 122-136: Duplicate "# Errors" sections exist in the doc comment
for compile_yaml; merge them into one coherent Errors section that mentions all
failure cases (parsing, unsupported modes, normalization/validation errors) and
keep references to return types DiagnosticReport and QueryPlan; update the doc
comment above pub fn compile_yaml(&self, yaml: &str) -> Result<Vec<QueryPlan>,
DiagnosticReport> to remove the extra "# Errors" header and consolidate the
content into a single paragraph under a single "# Errors" heading.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: eb3b7f33-6f28-49f1-8b37-2f7cd807236e
📒 Files selected for processing (27)
crates/sempai-core/src/formula.rscrates/sempai-core/src/lib.rscrates/sempai/src/engine.rscrates/sempai/src/lib.rscrates/sempai/src/normalize.rscrates/sempai/src/tests/behaviour.rscrates/sempai/src/tests/engine_tests.rscrates/sempai/src/tests/mod.rscrates/sempai/src/tests/normalization_tests.rscrates/sempai/tests/features/sempai_engine.featurecrates/sempai/tests/fixtures/normalization/conjunction_legacy.yamlcrates/sempai/tests/fixtures/normalization/conjunction_v2.yamlcrates/sempai/tests/fixtures/normalization/disjunction_legacy.yamlcrates/sempai/tests/fixtures/normalization/disjunction_v2.yamlcrates/sempai/tests/fixtures/normalization/invalid_no_positive_term_legacy.yamlcrates/sempai/tests/fixtures/normalization/invalid_no_positive_term_v2.yamlcrates/sempai/tests/fixtures/normalization/invalid_not_in_or_legacy.yamlcrates/sempai/tests/fixtures/normalization/invalid_not_in_or_v2.yamlcrates/sempai/tests/fixtures/normalization/nested_context_legacy.yamlcrates/sempai/tests/fixtures/normalization/nested_context_v2.yamlcrates/sempai/tests/fixtures/normalization/simple_pattern_legacy.yamlcrates/sempai/tests/fixtures/normalization/simple_pattern_v2.yamlcrates/sempai/tests/fixtures/normalization/valid_metavariable_pattern_exception.yamldocs/execplans/4-1-5-normalization-into-canonical-formula-model.mddocs/roadmap.mddocs/sempai-query-language-design.mddocs/users-guide.md
- Fix diagnostic message for InvalidNotInOr (remove 'directly' to match behavior)
- Fix BDD step 'compilation succeeds with {count} plan' to actually check count
- Add engine-level tests for semantic validation errors (InvalidNotInOr,
MissingPositiveTermInAnd)
- Fix spelling: 'normalised' -> 'normalized' in execplan
- Replace .unwrap() with .expect() in normalize tests
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
docs/complexity-antipatterns-and-refactoring-strategies.md (1)
579-598:⚠️ Potential issue | 🟠 MajorRealign footnote references with the claims in these sections.
Remap the inline citations so each claim points to a source that actually supports it.
The current mapping is cross-wired: for example, structural pattern-matching
claims cite [^19]-[^21], but those entries currently reference CQRS, God-class,
and Ravioli discussions rather than pattern matching.Triage:
[type:docstyle][type:syntax/md]As per coding guidelines, "
docs/**/*.md: Follow the Documentation style guide for writing, formatting, grammar, and structure standards" and "**/*.md: Use GitHub-flavoured Markdown style for footnotes and endnotes."Also applies to: 655-667, 856-878
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/complexity-antipatterns-and-refactoring-strategies.md` around lines 579 - 598, Update the misplaced footnote references for the "Structural pattern matching" paragraph by remapping the inline citations [^19], [^20], and [^21] to sources that actually support pattern matching and destructuring (replace the current CQRS/God-class/Ravioli citations); ensure each specific claim about match-case, destructuring benefits, and example comparisons points to the correct corresponding footnote. Also scan the other referenced ranges (lines ~655-667 and ~856-878) and realign any cross-wired footnotes the same way so that every assertion cites an appropriate source, using the same footnote identifiers ([^19], [^20], [^21], etc.) consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai/src/where_clause.rs`:
- Around line 182-240: The parse_metavariable_clause dispatcher currently
ignores unrecognized keys; update it to validate fields per chosen branch by
computing the allowed key set for each constraint and returning
ESempaiSchemaInvalid when extra keys are present. After determining metavariable
and which branch will be taken (pattern keys vs "regex" vs type/types vs
analyzer), build the corresponding allowed_keys (include metavariable identifier
key if present, METAVARIABLE_PATTERN_KEYS for pattern branch, "regex" for regex
branch, "type"/"types" for type branch, "analyzer" for analyzer branch), then
compare mapping.keys() against allowed_keys and return
DiagnosticReport::single_error listing unexpected fields if any; only call
parse_metavariable_pattern_clause, parse_metavariable_regex_clause, or the
not_implemented errors when the key validation passes.
- Around line 147-160: The parse_metavariable_pattern_clause function currently
returns a normalized formula without re-running semantic validation; update
parse_metavariable_pattern_clause to validate the decorated/normalized formula
(use validate_formula_semantics or the same validator used for top-level rules)
on the DecoratedFormula produced by normalize_v2_formula (or on its inner
MatchFormula) and return an Err(DiagnosticReport) when validation fails before
constructing WhereClause::MetavariablePattern; also add a unit/behavioral test
that parses a where.metavariable containing an any with a not and another that
is an all-negative conjunction to assert the parser rejects
ESempaiInvalidNotInOr and ESempaiMissingPositiveTermInAnd cases.
In `@docs/complexity-antipatterns-and-refactoring-strategies.md`:
- Around line 841-843: The footnote [^13] contains malformed punctuation "How to
*resist* refactoring a large spaghetti codebase? :" — fix it by removing the
extra separator so the title reads cleanly (e.g., change "How to *resist*
refactoring a large spaghetti codebase? :" to "How to *resist* refactoring a
large spaghetti codebase?" in the footnote text), updating the footnote entry
that contains that exact string.
---
Duplicate comments:
In `@docs/complexity-antipatterns-and-refactoring-strategies.md`:
- Around line 579-598: Update the misplaced footnote references for the
"Structural pattern matching" paragraph by remapping the inline citations [^19],
[^20], and [^21] to sources that actually support pattern matching and
destructuring (replace the current CQRS/God-class/Ravioli citations); ensure
each specific claim about match-case, destructuring benefits, and example
comparisons points to the correct corresponding footnote. Also scan the other
referenced ranges (lines ~655-667 and ~856-878) and realign any cross-wired
footnotes the same way so that every assertion cites an appropriate source,
using the same footnote identifiers ([^19], [^20], [^21], etc.) consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: bfa4ba8f-16c7-41af-a1d1-1f0bb59ce767
📒 Files selected for processing (4)
crates/sempai/src/normalize.rscrates/sempai/src/tests/normalization_tests.rscrates/sempai/src/where_clause.rsdocs/complexity-antipatterns-and-refactoring-strategies.md
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph. If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced. crates/sempai/src/tests/normalization_tests.rs Comment on file fn v2_where_focus_is_parsed_correctly() {
// Verify that the normalized rule has an Atom formula
let result = normalize_fixture("v2_where_focus.yaml");
let rules = result.expect("should parse v2 with focus where clause");
assert_eq!(rules.len(), 1);
let first_rule = rules.first().expect("should have at least one rule");
match &first_rule.formula {
Formula::Atom(_) => {}
_ => panic!("expected Atom formula"),
}
// Verify that the where clause is correctly parsed by obtaining the DecoratedFormula
let decorated = parse_v2_fixture_to_decorated("v2_where_focus.yaml");
assert_eq!(
decorated.where_clauses.len(),
1,
"expected exactly one where clause"
);
let first_where = decorated
.where_clauses
.first()
.expect("expected at least one where clause");
match first_where {
WhereClause::Focus { metavariable } => {
assert_eq!(metavariable, "FUNC", "expected focus on metavariable $FUNC");
}
_ => panic!("expected Focus where clause, got {first_where:?}"),
}
}❌ New issue: Code Duplication |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/sempai/src/where_clause.rs Comment on file pub(crate) fn parse_metavariable_clause(
value: &serde_json::Value,
normalize_v2_formula: &mut dyn FnMut(
&MatchFormula,
) -> Result<DecoratedFormula, DiagnosticReport>,
) -> Result<WhereClause, DiagnosticReport> {
let mapping = value.as_object().ok_or_else(|| {
DiagnosticReport::single_error(
DiagnosticCode::ESempaiSchemaInvalid,
"metavariable clause must be an object".to_owned(),
None,
vec![],
)
})?;
let metavariable = extract_metavariable_name(mapping)?;
// Detect mixed metavariable constraints (e.g., pattern + type, regex + analyzer)
let has_pattern_or_regex = METAVARIABLE_PATTERN_KEYS
.iter()
.any(|&k| mapping.contains_key(k))
|| mapping.contains_key("regex");
let has_type_constraint = mapping.contains_key("type") || mapping.contains_key("types");
let has_analyzer = mapping.contains_key("analyzer");
if has_pattern_or_regex && (has_type_constraint || has_analyzer) {
return Err(DiagnosticReport::single_error(
DiagnosticCode::ESempaiSchemaInvalid,
"metavariable clause cannot mix pattern/regex with type/types or analyzer".to_owned(),
None,
vec!["use either pattern/regex OR type/types/analyzer, not both".to_owned()],
));
}
if METAVARIABLE_PATTERN_KEYS
.iter()
.any(|&k| mapping.contains_key(k))
{
parse_metavariable_pattern_clause(metavariable, mapping, normalize_v2_formula)
} else if let Some(regex_value) = mapping.get("regex") {
parse_metavariable_regex_clause(metavariable, regex_value)
} else if has_type_constraint {
Err(DiagnosticReport::not_implemented(
"metavariable type constraint (type/types)",
))
} else if has_analyzer {
Err(DiagnosticReport::not_implemented(
"metavariable analysis constraint (analyzer)",
))
} else {
Err(DiagnosticReport::single_error(
DiagnosticCode::ESempaiSchemaInvalid,
"metavariable clause must have pattern, regex, type, types, or analyzer".to_owned(),
None,
vec![],
))
}
}❌ New issue: Complex Method |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
…ate tests normalize.rs: - Extract MetavariableConstraintKind enum for constraint classification - Extract classify_metavariable_constraint helper function with mixed constraint validation (pattern/regex + type/analyzer rejection) - Extract extract_metavariable_name helper function - Extract parse_metavariable_pattern_clause helper function - Extract parse_metavariable_regex_clause helper function - Reduce cyclomatic complexity of parse_metavariable_clause from 11 to 5 - Make normalize_v2_formula pub(crate) for test access normalization_tests.rs: - Add parse_v2_fixture_to_decorated helper to access DecoratedFormula - Add get_single_where_clause helper to eliminate test duplication - Simplify v2_where_focus_is_parsed_correctly using helper - Simplify v2_where_metavariable_regex_is_parsed_correctly using helper - Simplify v2_where_metavariable_pattern_is_parsed_correctly using helper Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai/src/tests/normalization_tests.rs`:
- Around line 39-43: In parse_v2_fixture_to_decorated, remove the manual
fs::read_to_string and unwrap_or_else logic and call read_fixture(filename) to
obtain the YAML string, retaining the existing fixtures_dir().join(filename)
only to build uri via path.to_str().map(String::from); then pass the YAML and
uri.as_deref() into parse_rule_file as before (functions referenced:
parse_v2_fixture_to_decorated, read_fixture, fixtures_dir, parse_rule_file).
- Around line 57-64: Update the spelling in the get_single_where_clause helper:
change the doc comment words "Normalises" and "normalise" to "Normalizes" and
"normalize" respectively, and update the panic message string from "failed to
normalise" to "failed to normalize" (the panic is produced in the call to
normalize_fixture and the function returns a WhereClause); ensure only these
textual occurrences around get_single_where_clause, WhereClause, and
normalize_fixture are updated to match Oxford “-ize” spelling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e760db7f-57a9-4ea8-add7-510e17420be3
📒 Files selected for processing (1)
crates/sempai/src/tests/normalization_tests.rs
Documentation fixes: - Fix malformed punctuation in footnote [^13] (remove extra ":") - Realign misplaced footnotes [^19]-[^21] in pattern matching section - Fix footnotes [^20], [^21], [^23], [^24] in declarative programming section - Fix footnotes [^19], [^21], [^23], [^24] in Ravioli code section - Fix footnote [^17]→[^20] in God class section - Fix footnote numbering in weaver-design.md (34 unique sequential footnotes) Code fixes: - parse_metavariable_clause: add field validation with allowed_keys per branch - parse_metavariable_pattern_clause: add semantic validation via validate_formula_semantics - parse_v2_fixture_to_decorated: use read_fixture() helper - get_single_where_clause: use Oxford "-ize" spelling Test additions: - v2_where_metavariable_invalid_not_in_or_fails_with_semantic_error - v2_where_metavariable_invalid_all_negative_fails_with_semantic_error - Add test fixtures for invalid metavariable patterns Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix cargo fmt issues in normalization_tests.rs and where_clause.rs - Fix markdownlint issues in weaver-design.md (MD012, MD047) - Add #[expect(clippy::too_many_lines)] to parse_metavariable_clause Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/sempai/src/where_clause.rs Comment on file let has_type = mapping.contains_key("type") || mapping.contains_key("types");
let has_analyzer = mapping.contains_key("analyzer");
if (has_pattern || has_regex) && (has_type || has_analyzer) {❌ New issue: Complex Conditional |
This comment was marked as resolved.
This comment was marked as resolved.
Lift sub-expressions in the complex conditional: - has_match_constraint = has_pattern || has_regex - has_type_or_analyzer = has_type || has_analyzer Makes the mixed constraint check more readable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
docs/weaver-design.md (1)
61-2594:⚠️ Potential issue | 🟠 MajorFix the footnote mapping to eliminate reuse of labels for different topics.
The footnote labels are still reused for different topics across sections, producing mismatched definitions. For example,
[^3]is cited at line 112 for Serena MCP clients and again at line 690 for Tree-sitter code slicing, but line 2564 defines[^3]as "Serena Model Context Protocol (MCP) Server as a stateful intermediary for LLM integration." This violates the GitHub-flavoured Markdown footnote contract, which allows only one definition per label.Scan the entire document body from top to bottom and collect every in-text citation
[^N]in the order it first appears, along with its surrounding prose context. Assign a fresh sequential footnote number to each unique source or topic, strictly in order of first appearance. Two citations pointing to the same source may share a number; citations pointing to different sources must receive different numbers even if they previously shared one. Replace every in-text[^N]marker throughout the document with the newly assigned number for that source. Rewrite the footnote definitions block at lines 2562–2594 so that each[^N]:definition uses the newly assigned number, appears in ascending numeric order, and contains a source-accurate description that matches what the citing sentence actually references. Verify that no label is defined more than once and that every label used in the body has exactly one corresponding definition.Triage:
[type:syntax/md][type:docstyle]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `docs/weaver-design.md` across the entire document (lines 61–2594), the footnote labels `[^1]` through `[^34]` are reused for different topics across sections, producing mismatched definitions. Fix this as follows: 1. Scan the entire document body from top to bottom and collect every in-text citation `[^N]` in the order it first appears, along with its surrounding prose context (the sentence or clause it annotates). 2. Assign a fresh sequential footnote number to each unique source/topic, strictly in order of first appearance. Two citations pointing to the same source may share a number; citations pointing to different sources must receive different numbers even if they previously shared one. 3. Replace every in-text `[^N]` marker throughout the document with the newly assigned number for that source. 4. Rewrite the footnote definitions block at lines 2562–2594 so that each `[^N]:` definition: - uses the newly assigned number, - appears in ascending numeric order, - contains a source-accurate description that matches what the citing sentence actually references (e.g., if the citation is for Tree-sitter enabling code slicing, the definition should describe that Tree-sitter capability, not Serena MCP or LSP). 5. Verify that no label is defined more than once, and that every label used in the body has exactly one corresponding definition. 6. Do not change any prose other than the footnote markers themselves and the definitions block.As per coding guidelines, "Use GitHub-flavoured numeric footnotes referenced as
[^1]in Markdown documentation", "Number footnotes in order of appearance", and keep documentation accurate and current.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/weaver-design.md` around lines 61 - 2594, The document reuses numeric footnote labels (the in-text markers `[^N]`) for different sources, causing mismatched definitions; scan the entire Markdown body sequentially, collect each in-text citation `[^N]` with its sentence context, then assign a fresh sequential numeric footnote to every unique source/topic in order of first appearance (allowing shared numbers only when citations truly reference the same source); replace every in-text `[^N]` marker with the newly assigned numbers throughout the file and rewrite the footnote definitions block so each `[^N]:` appears exactly once, in ascending order, and contains the source-accurate description matching the citing sentence (ensure no duplicate labels and that every used label has one definition).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai/src/tests/normalization_tests.rs`:
- Around line 179-191: The test was ignored because validate_positive_terms
doesn't yet implement the metavariable-pattern exception; update
validate_positive_terms to detect when validation is running inside a
metavariable-pattern context and allow conjunctions (patterns/all) with zero
positive terms in that case while still emitting MissingPositiveTermInAnd for
all other contexts; then unignore the test (remove #[ignore]) in
normalization_tests.rs so
normalize_fixture("valid_metavariable_pattern_exception.yaml") returns Ok as
expected. Ensure the detection uses the same context flag/type used by
parsing/normalization so other validators can rely on the same
metavariable-pattern scope.
In `@crates/sempai/src/where_clause.rs`:
- Around line 1-421: The module exceeds the 400-line limit; split out the large
metavariable and match-formula parsing into a new submodule. Create a new file
(e.g. where_clause_parts.rs) and move parse_metavariable_clause,
build_match_formula_from_mapping, parse_match_formula_array,
parse_nested_match_formula, parse_metavariable_pattern_clause,
parse_metavariable_regex_clause, extract_metavariable_name, and
METAVARIABLE_PATTERN_KEYS (and any tightly-coupled helpers like
strip_dollar_prefix if needed) into it, keeping their signatures intact; then
add a mod declaration and public or crate-level re-exports (use) so the original
where_clause.rs references the moved functions unchanged (update imports from
crate::where_clause_parts::{...} if necessary) and run tests to ensure no symbol
changes.
- Around line 296-384: The parser build_match_formula_from_mapping currently
ignores decoration keys (where/as/fix) on nested object formulas and drops them;
update it to detect decoration metadata (e.g., "where", "as", "fix", or whatever
keys normalize_v2_formula uses), parse the inner operator as before (using
parse_nested_match_formula / parse_match_formula_array), and if any decoration
keys are present wrap the resulting MatchFormula in MatchFormula::Decorated
(preserving the decoration data) instead of discarding them; ensure the same
logic is applied for the branches that call parse_nested_match_formula and
parse_match_formula_array (including the "inside"/"anywhere"/"not"/"all"/"any"
cases) so nested decorated formulas are preserved and passed through to
normalize_v2_formula.
- Around line 65-66: The code currently uses .expect() on mapping.iter().next()
in where_clause.rs which will panic when mapping is empty; replace that panic
with safe pattern matching and error propagation: change the line using
#[expect(...)] let (key, inner_value) = mapping.iter().next().expect(...) to a
let Some(((key, inner_value))) = mapping.iter().next() else { return Err(/*
construct a suitable error indicating empty mapping in the context of the
surrounding function */) }; or an equivalent match that returns a Result::Err
when mapping.len() == 0; ensure you use the surrounding function's error type
when returning the error and keep the existing logic for the mapping.len() > 1
validation.
---
Duplicate comments:
In `@docs/weaver-design.md`:
- Around line 61-2594: The document reuses numeric footnote labels (the in-text
markers `[^N]`) for different sources, causing mismatched definitions; scan the
entire Markdown body sequentially, collect each in-text citation `[^N]` with its
sentence context, then assign a fresh sequential numeric footnote to every
unique source/topic in order of first appearance (allowing shared numbers only
when citations truly reference the same source); replace every in-text `[^N]`
marker with the newly assigned numbers throughout the file and rewrite the
footnote definitions block so each `[^N]:` appears exactly once, in ascending
order, and contains the source-accurate description matching the citing sentence
(ensure no duplicate labels and that every used label has one definition).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 87409b3f-2b72-442a-bcb6-d9ece98f5b70
📒 Files selected for processing (6)
crates/sempai/src/tests/normalization_tests.rscrates/sempai/src/where_clause.rscrates/sempai/tests/fixtures/normalization/v2_where_metavariable_invalid_all_negative.yamlcrates/sempai/tests/fixtures/normalization/v2_where_metavariable_invalid_not_in_or.yamldocs/complexity-antipatterns-and-refactoring-strategies.mddocs/weaver-design.md
| #[expect(clippy::expect_used, reason = "length checked above")] | ||
| let (key, inner_value) = mapping.iter().next().expect("checked len == 1"); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
head -n 80 crates/sempai/src/where_clause.rs | tail -n 30Repository: leynos/weaver
Length of output: 1109
🏁 Script executed:
rg -n '\.expect\(' crates/sempai/src/where_clause.rs -A 2Repository: leynos/weaver
Length of output: 170
🏁 Script executed:
rg -n '#\[expect\(clippy::expect_used' crates/sempai/src/where_clause.rs -B 2 -A 2Repository: leynos/weaver
Length of output: 237
🏁 Script executed:
wc -l crates/sempai/src/where_clause.rsRepository: leynos/weaver
Length of output: 94
Remove .expect() from production code and handle the empty mapping case.
Replace this panic path with a let Some(..) else block or exhaustive match. Note that the current validation only checks mapping.len() > 1; it does not validate the len() == 0 case, which would panic. Ensure both cases are properly handled with error propagation:
Current code (lines 65–66)
#[expect(clippy::expect_used, reason = "length checked above")]
let (key, inner_value) = mapping.iter().next().expect("checked len == 1");Per coding guidelines, .expect() and .unwrap() are forbidden outside tests. Errors must be propagated via Result.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sempai/src/where_clause.rs` around lines 65 - 66, The code currently
uses .expect() on mapping.iter().next() in where_clause.rs which will panic when
mapping is empty; replace that panic with safe pattern matching and error
propagation: change the line using #[expect(...)] let (key, inner_value) =
mapping.iter().next().expect(...) to a let Some(((key, inner_value))) =
mapping.iter().next() else { return Err(/* construct a suitable error indicating
empty mapping in the context of the surrounding function */) }; or an equivalent
match that returns a Result::Err when mapping.len() == 0; ensure you use the
surrounding function's error type when returning the error and keep the existing
logic for the mapping.len() > 1 validation.
There was a problem hiding this comment.
@coderabbitai Has this now been resolved in the latest commit?
Use codegraph analysis to determine your answer.
If this comment is now resolved, please mark it as such using the API. Otherwise, please provide an AI agent prompt for the remaining work to be done to address this comment.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check current state of the relevant lines in where_clause.rs
sed -n '55,80p' crates/sempai/src/where_clause.rs
Length of output: 76
🏁 Script executed:
#!/bin/bash
# Find the file first
fd where_clause.rs
Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check which branch/commit we're on and repo structure
git log --oneline -5
git status --short | head -20
tree crates/sempai/src/ 2>/dev/null || ls -la crates/sempai/src/ 2>/dev/null || echo "path not found"
Length of output: 651
🏁 Script executed:
#!/bin/bash
# Check for expect usage in the where_clause module
echo "=== where_clause/mod.rs ==="
cat crates/sempai/src/where_clause/mod.rs
echo ""
echo "=== where_clause/metavariable.rs ==="
cat crates/sempai/src/where_clause/metavariable.rs
Length of output: 17136
The fix is in place. Now I'll resolve the review comment via the API.
🏁 Script executed:
#!/bin/bash
# Find and resolve the review comment
gh api graphql -f query='
{
repository(owner: "leynos", name: "weaver") {
pullRequest(number: 99) {
reviewThreads(first: 50) {
nodes {
id
isResolved
comments(first: 1) {
nodes {
body
author { login }
}
}
}
}
}
}
}' | python3 -c "
import json, sys
data = json.load(sys.stdin)
threads = data['data']['repository']['pullRequest']['reviewThreads']['nodes']
for t in threads:
comment = t['comments']['nodes'][0] if t['comments']['nodes'] else {}
author = comment.get('author', {}).get('login', '')
body = comment.get('body', '')
if author == 'coderabbitai' and 'expect_used' in body and not t['isResolved']:
print(f\"Thread ID: {t['id']}, resolved: {t['isResolved']}\")
"
Length of output: 0
🏁 Script executed:
#!/bin/bash
# List all review threads - check resolved status broadly
gh api graphql -f query='
{
repository(owner: "leynos", name: "weaver") {
pullRequest(number: 99) {
reviewThreads(first: 50) {
nodes {
id
isResolved
comments(first: 1) {
nodes {
databaseId
body
author { login }
}
}
}
}
}
}
}' | python3 -c "
import json, sys
data = json.load(sys.stdin)
threads = data['data']['repository']['pullRequest']['reviewThreads']['nodes']
for t in threads:
comment = t['comments']['nodes'][0] if t['comments']['nodes'] else {}
author = comment.get('author', {}).get('login', '')
body = comment.get('body', '')[:80].replace('\n', ' ')
print(f\"ID={t['id']} resolved={t['isResolved']} author={author} body={body!r}\")
"
Length of output: 7416
@leynos, yes — this is fully resolved in the latest commits. Here is what changed:
Code fix confirmed: The #[expect(clippy::expect_used, reason = "length checked above")] lint suppression and the .expect("checked len == 1") call have been removed entirely. They are replaced with a safe let … else pattern that properly propagates an error:
let Some((key, inner_value)) = mapping.iter().next() else {
return Err(DiagnosticReport::single_error(
DiagnosticCode::ESempaiSchemaInvalid,
"where clause mapping is unexpectedly empty".to_owned(),
None,
vec![],
));
};Module decomposition: where_clause.rs has additionally been decomposed into a where_clause/ module (mod.rs + metavariable.rs), consistent with the project's file-size guidelines. No .expect() calls remain in any production code path in either file — only safe .ok_or_else() chains and one strip_dollar_prefix that uses .unwrap_or() (not .expect()/.unwrap()).
The review thread is already marked as resolved on GitHub.
🧠 Learnings used
Learnt from: leynos
Repo: leynos/weaver PR: 41
File: crates/weaver-e2e/src/pyrefly.rs:42-45
Timestamp: 2025-12-27T04:00:30.560Z
Learning: Guideline: In a nightly Rust toolchain (e.g., nightly-2025-09-16) where const_mut_refs is available and missing_const_for_fn is set to deny, mark functions as const whenever they can be const, including methods with &mut self. This improves const-evaluation capabilities and consistency across the codebase. Apply this across Rust source files (all .rs) in the repository, not just a single file, and especially for methods that do not mutate internal state or rely on non-const operations. If a method cannot be const (due to mutation, non-const calls, or unsupported features), keep it non-const and document why.
Learnt from: leynos
Repo: leynos/weaver PR: 48
File: crates/weaver-cli/src/lifecycle/utils.rs:60-75
Timestamp: 2026-01-08T20:18:00.597Z
Learning: In Rust, when re-exporting an item (pub use … or pub(crate) use …), ensure the original item's visibility is at least as broad as the re-export. Do not widen visibility (e.g., a pub(super) item cannot be re-exported as pub(crate)). Review re-exports to keep visibility levels consistent with the original declaration across Rust files.
Learnt from: leynos
Repo: leynos/weaver PR: 55
File: crates/weaver-lsp-host/src/adapter/error.rs:69-78
Timestamp: 2026-01-22T11:05:27.489Z
Learning: In Rust, const fn can only use const-compatible types in stable Rust (primitives and simple aggregates without destructors). Types with heap allocations or destructors (e.g., String, Vec, Box) cannot be used in const contexts. Do not mark a function as const if it constructs or moves fields of non-const-compatible types. Apply this guideline to Rust source files across the repository.
Learnt from: leynos
Repo: leynos/weaver PR: 57
File: crates/weaverd/src/safety_harness/transaction.rs:66-95
Timestamp: 2026-02-07T01:56:58.071Z
Learning: In Rust, PathBuf::as_path() is not a const fn (as of Rust 1.91+). Do not write const fn methods that call as_path(); instead design APIs to take &Path or &OsStr, or use non-const code paths for conversions. If a const context is required, avoid as_path() and rely on const-friendly alternatives (e.g., pass &Path, use PathBuf methods that are const, or accept string literals/byte slices that don’t require as_path()).
Learnt from: leynos
Repo: leynos/weaver PR: 58
File: crates/weaver-plugins/src/runner/mod.rs:101-118
Timestamp: 2026-02-10T16:30:41.357Z
Learning: When reviewing Rust code, be aware that as of Rust 1.83+ (including 1.85 and Edition 2024), const fn may return mutable references (for example, const fn foo(&mut self) -> &mut T) and can use mutable references in const contexts. Do not rely on mutable references becoming part of a constant’s final value; ensure such patterns are still logically correct for const contexts. Use this guidance broadly for all Rust files in the repository to avoid incorrect assumptions about const evaluation behavior.
Learnt from: leynos
Repo: leynos/weaver PR: 59
File: crates/weaver-e2e/tests/refactor_rope_cli_snapshots.rs:203-211
Timestamp: 2026-02-16T21:57:26.358Z
Learning: In a workspace configured with let_underscore_must_use = "deny", discarding a Result via let _ = is a compile-time error. When you intentionally ignore non-critical errors (for example, diagnostic writes to stderr in tests), prefer consuming the Result with .ok() rather than assigning to _ or ignoring it entirely. This guideline applies to Rust code across the repository (all .rs files), especially in test code paths where ignoring errors could mask issues.
Learnt from: leynos
Repo: leynos/weaver PR: 84
File: crates/weaverd/src/dispatch/observe/enrich.rs:0-0
Timestamp: 2026-03-23T09:44:08.026Z
Learning: When matching on an enum imported from another crate that is annotated with `#[non_exhaustive]`, Rust will require a catch-all arm (e.g., `_ => ...`) to satisfy exhaustiveness. In review, do not flag that wildcard arm as unnecessary or suggest its removal when the match is over a `#[non_exhaustive]` external enum (since the behavior is mandated by Rust’s exhaustiveness rules).
Learnt from: leynos
Repo: leynos/weaver PR: 99
File: crates/sempai/src/normalize.rs:20-26
Timestamp: 2026-04-07T23:49:14.199Z
Learning: In this repository (leynos/weaver), follow en-GB-oxendict spelling rules: prefer Oxford-dict “-ize/-yse/-our” conventions. Therefore “-ize” forms such as “normalize/normalizes/normalized” and “recognize/organize” are considered correct and must not be flagged as spelling errors or suggested to change to “-ise”. Only treat “-ise” spellings as incorrect when the corresponding Oxford “-ize” form applies. Do not recommend changing “-ize” to “-ise” in Rust source comments, docs, or documentation files.
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Complex Methodcrates/sempai/src/where_clause.rs: parse_metavariable_clause What lead to degradation?parse_metavariable_clause has a cyclomatic complexity of 17, threshold = 9 Why does this problem occur?A Complex Method has a high cyclomatic complexity. The recommended threshold for the Rust language is a cyclomatic complexity lower than 9. How to fix it?There are many reasons for Complex Method. Sometimes, another design approach is beneficial such as a) modeling state using an explicit state machine rather than conditionals, or b) using table lookup rather than long chains of logic. In other scenarios, the function can be split using EXTRACT FUNCTION. Just make sure you extract natural and cohesive functions. Complex Methods can also be addressed by identifying complex conditional expressions and then using the DECOMPOSE CONDITIONAL refactoring. Helpful refactoring examplesTo get a general understanding of what this code health issue looks like - and how it might be addressed - we have prepared some diffs for illustrative purposes. SAMPLE# complex_method.js
function postItem(item) {
if (!item.id) {
- if (item.x != null && item.y != null) {
- post(item);
- } else {
- throw Error("Item must have x and y");
- }
+ // extract a separate function for creating new item
+ postNew(item);
} else {
- if (item.x < 10 && item.y > 25) {
- put(item);
- } else {
- throw Error("Item must have an x and y value between 10 and 25");
- }
+ // and one for updating existing items
+ updateItem(item);
}
}
+
+function postNew(item) {
+ validateNew(item);
+ post(item);
+}
+
+function updateItem(item) {
+ validateUpdate(item);
+ put(item);
+}
+ |
This comment was marked as resolved.
This comment was marked as resolved.
- Add ValidationContext enum to distinguish Default vs MetavariablePattern contexts - Update validate_formula_semantics and validate_positive_terms to accept context - Use MetavariablePattern context in parse_metavariable_pattern_clause to allow conjunctions without positive terms - Update valid_metavariable_pattern_exception.yaml to use v2 syntax - Remove v2_where_metavariable_invalid_all_negative fixture and test (all-negative conjunctions ARE valid in metavariable-pattern context per spec) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Split where_clause.rs into where_clause/mod.rs and where_clause/metavariable.rs to satisfy clippy::self-named-module-files lint - Add field validation to parse_metavariable_clause with proper error handling - Add semantic validation in parse_metavariable_pattern_clause using validate_formula_semantics - Update build_match_formula_from_mapping to handle decoration keys where/as/fix - Fix weaver-design.md footnote numbering: assign unique sequential numbers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract validation/classification logic into two private helpers and a private enum to reduce cyclomatic complexity from 17 to <9: - Add MetavariableConstraintKind enum for constraint classification - Add check_no_unexpected_fields helper for field validation - Add classify_and_validate_metavariable_constraint helper - Simplify parse_metavariable_clause body to use match on kind - Remove #[expect(clippy::too_many_lines)] attribute Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
docs/weaver-design.md (1)
2562-2596:⚠️ Potential issue | 🟠 MajorRestore source-accurate citation mapping for numeric footnotes.
Rebuild the footnote map so each
[^n]definition matches the source used at its first in-text occurrence. The current mapping is incorrect (for example, Line 1207 cites LSP standard as[^12], but[^12]is defined as call-graph fusion; Line 1407 citesjedias[^23], but[^23]is defined ascallgrind).Triage:
[type:syntax/md][type:docstyle]As per coding guidelines, “Use GitHub-flavoured numeric footnotes referenced as
[^1]in documentation”, “Number footnotes in order of appearance in documentation”, and keep documentation accurate and current.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/weaver-design.md` around lines 2562 - 2596, The numeric footnote definitions (e.g., markers like `[^1]`, `[^12]`, `[^23]`) are out of order and do not match their first in-text occurrences (e.g., the LSP reference and jedi citation); rebuild the footnote map so each `[^n]` definition is reordered to correspond to the source first cited in the document, renumber all definitions accordingly, update any inline usages to the new numbers, and verify examples like "LSP"/"Language Server Protocol", "jedi", "call-graph fusion"/"Call Graph Fusion Engine" now point to the correct `[^n]` entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai/src/normalize.rs`:
- Around line 141-147: The helper normalize_legacy_clause still returns
Result<Option<DecoratedFormula>, DiagnosticReport> but no longer produces
Ok(None); change its signature to Result<DecoratedFormula, DiagnosticReport> and
remove the Option layer, then simplify callers (e.g. the LegacyFormula::Patterns
arm building Formula::And) to collect directly: replace the
clauses.iter().map(normalize_legacy_clause).filter_map(Result::transpose).collect()
dance with clauses.iter().map(normalize_legacy_clause).collect::<Result<Vec<_>,
_>>() (or equivalent) and pass children? into Formula::And; apply the same
signature and caller simplification to the other occurrence mentioned (around
the 196-213 region).
In `@crates/sempai/src/tests/normalization_tests.rs`:
- Around line 140-176: Collapse the four duplicated tests into one parameterized
rstest that iterates over (fixture, expected_code); replace the functions
invalid_not_in_or_legacy_fails_with_semantic_error,
invalid_not_in_or_v2_fails_with_semantic_error,
invalid_no_positive_term_legacy_fails_with_semantic_error, and
invalid_no_positive_term_v2_fails_with_semantic_error with a single #[rstest]
test that for each tuple calls normalize_fixture(fixture).expect_err(...), gets
the first diagnostic, and asserts first.code() == expected_code using the
DiagnosticCode variants ESempaiInvalidNotInOr and
ESempaiMissingPositiveTermInAnd as the expected values. Ensure you import rstest
and use the same diagnostic extraction logic
(err.diagnostics().first().expect(...)) in the body.
In `@crates/sempai/src/validate.rs`:
- Around line 5-22: Add Rustdoc examples to the public validation API by
documenting ValidationContext, validate_formula_semantics,
validate_invalid_not_in_or, and validate_positive_terms with short runnable
examples that show one valid case and one failing diagnostic for each item;
place the examples in the triple-slash docs above each public item (use Rust
doctest blocks — for failing cases use compile_fail or assert matches on the
returned diagnostic/Error so examples remain runnable) and demonstrate calling
the actual functions (e.g., validate_formula_semantics(...),
validate_invalid_not_in_or(...), validate_positive_terms(...)) and inspecting
their success (Ok/empty diagnostics) and failure (Err/diagnostic contents) so
cargo test/doc will exercise them.
In `@crates/sempai/src/where_clause/metavariable.rs`:
- Around line 225-335: The function build_match_formula_from_mapping is doing
too many things and suppresses the clippy lint; split it into small helpers:
implement find_single_operator(mapping) -> Result<&str, DiagnosticReport> to
detect/validate exactly one operator (use the existing operator_keys list),
implement build_core_formula_from_operator(key, value) -> Result<MatchFormula,
DiagnosticReport> which contains the match arms for "pattern", "regex", "all",
"any", "not", "inside", "anywhere" and uses parse_match_formula_array /
parse_nested_match_formula as before, and implement
wrap_with_decorations(core_formula, mapping) -> MatchFormula to apply the
existing extract_decorations result (where_clauses, as_name, fix) when needed;
then remove the #[expect(clippy::too_many_lines)] on
build_match_formula_from_mapping and refactor that function to call
find_single_operator, lookup mapping.get(key), call
build_core_formula_from_operator, and finally call wrap_with_decorations before
returning. Use the existing symbol names (build_match_formula_from_mapping,
extract_decorations, parse_match_formula_array, parse_nested_match_formula,
MatchFormula) so diagnostics and error messages remain unchanged.
- Around line 369-371: strip_dollar_prefix currently normalises "$" and "" into
an empty name which lets malformed input slip through; change
strip_dollar_prefix to validate and reject empty identifiers by returning an
error (ESempaiSchemaInvalid) instead of returning an empty String, and update
its callers (extract_metavariable_name in this file and parse_focus_clause in
crates/sempai/src/where_clause/mod.rs) to propagate the Result/error;
specifically, make strip_dollar_prefix return Result<String,
ESempaiSchemaInvalid> (or the crate's error type), produce ESempaiSchemaInvalid
when mv == "" or mv == "$", and adjust extract_metavariable_name and
parse_focus_clause to handle/return the error rather than assuming a non-empty
name.
---
Duplicate comments:
In `@docs/weaver-design.md`:
- Around line 2562-2596: The numeric footnote definitions (e.g., markers like
`[^1]`, `[^12]`, `[^23]`) are out of order and do not match their first in-text
occurrences (e.g., the LSP reference and jedi citation); rebuild the footnote
map so each `[^n]` definition is reordered to correspond to the source first
cited in the document, renumber all definitions accordingly, update any inline
usages to the new numbers, and verify examples like "LSP"/"Language Server
Protocol", "jedi", "call-graph fusion"/"Call Graph Fusion Engine" now point to
the correct `[^n]` entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: dd39a6ad-589a-470e-b0f7-ad28d21ee450
📒 Files selected for processing (7)
crates/sempai/src/normalize.rscrates/sempai/src/tests/normalization_tests.rscrates/sempai/src/validate.rscrates/sempai/src/where_clause/metavariable.rscrates/sempai/src/where_clause/mod.rscrates/sempai/tests/fixtures/normalization/valid_metavariable_pattern_exception.yamldocs/weaver-design.md
| LegacyFormula::Patterns(clauses) => { | ||
| let children: Result<Vec<_>, _> = clauses | ||
| .iter() | ||
| .map(normalize_legacy_clause) | ||
| .filter_map(Result::transpose) | ||
| .collect(); | ||
| Ok(Formula::And(children?)) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Drop the dead Option layer from normalize_legacy_clause.
Return Result<DecoratedFormula, DiagnosticReport> directly. This helper no longer has an Ok(None) branch, so the optional return type and the caller's filter_map(Result::transpose) path are now just incidental complexity.
Based on learnings: "Clarity over cleverness. Be concise, but favour explicit over terse or obscure idioms. Prefer code that's easy to follow."
Also applies to: 196-213
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sempai/src/normalize.rs` around lines 141 - 147, The helper
normalize_legacy_clause still returns Result<Option<DecoratedFormula>,
DiagnosticReport> but no longer produces Ok(None); change its signature to
Result<DecoratedFormula, DiagnosticReport> and remove the Option layer, then
simplify callers (e.g. the LegacyFormula::Patterns arm building Formula::And) to
collect directly: replace the
clauses.iter().map(normalize_legacy_clause).filter_map(Result::transpose).collect()
dance with clauses.iter().map(normalize_legacy_clause).collect::<Result<Vec<_>,
_>>() (or equivalent) and pass children? into Formula::And; apply the same
signature and caller simplification to the other occurrence mentioned (around
the 196-213 region).
| #[test] | ||
| fn invalid_not_in_or_legacy_fails_with_semantic_error() { | ||
| let result = normalize_fixture("invalid_not_in_or_legacy.yaml"); | ||
| let err = result.expect_err("should fail with InvalidNotInOr"); | ||
| let first = err.diagnostics().first().expect("should have diagnostic"); | ||
| assert_eq!(first.code(), DiagnosticCode::ESempaiInvalidNotInOr); | ||
| } | ||
|
|
||
| #[test] | ||
| fn invalid_not_in_or_v2_fails_with_semantic_error() { | ||
| let result = normalize_fixture("invalid_not_in_or_v2.yaml"); | ||
| let err = result.expect_err("should fail with InvalidNotInOr"); | ||
| let first = err.diagnostics().first().expect("should have diagnostic"); | ||
| assert_eq!(first.code(), DiagnosticCode::ESempaiInvalidNotInOr); | ||
| } | ||
|
|
||
| #[test] | ||
| fn invalid_no_positive_term_legacy_fails_with_semantic_error() { | ||
| let result = normalize_fixture("invalid_no_positive_term_legacy.yaml"); | ||
| let err = result.expect_err("should fail with MissingPositiveTermInAnd"); | ||
| let first = err.diagnostics().first().expect("should have diagnostic"); | ||
| assert_eq!( | ||
| first.code(), | ||
| DiagnosticCode::ESempaiMissingPositiveTermInAnd | ||
| ); | ||
| } | ||
|
|
||
| #[test] | ||
| fn invalid_no_positive_term_v2_fails_with_semantic_error() { | ||
| let result = normalize_fixture("invalid_no_positive_term_v2.yaml"); | ||
| let err = result.expect_err("should fail with MissingPositiveTermInAnd"); | ||
| let first = err.diagnostics().first().expect("should have diagnostic"); | ||
| assert_eq!( | ||
| first.code(), | ||
| DiagnosticCode::ESempaiMissingPositiveTermInAnd | ||
| ); | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Collapse the duplicated semantic-failure tests into #[rstest] cases.
Replace this block with one parameterised test over (fixture, expected_code). The four tests differ only in inputs and expected diagnostics, so the current copy-paste is now pure maintenance overhead.
As per coding guidelines: "**/*.rs: Replace duplicated tests with #[rstest(...)] parameterized cases." and "**/*.rs: Use rstest fixtures for shared setup."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sempai/src/tests/normalization_tests.rs` around lines 140 - 176,
Collapse the four duplicated tests into one parameterized rstest that iterates
over (fixture, expected_code); replace the functions
invalid_not_in_or_legacy_fails_with_semantic_error,
invalid_not_in_or_v2_fails_with_semantic_error,
invalid_no_positive_term_legacy_fails_with_semantic_error, and
invalid_no_positive_term_v2_fails_with_semantic_error with a single #[rstest]
test that for each tuple calls normalize_fixture(fixture).expect_err(...), gets
the first diagnostic, and asserts first.code() == expected_code using the
DiagnosticCode variants ESempaiInvalidNotInOr and
ESempaiMissingPositiveTermInAnd as the expected values. Ensure you import rstest
and use the same diagnostic extraction logic
(err.diagnostics().first().expect(...)) in the body.
| /// Validation context to indicate where validation is occurring. | ||
| #[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] | ||
| pub enum ValidationContext { | ||
| /// Standard validation for top-level formulas (default). | ||
| #[default] | ||
| Default, | ||
| /// Validation inside a metavariable-pattern context where conjunctions | ||
| /// without positive terms are allowed. | ||
| MetavariablePattern, | ||
| } | ||
|
|
||
| impl ValidationContext { | ||
| /// Returns true if this is a metavariable-pattern context. | ||
| #[must_use] | ||
| pub const fn is_metavariable_pattern(self) -> bool { | ||
| matches!(self, Self::MetavariablePattern) | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add Rustdoc examples to the public validation API.
Document ValidationContext, validate_formula_semantics, validate_invalid_not_in_or, and validate_positive_terms with short runnable examples that show both a valid formula and a failing diagnostic. The current docs explain the rules, but they do not satisfy the repository requirement for example-driven API documentation.
As per coding guidelines: "**/*.rs: Function documentation must include clear examples demonstrating the usage and outcome of the function." and "**/*.rs: Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc."
Also applies to: 48-117
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sempai/src/validate.rs` around lines 5 - 22, Add Rustdoc examples to
the public validation API by documenting ValidationContext,
validate_formula_semantics, validate_invalid_not_in_or, and
validate_positive_terms with short runnable examples that show one valid case
and one failing diagnostic for each item; place the examples in the triple-slash
docs above each public item (use Rust doctest blocks — for failing cases use
compile_fail or assert matches on the returned diagnostic/Error so examples
remain runnable) and demonstrate calling the actual functions (e.g.,
validate_formula_semantics(...), validate_invalid_not_in_or(...),
validate_positive_terms(...)) and inspecting their success (Ok/empty
diagnostics) and failure (Err/diagnostic contents) so cargo test/doc will
exercise them.
| #[expect( | ||
| clippy::too_many_lines, | ||
| reason = "formula construction requires multiple branches" | ||
| )] | ||
| pub(crate) fn build_match_formula_from_mapping( | ||
| mapping: &serde_json::Map<String, serde_json::Value>, | ||
| ) -> Result<MatchFormula, DiagnosticReport> { | ||
| // Extract decoration metadata first | ||
| let (where_clauses, as_name, fix) = extract_decorations(mapping); | ||
| let has_decorations = !where_clauses.is_empty() || as_name.is_some() || fix.is_some(); | ||
|
|
||
| // Check for exactly one operator (pattern, regex, all, any, not, inside, anywhere) | ||
| let operator_keys = [ | ||
| "pattern", "regex", "all", "any", "not", "inside", "anywhere", | ||
| ]; | ||
| let present_operators: Vec<_> = operator_keys | ||
| .iter() | ||
| .filter(|&&key| mapping.contains_key(key)) | ||
| .copied() | ||
| .collect(); | ||
|
|
||
| if present_operators.is_empty() { | ||
| return Err(DiagnosticReport::single_error( | ||
| DiagnosticCode::ESempaiSchemaInvalid, | ||
| "metavariable pattern must have one of: pattern, regex, all, any, not, inside, anywhere".to_owned(), | ||
| None, | ||
| vec![], | ||
| )); | ||
| } | ||
|
|
||
| if present_operators.len() > 1 { | ||
| return Err(DiagnosticReport::single_error( | ||
| DiagnosticCode::ESempaiSchemaInvalid, | ||
| format!( | ||
| "metavariable pattern defines multiple operators: {}", | ||
| present_operators.join(", ") | ||
| ), | ||
| None, | ||
| vec!["keep only one operator per pattern".to_owned()], | ||
| )); | ||
| } | ||
|
|
||
| let key = *present_operators.first().ok_or_else(|| { | ||
| DiagnosticReport::single_error( | ||
| DiagnosticCode::ESempaiSchemaInvalid, | ||
| "no operator found in match formula".to_owned(), | ||
| None, | ||
| vec![], | ||
| ) | ||
| })?; | ||
| let value = mapping.get(key).ok_or_else(|| { | ||
| DiagnosticReport::single_error( | ||
| DiagnosticCode::ESempaiSchemaInvalid, | ||
| format!("operator '{key}' was detected but value is missing"), | ||
| None, | ||
| vec![], | ||
| ) | ||
| })?; | ||
|
|
||
| let core_formula = match key { | ||
| "pattern" => { | ||
| let pattern = value.as_str().ok_or_else(|| { | ||
| DiagnosticReport::single_error( | ||
| DiagnosticCode::ESempaiSchemaInvalid, | ||
| "pattern must be a string".to_owned(), | ||
| None, | ||
| vec![], | ||
| ) | ||
| })?; | ||
| MatchFormula::PatternObject(pattern.to_owned()) | ||
| } | ||
| "regex" => { | ||
| let regex = value.as_str().ok_or_else(|| { | ||
| DiagnosticReport::single_error( | ||
| DiagnosticCode::ESempaiSchemaInvalid, | ||
| "regex must be a string".to_owned(), | ||
| None, | ||
| vec![], | ||
| ) | ||
| })?; | ||
| MatchFormula::Regex(regex.to_owned()) | ||
| } | ||
| "all" => parse_match_formula_array(value).map(MatchFormula::All)?, | ||
| "any" => parse_match_formula_array(value).map(MatchFormula::Any)?, | ||
| "not" => parse_nested_match_formula(value).map(|f| MatchFormula::Not(Box::new(f)))?, | ||
| "inside" => parse_nested_match_formula(value).map(|f| MatchFormula::Inside(Box::new(f)))?, | ||
| "anywhere" => { | ||
| parse_nested_match_formula(value).map(|f| MatchFormula::Anywhere(Box::new(f)))? | ||
| } | ||
| _ => { | ||
| return Err(DiagnosticReport::single_error( | ||
| DiagnosticCode::ESempaiSchemaInvalid, | ||
| format!("unexpected operator '{key}' in match formula"), | ||
| None, | ||
| vec![], | ||
| )); | ||
| } | ||
| }; | ||
|
|
||
| // Wrap in Decorated if decorations are present | ||
| if has_decorations { | ||
| Ok(MatchFormula::Decorated { | ||
| formula: Box::new(core_formula), | ||
| where_clauses, | ||
| as_name, | ||
| fix, | ||
| }) | ||
| } else { | ||
| Ok(core_formula) | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Split build_match_formula_from_mapping and drop the lint expectation.
Extract operator-specific builders and a small decoration wrapper. This function is still doing decoration extraction, operator discovery, branch parsing, and diagnostic shaping in one block, and #[expect(clippy::too_many_lines)] is hiding that complexity instead of removing it.
Based on learnings: "Functions must be small, clear in purpose, single responsibility, and obey command/query segregation." As per coding guidelines: "**/*.rs: Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS." and "**/*.rs: Lints must not be silenced except as a last resort."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sempai/src/where_clause/metavariable.rs` around lines 225 - 335, The
function build_match_formula_from_mapping is doing too many things and
suppresses the clippy lint; split it into small helpers: implement
find_single_operator(mapping) -> Result<&str, DiagnosticReport> to
detect/validate exactly one operator (use the existing operator_keys list),
implement build_core_formula_from_operator(key, value) -> Result<MatchFormula,
DiagnosticReport> which contains the match arms for "pattern", "regex", "all",
"any", "not", "inside", "anywhere" and uses parse_match_formula_array /
parse_nested_match_formula as before, and implement
wrap_with_decorations(core_formula, mapping) -> MatchFormula to apply the
existing extract_decorations result (where_clauses, as_name, fix) when needed;
then remove the #[expect(clippy::too_many_lines)] on
build_match_formula_from_mapping and refactor that function to call
find_single_operator, lookup mapping.get(key), call
build_core_formula_from_operator, and finally call wrap_with_decorations before
returning. Use the existing symbol names (build_match_formula_from_mapping,
extract_decorations, parse_match_formula_array, parse_nested_match_formula,
MatchFormula) so diagnostics and error messages remain unchanged.
| /// Strips the leading `$` from a metavariable name if present. | ||
| pub(crate) fn strip_dollar_prefix(mv: &str) -> String { | ||
| mv.strip_prefix('$').unwrap_or(mv).to_owned() |
There was a problem hiding this comment.
Reject empty metavariable identifiers.
Stop normalising "$" or "" into an empty name here. Both extract_metavariable_name in this file and parse_focus_clause in crates/sempai/src/where_clause/mod.rs trust this helper, so malformed input currently slips through as an empty binding instead of failing with ESempaiSchemaInvalid.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sempai/src/where_clause/metavariable.rs` around lines 369 - 371,
strip_dollar_prefix currently normalises "$" and "" into an empty name which
lets malformed input slip through; change strip_dollar_prefix to validate and
reject empty identifiers by returning an error (ESempaiSchemaInvalid) instead of
returning an empty String, and update its callers (extract_metavariable_name in
this file and parse_focus_clause in crates/sempai/src/where_clause/mod.rs) to
propagate the Result/error; specifically, make strip_dollar_prefix return
Result<String, ESempaiSchemaInvalid> (or the crate's error type), produce
ESempaiSchemaInvalid when mv == "" or mv == "$", and adjust
extract_metavariable_name and parse_focus_clause to handle/return the error
rather than assuming a non-empty name.
Summary by Sourcery
Normalize legacy and v2 Semgrep search rules into a shared canonical
Formulamodel and wire YAML compilation to return real query plans using this representation.New Features:
FormulaandDecoratedFormulamodel insempai_coreto represent normalized query logic across legacy and v2 syntaxes.sempaicrate that lowers parsed YAML search rules into the canonical formula model, including language expansion into per-language rules.Engine::compile_yamlthat carry the normalized canonical formula for each compiled search rule.Enhancements:
E_SEMPAI_UNSUPPORTED_MODEerrors.QueryPlanwith accessors and storage for the canonical formula, enabling inspection in tests and future backends.Documentation:
Formulamodel, the normalization pipeline, and new semantic validation behaviour forcompile_yaml.Tests: