Implement normalization into canonical Formula with semantic checks#104
Implement normalization into canonical Formula with semantic checks#104
Conversation
This adds a new normalization step that converts legacy 'pattern*' and v2 'match' syntax into a unified canonical 'Formula' model defined in sempai_core. It includes semantic validation for structural constraints like InvalidNotInOr and MissingPositiveTermInAnd, replacing the previous NOT_IMPLEMENTED stub in Engine::compile_yaml with real normalized query plans. Legacy and v2 rules now produce structurally equivalent formulas, and invalid formulas emit deterministic diagnostics. Additional changes include new formula-related types and validation logic in sempai_core and wiring normalization into the compile pipeline. Extensive tests and documentation updates accompany this change to ensure correctness and guide users. No breaking API changes or new external dependencies introduced. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
Canonical Formula Model with Normalization and Semantic ValidationThis PR implements a canonical Formula model, an infallible normalization pipeline that unifies legacy pattern* and v2 match syntaxes into a single Decorated representation, and deterministic semantic checks. See design/execplan: docs/execplans/4-1-5-normalization-into-canonical-formula-model.md (new). Core Changes
Tests and Behaviour
Documentation
Public API Surface and Compatibility
Diagnostics
For ReviewersFocus review on canonical model correctness, normalization semantics (legacy → canonical and v2 → canonical), semantic-check recursion and positive-term logic and span anchoring, engine wiring in compile_yaml, and test coverage/parity with legacy fixtures. Relevant design/execplan: docs/execplans/4-1-5-normalization-into-canonical-formula-model.md. WalkthroughIntroduce a public canonical Formula AST and Decorated wrapper; add normalization from legacy Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Engine as Engine
participant Parser as Parser
participant Normaliser as Normaliser
participant Validator as Validator
participant Compiler as Compiler
participant QueryPlan as QueryPlan
Client->>Engine: compile_yaml(yaml)
Engine->>Parser: parse YAML -> Vec<Rule>
Parser-->>Engine: rules with SearchQueryPrincipal
loop For each search-mode Rule
Engine->>Normaliser: normalize_search_principal(principal, span)
Normaliser-->>Engine: Decorated<Formula>
Engine->>Validator: validate_formula(Decorated<Formula>)
alt Validation passes
Validator-->>Engine: Ok(())
Engine->>Compiler: compile_rule_plans(rule, formula)
Compiler->>QueryPlan: new(rule_id, language, formula)
QueryPlan-->>Compiler: QueryPlan instance
Compiler-->>Engine: Vec(QueryPlan)
else Validation fails
Validator-->>Engine: Err(DiagnosticReport)
Engine-->>Client: Err(DiagnosticReport)
end
end
Engine-->>Client: Ok(Vec(QueryPlan))
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 7 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (7 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Reviewer's GuideImplements a canonical Formula model in sempai-core and wires a full normalization + semantic validation pipeline from YAML search rules through to Engine::compile_yaml, replacing the NOT_IMPLEMENTED placeholder with real QueryPlan construction and updating tests and docs accordingly. Sequence diagram for Engine::compile_yaml normalization and planningsequenceDiagram
actor User
participant Engine
participant YamlParser as sempai_yaml_parse_rule_file
participant ModeValidator as validate_supported_modes
participant Normalizer as normalize_search_principal
participant Semantic as validate_formula
participant Planner as compile_rule_plans
User->>Engine: compile_yaml(yaml)
Engine->>YamlParser: parse_rule_file(yaml, None)
YamlParser-->>Engine: RuleFile or DiagnosticReport
alt parse_error
Engine-->>User: Err(DiagnosticReport)
else parse_ok
Engine->>ModeValidator: validate_supported_modes(file)
ModeValidator-->>Engine: Ok or DiagnosticReport
alt mode_error
Engine-->>User: Err(DiagnosticReport)
else modes_ok
loop for each Rule in file.rules()
Engine->>Engine: principal = rule.principal()
alt principal_is_search
Engine->>Normalizer: normalize_search_principal(principal, rule.rule_span())
Normalizer-->>Engine: Decorated_Formula or DiagnosticReport
alt normalization_error
Engine-->>User: Err(DiagnosticReport)
else normalized
Engine->>Semantic: validate_formula(&formula)
Semantic-->>Engine: Ok or DiagnosticReport
alt semantic_error
Engine-->>User: Err(DiagnosticReport)
else validated
Engine->>Planner: compile_rule_plans(rule, &formula)
Planner-->>Engine: List~QueryPlan~ or DiagnosticReport
alt plan_error
Engine-->>User: Err(DiagnosticReport)
else plans_ok
Engine->>Engine: append plans to result
end
end
end
else non_search_principal
Engine->>Engine: skip or unsupported mode handled earlier
end
end
Engine-->>User: Ok(List~QueryPlan~)
end
end
Class diagram for canonical Formula model and decoratorsclassDiagram
class Formula {
+Atom atom
+Not not
+Inside inside
+Anywhere anywhere
+And and
+Or or
}
class Atom {
+Pattern pattern
+Regex regex
+TreeSitterQuery tree_sitter_query
}
class PatternAtom {
+String text
}
class RegexAtom {
+String pattern
}
class TreeSitterQueryAtom {
+String query
}
class Decorated_Formula {
+Formula node
+List~WhereClause~ where_clauses
+Option~String~ as_name
+Option~String~ fix
+Option~SourceSpan~ span
}
class WhereClause {
+Value raw
}
class SourceSpan
Formula --> Atom : uses
Atom --> PatternAtom : has
Atom --> RegexAtom : has
Atom --> TreeSitterQueryAtom : has
Decorated_Formula --> Formula : wraps
Decorated_Formula --> WhereClause : contains
Decorated_Formula --> SourceSpan : optional
Class diagram for Engine normalization and QueryPlanclassDiagram
class Engine {
-EngineConfig config
+compile_yaml(yaml : &str) Result~List~QueryPlan~~, DiagnosticReport~
}
class QueryPlan {
-String rule_id
-Language language
-Decorated_Formula formula
+new(rule_id : String, language : Language, formula : Decorated_Formula) QueryPlan
+rule_id() String
+language() Language
+formula() &Decorated_Formula
}
class Decorated_Formula {
+Formula node
+List~WhereClause~ where_clauses
+Option~String~ as_name
+Option~String~ fix
+Option~SourceSpan~ span
}
class Formula
class WhereClause
class SourceSpan
class EngineConfig
class DiagnosticReport
class Language
class Rule {
+id() &str
+languages() &List~String~
+rule_span() Option~SourceSpan~
+principal() &RulePrincipal
}
class RulePrincipal {
+Search search
+Other other
}
class SearchQueryPrincipal
class NormalizeModule {
+normalize_search_principal(principal : &SearchQueryPrincipal, rule_span : Option~&SourceSpan~) Result~Decorated_Formula, DiagnosticReport~
}
class SemanticCheckModule {
+validate_formula(formula : &Decorated_Formula) Result~(), DiagnosticReport~
}
Engine --> EngineConfig : has
Engine ..> QueryPlan : creates
Engine ..> Rule : uses
Engine ..> NormalizeModule : calls
Engine ..> SemanticCheckModule : calls
Engine ..> DiagnosticReport : returns
QueryPlan --> Language : has
QueryPlan --> Decorated_Formula : has
Rule --> RulePrincipal : has
RulePrincipal --> SearchQueryPrincipal : may_wrap
NormalizeModule ..> Decorated_Formula : returns
NormalizeModule ..> DiagnosticReport : error
SemanticCheckModule ..> DiagnosticReport : error
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
…model - Introduce a canonical `Formula` enum in sempai-core for normalized queries. - Implement normalization from legacy (`pattern*`) and v2 (`match`) syntaxes into the shared formula model. - Add semantic validation for formulas enforcing constraints on `Or` and `And` compositions. - Update engine to compile normalized formulas into query plans per language. - Normalize compatibility-only `r2c-internal-project-depends-on` rules into a degenerate formula. - Enhance diagnostics with semantic validation errors and maintain accurate error spans. - Add extensive tests covering normalization and semantic validation. - Update documentation to reflect the new normalization and validation implementation. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix: crates/sempai/src/tests/normalization_tests.rs Comment on file fn legacy_pattern_normalizes_to_atom() {
let legacy = LegacyFormula::Pattern(String::from("foo($X)"));
let principal = SearchQueryPrincipal::Legacy(legacy);
let result = normalize_search_principal(&principal, None).expect("should normalize");
assert_eq!(
result.node,
Formula::Atom(Atom::Pattern(PatternAtom {
text: String::from("foo($X)")
}))
);
}❌ New issue: Code Duplication |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix: crates/sempai/src/normalize.rs Comment on file fn normalize_match(
formula: &MatchFormula,
fallback_span: Option<SourceSpan>,
) -> Decorated<Formula> {
match formula {
MatchFormula::Pattern(text) | MatchFormula::PatternObject(text) => Decorated {
node: Formula::Atom(Atom::Pattern(PatternAtom { text: text.clone() })),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
},
MatchFormula::Regex(pattern) => Decorated {
node: Formula::Atom(Atom::Regex(RegexAtom {
pattern: pattern.clone(),
})),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
},
MatchFormula::All(branches) => {
let normalized_branches = branches
.iter()
.map(|branch| normalize_match(branch, fallback_span.clone()))
.collect();
Decorated {
node: Formula::And(normalized_branches),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
}
}
MatchFormula::Any(branches) => {
let normalized_branches = branches
.iter()
.map(|branch| normalize_match(branch, fallback_span.clone()))
.collect();
Decorated {
node: Formula::Or(normalized_branches),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
}
}
MatchFormula::Not(inner) => {
let normalized_inner = normalize_match(inner, fallback_span.clone());
Decorated {
node: Formula::Not(Box::new(normalized_inner)),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
}
}
MatchFormula::Inside(inner) => {
let normalized_inner = normalize_match(inner, fallback_span.clone());
Decorated {
node: Formula::Inside(Box::new(normalized_inner)),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
}
}
MatchFormula::Anywhere(inner) => {
let normalized_inner = normalize_match(inner, fallback_span.clone());
Decorated {
node: Formula::Anywhere(Box::new(normalized_inner)),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
}
}
MatchFormula::Decorated {
formula: inner_formula,
where_clauses: raw_where,
as_name,
fix,
} => {
let mut normalized = normalize_match(inner_formula, fallback_span);
normalized.where_clauses = raw_where
.iter()
.map(|raw| WhereClause { raw: raw.clone() })
.collect();
normalized.as_name.clone_from(as_name);
normalized.fix.clone_from(fix);
normalized
}
}
}❌ New issue: Large Method |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix: crates/sempai/src/normalize.rs Comment on file fn normalize_legacy(
formula: &LegacyFormula,
fallback_span: Option<SourceSpan>,
) -> Decorated<Formula> {
match formula {
LegacyFormula::Pattern(text) => Decorated {
node: Formula::Atom(Atom::Pattern(PatternAtom { text: text.clone() })),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
},
LegacyFormula::PatternRegex(pattern) => Decorated {
node: Formula::Atom(Atom::Regex(RegexAtom {
pattern: pattern.clone(),
})),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
},
LegacyFormula::Patterns(clauses) => normalize_legacy_patterns(clauses, fallback_span),
LegacyFormula::PatternEither(branches) => {
let normalized_branches = branches
.iter()
.map(|branch| normalize_legacy(branch, fallback_span.clone()))
.collect();
Decorated {
node: Formula::Or(normalized_branches),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
}
}
LegacyFormula::PatternNot(value) => {
let inner = normalize_legacy_value(value, fallback_span.clone());
Decorated {
node: Formula::Not(Box::new(inner)),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
}
}
LegacyFormula::PatternInside(value) => {
let inner = normalize_legacy_value(value, fallback_span.clone());
Decorated {
node: Formula::Inside(Box::new(inner)),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
}
}
LegacyFormula::PatternNotInside(value) => {
let inner = normalize_legacy_value(value, fallback_span.clone());
let inside = Decorated {
node: Formula::Inside(Box::new(inner)),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span.clone(),
};
Decorated {
node: Formula::Not(Box::new(inside)),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
}
}
LegacyFormula::PatternNotRegex(pattern) => {
let regex_atom = Decorated {
node: Formula::Atom(Atom::Regex(RegexAtom {
pattern: pattern.clone(),
})),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span.clone(),
};
Decorated {
node: Formula::Not(Box::new(regex_atom)),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
}
}
LegacyFormula::Anywhere(value) => {
let inner = normalize_legacy_value(value, fallback_span.clone());
Decorated {
node: Formula::Anywhere(Box::new(inner)),
where_clauses: vec![],
as_name: None,
fix: None,
span: fallback_span,
}
}
}
}❌ New issue: Large Method |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
…n length Add a private `bare()` helper function to construct Decorated nodes without metadata, replacing repeated struct literals throughout normalize.rs. This reduces `normalize_legacy` from 102 to 51 lines and `normalize_match` from 94 to 58 lines, bringing both under the 70-line clippy threshold. Changes: - Add `bare()` helper function with clear documentation - Replace all inline Decorated struct literals in normalize_legacy - Replace all inline Decorated struct literals in normalize_match - Apply bare() to normalize_legacy_value and normalize_dependency_principal - Remove #[expect(clippy::too_many_lines)] attributes no longer needed - Add #[expect(clippy::missing_const_for_fn)] to bare() (Vec requires runtime) - Fix import ordering in engine.rs per cargo fmt No logic changes. All tests pass. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Introduce helper functions to reduce repetition and improve code quality: normalize.rs changes: - Add normalize_unary() helper for Not/Inside/Anywhere arms - Add normalize_branches() helper for All/Any arms - Refactor normalize_match to use helpers, reducing from 58 to 43 lines - All helper functions properly handle Option<SourceSpan> ergonomics normalization_tests.rs changes: - Add normalize_legacy() and normalize_v2() test helpers - Eliminate repeated 3-line boilerplate from all 11 tests - Replace verbose match/panic! assertions with concise matches! macro - Reduce file from 160 to 141 lines No logic changes. All 33 tests pass. Clippy and formatting checks pass. Addresses CodeScene Large Method and code duplication violations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Hey - I've found 9 issues, and left some high level feedback:
- In
semantic_check.rs, theMissingPositiveTermInAnddiagnostic always uses the ambientspanrather than the most specific available branch span, which makes error locations less precise than they could be; consider preferring theAndnode’s own span or one of its children’s spans when available. - In
normalize_search_principaland the normalization helpers you currently never return an error but still wrap results inResultwith#[expect(clippy::unnecessary_wraps)]; it would be cleaner either to introduce at least one real normalization error path now (e.g., for obviously malformed payloads) or to keep the API simple and return the bareDecorated<Formula>until you actually need fallible normalization.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `semantic_check.rs`, the `MissingPositiveTermInAnd` diagnostic always uses the ambient `span` rather than the most specific available branch span, which makes error locations less precise than they could be; consider preferring the `And` node’s own span or one of its children’s spans when available.
- In `normalize_search_principal` and the normalization helpers you currently never return an error but still wrap results in `Result` with `#[expect(clippy::unnecessary_wraps)]`; it would be cleaner either to introduce at least one real normalization error path now (e.g., for obviously malformed payloads) or to keep the API simple and return the bare `Decorated<Formula>` until you actually need fallible normalization.
## Individual Comments
### Comment 1
<location path="crates/sempai-core/src/formula.rs" line_range="40-49" />
<code_context>
+/// raw: json!({"metavariable-regex": {"metavariable": "$X", "regex": "foo.*"}}),
+/// };
+/// ```
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct WhereClause {
+ /// The raw JSON value of the constraint.
+ pub raw: Value,
</code_context>
<issue_to_address>
**issue (bug_risk):** Deriving `Eq` for `WhereClause` will fail because `serde_json::Value` does not implement `Eq`.
Because `serde_json::Value` only implements `PartialEq`, `#[derive(Eq)]` on `WhereClause` will not compile. Please either remove `Eq` (and any `WhereClause: Eq` bounds such as on `Decorated<T>`) or switch `raw` to a type that can soundly implement `Eq`.
</issue_to_address>
### Comment 2
<location path="crates/sempai/src/tests/engine_tests.rs" line_range="68" />
<code_context>
#[test]
-fn compile_yaml_returns_not_implemented_after_successful_parse() {
+fn compile_yaml_normalizes_and_returns_query_plans() {
let engine = default_engine();
let result = engine.compile_yaml(
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests that cover semantic validation errors surfaced through `compile_yaml` (e.g. invalid-not-in-or, missing-positive-term-in-and).
Current `compile_yaml_*` tests only cover successful cases. Since `compile_yaml` now calls `validate_formula`, please add tests that build YAML rules which normalize to:
- an `Or` with a negated branch, and
- an `And` with only constraint-style terms (`not`, `inside`, `anywhere`) and no positive term.
These should assert that `compile_yaml` returns an `Err`, and that the first diagnostic codes are `ESempaiInvalidNotInOr` and `ESempaiMissingPositiveTermInAnd`, to confirm semantic errors are exposed via the public API.
Suggested implementation:
```rust
#[test]
fn compile_yaml_normalizes_and_returns_query_plans() {
let engine = default_engine();
let result = engine.compile_yaml(
"rules:\n - id: demo.rule\n message: oops\n languages: [rust]\n severity: ERROR\n pattern: foo($X)\n",
);
let plans = result.expect("should successfully compile");
assert_eq!(plans.len(), 1);
let plan = plans.first().expect("should have first plan");
assert_eq!(plan.rule_id(), "demo.rule");
assert_eq!(plan.language(), Language::Rust);
}
#[test]
fn compile_yaml_returns_semantic_error_for_invalid_not_in_or() {
let engine = default_engine();
let result = engine.compile_yaml(
"\
rules:
- id: demo.invalid.not.in.or
message: invalid not in or
languages: [rust]
severity: ERROR
pattern: not foo($X) or bar($Y)
",
);
let error = result.expect_err("compile_yaml should surface semantic validation errors");
let diagnostics = error.diagnostics();
let first = diagnostics
.first()
.expect("semantic error should contain at least one diagnostic");
assert_eq!(first.code(), "ESempaiInvalidNotInOr");
}
#[test]
fn compile_yaml_returns_semantic_error_for_missing_positive_term_in_and() {
let engine = default_engine();
let result = engine.compile_yaml(
"\
rules:
- id: demo.missing.positive.term.in.and
message: missing positive term in and
languages: [rust]
severity: ERROR
pattern: not foo($X) and inside(bar($Y))
",
);
let error = result.expect_err("compile_yaml should surface semantic validation errors");
let diagnostics = error.diagnostics();
let first = diagnostics
.first()
.expect("semantic error should contain at least one diagnostic");
assert_eq!(first.code(), "ESempaiMissingPositiveTermInAnd");
}
```
To integrate this with the existing codebase you may need to:
1. Adjust how diagnostics are accessed from the error returned by `compile_yaml`. For example, replace `error.diagnostics()` and `first.code()` with the actual methods/fields used in your error/diagnostic types (e.g. `error.diagnostics`, `error.diagnostics()[0].code`, or `first.code.to_string()`).
2. Confirm the exact string representation or enum formatting of the diagnostic codes. If they are enums instead of strings, change the assertions to match the enum variant (e.g. `assert_eq!(first.code, DiagnosticCode::ESempaiInvalidNotInOr)`).
3. If your pattern language requires a different syntax to produce an `Or` with a negated branch or an `And` with only constraint-style terms, tweak the `pattern:` strings in the YAML to align with how `validate_formula` actually interprets formulas in your project.
</issue_to_address>
### Comment 3
<location path="crates/sempai/src/tests/normalization_tests.rs" line_range="9" />
<code_context>
+use crate::normalize::normalize_search_principal;
+
+/// Helper to normalize a legacy formula and extract the node.
+fn normalize_legacy(formula: LegacyFormula) -> Formula {
+ let principal = SearchQueryPrincipal::Legacy(formula);
+ normalize_search_principal(&principal, None)
</code_context>
<issue_to_address>
**suggestion (testing):** Add normalization tests for `patterns` (legacy `LegacyFormula::Patterns`) including constraint mapping into `where_clauses`.
Existing legacy normalization tests only cover simple `pattern` / `pattern-regex` and some negated/inside variants; they don’t exercise `LegacyFormula::Patterns`, where constraints are attached as `where_clauses` on the enclosing `And`. Please add a test that builds a `LegacyFormula::Patterns` with both formula clauses and one or more constraint clauses, and asserts that:
- the normalized node is a `Formula::And` with the expected child formulas, and
- the `Decorated` wrapper holds the expected `where_clauses` while all child formulas have empty `where_clauses`.
This will verify the constraint propagation logic in `normalize_legacy_patterns`.
Suggested implementation:
```rust
use sempai_core::formula::{Atom, Decorated, Formula, PatternAtom, RegexAtom};
```
```rust
/// Helper to normalize a legacy formula and extract the node.
fn normalize_legacy(formula: LegacyFormula) -> Formula {
let principal = SearchQueryPrincipal::Legacy(formula);
normalize_search_principal(&principal, None)
.expect("should normalize")
.node
}
/// Helper to normalize a legacy formula and keep the decorated wrapper.
///
/// This is useful for tests that need to inspect `where_clauses` or other
/// metadata attached to the normalized formula.
fn normalize_legacy_decorated(formula: LegacyFormula) -> Decorated<Formula> {
let principal = SearchQueryPrincipal::Legacy(formula);
normalize_search_principal(&principal, None).expect("should normalize")
}
/// Helper to normalize a v2 match formula and extract the node.
fn normalize_v2(formula: MatchFormula) -> Formula {
let principal = SearchQueryPrincipal::Match(formula);
normalize_search_principal(&principal, None)
.expect("should normalize")
.node
}
/// Normalization of `LegacyFormula::Patterns` should:
/// - produce an outer `Formula::And` node combining the pattern formulas, and
/// - attach any constraints as `where_clauses` on the outer `Decorated` node,
/// while all child formulas have empty `where_clauses`.
#[test]
fn normalize_legacy_patterns_propagates_constraints_to_where_clauses() {
// NOTE: The exact constructors for `LegacyFormula::Pattern` / constraint
// variants should match the rest of the tests in this module. If the
// field names differ, adjust accordingly.
//
// Two simple pattern formulas:
let pattern_formula_1 = LegacyFormula::Pattern {
// Example pattern string; align with existing tests if they use
// different patterns or additional fields.
pattern: "foo($x)".to_string(),
// Fill in any additional fields that the real `Pattern` variant
// expects (e.g. language, is_regexp, etc.).
..Default::default()
};
let pattern_formula_2 = LegacyFormula::Pattern {
pattern: "bar($x)".to_string(),
..Default::default()
};
// One or more constraint clauses that should end up as `where_clauses`
// on the enclosing `And` in the normalized representation.
//
// This assumes there is some constraint-like legacy variant; adjust the
// variant / fields to whatever is used elsewhere in the codebase for
// capture constraints.
let constraint_formula = LegacyFormula::Constraint {
// Example: `$x` is constrained to some value.
// Replace these fields with the actual ones from the enum.
variable: "x".to_string(),
value: LegacyValue::String("constrained".to_string()),
};
let legacy = LegacyFormula::Patterns {
// These are the pattern formulas that should become the children
// of the normalized `Formula::And`.
patterns: vec![pattern_formula_1, pattern_formula_2],
// These are the constraints that `normalize_legacy_patterns` is
// expected to map into `where_clauses` on the enclosing `And`.
constraints: vec![constraint_formula],
};
let decorated = normalize_legacy_decorated(legacy);
// Outer node must be an `And` combining the pattern formulas.
let children = match &decorated.node {
Formula::And(children) => children,
other => panic!("expected normalized legacy Patterns to be And, got {:?}", other),
};
// We expect exactly the two pattern children we provided above.
assert_eq!(
children.len(),
2,
"expected two child formulas in normalized And for LegacyFormula::Patterns"
);
// All constraints should be attached as `where_clauses` on the outer
// decorated node.
assert!(
!decorated.where_clauses.is_empty(),
"expected where_clauses on outer Decorated<Formula> for LegacyFormula::Patterns constraints"
);
// Child formulas themselves must not carry any where_clauses; they should
// all be attached to the enclosing `And`.
for (idx, child) in children.iter().enumerate() {
assert!(
child.where_clauses.is_empty(),
"expected child formula {} of Patterns And to have empty where_clauses",
idx
);
}
}
```
The above test assumes specific shapes for the `LegacyFormula` variants and `Decorated` type that may differ slightly from your actual implementation. To integrate cleanly you should:
1. **Adjust the `LegacyFormula` constructors**:
- Replace `LegacyFormula::Pattern { pattern, ..Default::default() }` with the exact syntax used in your existing legacy normalization tests (same fields, same variant name).
- Replace `LegacyFormula::Constraint { variable, value }` with the actual constraint variant used for legacy pattern constraints (for example it might be something like `LegacyFormula::Where`, `LegacyFormula::CaptureConstraint`, or similar).
- Update the `LegacyFormula::Patterns { patterns, constraints }` fields (`patterns`, `constraints`) to match the real field names for the `Patterns` variant.
2. **Update the `Decorated` type usage**:
- If `Decorated` lives in a different module or is re-exported under a different path, update the import (`use sempai_core::formula::{Atom, Decorated, Formula, PatternAtom, RegexAtom};`) accordingly.
- If the fields on `Decorated` are named differently (e.g. `node` vs `formula`, `where_clauses` vs `constraints`), adjust the helper `normalize_legacy_decorated` return type and the test field accesses to match.
3. **Ensure `Default::default()` is valid**:
- If the `LegacyFormula::Pattern` variant does not support the struct-update `..Default::default()` (e.g. it’s not `#[derive(Default)]` or doesn’t use struct syntax), replace those with complete explicit construction as used in the existing tests in this file.
4. **Align where-clause type checks**:
- If `where_clauses` is not a `Vec<_>` or has a different accessor, update the `assert!(!decorated.where_clauses.is_empty())` and child `where_clauses` assertions to match the real API.
</issue_to_address>
### Comment 4
<location path="crates/sempai/src/tests/normalization_tests.rs" line_range="80" />
<code_context>
+}
+
+#[test]
+fn v2_pattern_shorthand_normalizes_to_atom() {
+ let result = normalize_v2(MatchFormula::Pattern(String::from("bar($Y)")));
+ assert_eq!(
</code_context>
<issue_to_address>
**suggestion (testing):** Extend v2 normalization tests to cover `MatchFormula::Decorated` metadata (where/as/fix) and span propagation.
Current v2 tests only cover the structural mapping for `pattern`/`regex`/`all`/`any`/`not`/`inside`/`anywhere`, not the `MatchFormula::Decorated { .. }` variant. Since `normalize_match` maps `where` into `where_clauses` and copies `as_name`/`fix` onto `Decorated`, please add a test that builds a decorated formula with non-empty `where`/`as`/`fix` and asserts those fields are preserved. It would also be helpful to add a test that calls `normalize_search_principal` with a non-`None` `rule_span` and verifies the resulting `span` is set, to cover span propagation through normalization.
Suggested implementation:
```rust
#[test]
fn v2_decorated_metadata_is_preserved() {
// Build a decorated match formula with non-empty where/as/fix metadata
let inner = MatchFormula::Pattern(String::from("foo($X)"));
let mf = MatchFormula::Decorated {
formula: Box::new(inner),
// Use a minimal where-clause that your existing types support
// (e.g. WhereClause::Raw, WhereClause::Expr, etc.)
where_clauses: vec![WhereClause::Raw(String::from("foo == bar"))],
as_name: Some(String::from("my_capture")),
fix: Some(String::from("apply_fix")),
span: None,
};
let result = normalize_v2(mf);
match result {
Formula::Decorated(decorated) => {
assert_eq!(decorated.as_name.as_deref(), Some("my_capture"));
assert_eq!(decorated.fix.as_deref(), Some("apply_fix"));
assert_eq!(decorated.where_clauses.len(), 1);
// Adjust this assertion if WhereClause doesn't implement Display/to_string
assert_eq!(decorated.where_clauses[0].to_string(), "foo == bar");
}
_ => panic!("expected Decorated formula"),
}
}
#[test]
fn v2_span_is_propagated_from_search_principal() {
// Build a simple principal with a pattern formula
let principal = SearchPrincipal {
formula: MatchFormula::Pattern(String::from("foo($X)")),
span: None,
// Add any additional required fields here to satisfy the struct definition.
};
// Use whatever span type/constructor your codebase exposes for rule spans
let rule_span = Some(Span::from_offset(0, 10));
let normalized = normalize_search_principal(rule_span.clone(), principal);
assert_eq!(normalized.span, rule_span);
}
#[test]
fn legacy_pattern_regex_normalizes_to_regex_atom() {
```
Because only a fragment of the file and type definitions is visible, you will need to align the new tests with your existing types:
1. **WhereClause construction**
- Replace `WhereClause::Raw(String::from("foo == bar"))` with the concrete variant/constructor your codebase uses to represent a simple `where` clause.
- If `WhereClause` does not implement `Display`, replace `decorated.where_clauses[0].to_string()` with a field-level assertion that matches how `normalize_match` stores the textual predicate (e.g. `decorated.where_clauses[0].expr.source` or similar).
2. **Formula / Decorated shape**
- If your normalized type is something like `Formula::Decorated(DecoratedFormula { formula, where_clauses, as_name, fix, .. })`, update the pattern in the `match` arm to match the real struct/field names.
- If `normalize_v2` returns a wrapper (e.g. `NormalizedMatch`) instead of `Formula` directly, destructure that wrapper first and then match on its `formula` field.
3. **SearchPrincipal and span type**
- Replace the `SearchPrincipal` literal with the real struct fields required in your codebase (e.g. `rule`, `language`, etc.).
- Replace `Span::from_offset(0, 10)` with the constructor or literal you normally use for rule spans (e.g. `Span { start: 0, end: 10 }`, `RuleSpan::new(0, 10)`, or `TextRange::new(...)`).
- Adjust the field name `span` on `SearchPrincipal` and `normalized` if your struct uses a different name (e.g. `rule_span` or `range`).
4. **Import paths**
- Ensure the test module has `use` statements for `MatchFormula`, `Formula`, `WhereClause`, `SearchPrincipal`, `Span` (or your span type), `normalize_v2`, and `normalize_search_principal` from the correct modules.
</issue_to_address>
### Comment 5
<location path="crates/sempai/src/tests/normalization_tests.rs" line_range="61" />
<code_context>
+}
+
+#[test]
+fn legacy_pattern_not_inside_normalizes_to_not_inside() {
+ let result = normalize_legacy(LegacyFormula::PatternNotInside(Box::new(
+ LegacyValue::String(String::from("class Foo:")),
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding tests for legacy `pattern-either` and `semgrep-internal-pattern-anywhere` normalization.
Current legacy tests don’t cover `LegacyFormula::PatternEither` or `LegacyFormula::Anywhere` (for `semgrep-internal-pattern-anywhere`). Please add tests that
- verify `pattern-either: [...]` normalizes to `Formula::Or` with the correct children, and
- verify `semgrep-internal-pattern-anywhere` normalizes to `Formula::Anywhere` wrapping the child formula,
so the legacy-to-canonical mapping is fully exercised.
Suggested implementation:
```rust
#[test]
fn legacy_pattern_not_inside_normalizes_to_not_inside() {
let result = normalize_legacy(LegacyFormula::PatternNotInside(Box::new(
LegacyValue::String(String::from("class Foo:")),
)));
match result {
Formula::Not(not_inner) => match not_inner.node {
Formula::Inside(inside_inner) => match inside_inner.node {
Formula::Atom(Atom::Pattern(pat)) => {
assert_eq!(pat.text, "class Foo:");
}
_ => panic!("expected Pattern atom"),
},
_ => panic!("expected Inside formula inside Not"),
},
_ => panic!("expected Not formula"),
}
}
#[test]
fn legacy_pattern_either_normalizes_to_or() {
let result = normalize_legacy(LegacyFormula::PatternEither(vec![
LegacyFormula::Pattern(Box::new(LegacyValue::String(String::from("first pattern")))),
LegacyFormula::Pattern(Box::new(LegacyValue::String(String::from("second pattern")))),
]));
match result {
Formula::Or(children) => {
assert_eq!(children.len(), 2, "expected two children in Or formula");
match &children[0].node {
Formula::Atom(Atom::Pattern(pat)) => {
assert_eq!(pat.text, "first pattern");
}
_ => panic!("expected first child of Or to be a Pattern atom"),
}
match &children[1].node {
Formula::Atom(Atom::Pattern(pat)) => {
assert_eq!(pat.text, "second pattern");
}
_ => panic!("expected second child of Or to be a Pattern atom"),
}
}
_ => panic!("expected Or formula from legacy pattern-either"),
}
}
#[test]
fn legacy_anywhere_normalizes_to_anywhere() {
// This corresponds to legacy `semgrep-internal-pattern-anywhere` with a single pattern child.
let result = normalize_legacy(LegacyFormula::Anywhere(Box::new(
LegacyFormula::Pattern(Box::new(LegacyValue::String(String::from(
"pattern anywhere",
)))),
)));
match result {
Formula::Anywhere(inner) => match inner.node {
Formula::Atom(Atom::Pattern(pat)) => {
assert_eq!(pat.text, "pattern anywhere");
}
_ => panic!("expected Pattern atom inside Anywhere"),
},
_ => panic!("expected Anywhere formula from semgrep-internal-pattern-anywhere"),
```
The above edits assume the following shapes, which you should confirm in your codebase and adjust if necessary:
1. `LegacyFormula::PatternEither` takes a `Vec<LegacyFormula>`. If it instead takes a `Vec<LegacyValue>` or a different wrapper type, wrap the elements accordingly in the test.
2. `LegacyFormula::Pattern` is constructed as `LegacyFormula::Pattern(Box<LegacyValue::String(...)>)`. If the constructor signature differs, update the test calls.
3. `LegacyFormula::Anywhere` is defined as wrapping a `Box<LegacyFormula>`. If it instead wraps a `LegacyValue` directly, change the argument to `Anywhere` to match.
4. `Formula::Or` is assumed to contain a `Vec<Spanned<Formula>>` (or similar) where you access the formula via `.node`. If its inner type is different, update the pattern match on `Formula::Or(children)` and indexing of `children[0]` / `children[1]` accordingly.
Once these shapes are confirmed and aligned, the tests will fully cover normalization from legacy `pattern-either` and `semgrep-internal-pattern-anywhere` to `Formula::Or` and `Formula::Anywhere`, respectively.
</issue_to_address>
### Comment 6
<location path="crates/sempai/src/tests/semantic_validation_tests.rs" line_range="21" />
<code_context>
+}
+
+#[test]
+fn valid_or_with_positive_branches_passes() {
+ let formula = Decorated {
+ node: Formula::Or(vec![make_pattern("foo"), make_pattern("bar")]),
</code_context>
<issue_to_address>
**suggestion (testing):** Add a sanity test that a single positive atom formula passes validation untouched.
Current tests cover `And`/`Or` combinations, but not the simplest case. Please add a test where the root is a single `Atom` (e.g. `make_pattern("foo")`) and assert that `validate_formula` returns `Ok(())` to guard against future regressions on trivially valid formulas.
</issue_to_address>
### Comment 7
<location path="docs/execplans/4-1-5-normalization-into-canonical-formula-model.md" line_range="168" />
<code_context>
+produces `LegacyFormula` and `MatchFormula` ASTs in `sempai_yaml`, which are
+structurally distinct but semantically equivalent for the subset they share.
+
+The normalisation step must lower both representations into the design
+document's canonical `Formula` enum, then run semantic constraint checks before
+constructing a `QueryPlan`.
</code_context>
<issue_to_address>
**nitpick (typo):** Align spelling of "normalize/normalisation" within this document.
This section uses "normalisation" while earlier parts (including the title) use "normalize/normalization". Please choose one variant (likely the one from the title) and use it consistently in this file.
Suggested implementation:
```
The normalization step must lower both representations into the design
```
Search the rest of `docs/execplans/4-1-5-normalization-into-canonical-formula-model.md` for other variants like `normalisation`, `normalise`, and replace them with `normalization`, `normalize` respectively to keep spelling consistent with the title and earlier sections.
</issue_to_address>
### Comment 8
<location path="crates/sempai/src/normalize.rs" line_range="1" />
<code_context>
+//! Normalization of parsed query syntax into canonical formula model.
+//!
+//! This module provides the transformation from parsed
</code_context>
<issue_to_address>
**issue (review_instructions):** Add unit and behavioural tests for the normalization pipeline; the new normalization logic is currently untested.
This module implements the core normalization logic from `SearchQueryPrincipal` to the canonical `Formula` model, including legacy/v2 mappings and special handling for `r2c-internal-project-depends-on`. There are no unit tests in `crates/sempai/src/tests/normalization_tests.rs` (the file is empty in this diff) and no visible BDD coverage for the happy, unhappy, and edge paths you describe in the ExecPlan. Add unit tests that cover all branches of `normalize_legacy`, `normalize_legacy_patterns`, `normalize_match`, and `normalize_dependency_principal`, and add `rstest-bdd` feature scenarios that exercise normalization end-to-end via the public engine API.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*`
**Instructions:**
- For any new feature or change to an existing feature, both behavioural *and* unit tests are required.
- Tests must use `rstest-bdd` v0.5.0 for BDD scenarios covering happy, unhappy, and edge paths.
</details>
</issue_to_address>
### Comment 9
<location path="docs/execplans/4-1-5-normalization-into-canonical-formula-model.md" line_range="163" />
<code_context>
+
+## Context and orientation
+
+The Sempai query pipeline currently stops at a `NOT_IMPLEMENTED` placeholder
+after successfully parsing and mode-validating a YAML rule file. The parser
+produces `LegacyFormula` and `MatchFormula` ASTs in `sempai_yaml`, which are
</code_context>
<issue_to_address>
**suggestion (review_instructions):** ASTs are referenced later in this section without defining the acronym.
In the paragraph beginning "The parser produces `LegacyFormula` and `MatchFormula` ASTs", please expand AST on first use, for example "abstract syntax trees (ASTs)".
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*.md`
**Instructions:**
Define uncommon acronyms on first use.
</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| #[derive(Debug, Clone, PartialEq, Eq)] | ||
| pub enum Formula { | ||
| /// A leaf pattern or regex atom. | ||
| Atom(Atom), | ||
| /// Negation: the inner formula must not match. | ||
| Not(Box<Decorated<Formula>>), | ||
| /// Context constraint: the anchor must be inside a match of the inner. | ||
| Inside(Box<Decorated<Formula>>), | ||
| /// Context constraint: the inner must match somewhere in scope. | ||
| Anywhere(Box<Decorated<Formula>>), |
There was a problem hiding this comment.
issue (bug_risk): Deriving Eq for WhereClause will fail because serde_json::Value does not implement Eq.
Because serde_json::Value only implements PartialEq, #[derive(Eq)] on WhereClause will not compile. Please either remove Eq (and any WhereClause: Eq bounds such as on Decorated<T>) or switch raw to a type that can soundly implement Eq.
|
|
||
| #[test] | ||
| fn compile_yaml_returns_not_implemented_after_successful_parse() { | ||
| fn compile_yaml_normalizes_and_returns_query_plans() { |
There was a problem hiding this comment.
suggestion (testing): Add tests that cover semantic validation errors surfaced through compile_yaml (e.g. invalid-not-in-or, missing-positive-term-in-and).
Current compile_yaml_* tests only cover successful cases. Since compile_yaml now calls validate_formula, please add tests that build YAML rules which normalize to:
- an
Orwith a negated branch, and - an
Andwith only constraint-style terms (not,inside,anywhere) and no positive term.
These should assert that compile_yaml returns an Err, and that the first diagnostic codes are ESempaiInvalidNotInOr and ESempaiMissingPositiveTermInAnd, to confirm semantic errors are exposed via the public API.
Suggested implementation:
#[test]
fn compile_yaml_normalizes_and_returns_query_plans() {
let engine = default_engine();
let result = engine.compile_yaml(
"rules:\n - id: demo.rule\n message: oops\n languages: [rust]\n severity: ERROR\n pattern: foo($X)\n",
);
let plans = result.expect("should successfully compile");
assert_eq!(plans.len(), 1);
let plan = plans.first().expect("should have first plan");
assert_eq!(plan.rule_id(), "demo.rule");
assert_eq!(plan.language(), Language::Rust);
}
#[test]
fn compile_yaml_returns_semantic_error_for_invalid_not_in_or() {
let engine = default_engine();
let result = engine.compile_yaml(
"\
rules:
- id: demo.invalid.not.in.or
message: invalid not in or
languages: [rust]
severity: ERROR
pattern: not foo($X) or bar($Y)
",
);
let error = result.expect_err("compile_yaml should surface semantic validation errors");
let diagnostics = error.diagnostics();
let first = diagnostics
.first()
.expect("semantic error should contain at least one diagnostic");
assert_eq!(first.code(), "ESempaiInvalidNotInOr");
}
#[test]
fn compile_yaml_returns_semantic_error_for_missing_positive_term_in_and() {
let engine = default_engine();
let result = engine.compile_yaml(
"\
rules:
- id: demo.missing.positive.term.in.and
message: missing positive term in and
languages: [rust]
severity: ERROR
pattern: not foo($X) and inside(bar($Y))
",
);
let error = result.expect_err("compile_yaml should surface semantic validation errors");
let diagnostics = error.diagnostics();
let first = diagnostics
.first()
.expect("semantic error should contain at least one diagnostic");
assert_eq!(first.code(), "ESempaiMissingPositiveTermInAnd");
}To integrate this with the existing codebase you may need to:
- Adjust how diagnostics are accessed from the error returned by
compile_yaml. For example, replaceerror.diagnostics()andfirst.code()with the actual methods/fields used in your error/diagnostic types (e.g.error.diagnostics,error.diagnostics()[0].code, orfirst.code.to_string()). - Confirm the exact string representation or enum formatting of the diagnostic codes. If they are enums instead of strings, change the assertions to match the enum variant (e.g.
assert_eq!(first.code, DiagnosticCode::ESempaiInvalidNotInOr)). - If your pattern language requires a different syntax to produce an
Orwith a negated branch or anAndwith only constraint-style terms, tweak thepattern:strings in the YAML to align with howvalidate_formulaactually interprets formulas in your project.
| use crate::normalize::normalize_search_principal; | ||
|
|
||
| /// Helper to normalize a legacy formula and extract the node. | ||
| fn normalize_legacy(formula: LegacyFormula) -> Formula { |
There was a problem hiding this comment.
suggestion (testing): Add normalization tests for patterns (legacy LegacyFormula::Patterns) including constraint mapping into where_clauses.
Existing legacy normalization tests only cover simple pattern / pattern-regex and some negated/inside variants; they don’t exercise LegacyFormula::Patterns, where constraints are attached as where_clauses on the enclosing And. Please add a test that builds a LegacyFormula::Patterns with both formula clauses and one or more constraint clauses, and asserts that:
- the normalized node is a
Formula::Andwith the expected child formulas, and - the
Decoratedwrapper holds the expectedwhere_clauseswhile all child formulas have emptywhere_clauses.
This will verify the constraint propagation logic in normalize_legacy_patterns.
Suggested implementation:
use sempai_core::formula::{Atom, Decorated, Formula, PatternAtom, RegexAtom};/// Helper to normalize a legacy formula and extract the node.
fn normalize_legacy(formula: LegacyFormula) -> Formula {
let principal = SearchQueryPrincipal::Legacy(formula);
normalize_search_principal(&principal, None)
.expect("should normalize")
.node
}
/// Helper to normalize a legacy formula and keep the decorated wrapper.
///
/// This is useful for tests that need to inspect `where_clauses` or other
/// metadata attached to the normalized formula.
fn normalize_legacy_decorated(formula: LegacyFormula) -> Decorated<Formula> {
let principal = SearchQueryPrincipal::Legacy(formula);
normalize_search_principal(&principal, None).expect("should normalize")
}
/// Helper to normalize a v2 match formula and extract the node.
fn normalize_v2(formula: MatchFormula) -> Formula {
let principal = SearchQueryPrincipal::Match(formula);
normalize_search_principal(&principal, None)
.expect("should normalize")
.node
}
/// Normalization of `LegacyFormula::Patterns` should:
/// - produce an outer `Formula::And` node combining the pattern formulas, and
/// - attach any constraints as `where_clauses` on the outer `Decorated` node,
/// while all child formulas have empty `where_clauses`.
#[test]
fn normalize_legacy_patterns_propagates_constraints_to_where_clauses() {
// NOTE: The exact constructors for `LegacyFormula::Pattern` / constraint
// variants should match the rest of the tests in this module. If the
// field names differ, adjust accordingly.
//
// Two simple pattern formulas:
let pattern_formula_1 = LegacyFormula::Pattern {
// Example pattern string; align with existing tests if they use
// different patterns or additional fields.
pattern: "foo($x)".to_string(),
// Fill in any additional fields that the real `Pattern` variant
// expects (e.g. language, is_regexp, etc.).
..Default::default()
};
let pattern_formula_2 = LegacyFormula::Pattern {
pattern: "bar($x)".to_string(),
..Default::default()
};
// One or more constraint clauses that should end up as `where_clauses`
// on the enclosing `And` in the normalized representation.
//
// This assumes there is some constraint-like legacy variant; adjust the
// variant / fields to whatever is used elsewhere in the codebase for
// capture constraints.
let constraint_formula = LegacyFormula::Constraint {
// Example: `$x` is constrained to some value.
// Replace these fields with the actual ones from the enum.
variable: "x".to_string(),
value: LegacyValue::String("constrained".to_string()),
};
let legacy = LegacyFormula::Patterns {
// These are the pattern formulas that should become the children
// of the normalized `Formula::And`.
patterns: vec![pattern_formula_1, pattern_formula_2],
// These are the constraints that `normalize_legacy_patterns` is
// expected to map into `where_clauses` on the enclosing `And`.
constraints: vec![constraint_formula],
};
let decorated = normalize_legacy_decorated(legacy);
// Outer node must be an `And` combining the pattern formulas.
let children = match &decorated.node {
Formula::And(children) => children,
other => panic!("expected normalized legacy Patterns to be And, got {:?}", other),
};
// We expect exactly the two pattern children we provided above.
assert_eq!(
children.len(),
2,
"expected two child formulas in normalized And for LegacyFormula::Patterns"
);
// All constraints should be attached as `where_clauses` on the outer
// decorated node.
assert!(
!decorated.where_clauses.is_empty(),
"expected where_clauses on outer Decorated<Formula> for LegacyFormula::Patterns constraints"
);
// Child formulas themselves must not carry any where_clauses; they should
// all be attached to the enclosing `And`.
for (idx, child) in children.iter().enumerate() {
assert!(
child.where_clauses.is_empty(),
"expected child formula {} of Patterns And to have empty where_clauses",
idx
);
}
}The above test assumes specific shapes for the LegacyFormula variants and Decorated type that may differ slightly from your actual implementation. To integrate cleanly you should:
-
Adjust the
LegacyFormulaconstructors:- Replace
LegacyFormula::Pattern { pattern, ..Default::default() }with the exact syntax used in your existing legacy normalization tests (same fields, same variant name). - Replace
LegacyFormula::Constraint { variable, value }with the actual constraint variant used for legacy pattern constraints (for example it might be something likeLegacyFormula::Where,LegacyFormula::CaptureConstraint, or similar). - Update the
LegacyFormula::Patterns { patterns, constraints }fields (patterns,constraints) to match the real field names for thePatternsvariant.
- Replace
-
Update the
Decoratedtype usage:- If
Decoratedlives in a different module or is re-exported under a different path, update the import (use sempai_core::formula::{Atom, Decorated, Formula, PatternAtom, RegexAtom};) accordingly. - If the fields on
Decoratedare named differently (e.g.nodevsformula,where_clausesvsconstraints), adjust the helpernormalize_legacy_decoratedreturn type and the test field accesses to match.
- If
-
Ensure
Default::default()is valid:- If the
LegacyFormula::Patternvariant does not support the struct-update..Default::default()(e.g. it’s not#[derive(Default)]or doesn’t use struct syntax), replace those with complete explicit construction as used in the existing tests in this file.
- If the
-
Align where-clause type checks:
- If
where_clausesis not aVec<_>or has a different accessor, update theassert!(!decorated.where_clauses.is_empty())and childwhere_clausesassertions to match the real API.
- If
| } | ||
|
|
||
| #[test] | ||
| fn v2_pattern_shorthand_normalizes_to_atom() { |
There was a problem hiding this comment.
suggestion (testing): Extend v2 normalization tests to cover MatchFormula::Decorated metadata (where/as/fix) and span propagation.
Current v2 tests only cover the structural mapping for pattern/regex/all/any/not/inside/anywhere, not the MatchFormula::Decorated { .. } variant. Since normalize_match maps where into where_clauses and copies as_name/fix onto Decorated, please add a test that builds a decorated formula with non-empty where/as/fix and asserts those fields are preserved. It would also be helpful to add a test that calls normalize_search_principal with a non-None rule_span and verifies the resulting span is set, to cover span propagation through normalization.
Suggested implementation:
#[test]
fn v2_decorated_metadata_is_preserved() {
// Build a decorated match formula with non-empty where/as/fix metadata
let inner = MatchFormula::Pattern(String::from("foo($X)"));
let mf = MatchFormula::Decorated {
formula: Box::new(inner),
// Use a minimal where-clause that your existing types support
// (e.g. WhereClause::Raw, WhereClause::Expr, etc.)
where_clauses: vec![WhereClause::Raw(String::from("foo == bar"))],
as_name: Some(String::from("my_capture")),
fix: Some(String::from("apply_fix")),
span: None,
};
let result = normalize_v2(mf);
match result {
Formula::Decorated(decorated) => {
assert_eq!(decorated.as_name.as_deref(), Some("my_capture"));
assert_eq!(decorated.fix.as_deref(), Some("apply_fix"));
assert_eq!(decorated.where_clauses.len(), 1);
// Adjust this assertion if WhereClause doesn't implement Display/to_string
assert_eq!(decorated.where_clauses[0].to_string(), "foo == bar");
}
_ => panic!("expected Decorated formula"),
}
}
#[test]
fn v2_span_is_propagated_from_search_principal() {
// Build a simple principal with a pattern formula
let principal = SearchPrincipal {
formula: MatchFormula::Pattern(String::from("foo($X)")),
span: None,
// Add any additional required fields here to satisfy the struct definition.
};
// Use whatever span type/constructor your codebase exposes for rule spans
let rule_span = Some(Span::from_offset(0, 10));
let normalized = normalize_search_principal(rule_span.clone(), principal);
assert_eq!(normalized.span, rule_span);
}
#[test]
fn legacy_pattern_regex_normalizes_to_regex_atom() {Because only a fragment of the file and type definitions is visible, you will need to align the new tests with your existing types:
-
WhereClause construction
- Replace
WhereClause::Raw(String::from("foo == bar"))with the concrete variant/constructor your codebase uses to represent a simplewhereclause. - If
WhereClausedoes not implementDisplay, replacedecorated.where_clauses[0].to_string()with a field-level assertion that matches hownormalize_matchstores the textual predicate (e.g.decorated.where_clauses[0].expr.sourceor similar).
- Replace
-
Formula / Decorated shape
- If your normalized type is something like
Formula::Decorated(DecoratedFormula { formula, where_clauses, as_name, fix, .. }), update the pattern in thematcharm to match the real struct/field names. - If
normalize_v2returns a wrapper (e.g.NormalizedMatch) instead ofFormuladirectly, destructure that wrapper first and then match on itsformulafield.
- If your normalized type is something like
-
SearchPrincipal and span type
- Replace the
SearchPrincipalliteral with the real struct fields required in your codebase (e.g.rule,language, etc.). - Replace
Span::from_offset(0, 10)with the constructor or literal you normally use for rule spans (e.g.Span { start: 0, end: 10 },RuleSpan::new(0, 10), orTextRange::new(...)). - Adjust the field name
spanonSearchPrincipalandnormalizedif your struct uses a different name (e.g.rule_spanorrange).
- Replace the
-
Import paths
- Ensure the test module has
usestatements forMatchFormula,Formula,WhereClause,SearchPrincipal,Span(or your span type),normalize_v2, andnormalize_search_principalfrom the correct modules.
- Ensure the test module has
| } | ||
|
|
||
| #[test] | ||
| fn legacy_pattern_not_inside_normalizes_to_not_inside() { |
There was a problem hiding this comment.
suggestion (testing): Consider adding tests for legacy pattern-either and semgrep-internal-pattern-anywhere normalization.
Current legacy tests don’t cover LegacyFormula::PatternEither or LegacyFormula::Anywhere (for semgrep-internal-pattern-anywhere). Please add tests that
- verify
pattern-either: [...]normalizes toFormula::Orwith the correct children, and - verify
semgrep-internal-pattern-anywherenormalizes toFormula::Anywherewrapping the child formula,
so the legacy-to-canonical mapping is fully exercised.
Suggested implementation:
#[test]
fn legacy_pattern_not_inside_normalizes_to_not_inside() {
let result = normalize_legacy(LegacyFormula::PatternNotInside(Box::new(
LegacyValue::String(String::from("class Foo:")),
)));
match result {
Formula::Not(not_inner) => match not_inner.node {
Formula::Inside(inside_inner) => match inside_inner.node {
Formula::Atom(Atom::Pattern(pat)) => {
assert_eq!(pat.text, "class Foo:");
}
_ => panic!("expected Pattern atom"),
},
_ => panic!("expected Inside formula inside Not"),
},
_ => panic!("expected Not formula"),
}
}
#[test]
fn legacy_pattern_either_normalizes_to_or() {
let result = normalize_legacy(LegacyFormula::PatternEither(vec![
LegacyFormula::Pattern(Box::new(LegacyValue::String(String::from("first pattern")))),
LegacyFormula::Pattern(Box::new(LegacyValue::String(String::from("second pattern")))),
]));
match result {
Formula::Or(children) => {
assert_eq!(children.len(), 2, "expected two children in Or formula");
match &children[0].node {
Formula::Atom(Atom::Pattern(pat)) => {
assert_eq!(pat.text, "first pattern");
}
_ => panic!("expected first child of Or to be a Pattern atom"),
}
match &children[1].node {
Formula::Atom(Atom::Pattern(pat)) => {
assert_eq!(pat.text, "second pattern");
}
_ => panic!("expected second child of Or to be a Pattern atom"),
}
}
_ => panic!("expected Or formula from legacy pattern-either"),
}
}
#[test]
fn legacy_anywhere_normalizes_to_anywhere() {
// This corresponds to legacy `semgrep-internal-pattern-anywhere` with a single pattern child.
let result = normalize_legacy(LegacyFormula::Anywhere(Box::new(
LegacyFormula::Pattern(Box::new(LegacyValue::String(String::from(
"pattern anywhere",
)))),
)));
match result {
Formula::Anywhere(inner) => match inner.node {
Formula::Atom(Atom::Pattern(pat)) => {
assert_eq!(pat.text, "pattern anywhere");
}
_ => panic!("expected Pattern atom inside Anywhere"),
},
_ => panic!("expected Anywhere formula from semgrep-internal-pattern-anywhere"),The above edits assume the following shapes, which you should confirm in your codebase and adjust if necessary:
LegacyFormula::PatternEithertakes aVec<LegacyFormula>. If it instead takes aVec<LegacyValue>or a different wrapper type, wrap the elements accordingly in the test.LegacyFormula::Patternis constructed asLegacyFormula::Pattern(Box<LegacyValue::String(...)>). If the constructor signature differs, update the test calls.LegacyFormula::Anywhereis defined as wrapping aBox<LegacyFormula>. If it instead wraps aLegacyValuedirectly, change the argument toAnywhereto match.Formula::Oris assumed to contain aVec<Spanned<Formula>>(or similar) where you access the formula via.node. If its inner type is different, update the pattern match onFormula::Or(children)and indexing ofchildren[0]/children[1]accordingly.
Once these shapes are confirmed and aligned, the tests will fully cover normalization from legacypattern-eitherandsemgrep-internal-pattern-anywheretoFormula::OrandFormula::Anywhere, respectively.
| } | ||
|
|
||
| #[test] | ||
| fn valid_or_with_positive_branches_passes() { |
There was a problem hiding this comment.
suggestion (testing): Add a sanity test that a single positive atom formula passes validation untouched.
Current tests cover And/Or combinations, but not the simplest case. Please add a test where the root is a single Atom (e.g. make_pattern("foo")) and assert that validate_formula returns Ok(()) to guard against future regressions on trivially valid formulas.
|
|
||
| ## Context and orientation | ||
|
|
||
| The Sempai query pipeline currently stops at a `NOT_IMPLEMENTED` placeholder |
There was a problem hiding this comment.
suggestion (review_instructions): ASTs are referenced later in this section without defining the acronym.
In the paragraph beginning "The parser produces LegacyFormula and MatchFormula ASTs", please expand AST on first use, for example "abstract syntax trees (ASTs)".
Review instructions:
Path patterns: **/*.md
Instructions:
Define uncommon acronyms on first use.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b41c038c28
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| file.rules() | ||
| .iter() | ||
| .filter_map(|rule| { | ||
| if let RulePrincipal::Search(principal) = rule.principal() { | ||
| Some((rule, principal)) |
There was a problem hiding this comment.
Align BDD scenarios with new compile_yaml success path
This change makes compile_yaml return Ok(Vec<QueryPlan>) for valid search rules, but the behavioural scenarios in crates/sempai/tests/features/sempai_engine.feature still expect NOT_IMPLEMENTED for valid YAML and r2c-internal-project-depends-on. With this new path, those scenarios now hit then_compilation_fails on successful results and fail the behaviour test suite, so CI will break unless the feature expectations are updated to assert successful compilation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/sempai/src/tests/engine_tests.rs (1)
68-98: 🛠️ Refactor suggestion | 🟠 MajorCollapse duplicated compile-yaml normalisation tests into one parameterized test.
Unify the two near-identical tests into a single
#[rstest]case table and keep one assertion body.♻️ Proposed refactor
+use rstest::rstest; use crate::engine::QueryPlan; use crate::{ Diagnostic, DiagnosticCode, DiagnosticReport, Engine, EngineConfig, EngineLimits, Language, }; use sempai_core::formula::{Atom, Decorated, Formula, PatternAtom}; @@ -#[test] -fn compile_yaml_normalizes_and_returns_query_plans() { - let engine = default_engine(); - let result = engine.compile_yaml( - "rules:\n - id: demo.rule\n message: oops\n languages: [rust]\n severity: ERROR\n pattern: foo($X)\n", - ); - let plans = result.expect("should successfully compile"); - assert_eq!(plans.len(), 1); - let plan = plans.first().expect("should have first plan"); - assert_eq!(plan.rule_id(), "demo.rule"); - assert_eq!(plan.language(), Language::Rust); -} - -#[test] -fn compile_yaml_normalizes_project_depends_on_search_rule() { - let engine = default_engine(); - let result = engine.compile_yaml(concat!( - "rules:\n", - " - id: demo.depends\n", - " message: detect vulnerable dependency\n", - " languages: [python]\n", - " severity: WARNING\n", - " r2c-internal-project-depends-on:\n", - " namespace: pypi\n", - " package: requests\n", - )); - let plans = result.expect("should successfully compile"); - assert_eq!(plans.len(), 1); - let plan = plans.first().expect("should have first plan"); - assert_eq!(plan.rule_id(), "demo.depends"); - assert_eq!(plan.language(), Language::Python); -} +#[rstest] +#[case( + "rules:\n - id: demo.rule\n message: oops\n languages: [rust]\n severity: ERROR\n pattern: foo($X)\n", + "demo.rule", + Language::Rust +)] +#[case( + concat!( + "rules:\n", + " - id: demo.depends\n", + " message: detect vulnerable dependency\n", + " languages: [python]\n", + " severity: WARNING\n", + " r2c-internal-project-depends-on:\n", + " namespace: pypi\n", + " package: requests\n", + ), + "demo.depends", + Language::Python +)] +fn compile_yaml_normalizes_and_returns_query_plans( + #[case] yaml: &str, + #[case] expected_rule_id: &str, + #[case] expected_language: Language, +) { + let engine = default_engine(); + let plans = engine + .compile_yaml(yaml) + .expect("should successfully compile"); + assert_eq!(plans.len(), 1); + let plan = plans.first().expect("should have first plan"); + assert_eq!(plan.rule_id(), expected_rule_id); + assert_eq!(plan.language(), expected_language); +}As per coding guidelines, "Use
rstestfixtures for shared setup in Rust tests" and "Replace duplicated tests with#[rstest(...)]parameterised cases in Rust."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sempai/src/tests/engine_tests.rs` around lines 68 - 98, Combine the two tests compile_yaml_normalizes_and_returns_query_plans and compile_yaml_normalizes_project_depends_on_search_rule into one parameterized rstest that iterates cases of (yaml, expected_rule_id, expected_language); reuse the same default_engine(), call engine.compile_yaml(yaml) and the shared assertions plan.rule_id() == expected_rule_id and plan.language() == expected_language (e.g., Language::Rust and Language::Python) inside the single test body to remove duplication; annotate the new test with #[rstest] and provide case table entries containing the two existing YAML strings and their expected values, keeping the same calls to default_engine and engine.compile_yaml.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai/src/normalize.rs`:
- Around line 252-264: The current normalize_dependency_principal uses a
TreeSitterQueryAtom with query "(ERROR) `@_dependency_check`" which can match real
ERROR nodes; change the query to use a node type that can never exist (e.g.
"__NONEXISTENT_NODE__") so the pattern is truly non-matchable: in
normalize_dependency_principal replace the query string in TreeSitterQueryAtom
(the call inside bare(...) that constructs
Formula::Atom(Atom::TreeSitterQuery(TreeSitterQueryAtom { query: ... }))) with a
deliberately impossible node name like "(__NONEXISTENT_NODE__)
`@_dependency_check`" (and update the inline comment to explain why).
In `@crates/sempai/src/semantic_check.rs`:
- Around line 48-76: The current check_no_not_in_or() only rejects a
Formula::Not at the branch root, allowing nested negations like Or([And([Atom,
Not])]) to slip through; add a helper fn branch_contains_not(formula: &Formula)
-> bool that recursively returns true for Formula::Not and traverses
Formula::And, Formula::Or, Formula::Inside, and Formula::Anywhere branches
(false for Atom), then in check_no_not_in_or replace the matches!(branch.node,
Formula::Not(_)) test with if branch_contains_not(&branch.node) to reject any
branch that contains a negation anywhere and keep the existing error
construction using branch.span or the outer span.
- Around line 83-125: The current is_positive_term wrongly treats any And/Or as
positive by shape; change is_positive_term (and remove const) to recursively
inspect And/Or branches so it returns true only if an Atom exists in its
descendants (i.e., match on Formula::Atom => true; Formula::And(branches) |
Formula::Or(branches) => branches.iter().any(|b| is_positive_term(&b.node));
Formula::Not | Formula::Inside | Formula::Anywhere => false), leaving
check_positive_term_in_and as-is to call the new function.
In `@docs/execplans/4-1-5-normalization-into-canonical-formula-model.md`:
- Around line 56-66: The doc text is out of sync: update the ExecPlan
description to reflect that semantic validation now lives in
crates/sempai/src/semantic_check.rs (not in sempai_core), and change the
Decorated<T> field names to match the code: where_clauses, as_name, fix, and
span: Option<SourceSpan>; also mention that sempai_core still owns Formula,
Atom, Decorated, WhereClause types, sempai_yaml owns LegacyFormula/MatchFormula,
and sempai (facade) wires parsing→normalization→validation→plan construction;
apply the same corrections to the repeated block referenced around lines
212-218.
---
Outside diff comments:
In `@crates/sempai/src/tests/engine_tests.rs`:
- Around line 68-98: Combine the two tests
compile_yaml_normalizes_and_returns_query_plans and
compile_yaml_normalizes_project_depends_on_search_rule into one parameterized
rstest that iterates cases of (yaml, expected_rule_id, expected_language); reuse
the same default_engine(), call engine.compile_yaml(yaml) and the shared
assertions plan.rule_id() == expected_rule_id and plan.language() ==
expected_language (e.g., Language::Rust and Language::Python) inside the single
test body to remove duplication; annotate the new test with #[rstest] and
provide case table entries containing the two existing YAML strings and their
expected values, keeping the same calls to default_engine and
engine.compile_yaml.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0888edfc-62dc-453b-bade-07480e3c886a
📒 Files selected for processing (16)
crates/sempai-core/Cargo.tomlcrates/sempai-core/src/formula.rscrates/sempai-core/src/lib.rscrates/sempai/src/engine.rscrates/sempai/src/lib.rscrates/sempai/src/normalize.rscrates/sempai/src/semantic_check.rscrates/sempai/src/tests/behaviour.rscrates/sempai/src/tests/engine_tests.rscrates/sempai/src/tests/mod.rscrates/sempai/src/tests/normalization_tests.rscrates/sempai/src/tests/semantic_validation_tests.rsdocs/execplans/4-1-5-normalization-into-canonical-formula-model.mddocs/roadmap.mddocs/sempai-query-language-design.mddocs/users-guide.md
| match formula { | ||
| Formula::Or(branches) => { | ||
| for branch in branches { | ||
| if matches!(branch.node, Formula::Not(_)) { | ||
| return Err(DiagnosticReport::validation_error( | ||
| DiagnosticCode::ESempaiInvalidNotInOr, | ||
| String::from( | ||
| "negated terms are not allowed inside disjunction (Or/pattern-either)", | ||
| ), | ||
| branch.span.clone().or_else(|| span.cloned()), | ||
| vec![], | ||
| )); | ||
| } | ||
| // Recursively check nested formulas | ||
| check_no_not_in_or(&branch.node, branch.span.as_ref().or(span))?; | ||
| } | ||
| Ok(()) | ||
| } | ||
| Formula::And(branches) => { | ||
| for branch in branches { | ||
| check_no_not_in_or(&branch.node, branch.span.as_ref().or(span))?; | ||
| } | ||
| Ok(()) | ||
| } | ||
| Formula::Not(inner) | Formula::Inside(inner) | Formula::Anywhere(inner) => { | ||
| check_no_not_in_or(&inner.node, inner.span.as_ref().or(span)) | ||
| } | ||
| Formula::Atom(_) => Ok(()), | ||
| } |
There was a problem hiding this comment.
Reject negation anywhere inside an Or branch.
Fail the whole branch, not just Formula::Not at the top level. Or([And([Atom(..), Not(..)]), Atom(..)]) currently passes because the recursion drops the disjunction context before it reaches the nested Not.
Patch sketch
+fn branch_contains_not(formula: &Formula) -> bool {
+ match formula {
+ Formula::Not(_) => true,
+ Formula::And(branches) | Formula::Or(branches) => {
+ branches.iter().any(|branch| branch_contains_not(&branch.node))
+ }
+ Formula::Inside(inner) | Formula::Anywhere(inner) => {
+ branch_contains_not(&inner.node)
+ }
+ Formula::Atom(_) => false,
+ }
+}
+
fn check_no_not_in_or(
formula: &Formula,
span: Option<&sempai_core::SourceSpan>,
) -> Result<(), DiagnosticReport> {
match formula {
Formula::Or(branches) => {
for branch in branches {
- if matches!(branch.node, Formula::Not(_)) {
+ if branch_contains_not(&branch.node) {
return Err(DiagnosticReport::validation_error(
DiagnosticCode::ESempaiInvalidNotInOr,
String::from(
"negated terms are not allowed inside disjunction (Or/pattern-either)",
),
branch.span.clone().or_else(|| span.cloned()),
vec![],
));
}
check_no_not_in_or(&branch.node, branch.span.as_ref().or(span))?;
}
Ok(())
}Based on learnings: "Semantic validation of normalized formulas must emit E_SEMPAI_INVALID_NOT_IN_OR when negated terms appear in disjunction branches."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sempai/src/semantic_check.rs` around lines 48 - 76, The current
check_no_not_in_or() only rejects a Formula::Not at the branch root, allowing
nested negations like Or([And([Atom, Not])]) to slip through; add a helper fn
branch_contains_not(formula: &Formula) -> bool that recursively returns true for
Formula::Not and traverses Formula::And, Formula::Or, Formula::Inside, and
Formula::Anywhere branches (false for Atom), then in check_no_not_in_or replace
the matches!(branch.node, Formula::Not(_)) test with if
branch_contains_not(&branch.node) to reject any branch that contains a negation
anywhere and keep the existing error construction using branch.span or the outer
span.
Address code review comments on the normalization implementation: Improvements to semantic_check.rs: - MissingPositiveTermInAnd diagnostic now prefers the first branch's span for a more precise error location, falling back to the enclosing And's span only when no branch has one. New tests in engine_tests.rs: - compile_yaml_returns_semantic_error_for_invalid_not_in_or: verifies that pattern-either containing a pattern-not surfaces as ESempaiInvalidNotInOr through the public API. - compile_yaml_returns_semantic_error_for_missing_positive_term_in_and: verifies that patterns with only constraint-style terms surface as ESempaiMissingPositiveTermInAnd through the public API. New tests in normalization_tests.rs: - legacy_pattern_either_normalizes_to_or: exercises pattern-either legacy-to-canonical mapping. - legacy_anywhere_normalizes_to_anywhere: exercises semgrep-internal-pattern-anywhere legacy-to-canonical mapping. - legacy_patterns_propagates_constraints_to_where_clauses: verifies that LegacyFormula::Patterns constraints are mapped onto the enclosing And's where_clauses, with child formulas having none. - v2_decorated_preserves_where_as_and_fix_metadata: verifies that MatchFormula::Decorated metadata (where, as, fix) is preserved on the canonical Decorated wrapper. - span_propagates_from_search_principal_to_decorated: verifies that the rule_span is propagated through normalization. New test in semantic_validation_tests.rs: - single_positive_atom_passes_validation: sanity check that the simplest valid formula (a single Atom) passes validate_formula. Documentation fixes: - Normalize "normalisation" to "normalization" in the ExecPlan to match the document's title and surrounding text. - Expand ASTs acronym on first use. Test count: 33 -> 41 passing tests in sempai crate. All clippy, rustfmt, markdownlint, and cargo test checks pass. Note: the code review also flagged a potential Eq-derive issue on WhereClause; this is a false positive as serde_json::Value implements Eq. The existing derive compiles cleanly and all workspace lints pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/execplans/4-1-5-normalization-into-canonical-formula-model.md`:
- Around line 584-595: The docs show an outdated Decorated<T> interface; update
the "Interfaces and dependencies" section and any other occurrences to match the
real struct fields: ensure the struct is documented with fields node,
where_clauses, as_name, fix, and span (matching the implemented Decorated<T>),
and remove or rename any doc references to old field names so the documentation
reflects the shipped API exactly (search for Decorated<T>, where_clauses,
as_name, fix, span to locate all places to change).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 784bdbf5-23c5-466f-8d60-f529b92e5629
📒 Files selected for processing (5)
crates/sempai/src/semantic_check.rscrates/sempai/src/tests/engine_tests.rscrates/sempai/src/tests/normalization_tests.rscrates/sempai/src/tests/semantic_validation_tests.rsdocs/execplans/4-1-5-normalization-into-canonical-formula-model.md
Normalization has no real error paths today, so wrapping the result in Result<_, DiagnosticReport> required an `#[expect(clippy::unnecessary_wraps)]` escape hatch and forced every caller to use `?`. Addressing review feedback, return a bare `Decorated<Formula>` instead and document that the signature should grow a Result only when a mapping first needs to signal a structural error. Callers in engine.rs and the normalization tests are updated to drop the `?` / `.expect(...)` noise. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
crates/sempai/src/normalize.rs (1)
252-258:⚠️ Potential issue | 🟠 MajorReplace the matchable
(ERROR)dependency placeholder.Fix Line 257. Stop emitting a query that can match parser recovery nodes in malformed source. Encode a guaranteed non-match placeholder path instead of a live query shape.
Safer placeholder strategy (non-match by contradiction)
- Formula::Atom(Atom::TreeSitterQuery(TreeSitterQueryAtom { - query: String::from("(ERROR) `@_dependency_check`"), - })), + Formula::Atom(Atom::TreeSitterQuery(TreeSitterQueryAtom { + // Keep this syntactically valid while making it semantically impossible to match. + query: String::from( + "(ERROR) `@_dependency_check` (`#eq`? `@_dependency_check` \"__sempai_never__\") (`#eq`? `@_dependency_check` \"__sempai_also_never__\")", + ), + })),In Tree-sitter query semantics, does `(ERROR) `@cap`` match parse-recovery ERROR nodes in malformed syntax trees? Are multiple predicates on the same capture conjunctive, such that contradictory `#eq?` predicates force a no-match result?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sempai/src/normalize.rs` around lines 252 - 258, Replace the live Tree-sitter query string "(ERROR) `@_dependency_check`" with a guaranteed non-matching placeholder so the rule cannot accidentally match parser-recovery nodes; update the TreeSitterQueryAtom creation in the bare(Formula::Atom(Atom::TreeSitterQuery(TreeSitterQueryAtom { query: ... }))) expression to use a non-existent node name or an explicitly contradictory predicate (for example a node type like "__never_node__" or a capture with a contradiction) instead of "(ERROR) `@_dependency_check`" so the query is impossible to satisfy at parse time.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai/src/engine.rs`:
- Around line 17-39: The code currently clones Decorated<Formula> for each
language in compile_rule_plans; change QueryPlan to hold Arc<Decorated<Formula>>
and update compile_rule_plans to wrap the incoming formula in Arc::new(...) once
and clone the Arc for each QueryPlan instead of deep-cloning the formula. Update
other call sites mentioned (the blocks at the other two places referenced) to
construct QueryPlan with Arc<Decorated<Formula>> as well, and adjust
signature/usages of QueryPlan::new and any fields to accept
Arc<Decorated<Formula>> so the formula is shared across per-language plans.
In `@crates/sempai/src/normalize.rs`:
- Line 25: Update the malformed module doc bullet that currently reads //! -
`regex: "...` → `Formula::Atom(Atom::Regex(...))` by closing the inline code
literal (change it to //! - `regex: "..."` → `Formula::Atom(Atom::Regex(...))`)
so the backtick pair is balanced; locate the module-level doc comment containing
that snippet in normalize.rs and fix the punctuation to keep generated docs
syntactically valid.
In `@crates/sempai/src/tests/normalization_tests.rs`:
- Around line 260-267: Add a new unit test that constructs a
SearchQueryPrincipal::ProjectDependsOn variant and calls
normalize_search_principal with a non‑None SourceSpan to ensure the
ProjectDependsOn branch is exercised; assert the returned normalized principal
has the canonical shape you expect for ProjectDependsOn (match the same
structural form used elsewhere in tests) and that normalized.span == Some(span)
to lock metadata propagation for this branch. Locate the normalization
entrypoint normalize_search_principal and mirror the style of the existing
span_propagates_from_search_principal_to_decorated test: create a SourceSpan,
build a ProjectDependsOn principal value, call
normalize_search_principal(&principal, Some(&span)), then add explicit
assertions for both the normalized principal variant/contents and the span
propagation.
- Around line 37-154: Multiple small tests repeat the same
normalize_legacy/normalize_v2 assertions; replace these duplicated single-case
tests (e.g., legacy_pattern_normalizes_to_atom,
legacy_pattern_regex_normalizes_to_regex_atom,
v2_pattern_shorthand_normalizes_to_atom, v2_regex_normalizes_to_regex_atom,
v2_all_normalizes_to_and, v2_any_normalizes_to_or, v2_not_normalizes_to_not,
v2_inside_normalizes_to_inside, v2_anywhere_normalizes_to_anywhere) with
parameterized rstest table cases that iterate inputs
(LegacyFormula::Pattern/PatternRegex/... and
MatchFormula::Pattern/Regex/All/Any/Not/Inside/Anywhere) and expected normalized
predicate types from normalize_legacy and normalize_v2; keep the
nested-structure tests (like legacy_pattern_not_regex_normalizes_to_not_regex
and legacy_pattern_not_inside_normalizes_to_not_inside) as bespoke tests. Use
#[rstest] with case(...) entries and assert matches!(result, ...) or equality as
appropriate, referencing normalize_legacy and normalize_v2 in the fixture.
---
Duplicate comments:
In `@crates/sempai/src/normalize.rs`:
- Around line 252-258: Replace the live Tree-sitter query string "(ERROR)
`@_dependency_check`" with a guaranteed non-matching placeholder so the rule
cannot accidentally match parser-recovery nodes; update the TreeSitterQueryAtom
creation in the bare(Formula::Atom(Atom::TreeSitterQuery(TreeSitterQueryAtom {
query: ... }))) expression to use a non-existent node name or an explicitly
contradictory predicate (for example a node type like "__never_node__" or a
capture with a contradiction) instead of "(ERROR) `@_dependency_check`" so the
query is impossible to satisfy at parse time.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c1d318bc-36dd-49fe-aafc-3c221f1544a6
📒 Files selected for processing (3)
crates/sempai/src/engine.rscrates/sempai/src/normalize.rscrates/sempai/src/tests/normalization_tests.rs
| fn compile_rule_plans( | ||
| rule: &Rule, | ||
| formula: &Decorated<Formula>, | ||
| ) -> Result<Vec<QueryPlan>, DiagnosticReport> { | ||
| rule.languages() | ||
| .iter() | ||
| .map(|lang_str| { | ||
| let language = lang_str.parse::<Language>().map_err(|e| { | ||
| DiagnosticReport::validation_error( | ||
| DiagnosticCode::ESempaiSchemaInvalid, | ||
| format!("unsupported language '{lang_str}': {e}"), | ||
| rule.rule_span().cloned(), | ||
| vec![], | ||
| ) | ||
| })?; | ||
| Ok(QueryPlan::new( | ||
| rule.id().to_owned(), | ||
| language, | ||
| formula.clone(), | ||
| )) | ||
| }) | ||
| .collect() | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Share canonical formula instances across per-language plans.
Stop deep-cloning Decorated<Formula> for every language. Store the formula as Arc<Decorated<Formula>> in QueryPlan and clone the Arc in compile_rule_plans.
Also applies to: 58-63, 72-77
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sempai/src/engine.rs` around lines 17 - 39, The code currently clones
Decorated<Formula> for each language in compile_rule_plans; change QueryPlan to
hold Arc<Decorated<Formula>> and update compile_rule_plans to wrap the incoming
formula in Arc::new(...) once and clone the Arc for each QueryPlan instead of
deep-cloning the formula. Update other call sites mentioned (the blocks at the
other two places referenced) to construct QueryPlan with Arc<Decorated<Formula>>
as well, and adjust signature/usages of QueryPlan::new and any fields to accept
Arc<Decorated<Formula>> so the formula is shared across per-language plans.
| //! | ||
| //! - `"..."` (string shorthand) → `Formula::Atom(Atom::Pattern(...))` | ||
| //! - `pattern: "..."` → `Formula::Atom(Atom::Pattern(...))` | ||
| //! - `regex: "...` → `Formula::Atom(Atom::Regex(...))` |
There was a problem hiding this comment.
Fix the malformed v2 regex doc bullet.
Correct Line 25 to close the inline literal (\regex: "..."``). Keep module docs syntactically clean for generated documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sempai/src/normalize.rs` at line 25, Update the malformed module doc
bullet that currently reads //! - `regex: "...` →
`Formula::Atom(Atom::Regex(...))` by closing the inline code literal (change it
to //! - `regex: "..."` → `Formula::Atom(Atom::Regex(...))`) so the backtick
pair is balanced; locate the module-level doc comment containing that snippet in
normalize.rs and fix the punctuation to keep generated docs syntactically valid.
| #[test] | ||
| fn legacy_pattern_normalizes_to_atom() { | ||
| let result = normalize_legacy(LegacyFormula::Pattern(String::from("foo($X)"))); | ||
| assert_eq!( | ||
| result, | ||
| Formula::Atom(Atom::Pattern(PatternAtom { | ||
| text: String::from("foo($X)") | ||
| })) | ||
| ); | ||
| } | ||
|
|
||
| #[test] | ||
| fn legacy_pattern_regex_normalizes_to_regex_atom() { | ||
| let result = normalize_legacy(LegacyFormula::PatternRegex(String::from(r"foo_\d+"))); | ||
| assert_eq!( | ||
| result, | ||
| Formula::Atom(Atom::Regex(RegexAtom { | ||
| pattern: String::from(r"foo_\d+") | ||
| })) | ||
| ); | ||
| } | ||
|
|
||
| #[test] | ||
| fn legacy_pattern_not_regex_normalizes_to_not_regex() { | ||
| let result = normalize_legacy(LegacyFormula::PatternNotRegex(String::from(r"bar.*"))); | ||
| match result { | ||
| Formula::Not(inner) => match inner.node { | ||
| Formula::Atom(Atom::Regex(regex)) => { | ||
| assert_eq!(regex.pattern, r"bar.*"); | ||
| } | ||
| _ => panic!("expected Regex atom inside Not"), | ||
| }, | ||
| _ => panic!("expected Not formula"), | ||
| } | ||
| } | ||
|
|
||
| #[test] | ||
| fn legacy_pattern_not_inside_normalizes_to_not_inside() { | ||
| let result = normalize_legacy(LegacyFormula::PatternNotInside(Box::new( | ||
| LegacyValue::String(String::from("class Foo:")), | ||
| ))); | ||
| match result { | ||
| Formula::Not(not_inner) => match not_inner.node { | ||
| Formula::Inside(inside_inner) => match inside_inner.node { | ||
| Formula::Atom(Atom::Pattern(pat)) => { | ||
| assert_eq!(pat.text, "class Foo:"); | ||
| } | ||
| _ => panic!("expected Pattern atom"), | ||
| }, | ||
| _ => panic!("expected Inside formula inside Not"), | ||
| }, | ||
| _ => panic!("expected Not formula"), | ||
| } | ||
| } | ||
|
|
||
| #[test] | ||
| fn v2_pattern_shorthand_normalizes_to_atom() { | ||
| let result = normalize_v2(MatchFormula::Pattern(String::from("bar($Y)"))); | ||
| assert_eq!( | ||
| result, | ||
| Formula::Atom(Atom::Pattern(PatternAtom { | ||
| text: String::from("bar($Y)") | ||
| })) | ||
| ); | ||
| } | ||
|
|
||
| #[test] | ||
| fn v2_regex_normalizes_to_regex_atom() { | ||
| let result = normalize_v2(MatchFormula::Regex(String::from(r"baz_\w+"))); | ||
| assert_eq!( | ||
| result, | ||
| Formula::Atom(Atom::Regex(RegexAtom { | ||
| pattern: String::from(r"baz_\w+") | ||
| })) | ||
| ); | ||
| } | ||
|
|
||
| #[test] | ||
| fn v2_all_normalizes_to_and() { | ||
| let result = normalize_v2(MatchFormula::All(vec![ | ||
| MatchFormula::Pattern(String::from("foo")), | ||
| MatchFormula::Pattern(String::from("bar")), | ||
| ])); | ||
| assert!(matches!(result, Formula::And(ref branches) if branches.len() == 2)); | ||
| } | ||
|
|
||
| #[test] | ||
| fn v2_any_normalizes_to_or() { | ||
| let result = normalize_v2(MatchFormula::Any(vec![ | ||
| MatchFormula::Pattern(String::from("foo")), | ||
| MatchFormula::Pattern(String::from("bar")), | ||
| ])); | ||
| assert!(matches!(result, Formula::Or(ref branches) if branches.len() == 2)); | ||
| } | ||
|
|
||
| #[test] | ||
| fn v2_not_normalizes_to_not() { | ||
| let result = normalize_v2(MatchFormula::Not(Box::new(MatchFormula::Pattern( | ||
| String::from("baz"), | ||
| )))); | ||
| assert!(matches!(result, Formula::Not(_))); | ||
| } | ||
|
|
||
| #[test] | ||
| fn v2_inside_normalizes_to_inside() { | ||
| let result = normalize_v2(MatchFormula::Inside(Box::new(MatchFormula::Pattern( | ||
| String::from("class X:"), | ||
| )))); | ||
| assert!(matches!(result, Formula::Inside(_))); | ||
| } | ||
|
|
||
| #[test] | ||
| fn v2_anywhere_normalizes_to_anywhere() { | ||
| let result = normalize_v2(MatchFormula::Anywhere(Box::new(MatchFormula::Pattern( | ||
| String::from("unsafe"), | ||
| )))); | ||
| assert!(matches!(result, Formula::Anywhere(_))); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Parameterize duplicated normalisation tests with rstest.
Replace the repeated single-case tests with #[rstest] table cases and keep bespoke tests only for nested-structure assertions.
Refactor pattern to remove duplication
+use rstest::rstest;
+
#[test]
-fn v2_pattern_shorthand_normalizes_to_atom() {
- let result = normalize_v2(MatchFormula::Pattern(String::from("bar($Y)")));
- assert_eq!(
- result,
- Formula::Atom(Atom::Pattern(PatternAtom {
- text: String::from("bar($Y)")
- }))
- );
-}
-
-#[test]
-fn v2_regex_normalizes_to_regex_atom() {
- let result = normalize_v2(MatchFormula::Regex(String::from(r"baz_\w+")));
- assert_eq!(
- result,
- Formula::Atom(Atom::Regex(RegexAtom {
- pattern: String::from(r"baz_\w+")
- }))
- );
+#[rstest]
+#[case(
+ MatchFormula::Pattern(String::from("bar($Y)")),
+ Formula::Atom(Atom::Pattern(PatternAtom { text: String::from("bar($Y)") }))
+)]
+#[case(
+ MatchFormula::Regex(String::from(r"baz_\w+")),
+ Formula::Atom(Atom::Regex(RegexAtom { pattern: String::from(r"baz_\w+") }))
+)]
+fn v2_leaf_forms_normalize(#[case] input: MatchFormula, #[case] expected: Formula) {
+ let result = normalize_v2(input);
+ assert_eq!(result, expected);
}As per coding guidelines: "Use rstest fixtures for shared setup in Rust tests." and "Replace duplicated tests with #[rstest(...)] parameterised cases in Rust."
Also applies to: 156-193
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sempai/src/tests/normalization_tests.rs` around lines 37 - 154,
Multiple small tests repeat the same normalize_legacy/normalize_v2 assertions;
replace these duplicated single-case tests (e.g.,
legacy_pattern_normalizes_to_atom,
legacy_pattern_regex_normalizes_to_regex_atom,
v2_pattern_shorthand_normalizes_to_atom, v2_regex_normalizes_to_regex_atom,
v2_all_normalizes_to_and, v2_any_normalizes_to_or, v2_not_normalizes_to_not,
v2_inside_normalizes_to_inside, v2_anywhere_normalizes_to_anywhere) with
parameterized rstest table cases that iterate inputs
(LegacyFormula::Pattern/PatternRegex/... and
MatchFormula::Pattern/Regex/All/Any/Not/Inside/Anywhere) and expected normalized
predicate types from normalize_legacy and normalize_v2; keep the
nested-structure tests (like legacy_pattern_not_regex_normalizes_to_not_regex
and legacy_pattern_not_inside_normalizes_to_not_inside) as bespoke tests. Use
#[rstest] with case(...) entries and assert matches!(result, ...) or equality as
appropriate, referencing normalize_legacy and normalize_v2 in the fixture.
| #[test] | ||
| fn span_propagates_from_search_principal_to_decorated() { | ||
| let span = SourceSpan::new(5, 42, Some(String::from("file:///rule.yaml"))); | ||
| let principal = SearchQueryPrincipal::Match(MatchFormula::Pattern(String::from("foo($X)"))); | ||
| let normalized = normalize_search_principal(&principal, Some(&span)); | ||
|
|
||
| assert_eq!(normalized.span, Some(span)); | ||
| } |
There was a problem hiding this comment.
Add explicit coverage for the ProjectDependsOn normalisation branch.
Add a regression test that constructs SearchQueryPrincipal::ProjectDependsOn(...) and asserts the canonical output shape and metadata propagation. Lock this branch with a dedicated assertion to prevent silent regressions in normalize_search_principal dispatch coverage.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sempai/src/tests/normalization_tests.rs` around lines 260 - 267, Add a
new unit test that constructs a SearchQueryPrincipal::ProjectDependsOn variant
and calls normalize_search_principal with a non‑None SourceSpan to ensure the
ProjectDependsOn branch is exercised; assert the returned normalized principal
has the canonical shape you expect for ProjectDependsOn (match the same
structural form used elsewhere in tests) and that normalized.span == Some(span)
to lock metadata propagation for this branch. Locate the normalization
entrypoint normalize_search_principal and mirror the style of the existing
span_propagates_from_search_principal_to_decorated test: create a SourceSpan,
build a ProjectDependsOn principal value, call
normalize_search_principal(&principal, Some(&span)), then add explicit
assertions for both the normalized principal variant/contents and the span
propagation.
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Code Duplicationcrates/sempai/src/tests/engine_tests.rs: What lead to degradation?The module contains 2 functions with similar structure: compile_yaml_returns_semantic_error_for_invalid_not_in_or,compile_yaml_returns_semantic_error_for_missing_positive_term_in_and Why does this problem occur?Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health. How to fix it?A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More |
This comment was marked as resolved.
This comment was marked as resolved.
…l success - Semantic validation: make `is_positive_term` recursive so that shapes such as `And[Or[Inside[Atom]]]` are correctly flagged as `MissingPositiveTermInAnd` instead of being accepted on the basis of the outer combinator shape alone. Add a targeted test for the `And[Or[Inside]]` case. - Normalization: replace the dependency-principal placeholder query `(ERROR) @_dependency_check` with `(__NONEXISTENT_NODE__) @_dependency_check` so it cannot match real Tree-sitter ERROR nodes. - BDD: update `sempai_engine.feature` to assert that valid search rules (including `r2c-internal-project-depends-on`) now compile successfully with a single query plan, add the matching `Then` steps, and switch the test world's `compile_result` to `Result<Vec<QueryPlan>, DiagnosticReport>`. - Engine tests: collapse the two `compile_yaml_*success*` cases into a single `#[rstest]` parameterized test covering both the legacy pattern and the project-depends-on search rule. - Docs: correct the ExecPlan to describe `Decorated<T>` with the shipped field names (`where_clauses`, `as_name`, `fix`, `span: Option<SourceSpan>`), and make explicit that normalization lives in `crates/sempai/src/normalize.rs` and semantic validation in `crates/sempai/src/semantic_check.rs`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
crates/sempai/src/normalize.rs (1)
25-25:⚠️ Potential issue | 🟡 MinorFix the malformed v2 regex doc bullet.
Close the inline literal: change
`regex: "...`to`regex: "..."`.Proposed fix
-//! - `regex: "...` → `Formula::Atom(Atom::Regex(...))` +//! - `regex: "..."` → `Formula::Atom(Atom::Regex(...))`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sempai/src/normalize.rs` at line 25, The doc comment in crates/sempai/src/normalize.rs contains a malformed inline literal in the v2 regex bullet (`regex: "...`); update that inline backtick sequence to close properly so it reads `regex: "..."` (i.e., change the broken `` `regex: "..."`` to `` `regex: "..."` ``) to fix the documentation formatting.crates/sempai/src/semantic_check.rs (1)
48-65:⚠️ Potential issue | 🟠 MajorNested negations inside
Orbranches slip through validation.The current check at line 51 only detects
Formula::Notat the immediate branch level. A structure likeOr([And([Atom(..), Not(..)])])passes because the recursion at line 62 drops the disjunction context before reaching the nestedNot.Add a helper that recursively searches for any
Notwithin a branch before rejecting it.Proposed fix
+fn branch_contains_not(formula: &Formula) -> bool { + match formula { + Formula::Not(_) => true, + Formula::And(branches) | Formula::Or(branches) => { + branches.iter().any(|branch| branch_contains_not(&branch.node)) + } + Formula::Inside(inner) | Formula::Anywhere(inner) => branch_contains_not(&inner.node), + Formula::Atom(_) => false, + } +} + fn check_no_not_in_or( formula: &Formula, span: Option<&sempai_core::SourceSpan>, ) -> Result<(), DiagnosticReport> { match formula { Formula::Or(branches) => { for branch in branches { - if matches!(branch.node, Formula::Not(_)) { + if branch_contains_not(&branch.node) { return Err(DiagnosticReport::validation_error(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sempai/src/semantic_check.rs` around lines 48 - 65, The current check in check_no_not_in_or only rejects immediate Formula::Not children of a Formula::Or branch but loses the "inside Or" context when it recurses, so nested Not nodes slip through; modify check_no_not_in_or (or add a small helper like contains_not_recursive) that, when called on each branch.node of a Formula::Or, walks the formula tree and returns true if any Formula::Not is found anywhere inside that branch (without clearing the "in-or" context), and then produce the existing DiagnosticReport using branch.span.clone().or_else(|| span.cloned()) if the helper finds a Not; keep using the same DiagnosticCode::ESempaiInvalidNotInOr and error message text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai/src/tests/engine_tests.rs`:
- Around line 103-135: Extract a shared test helper or parameterize the two
tests using rstest: move the common setup (default_engine(),
engine.compile_yaml(...), first_diagnostic_of_err(result)) into a helper
function like assert_compile_yaml_semantic_error(yaml: &str) -> DiagnosticCode
and call it from the existing tests, or replace both test functions
(compile_yaml_returns_semantic_error_for_invalid_not_in_or and
compile_yaml_returns_semantic_error_for_missing_positive_term_in_and) with a
single #[rstest] that provides the YAML string and expected DiagnosticCode pairs
and asserts equality using default_engine(), engine.compile_yaml(...),
first_diagnostic_of_err(result), and DiagnosticCode::ESempaiInvalidNotInOr /
DiagnosticCode::ESempaiMissingPositiveTermInAnd; ensure the helper/parametrized
test references the same symbols (default_engine, engine.compile_yaml,
first_diagnostic_of_err, DiagnosticCode) so behavior is unchanged.
In `@crates/sempai/src/tests/semantic_validation_tests.rs`:
- Around line 151-180: Add a new unit test asserting validation fails when a Not
is nested inside an And that is itself a branch of an Or (e.g., Or[ And[ Atom,
Not[...] ], ... ]). Create a test (e.g., nested_not_in_and_inside_or_fails) that
constructs a Decorated node using Formula::Or containing a Decorated
Formula::And which includes make_pattern("atom") and a Decorated Formula::Not,
call validate_formula(&formula), expect_err, and assert the first diagnostic
code equals DiagnosticCode::ESempaiInvalidNotInOr to ensure the recursive
branch_contains_not detection catches deeply nested negations.
---
Duplicate comments:
In `@crates/sempai/src/normalize.rs`:
- Line 25: The doc comment in crates/sempai/src/normalize.rs contains a
malformed inline literal in the v2 regex bullet (`regex: "...`); update that
inline backtick sequence to close properly so it reads `regex: "..."` (i.e.,
change the broken `` `regex: "..."`` to `` `regex: "..."` ``) to fix the
documentation formatting.
In `@crates/sempai/src/semantic_check.rs`:
- Around line 48-65: The current check in check_no_not_in_or only rejects
immediate Formula::Not children of a Formula::Or branch but loses the "inside
Or" context when it recurses, so nested Not nodes slip through; modify
check_no_not_in_or (or add a small helper like contains_not_recursive) that,
when called on each branch.node of a Formula::Or, walks the formula tree and
returns true if any Formula::Not is found anywhere inside that branch (without
clearing the "in-or" context), and then produce the existing DiagnosticReport
using branch.span.clone().or_else(|| span.cloned()) if the helper finds a Not;
keep using the same DiagnosticCode::ESempaiInvalidNotInOr and error message
text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7ce5cff4-a1da-49ae-ab84-e1a14bd51d32
📒 Files selected for processing (7)
crates/sempai/src/normalize.rscrates/sempai/src/semantic_check.rscrates/sempai/src/tests/behaviour.rscrates/sempai/src/tests/engine_tests.rscrates/sempai/src/tests/semantic_validation_tests.rscrates/sempai/tests/features/sempai_engine.featuredocs/execplans/4-1-5-normalization-into-canonical-formula-model.md
| #[test] | ||
| fn nested_or_in_and_with_not_fails() { | ||
| let nested_or = Decorated { | ||
| node: Formula::Or(vec![ | ||
| make_pattern("a"), | ||
| Decorated { | ||
| node: Formula::Not(Box::new(make_pattern("b"))), | ||
| where_clauses: vec![], | ||
| as_name: None, | ||
| fix: None, | ||
| span: None, | ||
| }, | ||
| ]), | ||
| where_clauses: vec![], | ||
| as_name: None, | ||
| fix: None, | ||
| span: None, | ||
| }; | ||
| let formula = Decorated { | ||
| node: Formula::And(vec![make_pattern("foo"), nested_or]), | ||
| where_clauses: vec![], | ||
| as_name: None, | ||
| fix: None, | ||
| span: None, | ||
| }; | ||
| let result = validate_formula(&formula); | ||
| let err = result.expect_err("should fail validation"); | ||
| let first = err.diagnostics().first().expect("should have diagnostic"); | ||
| assert_eq!(first.code(), DiagnosticCode::ESempaiInvalidNotInOr); | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add a test case for deeply nested Not inside Or branch.
This test validates And[foo, Or[a, Not[b]]], but the nested Not is a direct child of the Or. Once the branch_contains_not fix is applied to semantic_check.rs, add a test for Or[And[Atom, Not[...]]] to verify the recursive detection catches negations buried deeper within a branch.
#!/bin/bash
# Description: Check if any test exercises Or with And containing Not as a nested child.
# Expected: No existing test for Or([And([Atom, Not])]) pattern.
rg -n 'Formula::Or.*Formula::And.*Formula::Not' crates/sempai/src/tests/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sempai/src/tests/semantic_validation_tests.rs` around lines 151 - 180,
Add a new unit test asserting validation fails when a Not is nested inside an
And that is itself a branch of an Or (e.g., Or[ And[ Atom, Not[...] ], ... ]).
Create a test (e.g., nested_not_in_and_inside_or_fails) that constructs a
Decorated node using Formula::Or containing a Decorated Formula::And which
includes make_pattern("atom") and a Decorated Formula::Not, call
validate_formula(&formula), expect_err, and assert the first diagnostic code
equals DiagnosticCode::ESempaiInvalidNotInOr to ensure the recursive
branch_contains_not detection catches deeply nested negations.
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/sempai/src/tests/engine_tests.rs Comment on lines +121 to +134 fn compile_yaml_returns_semantic_error_for_missing_positive_term_in_and() {
let engine = default_engine();
let result = engine.compile_yaml(concat!(
"rules:\n",
" - id: demo.missing.positive.term.in.and\n",
" message: missing positive term in and\n",
" languages: [rust]\n",
" severity: ERROR\n",
" patterns:\n",
" - pattern-not: foo($X)\n",
" - pattern-inside: bar($Y)\n",
));
let (code, _diag) = first_diagnostic_of_err(result);
assert_eq!(code, DiagnosticCode::ESempaiMissingPositiveTermInAnd);❌ New issue: Code Duplication |
This comment was marked as resolved.
This comment was marked as resolved.
Refactored repeated code in engine_tests.rs by introducing a helper function `assert_compile_yaml_semantic_error`. This DRYs up tests that check for expected semantic error diagnostic codes from YAML compilation. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Gates Failed
Enforce advisory code health rules
(1 file with Code Duplication)
Gates Passed
5 Quality Gates Passed
See analysis details in CodeScene
Reason for failure
| Enforce advisory code health rules | Violations | Code Health Impact | |
|---|---|---|---|
| engine_tests.rs | 1 advisory rule | 10.00 → 9.39 | Suppress |
Quality Gate Profile: Pay Down Tech Debt
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.
| fn compile_yaml_returns_semantic_error_for_missing_positive_term_in_and() { | ||
| assert_compile_yaml_semantic_error( | ||
| concat!( | ||
| "rules:\n", | ||
| " - id: demo.missing.positive.term.in.and\n", | ||
| " message: missing positive term in and\n", | ||
| " languages: [rust]\n", | ||
| " severity: ERROR\n", | ||
| " patterns:\n", | ||
| " - pattern-not: foo($X)\n", | ||
| " - pattern-inside: bar($Y)\n", | ||
| ), | ||
| DiagnosticCode::ESempaiMissingPositiveTermInAnd, | ||
| ); |
There was a problem hiding this comment.
❌ New issue: Code Duplication
The module contains 2 functions with similar structure: compile_yaml_returns_semantic_error_for_invalid_not_in_or,compile_yaml_returns_semantic_error_for_missing_positive_term_in_and
Summary
pattern*syntax and v2matchsyntax into a single representation in sempai_core.Changes
Core modelling
Formulaenum with variants:Atom,Not,Inside,Anywhere,And,Or.Atomenum:Pattern,Regex,TreeSitterQuery.PatternAtom,RegexAtom,TreeSitterQueryAtom.Decorated<T>wrapper carryingwhere_clauses,as_name,fix, andspan.WhereClauseas an opaque container (initially storing a raw JSON value).pub mod formula;and re-export fromsempai-corelib).Normalization layer
Decorated<Formula>:pattern,pattern-regex,patterns,pattern-either,pattern-inside, etc. to canonicalFormulastructures.matchforms (all,any,not,inside,anywhere, etc.) to canonical shapes.WhereClauses to the enclosingDecorated<Formula>for constraints carried over from legacy forms.Decoratedmetadata (where,as,fix) during normalization.Semantic validation
InvalidNotInOr: reject negated branches inside disjunctions.MissingPositiveTermInAnd: require at least one positive term in anAnd.E_SEMPAI_INVALID_NOT_IN_OR,E_SEMPAI_MISSING_POSITIVE_TERM_IN_AND).Engine integration
Engine::compile_yaml:Decorated<Formula>.QueryPlancontaining the normalized formula and rule metadata.QueryPlanto store theDecorated<Formula>instead of the prior placeholder.Testing
sempai-corefor construction and equality ofFormulavariants.sempaifor normalization and semantic validation scenarios.Documentation
docs/sempai-query-language-design.md(note canonicalFormulamodel andWhereClausehandling).docs/users-guide.md(update to reflectcompile_yamlreturning compiled plans for valid rules).docs/roadmap.md(mark 4.1.5 as done).Validation and acceptance criteria
Engine::compile_yamlreturnsQueryPlans with normalizedDecorated<Formula>for valid rules (no moreNOT_IMPLEMENTED).Formulavalues.How to test locally
Notes on compatibility and scope
sempai_yaml::parse_rule_filesignature remains unchanged.📎 Task: https://www.devboxer.com/task/795de59f-09d8-463e-9ef9-350b92fa57be