Implement canonical Formula normalization; wire into engine (4.1.5)#105
Implement canonical Formula normalization; wire into engine (4.1.5)#105
Conversation
Add a canonical Formula enum and associated Atom and Decorated types to sempai_core. Implement normalization of legacy and v2 Semgrep operators into the canonical Formula model. Enforce semantic constraints rejecting invalid formula shapes with diagnostic codes. Update Engine::compile_yaml to produce QueryPlan structs carrying normalized Formulas instead of placeholders. Preserve legacy constraints as opaque Formula::Constraint nodes. Add extensive unit and BDD tests covering normalization and constraint validation. Update documentation and roadmap to reflect new normalization and validation behavior. This completes roadmap item 4.1.5 with fully tested normalization pipeline and user-visible error reporting. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Reviewer's GuideImplements a canonical Formula model in sempai_core and wires a pure normalization + semantic-validation pipeline (from legacy and v2 YAML search principals) through the sempai engine so Engine::compile_yaml now returns real QueryPlans carrying Formulas, backed by unit/BDD tests and documentation updates. Class diagram for canonical Formula model and engine integrationclassDiagram
direction LR
class Atom {
<<enum>>
+Pattern : String
+Regex : String
+TreeSitterQuery : String
}
class Decorated_T_ {
<<generic>>
+node : T
+where_clauses : Vec~serde_json::Value~
+as_name : Option~String~
+fix : Option~String~
}
class Formula {
<<enum>>
+Atom : Atom
+Not : Decorated~Formula~
+Inside : Decorated~Formula~
+Anywhere : Decorated~Formula~
+And : Vec~Decorated~Formula~~
+Or : Vec~Decorated~Formula~~
+Constraint : serde_json::Value
}
class SearchQueryPrincipal {
<<from_sempai_yaml>>
+Legacy : LegacyFormula
+Match : MatchFormula
+ProjectDependsOn
}
class LegacyFormula {
<<from_sempai_yaml>>
+Pattern : String
+PatternRegex : String
+Patterns : Vec~LegacyClause~
+PatternEither : Vec~LegacyFormula~
+PatternNot : LegacyValue
+PatternInside : LegacyValue
+PatternNotInside : LegacyValue
+PatternNotRegex : String
+Anywhere : LegacyValue
}
class LegacyValue {
<<from_sempai_yaml>>
+String : String
+Formula : LegacyFormula
}
class LegacyClause {
<<from_sempai_yaml>>
+Formula : LegacyFormula
+Constraint : serde_json::Value
}
class MatchFormula {
<<from_sempai_yaml>>
+Pattern : String
+PatternObject : String
+Regex : String
+All : Vec~MatchFormula~
+Any : Vec~MatchFormula~
+Not : Box~MatchFormula~
+Inside : Box~MatchFormula~
+Anywhere : Box~MatchFormula~
+Decorated : MatchDecorated
}
class MatchDecorated {
<<from_sempai_yaml>>
+formula : Box~MatchFormula~
+where_clauses : Vec~serde_json::Value~
+as_name : Option~String~
+fix : Option~String~
}
class DiagnosticReport {
<<from_sempai_core>>
+codes : Vec~DiagnosticCode~
}
class QueryPlan {
+rule_id : String
+language : Language
+formula : Option~Formula~
+formula() : Option~&Formula~
}
class Engine {
+compile_yaml(rule_file_path : &str) : Result~Vec~QueryPlan~, DiagnosticReport~
+execute(plans : &Vec~QueryPlan~) : Result~(), DiagnosticReport~
}
class NormaliseModule {
<<sempai_normalise>>
+normalise_search_principal(principal : &SearchQueryPrincipal) : Result~Formula, DiagnosticReport~
+normalise_legacy(formula : &LegacyFormula) : Result~Formula, DiagnosticReport~
+normalise_match(formula : &MatchFormula) : Result~Formula, DiagnosticReport~
+validate_formula_constraints(formula : &Formula) : Result~(), DiagnosticReport~
}
Formula o--> Atom
Formula "1" o--> "many" Decorated_T_
Decorated_T_ "1" o--> "1" Formula : T=Formula
SearchQueryPrincipal --> LegacyFormula
SearchQueryPrincipal --> MatchFormula
LegacyFormula --> LegacyClause
LegacyFormula --> LegacyValue
LegacyClause --> LegacyFormula
MatchFormula --> MatchDecorated
MatchDecorated --> MatchFormula
NormaliseModule ..> SearchQueryPrincipal
NormaliseModule ..> LegacyFormula
NormaliseModule ..> MatchFormula
NormaliseModule ..> Formula
NormaliseModule ..> DiagnosticReport
Engine --> QueryPlan
Engine ..> NormaliseModule
QueryPlan ..> Formula
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Added a new section 'Practice documentation' to the 4-1-5 normalization into canonical formula model docs. This section lists relevant project guidance documents covering Testing, Design and architecture, Code quality, and Configuration aspects pertinent to this milestone. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
- Introduce a canonical normalized formula model (Formula, Atom, Decorated) in sempai_core - Add normalization modules in sempai to lower legacy and v2 Semgrep queries into Formula - Implement semantic constraint validation on formulas (e.g., disallow Not in Or, require positive terms in And) - Change Engine::compile_yaml to produce QueryPlans with normalized formulas - Support ProjectDependsOn rules producing plans without formulas - Add extensive unit and BDD tests for normalization and constraints - Update docs and user guide to reflect normalization and new diagnostics - Remove previous not-implemented placeholders from query compilation This implements roadmap item 4.1.5 and completes the canonical internal formula model and query normalization pipeline. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Gates Failed
Enforce advisory code health rules
(3 files with Code Duplication)
Gates Passed
5 Quality Gates Passed
See analysis details in CodeScene
Reason for failure
| Enforce advisory code health rules | Violations | Code Health Impact | |
|---|---|---|---|
| behaviour.rs | 1 advisory rule | 10.00 → 9.39 | Suppress |
| constraint_tests.rs | 1 advisory rule | 9.39 | Suppress |
| normalise_tests.rs | 1 advisory rule | 9.39 | Suppress |
Quality Gate Profile: Pay Down Tech Debt
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.
| #[then("compilation fails with code {code}")] | ||
| fn then_compilation_fails(world: &mut TestWorld, code: QuotedString) { | ||
| assert_diagnostic_code( | ||
| let report = extract_report( |
There was a problem hiding this comment.
❌ New issue: Code Duplication
The module contains 4 functions with similar structure: then_compilation_fails,then_execution_fails,then_first_plan_has_formula,then_first_plan_has_no_formula
| fn or_with_not_child_is_rejected() { | ||
| let formula = Formula::Or(vec![ | ||
| bare(pat("a")), | ||
| bare(Formula::Not(Box::new(bare(pat("b"))))), | ||
| ]); | ||
| let err = validate_formula_constraints(&formula).expect_err("should fail"); | ||
| let code = err.diagnostics().first().expect("at least one").code(); | ||
| assert_eq!(code, DiagnosticCode::ESempaiInvalidNotInOr); | ||
| } |
There was a problem hiding this comment.
❌ New issue: Code Duplication
The module contains 4 functions with similar structure: and_with_no_positive_terms_is_rejected,nested_and_inside_or_with_no_positive_term_is_rejected,nested_or_with_not_inside_and_is_rejected,or_with_not_child_is_rejected
| fn legacy_patterns_normalises_to_and() { | ||
| let principal = SearchQueryPrincipal::Legacy(LegacyFormula::Patterns(vec![ | ||
| LegacyClause::Formula(LegacyFormula::Pattern(String::from("a"))), | ||
| LegacyClause::Formula(LegacyFormula::PatternNot(Box::new(LegacyValue::String( | ||
| String::from("b"), | ||
| )))), | ||
| ])); | ||
| let result = normalise_search_principal(&principal).expect("ok"); | ||
| let expected = Formula::And(vec![ | ||
| bare(pat("a")), | ||
| bare(Formula::Not(Box::new(bare(pat("b"))))), | ||
| ]); | ||
| assert_eq!(result, Some(expected)); | ||
| } |
There was a problem hiding this comment.
❌ New issue: Code Duplication
The module contains 4 functions with similar structure: legacy_pattern_either_normalises_to_or,legacy_patterns_normalises_to_and,v2_all_normalises_to_and,v2_any_normalises_to_or
Summary
Changes
Artefacts
Rationale & design decisions
Testing plan
Milestones progress (high level)
Task
📎 Task: https://www.devboxer.com/task/21bc5583-37ef-4f65-88fa-8c6f1f8f662c