Add sempai_yaml: YAML parser front-end, engine wiring, and tests#85
Add sempai_yaml: YAML parser front-end, engine wiring, and tests#85
Conversation
…and serde-saphyr Create a comprehensive execution plan document outlining the design, constraints, risks, progress stages, and validation requirements for milestone 4.1.3. This milestone focuses on implementing parsing of Semgrep-compatible YAML rule files via a new `sempai_yaml` crate using `saphyr` and `serde-saphyr`. The document specifies the planned crate structure, API contract, diagnostic reporting standards, testing strategy, and documentation updates necessary before implementation begins. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
Reviewer's GuideIntroduces a new Sequence diagram for Engine::compile_yaml using sempai_yaml parsersequenceDiagram
actor Caller
participant Engine
participant SempaiYaml as sempai_yaml_parse_rule_file
participant SourceMap
participant SerdeSaphyr as serde_saphyr
participant Diagnostics as DiagnosticReport
Caller->>Engine: compile_yaml(yaml)
Engine->>SempaiYaml: parse_rule_file(yaml, None)
SempaiYaml->>SourceMap: SourceMap::parse(yaml, source_uri)
SourceMap-->>SempaiYaml: source_map
SempaiYaml->>SerdeSaphyr: from_str(yaml)
alt YAML_or_schema_error
SerdeSaphyr-->>SempaiYaml: Error
SempaiYaml->>Diagnostics: diagnostic_from_serde(error, source_map)
SempaiYaml-->>Engine: Err(DiagnosticReport)
Engine-->>Caller: Err(DiagnosticReport)
else Parse_ok
SerdeSaphyr-->>SempaiYaml: RawRuleFile
SempaiYaml-->>Engine: Ok(RuleFile)
Engine->>Diagnostics: DiagnosticReport::not_implemented(compile_yaml_query_plan_normalisation)
Engine-->>Caller: Err(DiagnosticReport)
end
Class diagram for new YAML rule model in sempai_yamlclassDiagram
class RuleFile {
+Vec~Rule~ rules
+RuleFile new(rules: Vec~Rule~)
+rules() Vec~Rule~
}
class Rule {
+String id
+RuleMode mode
+Option~String~ message
+Vec~String~ languages
+Option~RuleSeverity~ severity
+Option~String~ min_version
+Option~String~ max_version
+RulePrincipal principal
+id() String
+mode() RuleMode
+message() Option~String~
+languages() Vec~String~
+severity() Option~RuleSeverity~
+min_version() Option~String~
+max_version() Option~String~
+principal() RulePrincipal
}
class RuleMode {
<<enum>>
+Search
+Taint
+Join
+Extract
+Other(String)
+from_optional(raw: Option~str~) RuleMode
}
class RuleSeverity {
<<enum>>
+Error
+Warning
+Info
+Inventory
+Experiment
+Critical
+High
+Medium
+Low
+parse(raw: str) Option~RuleSeverity~
}
class RulePrincipal {
<<enum>>
+Search(SearchQueryPrincipal)
+Extract(ExtractQueryPrincipal)
+Taint(TaintQueryPrincipal)
+Join(Value)
}
class SearchQueryPrincipal {
<<enum>>
+Legacy(LegacyFormula)
+Match(MatchFormula)
}
class ExtractQueryPrincipal {
+String dest_language
+String extract
+LegacyFormula query
+dest_language() String
+extract() String
+query() LegacyFormula
}
class TaintQueryPrincipal {
<<enum>>
+New(Value)
+Legacy(sources: Value, sanitizers: Option~Value~, sinks: Value)
}
class LegacyFormula {
<<enum>>
+Pattern(String)
+PatternRegex(String)
+Patterns(Vec~LegacyClause~)
+PatternEither(Vec~LegacyFormula~)
+PatternNot(Box~LegacyValue~)
+PatternInside(Box~LegacyValue~)
+PatternNotInside(Box~LegacyValue~)
+PatternNotRegex(String)
+Anywhere(Box~LegacyValue~)
}
class LegacyValue {
<<enum>>
+String(String)
+Formula(LegacyFormula)
}
class LegacyClause {
<<enum>>
+Formula(LegacyFormula)
+Constraint(Value)
}
class MatchFormula {
<<enum>>
+Pattern(String)
+PatternObject(String)
+Regex(String)
+All(Vec~MatchFormula~)
+Any(Vec~MatchFormula~)
+Not(Box~MatchFormula~)
+Inside(Box~MatchFormula~)
+Anywhere(Box~MatchFormula~)
+Decorated(formula: Box~MatchFormula~, where_clauses: Vec~Value~, as_name: Option~String~, fix: Option~String~)
+decorated(formula: MatchFormula, where_clauses: Vec~Value~, as_name: Option~String~, fix: Option~String~) MatchFormula
}
RuleFile --> Rule
Rule --> RuleMode
Rule --> RuleSeverity
Rule --> RulePrincipal
RulePrincipal --> SearchQueryPrincipal
RulePrincipal --> ExtractQueryPrincipal
RulePrincipal --> TaintQueryPrincipal
SearchQueryPrincipal --> LegacyFormula
SearchQueryPrincipal --> MatchFormula
ExtractQueryPrincipal --> LegacyFormula
TaintQueryPrincipal --> LegacyFormula
LegacyFormula --> LegacyClause
LegacyFormula --> LegacyValue
LegacyClause --> LegacyFormula
LegacyClause --> LegacyValue
MatchFormula --> MatchFormula
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
Summary by CodeRabbit
WalkthroughAdd a new crate Changes
Sequence DiagramsequenceDiagram
participant User as "User"
participant Engine as "sempai::Engine"
participant SourceMap as "SourceMap"
participant Parser as "sempai_yaml::parser"
participant Validator as "Schema/Builders"
participant Diagnostics as "DiagnosticReport"
User->>Engine: compile_yaml(yaml_string)
Engine->>SourceMap: parse(yaml, source_uri)
SourceMap-->>Engine: SourceMap with spans
Engine->>Parser: parse_rule_file(yaml, source_uri)
alt Malformed YAML
Parser->>Diagnostics: produce E_SEMPAI_YAML_PARSE (with span)
Diagnostics-->>Parser: DiagnosticReport
Parser-->>Engine: Err(DiagnosticReport)
Engine-->>User: Err(DiagnosticReport)
else Valid YAML
Parser->>Validator: deserialize RawRuleFile -> validate & convert
alt Schema invalid
Validator->>Diagnostics: produce E_SEMPAI_SCHEMA_INVALID (with span)
Diagnostics-->>Validator: DiagnosticReport
Validator-->>Parser: Err(DiagnosticReport)
Parser-->>Engine: Err(DiagnosticReport)
Engine-->>User: Err(DiagnosticReport)
else Schema valid
Validator-->>Parser: Ok(RuleFile)
Parser-->>Engine: Ok(RuleFile)
Engine->>Diagnostics: produce NOT_IMPLEMENTED ("query-plan normalization")
Diagnostics-->>Engine: DiagnosticReport
Engine-->>User: Err(DiagnosticReport)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…nostics - Introduce `sempai_yaml` crate providing schema-aligned rule models and parser. - Parse legacy and v2 search principals into typed Rust enums. - Preserve extract, join, and taint rule bodies for later semantic handling. - Implement structured parser diagnostics with source spans for malformed YAML and schema errors. - Wire `sempai::Engine::compile_yaml` to delegate YAML parsing to `sempai_yaml` with error reporting. - Replace placeholder `NOT_IMPLEMENTED` for YAML parsing with real parse-time diagnostics. - Add behaviour, unit, and integration tests including feature scenarios validating parser success and error handling. - Update docs, roadmap, and user guide to reflect new YAML parsing capability and diagnostic reporting. This completes roadmap milestone 4.1.3, enabling immediate user-visible YAML parse validation while awaiting query plan normalization. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix: crates/sempai-yaml/src/model.rs Comment on file as_name: Option<String>,
fix: Option<String>,
) -> Self {
if where_clauses.is_empty() && as_name.is_none() && fix.is_none() {❌ New issue: Complex Conditional |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix: crates/sempai/src/tests/engine_tests.rs Comment on file #[test]
fn compile_yaml_returns_not_implemented() {
fn compile_yaml_returns_yaml_parse_diagnostic_for_malformed_yaml() {❌ New issue: Code Duplication |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
…omics - Added const fn `has_decoration` in MatchFormula to clarify decoration check - Simplified conditional in `decorated` method using the new helper - Refactored engine tests to use helper functions for cleaner error handling - Improved readability and consistency in tests for diagnostic checks Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d7b04233b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai-yaml/src/model.rs`:
- Around line 29-41: The constructor RuleFile::new currently declared as `pub
const fn new(rules: Vec<Rule>) -> Self` is invalid for const-eval because Vec
requires heap allocation; either remove the `const` qualifier so it becomes `pub
fn new(rules: Vec<Rule>) -> Self`, or change the parameter to a const-compatible
type (e.g., `rules: &'static [Rule]`) and update the struct field and any
callers accordingly so they pass a static slice; ensure callers of RuleFile::new
and the stored field type are updated to match the chosen option.
In `@crates/sempai-yaml/src/parser.rs`:
- Around line 1-14: The parser.rs file is over the 400-line limit; extract all
raw serde types (e.g., RawRuleFile, RawRule, RawLegacyClause, RawLegacyFormula,
RawValue, etc.) and their TryFrom implementations into a new submodule (e.g.,
raw.rs or conversion.rs). Create the new file, move the struct/enum definitions
and their TryFrom impl blocks there, make them pub(crate) or pub as needed, add
mod raw; in parser.rs replace the moved definitions with use
crate::parser::raw::{RawRuleFile, RawRule, RawLegacyClause, ...} and update any
references (TryFrom impls or conversions) to point at the new module; ensure
visibility and imports (serde, serde_json, Spanned, SourceMap, etc.) are added
to the new file and re-run tests to confirm compilation.
- Around line 40-70: RawRule is missing #[serde(deny_unknown_fields)] unlike the
other parser structs; either add the attribute to RawRule to enforce consistency
and fail on unexpected YAML keys, or if unknown fields are intentionally allowed
for forward-compatibility, add a clear comment above the struct (e.g., "//
FIXME: allow unknown fields for forward compatibility because ...") explaining
the rationale; locate the RawRule definition and either add
#[serde(deny_unknown_fields)] directly above struct RawRule or insert the FIXME
comment with the explanation.
- Around line 567-596: The try_from implementation for RawMatchFormulaObject
contains an 8-branch if-else chain; extract that branch logic into a new helper
function (e.g., build_core_match_formula) that takes the RawMatchFormulaObject
(or a reference) and returns Result<MatchFormula, Self::Error> (or Result<Self,
Self::Error> matching the enum/variant type used in try_from). Move the
pattern/regex/all/any/not/inside/anywhere checks and conversions (including the
.into_iter().map(TryInto::try_into).collect and Box::new conversions) into
build_core_match_formula, returning the constructed core variant or the
schema_error for the empty case; then simplify try_from to call
build_core_match_formula(value) and return its result, keeping all error
handling intact and preserving the use of schema_error and the same variant
constructors.
In `@crates/sempai-yaml/src/source_map.rs`:
- Around line 40-56: Extract the duplicated
document.data.as_mapping_get("rules") call into a local binding (e.g. let
rules_opt = document.data.as_mapping_get("rules")) and then use that binding
when computing rules_span and rule_spans; for rules_span call
source_span_for_node(yaml, rules, source_uri.as_deref()) only if
rules_opt.is_some(), and for rule_spans match on rules_opt.data to handle
YamlDataOwned::Sequence as currently done, defaulting to empty when None—update
references to the existing symbols rules_span, rule_spans, document,
source_span_for_node, yaml, and source_uri accordingly.
- Around line 112-123: The function char_index_to_byte currently counts
characters then calls source.char_indices().nth(index), which does two O(n)
passes; remove the redundant char_count logic in char_index_to_byte and directly
call source.char_indices().nth(index), mapping Some((offset,_)) to Some(offset)
and mapping None to Some(source.len()) (also keep the special-case index == 0
returning Some(0)). Update only char_index_to_byte to handle out-of-bounds via
the nth result instead of pre-counting.
In `@crates/sempai/src/engine.rs`:
- Around line 111-115: The DiagnosticReport message in compile_yaml uses
"normalisation"; update the string passed to DiagnosticReport::not_implemented
inside the compile_yaml method to use the repository's "-ization" spelling (e.g.
"compile_yaml query-plan normalization") so the diagnostic text matches the
project's convention; locate the call to DiagnosticReport::not_implemented in
the compile_yaml function and replace the literal accordingly.
In `@crates/sempai/src/tests/engine_tests.rs`:
- Around line 8-10: Convert the multi-line test helper function default_engine
into a single-line definition: replace the current block-style fn
default_engine() -> Engine { Engine::new(EngineConfig::default()) } with a
single-line function declaration that returns Engine by calling
Engine::new(EngineConfig::default()); keep the function name default_engine and
the calls to Engine::new and EngineConfig::default unchanged.
In `@docs/execplans/4-1-3-yaml-rule-parsing-via-saphyr-and-serde-saphyr.md`:
- Around line 236-246: Replace the inconsistent -isation spellings in the
retrospective notes by changing "deserialisation" to "deserialization" and
"normalisation" to "normalization" so the document follows the repository's
en-GB-oxendict rule of using -ize/-ization; update the two occurrences (the
words "deserialisation" and "normalisation" in the Retrospective notes block) to
their -ization forms and run the markdown lint/format checks referenced in the
file to confirm no other instances remain.
In `@docs/sempai-query-language-design.md`:
- Around line 424-428: In the implementation note sentence referencing
sempai::Engine::compile_yaml and sempai_yaml, update the British Oxford spelling
to the repository's preferred "-ization" form by replacing "normalisation" with
"normalization" (e.g., in the line "Successful YAML parsing still returns a
deliberate `NOT_IMPLEMENTED` placeholder until 4.1.5 delivers normalisation into
executable query plans." change "normalisation" to "normalization").
In `@docs/users-guide.md`:
- Around line 1135-1144: The docs text uses British "-isation" in the phrase
"rule normalisation into executable query plans"; update that phrase to the
Oxford spelling "-ization" (i.e., "rule normalization into executable query
plans") where it appears in the block describing compile_yaml and the
NOT_IMPLEMENTED placeholder so the documentation for compile_yaml, compile_dsl,
and execute uses the repository's en-GB-oxendict conventions consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e8153f4a-de56-4339-a187-e31471c72ff1
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
Cargo.tomlcrates/sempai-yaml/Cargo.tomlcrates/sempai-yaml/src/lib.rscrates/sempai-yaml/src/model.rscrates/sempai-yaml/src/parser.rscrates/sempai-yaml/src/source_map.rscrates/sempai-yaml/src/tests/behaviour.rscrates/sempai-yaml/src/tests/mod.rscrates/sempai-yaml/src/tests/parser_tests.rscrates/sempai-yaml/tests/features/sempai_yaml.featurecrates/sempai/Cargo.tomlcrates/sempai/src/engine.rscrates/sempai/src/tests/behaviour.rscrates/sempai/src/tests/engine_tests.rscrates/sempai/tests/features/sempai_engine.featuredocs/execplans/4-1-3-yaml-rule-parsing-via-saphyr-and-serde-saphyr.mddocs/roadmap.mddocs/sempai-query-language-design.mddocs/users-guide.md
- Added support for `match` top-level query principal alongside legacy patterns. - Enforced exclusive use of either legacy search keys or `match` in rules. - Validated `match` formulas to accept exactly one operator, rejecting empty or multiple operators. - Improved error messages for schema violations involving query principals. - Added comprehensive tests covering extract, join, taint, custom modes, match variants, and validation errors. - Updated source map and parsing code for better compatibility and error handling. This update enables forward compatibility with Semgrep rule schema extensions, helps maintain rule correctness, and improves user diagnostics during YAML rule parsing. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
|
@coderabbitai Have the following now been resolved? Annul any requirements that violate the en-GB-oxendict spelling (-ize / -yse / -our) conventions (for example a request to replace "normalize" with "normalise" or "artefact" with "artifact"), or where the requirement unnecessarily increases cyclomatic complexity. |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/sempai-yaml/src/parser.rs Comment on file fn try_from(value: RawMatchFormulaObject) -> Result<Self, Self::Error> {
let operator_count = [
value.pattern.is_some(),
value.regex.is_some(),
value.all.is_some(),
value.any.is_some(),
value.not.is_some(),
value.inside.is_some(),
value.anywhere.is_some(),
]
.iter()
.filter(|&&present| present)
.count();
if operator_count == 0 {
return Err(schema_error(
String::from("match formula object is empty"),
None,
"add a supported `match` operator",
));
}
if operator_count > 1 {
return Err(schema_error(
String::from("match formula object defines multiple operators"),
None,
"keep only one operator per match object",
));
}
let core = if let Some(pattern) = value.pattern {
Self::PatternObject(pattern)
} else if let Some(regex) = value.regex {
Self::Regex(regex)
} else if let Some(all) = value.all {
Self::All(
all.into_iter()
.map(TryInto::try_into)
.collect::<Result<Vec<_>, _>>()?,
)
} else if let Some(any) = value.any {
Self::Any(
any.into_iter()
.map(TryInto::try_into)
.collect::<Result<Vec<_>, _>>()?,
)
} else if let Some(not) = value.not {
Self::Not(Box::new((*not).try_into()?))
} else if let Some(inside) = value.inside {
Self::Inside(Box::new((*inside).try_into()?))
} else if let Some(anywhere) = value.anywhere {
Self::Anywhere(Box::new((*anywhere).try_into()?))
} else {
// Safety: operator_count == 1 ensures at least one operator is present
return Err(schema_error(
String::from("internal error: no operator found despite count check"),
None,
"please report this bug",
));
};
Ok(Self::decorated(
core,
value.where_.or(value.where_alias).unwrap_or_default(),
value.as_name,
value.fix,
))
}❌ New issue: Complex Method |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/sempai-yaml/src/tests/parser_tests.rs Comment on file fn reject_multiple_match_operators() {
let yaml = concat!(
"rules:\n",
" - id: demo.multi\n",
" message: multiple operators\n",
" languages: [python]\n",
" severity: WARNING\n",
" match:\n",
" pattern: foo($X)\n",
" regex: bar\n",
);
let err = parse_rule_file(yaml, Some("file:///rules.yaml"))
.expect_err("should reject multiple operators");
let diagnostic = err.diagnostics().first().expect("one diagnostic");
assert_eq!(diagnostic.code(), DiagnosticCode::ESempaiSchemaInvalid);
assert!(
diagnostic
.message()
.contains("match formula object defines multiple operators")
);
}❌ New issue: Code Duplication |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/sempai-yaml/src/tests/parser_tests.rs Comment on file fn parse_match_variants() {
let yaml = concat!(
"rules:\n",
" - id: demo.match.pattern\n",
" message: pattern string\n",
" languages: [python]\n",
" severity: WARNING\n",
" match: \"foo($X)\"\n",
"\n",
" - id: demo.match.regex\n",
" message: regex\n",
" languages: [python]\n",
" severity: WARNING\n",
" match:\n",
" regex: \"bar\"\n",
"\n",
" - id: demo.match.any\n",
" message: any\n",
" languages: [python]\n",
" severity: WARNING\n",
" match:\n",
" any:\n",
" - pattern: foo($X)\n",
" - pattern: bar($Y)\n",
"\n",
" - id: demo.match.not\n",
" message: not\n",
" languages: [python]\n",
" severity: WARNING\n",
" match:\n",
" not:\n",
" pattern: foo($X)\n",
"\n",
" - id: demo.match.inside\n",
" message: inside\n",
" languages: [python]\n",
" severity: WARNING\n",
" match:\n",
" inside:\n",
" pattern: class $C\n",
"\n",
" - id: demo.match.anywhere\n",
" message: anywhere\n",
" languages: [python]\n",
" severity: WARNING\n",
" match:\n",
" anywhere:\n",
" pattern: foo($X)\n",
);
let file = parse_rule_file(yaml, Some("file:///rules.yaml")).expect("valid match variants");
let rules = file.rules();
assert_eq!(rules.len(), 6);
// Pattern string shorthand
assert!(matches!(
rules[0].principal(),
RulePrincipal::Search(SearchQueryPrincipal::Match(MatchFormula::Pattern(p)))
if p == "foo($X)"
));
// Regex
assert!(matches!(
rules[1].principal(),
RulePrincipal::Search(SearchQueryPrincipal::Match(MatchFormula::Regex(r)))
if r == "bar"
));
// Any
assert!(matches!(
rules[2].principal(),
RulePrincipal::Search(SearchQueryPrincipal::Match(MatchFormula::Any(children)))
if children.len() == 2
));
// Not
assert!(matches!(
rules[3].principal(),
RulePrincipal::Search(SearchQueryPrincipal::Match(MatchFormula::Not(_)))
));
// Inside
assert!(matches!(
rules[4].principal(),
RulePrincipal::Search(SearchQueryPrincipal::Match(MatchFormula::Inside(_)))
));
// Anywhere
assert!(matches!(
rules[5].principal(),
RulePrincipal::Search(SearchQueryPrincipal::Match(MatchFormula::Anywhere(_)))
));
}❌ New issue: Large Method |
This comment was marked as resolved.
This comment was marked as resolved.
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 14999 🏁 Script executed: Length of output: 1658
Annulments: None. Comment 6 asked to change "normalisation" → "normalization", which is the correct en-GB-oxendict (-ize) direction — that is not a violation to annul. Comment 5 proposes helpers that reduce complexity, not increase it — no annulment there either.
Comment 5 remains open. The agent prompt for that refactor (centralising required-field validation into |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
- Introduced a helper function `first_err_diagnostic` to simplify error diagnostics extraction in tests. - Replaced verbose error assertions with the helper for clearer, DRY test code. - Split the previous single complex test into multiple focused tests covering match pattern shorthand, regex, and match formula variants (any, not, inside, anywhere). - Added tests to validate that invalid cases produce appropriate diagnostics with expected error codes and messages. This improves test coverage and maintainability for sempai-yaml parser. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
crates/sempai-yaml/src/parser/mod.rs (1)
134-140:⚠️ Potential issue | 🟠 MajorStop routing unknown modes through the search builder.
Line 140 skips principal-family validation for
RuleMode::Other(_), but Line 196 still routes the rule intobuild_search_rule(). That makes forward-compatible modes impossible: a future non-search mode either fails with a misleading search-principal error or parses a search principal while ignoring other principal families. Reject unknown modes explicitly here or preserve their principal payload instead of assuming search.Also applies to: 194-198
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sempai-yaml/src/parser/mod.rs` around lines 134 - 140, The match arm for RuleMode::Other(_) must not silently return Ok(()) and later be routed into build_search_rule; either reject unknown modes explicitly or preserve their principal payload instead of treating them as search. Change the RuleMode::Other(_) arm in the match that currently calls unexpected_for_search/…/unexpected_for_taint to return a clear error (e.g., UnknownRuleMode) or to carry the original principal data forward, and also update the code path that calls build_search_rule so it never receives RuleMode::Other(_) (ensure build_search_rule is only invoked for RuleMode::Search); reference RuleMode::Other, unexpected_for_search/extract/join/taint, and build_search_rule when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai-yaml/src/tests/parser_tests/mode_tests.rs`:
- Around line 25-27: Replace the str.to_string() call with String::from(...) in
the assertion to satisfy the str_to_string Clippy lint: update the code that
constructs the LegacyFormula::Pattern (the string literal "source($X)" used
inside LegacyFormula::Pattern in the assert comparing extract.query()) to use
String::from("source($X)") instead of "source($X)".to_string().
---
Duplicate comments:
In `@crates/sempai-yaml/src/parser/mod.rs`:
- Around line 134-140: The match arm for RuleMode::Other(_) must not silently
return Ok(()) and later be routed into build_search_rule; either reject unknown
modes explicitly or preserve their principal payload instead of treating them as
search. Change the RuleMode::Other(_) arm in the match that currently calls
unexpected_for_search/…/unexpected_for_taint to return a clear error (e.g.,
UnknownRuleMode) or to carry the original principal data forward, and also
update the code path that calls build_search_rule so it never receives
RuleMode::Other(_) (ensure build_search_rule is only invoked for
RuleMode::Search); reference RuleMode::Other,
unexpected_for_search/extract/join/taint, and build_search_rule when making the
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0eddd4e7-63c9-4a5b-bd2a-08186d2b00d1
📒 Files selected for processing (5)
crates/sempai-yaml/src/parser/mod.rscrates/sempai-yaml/src/tests/parser_tests/legacy_tests.rscrates/sempai-yaml/src/tests/parser_tests/match_tests.rscrates/sempai-yaml/src/tests/parser_tests/mod.rscrates/sempai-yaml/src/tests/parser_tests/mode_tests.rs
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. crates/sempai-yaml/src/parser/mod.rs Comment on file fn unexpected_for_search(raw: &RawRule) -> Vec<&'static str> {
collect_unexpected(
raw,
&[
(has_extract_fields, "`extract` or `dest-language`"),
(has_join_fields, "`join`"),
(has_taint_fields, "`taint` or legacy taint fields"),
],
)
}❌ New issue: Code Duplication |
This comment was marked as resolved.
This comment was marked as resolved.
Changed usage from `"source($X)".to_string()` to `String::from("source($X)")` in test for consistency and clarity.
Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai-yaml/src/tests/parser_tests/mode_tests.rs`:
- Around line 50-53: The tests call first_err_diagnostic(yaml) but discard the
captured span with "_" allowing span-mapping regressions; update the calls
(e.g., the one using first_err_diagnostic and the one around lines 162-167) to
capture the span boolean (e.g., let (code, message, has_span) =
first_err_diagnostic(yaml)) and then assert that has_span is true for
schema-invalid diagnostics where a span is expected, while keeping the existing
assertions for DiagnosticCode::ESempaiSchemaInvalid and message contents intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9c8518a6-6197-45c6-90ac-d1dc267efc92
📒 Files selected for processing (1)
crates/sempai-yaml/src/tests/parser_tests/mode_tests.rs
…idate_principal_family Removed separate functions for collecting unexpected principal fields per rule mode and instead inlined the calls to `collect_unexpected` directly in the `validate_principal_family` match arms. This simplifies the code by reducing indirection and unused helpers while maintaining the same validation logic. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/sempai-yaml/src/parser/mod.rs (1)
128-130:⚠️ Potential issue | 🟠 MajorStop treating
RuleMode::Other(_)assearch.Lines 128-129 deliberately skip principal-family validation for unknown modes, but Lines 185-186 still pass them to
build_search_rule. That makes the parser accept only unknown modes that happen to look like search rules and misreports everything else as a search-schema failure. Fail fast with an explicit unsupported-mode diagnostic until the model can retain an opaque principal.🔧 Proposed fix
- // Unknown modes are preserved for forward compatibility and treated as search-like - RuleMode::Search | RuleMode::Other(_) => { - build_search_rule(&raw, rule_span.clone(), source_map)? - } + RuleMode::Search => build_search_rule(&raw, rule_span.clone(), source_map)?, + RuleMode::Other(other) => { + return Err(schema_error( + format!("unsupported rule mode `{other}`"), + rule_span.clone(), + "use `search`, `extract`, `join`, or `taint`", + )); + }Also applies to: 183-187
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sempai-yaml/src/parser/mod.rs` around lines 128 - 130, The parser currently treats RuleMode::Other(_) as if it were a search rule by skipping validation (returning Ok(())) but still later passes unknown modes into build_search_rule, causing misreported search-schema errors; change the handling of RuleMode::Other(_) so it returns an explicit unsupported-mode error/diagnostic (i.e., return Err with a clear UnsupportedMode/unsupported_mode diagnostic) instead of Ok(()), and ensure any code that calls build_search_rule only does so for known search modes (leave build_search_rule calls unaffected but gate them behind the known RuleMode variant check so RuleMode::Other(_) never reaches build_search_rule).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/sempai-yaml/src/parser/mod.rs`:
- Around line 128-130: The parser currently treats RuleMode::Other(_) as if it
were a search rule by skipping validation (returning Ok(())) but still later
passes unknown modes into build_search_rule, causing misreported search-schema
errors; change the handling of RuleMode::Other(_) so it returns an explicit
unsupported-mode error/diagnostic (i.e., return Err with a clear
UnsupportedMode/unsupported_mode diagnostic) instead of Ok(()), and ensure any
code that calls build_search_rule only does so for known search modes (leave
build_search_rule calls unaffected but gate them behind the known RuleMode
variant check so RuleMode::Other(_) never reaches build_search_rule).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 99330207-07bf-4c6d-bcec-acbbaa55ecad
📒 Files selected for processing (1)
crates/sempai-yaml/src/parser/mod.rs
Add assertions in parser mode tests to check that schema errors include a primary span, ensuring error diagnostics are complete with location information. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai-yaml/src/tests/parser_tests/mode_tests.rs`:
- Around line 133-146: The test case named join_with_match in mode_tests.rs is
missing a languages header which lets header validation fail before the intended
cross-mode principal assertion; update the YAML string in the
#[case::join_with_match(...)] fixture to include a languages field (e.g. add a
"languages: []" or appropriate languages list under the rules header) so the
parser reaches the join-vs-match principal error and the test asserts the
expected message "Join mode rule contains unexpected principal fields: `match`
or legacy search keys".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 86b30187-2250-4afa-a23b-c532741cad58
📒 Files selected for processing (1)
crates/sempai-yaml/src/tests/parser_tests/mode_tests.rs
Add the 'languages: [python]' field to a rule in parser_tests/mode_tests.rs to enhance test coverage for language-specific rule parsing. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sempai-yaml/src/tests/parser_tests/mode_tests.rs`:
- Around line 118-132: The test case named extract_with_join is missing the
severity header so header validation can fail first; update the YAML fixture
string in the #[case::extract_with_join(...)] input to include a severity field
(e.g., add "severity: WARNING\n" alongside the other top-level rule fields) so
the test still triggers the intended "Extract mode rule contains unexpected
principal fields: `join`" diagnostic rather than a missing-severity error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 22bd0cbe-6b3a-4502-98f5-a36cedc930d7
📒 Files selected for processing (1)
crates/sempai-yaml/src/tests/parser_tests/mode_tests.rs
Include the 'severity: WARNING' field in the extract mode test rule to ensure proper parsing and handling of severity attributes in mode parsing tests. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
Summary
Changes
Rationale
Plan of Work (high level)
Test Plan
Documentation & Roadmap
Validation & Acceptance Criteria
Next Steps
Links
📎 Task: https://www.devboxer.com/task/67f4f901-cb23-45d8-ae2d-d319e6e63204