Skip to content

Refactor manifest internals into modular diagnostics, glob, and macros#244

Merged
leynos merged 5 commits intomainfrom
terragon/refactor-manifest-submodules-d0dzyn
Dec 4, 2025
Merged

Refactor manifest internals into modular diagnostics, glob, and macros#244
leynos merged 5 commits intomainfrom
terragon/refactor-manifest-submodules-d0dzyn

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Dec 2, 2025

Summary

  • Refactors manifest internals into modular diagnostics, glob handling, and Jinja macro support (jinja_macros).
  • Internal module paths reorganised with clearer boundaries; no user-facing behaviour is changed.
  • Tests and scaffolding reorganised to align with the new modular structure and improve coverage.

Changes

Diagnostics

  • Extracted manifest diagnostics into a dedicated module under src/manifest/diagnostics with a mod.rs and a yaml.rs submodule.
  • Added ManifestSource and ManifestName types to support diagnostic labeling and source tracking.
  • Introduced a central ManifestError enum and mapping helpers, including map_data_error for structure-related errors.
  • YAML-specific diagnostics and helpers are kept in the new yaml submodule, and map_yaml_error is exposed for upstream use.

Glob handling

  • Replaced the oversized glob module with a modular structure distributed across:
    • src/manifest/glob/mod.rs: entry point and public API surface
    • src/manifest/glob/errors.rs: error helpers and message construction
    • src/manifest/glob/normalize.rs: separator and escape normalization
    • src/manifest/glob/validate.rs: brace/character-class validation logic
    • src/manifest/glob/walk.rs: filesystem walking helpers
  • Introduced a dedicated GlobPattern type and GlobEntryResult export for clarity.
  • Updated error propagation to use the new GlobErrorContext/GlobErrorType systems.
  • Added new tests module for glob-related validations and path handling.

Jinja macros

  • Created manifest/jinja_macros/mod.rs to host macro-related utilities and re-export integration points.
  • Split caching and macro invocation logic into manifest/jinja_macros/cache.rs.
  • Extracted and updated core helpers (e.g., parse_macro_name, register_macro, register_manifest_macros, call_macro_value) to the new modular structure while preserving behaviour.
  • Adjusted surface area to clearly reflect responsibilities (macro parsing, registration, invocation).

Tests and test structure

  • Reorganized tests into a clearer layout:
    • src/manifest/tests/mod.rs to expose the manifest test suite
    • src/manifest/tests/macros.rs for macro-related tests
    • src/manifest/tests/workspace.rs for workspace/config related tests
  • Updated test scaffolding to reflect the new module boundaries and helpers.
  • Added tests for brace matching, non-UTF-8 path handling, and macro invocation pathways.

Rationale

  • The previous monolithic manifest submodules were becoming hard to maintain. This refactor partitions concerns into smaller, testable units, improving readability and future maintainability.
  • Clear module boundaries reduce cognitive load when extending diagnostics, glob patterns, or manifest macros.

Migration / Compatibility

  • This refactor preserves public APIs and behaviour. Internal module paths have changed, so CI and local builds should be updated to follow the new module layout.
  • No configuration or runtime changes required for users; tests and internal structure have been reorganised.

Test plan

  • Run cargo test to execute the full test suite.
  • Specifically, verify:
    • Diagnostics: YAML parsing diagnostics and structure errors are reported with correct labels and messages.
    • Glob: Brace validation, normalization, and path expansion behave as before, including error paths.
    • Jinja macros: Macro parsing, registration, and invocation paths work with the new module layout.
    • Workspace tests: Manifest workspace resolution and caching behaviour remain correct.

If any CI step fails due to path or import changes, please indicate the exact module path adjustments needed for compatibility.

🌿 Generated by Terry


ℹ️ Tag @terragon-labs to ask questions and address PR feedback

📎 Task: https://www.terragonlabs.com/task/3f86ec8a-9f88-481b-b335-ffdfd7374b03

Summary by Sourcery

Modularise manifest internals by splitting diagnostics, glob handling, and Jinja macro helpers into focused submodules while keeping public behaviour stable.

Bug Fixes:

  • Improve glob error reporting and handling of invalid patterns, I/O failures, and non-UTF-8 paths during manifest glob expansion.

Enhancements:

  • Introduce a dedicated manifest::diagnostics module with ManifestSource/ManifestName types, a central ManifestError, and helpers for mapping YAML and structural errors into rich diagnostics.
  • Refactor glob handling into a manifest::glob module with separate normalization, validation, error construction, and filesystem traversal helpers, plus a GlobPattern type and GlobEntryResult alias.
  • Split Jinja macro support into a manifest::jinja_macros module with a cache submodule, tightening visibility of macro caching helpers while preserving external behaviour.

Tests:

  • Restructure manifest tests into a dedicated manifest::tests tree with separate macro and workspace suites.
  • Add targeted tests for glob separator normalization, brace validation, non-UTF-8 path handling, and macro registration edge cases, as well as workspace resolution and cache placement.

…acro handling

- Remove old manifest glob module and replace with a new, modular glob handling system including errors, normalize, validate, and walk submodules.
- Add detailed glob pattern validation, normalization, and expansion with error context and UTF-8 safety.
- Introduce manifest diagnostics module to provide actionable errors during manifest parsing, improving YAML and JSON error mapping.
- Refactor manifest diagnostics code by moving YAML-specific diagnostics to a dedicated submodule.
- Rename and reorganize jinja_macros module to separate macro caching and invocation logic.
- Improve manifest macro registration and invocation with caching to reuse compiled templates efficiently.
- Move manifest macro related tests under organized modules for better maintainability.

This change improves manifest parsing diagnostics and glob pattern validation and handling, enhancing reliability and developer experience.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 2, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced error diagnostics for manifest parsing with improved source identification and clearer error context.
  • Improvements

    • Refined manifest validation and pattern handling for improved reliability.
    • Strengthened error handling and reporting in macro processing.
    • Expanded test coverage for manifest workspace resolution and validation.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Summarise the addition of structured manifest diagnostics, split the previous glob implementation into focused glob submodules (normalize, validate, walk, errors), refactor Jinja macro registration into a dedicated module, and add related tests for macros and workspace resolution.

Changes

Cohort / File(s) Summary
Diagnostics refactor
src/manifest/diagnostics/mod.rs, src/manifest/diagnostics/yaml.rs
Add structured diagnostic types (ManifestSource, ManifestName, ManifestError), implement conversions and Display, expose map_yaml_error, and add map_data_error to convert serde_json errors into boxed diagnostics; move YAML-specific helpers into yaml.rs.
Glob module split
src/manifest/glob.rs (removed), src/manifest/glob/mod.rs, src/manifest/glob/errors.rs, src/manifest/glob/normalize.rs, src/manifest/glob/validate.rs, src/manifest/glob/walk.rs, src/manifest/glob/tests.rs
Replace monolithic glob.rs with modular implementation: mod.rs exposes glob_paths and GlobPattern; errors.rs defines GlobErrorContext/GlobErrorType and error creation helpers; normalize.rs normalises separators and implements escape forcing; validate.rs validates brace matching; walk.rs handles filesystem traversal and entry processing; add tests for normalisation, validation and expansion.
Jinja macro adjustments
src/manifest/jinja_macros/cache.rs, src/manifest/jinja_macros/mod.rs
Reduce public surface of macro cache (pub(super) for many items); move macro parsing/registration/invocation into mod.rs (parse_macro_name, register_macro, register_manifest_macros, call_macro_value); switch some error returns to anyhow::Result.
Tests and scaffolding
src/manifest/tests/mod.rs, src/manifest/tests/macros.rs, src/manifest/tests/workspace.rs
Add test module scaffolding and tests: update macro tests to new paths, add workspace tests validating workspace root resolution, UTF‑8 path constraints, cache scoping and HTTP-driven manifest fetch behaviour.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Inspect src/manifest/glob/normalize.rs for Unix-guarded escape handling and trailing-backslash edge cases.
  • Verify src/manifest/glob/validate.rs brace-depth state machine and correct error context composition.
  • Review src/manifest/glob/walk.rs path conversion, metadata handling and cross-platform normalisation.
  • Validate src/manifest/diagnostics/mod.rs public API, Display implementations and boxed diagnostic composition.
  • Audit src/manifest/jinja_macros/mod.rs for template compilation, environment mutation and anyhow error propagation.

Possibly related issues

Poem

✨ Split the code, mend the seam,
Errors labelled, globs made clean,
Macros cached and tests in tow,
Ship the blooms of work you sow! 🎉

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly summarises the main refactoring work: modularising diagnostics, glob, and macro support within the manifest internals.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The pull request description is directly related to the changeset. It comprehensively describes the refactoring of manifest internals into modular diagnostics, glob handling, and Jinja macro support, matching the file-level summaries and changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch terragon/refactor-manifest-submodules-d0dzyn

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)
  • UTF-8: Entity not found: Issue - Could not find referenced Issue.

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Dec 2, 2025

Reviewer's Guide

Refactors manifest internals into modular diagnostics, glob, and Jinja macro subsystems, and reorganises associated tests, without altering user-facing behaviour.

Class diagram for the modularised glob handling subsystem

classDiagram
    class GlobPattern {
        +raw : String
        +normalized : String
    }

    class GlobModule {
        +glob_paths(pattern) Result~Vec~String~~
    }

    class NormalizeModule {
        +normalize_separators(pattern) String
        +force_literal_escapes(pattern) String
    }

    class ValidateModule {
        +validate_brace_matching(pattern) Result~()~
    }

    class BraceValidator {
        -depth : i32
        -in_class : bool
        -last_open_pos : Option~usize~
        -escaped : bool
        +new() BraceValidator
        +process_character(ch, pos, pattern) Result~()~
        +handle_escape_sequence(ch) bool
        +handle_character_class(ch) void
        +handle_braces(ch, pos, pattern) Result~()~
        +validate_final_state(pattern) Result~()~
    }

    class WalkModule {
        +open_root_dir(pattern) Result~Dir~
        +process_glob_entry(entry, pattern, root) Result~Option~String~~
        +fetch_metadata(root, path) Result~Metadata~
    }

    class GlobErrorContext {
        +pattern : String
        +error_char : char
        +position : usize
        +error_type : GlobErrorType
    }

    class GlobErrorType {
        <<enum>>
        InvalidPattern
        UnmatchedBrace
        IoError
    }

    class GlobErrorsModule {
        +create_glob_error(ctx, detail) Error
        +create_unmatched_brace_error(ctx) Error
    }

    class GlobEntryResult {
        <<type alias>>
        Result~PathBuf, glob__GlobError~
    }

    class Error {
        <<from minijinja>>
    }

    class Dir {
        <<from cap_std>>
    }

    class Utf8Path {
        <<from camino>>
    }

    GlobModule ..> GlobPattern : constructs
    GlobModule ..> NormalizeModule : uses
    GlobModule ..> ValidateModule : uses
    GlobModule ..> WalkModule : uses
    GlobModule ..> GlobErrorsModule : uses

    ValidateModule ..> BraceValidator : uses

    WalkModule ..> GlobPattern : uses
    WalkModule ..> GlobEntryResult : uses
    WalkModule ..> Dir : uses
    WalkModule ..> Utf8Path : uses
    WalkModule ..> GlobErrorsModule : uses

    GlobErrorsModule ..> GlobErrorContext : uses
    GlobErrorContext ..> GlobErrorType : has

    GlobModule ..> Error : returns
    ValidateModule ..> Error : returns
    WalkModule ..> Error : returns
Loading

Class diagram for the refactored Jinja macros subsystem

classDiagram
    class JinjaMacrosModule {
        +parse_macro_name(signature) Result~String~
        +register_macro(env, macro_def, index) Result~()~
        +register_manifest_macros(doc, env) Result~()~
        +call_macro_value(state, macro_value, positional, kwargs) Result~Value, Error~
    }

    class MacroCache {
        +template_name : String
        +macro_name : String
        -instance : OnceLock~MacroInstance~
        +new(template_name, macro_name) MacroCache
        +prepare(env) Result~()~
    }

    class MacroInstance {
        -state_guard : MacroStateGuard
        -value : Value
        +new(env, template_name, macro_name) Result~MacroInstance~
        +state() &State
        +value() &Value
    }

    class MacroStateGuard {
        -ptr : NonNull~State~
        +new(state) MacroStateGuard
        +state() &State
        +into_inner(self) State
    }

    class CacheModule {
        +make_macro_fn(cache) impl Fn(State, Rest~Value~, Kwargs) -> Result~Value, Error~
    }

    class MacroDefinition {
        +signature : String
        +body : String
    }

    class Environment {
        <<from minijinja>>
        +add_template_owned(name, source) Result~()~
        +add_function(name, func) void
        +get_template(name) Result~Template~
    }

    class State {
        <<from minijinja>>
    }

    class Value {
        <<from minijinja>>
        +call(state, args) Result~Value, Error~
    }

    class Kwargs {
        <<from minijinja>>
    }

    class RestValue {
        <<alias Rest~Value~~>>
    }

    class Error {
        <<from minijinja>>
    }

    class OnceLockMacroInstance {
        <<alias OnceLock~MacroInstance~~>>
    }

    JinjaMacrosModule ..> MacroDefinition : uses
    JinjaMacrosModule ..> Environment : uses
    JinjaMacrosModule ..> MacroCache : constructs
    JinjaMacrosModule ..> CacheModule : uses
    JinjaMacrosModule ..> State : uses
    JinjaMacrosModule ..> Value : uses
    JinjaMacrosModule ..> Kwargs : uses

    MacroCache o--> MacroInstance : caches
    MacroCache ..> Environment : prepares_with
    MacroCache ..> OnceLockMacroInstance : owns

    MacroInstance ..> Environment : loads_from
    MacroInstance ..> State : owns
    MacroInstance ..> Value : owns
    MacroInstance ..> MacroStateGuard : wraps

    MacroStateGuard ..> State : owns_raw

    CacheModule ..> MacroCache : uses
    CacheModule ..> State : closure_arg
    CacheModule ..> Value : closure_arg
    CacheModule ..> Kwargs : closure_arg
    CacheModule ..> Error : returns
Loading

File-Level Changes

Change Details Files
Modularise manifest diagnostics into a dedicated diagnostics module with shared types and helpers.
  • Introduce src/manifest/diagnostics/mod.rs as the central diagnostics module that defines ManifestSource, ManifestName, and ManifestError
  • Move YAML-specific error mapping into src/manifest/diagnostics/yaml.rs and re-export map_yaml_error from the diagnostics module
  • Add map_data_error and DataDiagnostic for serde_json structural errors, simplifying callers’ mapping from data errors to miette diagnostics
src/manifest/diagnostics/mod.rs
src/manifest/diagnostics/yaml.rs
Refactor glob handling into a multi-file module with clear responsibilities and enriched error handling.
  • Create src/manifest/glob/mod.rs as the public glob API exposing GlobPattern, GlobEntryResult, and glob_paths
  • Add normalize.rs to handle separator normalisation and, on Unix, escaping semantics for wildcard and bracket characters
  • Add validate.rs to encapsulate brace and character-class validation, using a BraceValidator helper
  • Add walk.rs to encapsulate filesystem root selection and per-entry processing, including UTF-8 enforcement and directory filtering
  • Introduce errors.rs (content not shown in diff) to construct structured glob-related MiniJinja errors via GlobErrorContext and GlobErrorType
  • Add src/manifest/glob/tests.rs with coverage for normalisation, brace validation, non-UTF-8 paths, and directory filtering
src/manifest/glob/mod.rs
src/manifest/glob/normalize.rs
src/manifest/glob/validate.rs
src/manifest/glob/walk.rs
src/manifest/glob/errors.rs
src/manifest/glob/tests.rs
src/manifest/glob.rs
Split Jinja macro utilities into a modular layout and tighten visibility around caching internals.
  • Introduce src/manifest/jinja_macros/mod.rs as the main module exporting macro-related helpers (parse_macro_name, register_macro, register_manifest_macros, call_macro_value) and wiring in the cache submodule
  • Move macro caching and invocation helpers into src/manifest/jinja_macros/cache.rs, changing make_macro_fn and MacroCache to pub(super) and adjusting error types to anyhow::Result where appropriate
  • Simplify MacroCache::prepare to ignore redundant OnceLock::set failures and replace a manual NonNull::new unwrap with NonNull::new_unchecked plus a safety comment
src/manifest/jinja_macros/mod.rs
src/manifest/jinja_macros/cache.rs
src/manifest/jinja_macros.rs
Restructure manifest tests into macros-focused and workspace-focused modules with clearer responsibilities.
  • Create src/manifest/tests/mod.rs to group manifest tests and re-export submodules
  • Move macro-specific tests from the old src/manifest/tests.rs into src/manifest/tests/macros.rs and trim them to macro-related concerns
  • Introduce src/manifest/tests/workspace.rs to host workspace resolution, non-UTF-8 workspace root, and cache-directory tests previously co-located with macro tests
  • Refactor imports in the new test modules to match the new module hierarchy (e.g., super::super::jinja_macros, from_path_with_policy, stdlib_config_for_manifest)
src/manifest/tests/mod.rs
src/manifest/tests/macros.rs
src/manifest/tests/workspace.rs
src/manifest/tests.rs

Possibly linked issues

  • Remove srgn documentation #153: PR implements the requested manifest refactor by splitting diagnostics, Jinja helpers/macros, and glob handling into dedicated modules.
  • Remove srgn documentation #153: PR performs the requested manifest refactor by splitting diagnostics and glob into submodules and reorganizing tests.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@leynos leynos marked this pull request as ready for review December 3, 2025 09:24
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In glob::BraceValidator, the BraceValidationState.escape_active flag is initialized from cfg!(unix) but never mutated, so you could simplify this by removing it from the state and using #[cfg(unix)] branches directly in handle_escape_sequence to reduce runtime state and make the escape semantics clearer.
  • The normalized_or_raw helper for GlobPattern panics if normalized is None; since this invariant is critical for glob_paths, consider enforcing it at the type level (e.g., by using a dedicated type for validated patterns or constructing GlobPattern only with a normalized value) instead of relying on a runtime panic.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `glob::BraceValidator`, the `BraceValidationState.escape_active` flag is initialized from `cfg!(unix)` but never mutated, so you could simplify this by removing it from the state and using `#[cfg(unix)]` branches directly in `handle_escape_sequence` to reduce runtime state and make the escape semantics clearer.
- The `normalized_or_raw` helper for `GlobPattern` panics if `normalized` is `None`; since this invariant is critical for `glob_paths`, consider enforcing it at the type level (e.g., by using a dedicated type for validated patterns or constructing `GlobPattern` only with a normalized value) instead of relying on a runtime panic.

## Individual Comments

### Comment 1
<location> `src/manifest/glob/normalize.rs:9-12` </location>
<code_context>
-    {
-        let mut out = String::with_capacity(pattern.len());
-        let mut it = pattern.chars().peekable();
-        while let Some(c) = it.next() {
-            if c == '\\' {
-                out.push(process_backslash(&mut it, native));
-            } else if c == '/' || c == '\\' {
-                out.push(native);
-            } else {
</code_context>

<issue_to_address>
**nitpick:** The backslash case in `normalize_separators` is handled twice on Unix, which is redundant and may confuse future maintenance.

In the Unix branch, the loop first handles `c == '\'` via `process_backslash`, then the `else if c == '/' || c == '\'` repeats the backslash check. The backslash part of the `else if` is unreachable and can be removed to reduce redundancy and clarify intent.
</issue_to_address>

### Comment 2
<location> `src/manifest/glob/validate.rs:5` </location>
<code_context>
-use minijinja::{Error, ErrorKind};
-
-/// Represents a character being processed with its context
-#[derive(Debug, Clone, Copy)]
-pub struct CharContext {
-    pub ch: char,
</code_context>

<issue_to_address>
**issue (complexity):** Consider collapsing the validator’s helper structs into a single stateful `BraceValidator` and simplifying its helper methods’ control flow to make the brace-validation scan easier to follow.

You can simplify this validator quite a bit by collapsing the extra types and simplifying the control flow, while keeping all behavior intact.

### 1. Inline `CharContext` and `BraceValidationState` into `BraceValidator`

`CharContext` mostly mirrors fields already on `BraceValidator` and `BraceValidationState`. You can eliminate both `CharContext` and `BraceValidationState` and keep the state directly on `BraceValidator`:

```rust
struct BraceValidator {
    depth: i32,
    in_class: bool,
    last_open_pos: Option<usize>,
    escape_active: bool,
    escaped: bool,
}

impl BraceValidator {
    const fn new() -> Self {
        Self {
            depth: 0,
            in_class: false,
            last_open_pos: None,
            escape_active: cfg!(unix),
            escaped: false,
        }
    }
}
```

Then `process_character` can work on these fields directly:

```rust
fn process_character(
    &mut self,
    ch: char,
    pos: usize,
    pattern: &GlobPattern,
) -> Result<(), Error> {
    if self.handle_escape_sequence(ch) {
        return Ok(());
    }

    self.handle_character_class(ch);
    self.handle_braces(ch, pos, pattern)
}
```

### 2. Simplify escape handling (`Option<Result<..>>``bool`)

`handle_escape_sequence` does not need to return `Option<Result<..>>`; a boolean “handled / not handled” is sufficient:

```rust
fn handle_escape_sequence(&mut self, ch: char) -> bool {
    if self.escaped {
        self.escaped = false;
        return true; // fully handled
    }

    if ch == '\\' && self.escape_active {
        self.escaped = true;
        return true; // escape start handled
    }

    false
}
```

This removes the `if let Some(result)` + `return result` pattern, making the flow easier to follow.

### 3. Use internal state in helpers, remove `in_class` duplication

With the state directly on `BraceValidator`, the helpers no longer need a context object or duplicated `in_class`:

```rust
fn handle_character_class(&mut self, ch: char) {
    match ch {
        '[' if !self.in_class => self.in_class = true,
        ']' if self.in_class => self.in_class = false,
        _ => {}
    }
}

fn handle_braces(
    &mut self,
    ch: char,
    pos: usize,
    pattern: &GlobPattern,
) -> Result<(), Error> {
    if self.in_class {
        return Ok(());
    }

    match ch {
        '}' if self.depth == 0 => Err(create_unmatched_brace_error(&GlobErrorContext {
            pattern: pattern.raw.clone(),
            error_char: ch,
            position: pos,
            error_type: GlobErrorType::UnmatchedBrace,
        })),
        '{' => {
            self.depth += 1;
            self.last_open_pos = Some(pos);
            Ok(())
        }
        '}' => {
            self.depth -= 1;
            Ok(())
        }
        _ => Ok(()),
    }
}
```

### 4. Drop the `#[expect(missing_const_for_fn)]` on a simple mutating helper

Once you have simple, mutating helper methods like `handle_character_class`, the explicit `#[expect(missing_const_for_fn)]` adds noise without value. You can safely remove it:

```rust
fn handle_character_class(&mut self, ch: char) {
    // ...
}
```

This keeps the core logic focused and reduces the amount of “meta” around the code.

These changes keep all semantics (brace depth, class tracking, escape behavior, error reporting) but reduce the number of types, the amount of state indirection, and the nesting in the control flow, making the scan easier to reason about.
</issue_to_address>

### Comment 3
<location> `src/manifest/glob/mod.rs:17` </location>
<code_context>
-}
-
-/// Configuration for brace validation
-#[derive(Debug, Clone)]
-pub struct BraceValidationState {
-    pub depth: i32,
</code_context>

<issue_to_address>
**issue (complexity):** Consider tightening the `GlobPattern` type and using borrows instead of per-call clones to simplify `glob_paths` control flow and state handling.

You can simplify this without changing behavior by tightening the types and borrowing instead of cloning.

### 1. Remove the invalid intermediate `GlobPattern` state

`normalized: Option<String>` is never meaningfully absent after `glob_paths` runs normalization, and the “missing normalization” branch is effectively a logic panic. You can encode the invariant in the type and eliminate that branch:

```rust
#[derive(Debug, Clone)]
pub struct GlobPattern {
    pub raw: String,
    pub normalized: String,
}
```

Then build `GlobPattern` only *after* normalization inside `glob_paths`:

```rust
pub fn glob_paths(pattern: &str) -> std::result::Result<Vec<String>, Error> {
    use glob::{MatchOptions, glob_with};

    let opts = MatchOptions {
        case_sensitive: true,
        require_literal_separator: true,
        require_literal_leading_dot: false,
    };

    // Validation can operate on &str (see next section)
    validate_brace_matching(pattern)?;

    #[cfg(unix)]
    let mut normalized = normalize_separators(pattern);
    #[cfg(not(unix))]
    let normalized = normalize_separators(pattern);
    #[cfg(unix)]
    {
        normalized = force_literal_escapes(&normalized);
    }

    let pattern_state = GlobPattern {
        raw: pattern.to_owned(),
        normalized: normalized.clone(),
    };

    let root = open_root_dir(&pattern_state).map_err(|e| {
        create_glob_error(
            &GlobErrorContext {
                pattern: pattern_state.raw.clone(),
                error_char: char::from(0),
                position: 0,
                error_type: GlobErrorType::IoError,
            },
            Some(e.to_string()),
        )
    })?;

    let entries = glob_with(&pattern_state.normalized, opts).map_err(|e| {
        create_glob_error(
            &GlobErrorContext {
                pattern: pattern_state.raw.clone(),
                error_char: char::from(0),
                position: 0,
                error_type: GlobErrorType::InvalidPattern,
            },
            Some(e.to_string()),
        )
    })?;

    let mut paths = Vec::new();
    for entry in entries {
        if let Some(p) = process_glob_entry(entry, &pattern_state, &root)? {
            paths.push(p);
        }
    }
    Ok(paths)
}
```

This removes:

- The invalid `normalized: None` intermediate state.
- The panic-like `"pattern normalisation missing after validation"` path.
- The need for `as_deref().ok_or_else(...)` on `normalized`.

### 2. Decouple validation from `GlobPattern`

`validate_brace_matching` only needs the raw pattern. You can simplify it to accept `&str` and reduce coupling:

```rust
// before
pub fn validate_brace_matching(pattern: &GlobPattern) -> Result<(), Error> { /* ... */ }

// after
pub fn validate_brace_matching(pattern: &str) -> Result<(), Error> { /* ... */ }
```

Call site (shown above) becomes `validate_brace_matching(pattern)?;`, which avoids creating any struct just for validation.

### 3. Avoid cloning `GlobPattern` per entry

`process_glob_entry(entry, pattern_state.clone(), &root)` clones the whole state every iteration just for error context. You can pass a shared borrow and clone only if/when needed inside error formatting:

```rust
// before
pub fn process_glob_entry(
    entry: GlobEntryResult,
    pattern: GlobPattern,
    root: &Path,
) -> Result<Option<String>, Error> { /* ... */ }

// after
pub fn process_glob_entry(
    entry: GlobEntryResult,
    pattern: &GlobPattern,
    root: &Path,
) -> Result<Option<String>, Error> { /* ... */ }
```

Call site (also updated above):

```rust
for entry in entries {
    if let Some(p) = process_glob_entry(entry, &pattern_state, &root)? {
        paths.push(p);
    }
}
```

Inside `process_glob_entry`, if you need ownership for an error message, clone just the fields you actually display:

```rust
return Err(create_glob_error(
    &GlobErrorContext {
        pattern: pattern.raw.clone(),
        // ...
    },
    Some(msg.to_owned()),
));
```

These changes keep all behavior, but:

- Remove an unnecessary `Option` and its invariant-violating state.
- Flatten the orchestration in `glob_paths` (validate → normalize → construct `GlobPattern` → glob).
- Reduce coupling between validation and stateful structures.
- Avoid per-entry `GlobPattern` clones in the hot path.
</issue_to_address>

### Comment 4
<location> `src/manifest/glob/mod.rs:37` </location>
<code_context>
+///
+/// Panics if pattern normalisation fails to record the derived pattern, which
+/// indicates a logic error in the validator.
+pub fn glob_paths(pattern: &str) -> std::result::Result<Vec<String>, Error> {
+    use glob::{MatchOptions, glob_with};
+
</code_context>

<issue_to_address>
**issue (review_instructions):** Add behavioural and unit tests that cover glob_paths and the new glob normalisation/validation behaviour.

Treat this new glob module (brace validation, separator/escape normalisation, capability-checked walking) as a changed feature and add tests accordingly.

Add behavioural tests that exercise end-to-end glob_paths behaviour (including successful matches, unmatched braces, invalid patterns, non-UTF-8 path handling, and directory vs file filtering), and unit tests that directly cover validate_brace_matching, normalize_separators / force_literal_escapes, and process_glob_entry.

Ensure the new src/manifest/glob/tests.rs module actually contains these tests rather than remaining empty, so changes in this refactor are enforced by the test suite.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
Rules:
- For any new feature or change to an existing feature, both behavioural *and* unit tests are required.
- Bug fixes *must* be demonstrated by a test.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/manifest/glob/normalize.rs Outdated
Comment thread src/manifest/glob/validate.rs Outdated
Comment thread src/manifest/glob/mod.rs
- Removed Option from GlobPattern.normalized and made it a String.
- Changed validate_brace_matching to accept &str instead of &GlobPattern.
- Streamlined escape sequence handling in brace validator.
- Updated glob_paths to construct GlobPattern after normalization.
- Improved code clarity and memory usage by passing references where possible.
- Added tests to cover normalization, filtering directories, and brace validation.
- Fixed normalize_separators to only replace '/' with native separator.
- Made pattern normalization and validation more consistent and robust.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos changed the title Refactor oversized manifest submodules into focused modules Refactor manifest internals into focused diagnostics, glob, and macros Dec 3, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/manifest/jinja_macros/cache.rs (1)

165-173: Panic usage violates production code guidelines.

The coding guidelines state that .expect() and .unwrap() are forbidden outside of tests and errors must be propagated. Although Box::into_raw cannot return null for a valid allocation, wrap this in a Result for consistency with the codebase's error-handling philosophy.

Apply this diff:

-    fn new(state: State<'static, 'static>) -> Self {
+    fn new(state: State<'static, 'static>) -> anyhow::Result<Self> {
         let boxed = Box::new(state);
         let ptr = Box::into_raw(boxed);
-        let ptr_non_null = NonNull::new(ptr).unwrap_or_else(|| {
-            panic!("Box::into_raw cannot return a null pointer");
-        });
-        Self { ptr: ptr_non_null }
+        let ptr_non_null = NonNull::new(ptr)
+            .ok_or_else(|| anyhow::anyhow!("Box::into_raw returned null pointer"))?;
+        Ok(Self { ptr: ptr_non_null })
     }

This requires updating the call site in MacroInstance::new to propagate the error.

Based on coding guidelines: ".expect() and .unwrap() are forbidden outside of tests. Errors must be propagated."

src/manifest/tests/macros.rs (1)

249-260: Use concat!() for long string literals.

The inline YAML string is difficult to read. Per coding guidelines, use concat!() to combine long string literals.

 #[rstest]
 fn register_manifest_macros_supports_multiple(
     mut strict_env: Environment<'static>,
 ) -> AnyResult<()> {
-    let yaml = serde_saphyr::from_str::<ManifestValue>(
-        "macros:\n  - signature: \"greet(name)\"\n    body: |\n      Hello {{ name }}\n  - signature: \"shout(text)\"\n    body: |\n      {{ text | upper }}\n",
-    )?;
+    let yaml = serde_saphyr::from_str::<ManifestValue>(concat!(
+        "macros:\n",
+        "  - signature: \"greet(name)\"\n",
+        "    body: |\n",
+        "      Hello {{ name }}\n",
+        "  - signature: \"shout(text)\"\n",
+        "    body: |\n",
+        "      {{ text | upper }}\n",
+    ))?;
     register_manifest_macros(&yaml, &mut strict_env)?;
     let rendered = render_with(&strict_env, "{{ shout(greet('netsuke')) }}")?;
     ensure!(rendered.trim() == "HELLO NETSUKE");
     Ok(())
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a272c4 and 5a31b93.

📒 Files selected for processing (14)
  • src/manifest/diagnostics/mod.rs (1 hunks)
  • src/manifest/diagnostics/yaml.rs (2 hunks)
  • src/manifest/glob.rs (0 hunks)
  • src/manifest/glob/errors.rs (1 hunks)
  • src/manifest/glob/mod.rs (1 hunks)
  • src/manifest/glob/normalize.rs (1 hunks)
  • src/manifest/glob/tests.rs (1 hunks)
  • src/manifest/glob/validate.rs (1 hunks)
  • src/manifest/glob/walk.rs (1 hunks)
  • src/manifest/jinja_macros/cache.rs (5 hunks)
  • src/manifest/jinja_macros/mod.rs (1 hunks)
  • src/manifest/tests/macros.rs (1 hunks)
  • src/manifest/tests/mod.rs (1 hunks)
  • src/manifest/tests/workspace.rs (1 hunks)
💤 Files with no reviewable changes (1)
  • src/manifest/glob.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during Rust tests in the code itself rather than silencing them.
In Rust, extract meaningfully named helper functions when a function is too long, adhering to separation of concerns and CQRS.
In Rust, group related parameters in meaningfully named structs when a function has too many parameters.
In Rust, consider using Arc to reduce data returned when a function is returning a large error.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
In Rust, prefer immutable data and avoid unnecessary mut bindings.
In Rust, handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code in Rust unless absolutely necessary, and document any usage clearly.
In Rust, place function attributes after doc comments.
In Rust, do not use return in single-line functions.
In Rust, use predicate functions for conditional criteria with more than two branches.
Lints in Rust must not be silenced except as a last resort.
Lint rule suppressions in Rust must be tightly scoped and include a clear reason.
In Rust, prefer expect over allow.
In Rust tests, use rstest fixtures for shared setup.
In Rust tests, replace duplicated tests with #[rstest(...)] parameterized cases.
In Rust, prefer mockall for mocks and stubs.
In Rust, use concat!() to combine long string literals rather than escaping newlines with a backslash.
In Rust, prefer single line versions of functions where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line).
Use NewTypes in Rust to model domain values and eliminate 'integer soup'. Reach for newt-hype when introducing many homogeneous wrappers; add small shims for string-backed wrappers. For path-centric wrappers implement AsRef<Path> along...

Files:

  • src/manifest/jinja_macros/cache.rs
  • src/manifest/glob/mod.rs
  • src/manifest/tests/mod.rs
  • src/manifest/glob/validate.rs
  • src/manifest/glob/tests.rs
  • src/manifest/jinja_macros/mod.rs
  • src/manifest/tests/workspace.rs
  • src/manifest/glob/normalize.rs
  • src/manifest/tests/macros.rs
  • src/manifest/glob/errors.rs
  • src/manifest/diagnostics/yaml.rs
  • src/manifest/glob/walk.rs
  • src/manifest/diagnostics/mod.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/manifest/jinja_macros/cache.rs
  • src/manifest/glob/mod.rs
  • src/manifest/tests/mod.rs
  • src/manifest/glob/validate.rs
  • src/manifest/glob/tests.rs
  • src/manifest/jinja_macros/mod.rs
  • src/manifest/tests/workspace.rs
  • src/manifest/glob/normalize.rs
  • src/manifest/tests/macros.rs
  • src/manifest/glob/errors.rs
  • src/manifest/diagnostics/yaml.rs
  • src/manifest/glob/walk.rs
  • src/manifest/diagnostics/mod.rs
🧬 Code graph analysis (9)
src/manifest/jinja_macros/cache.rs (1)
src/manifest/jinja_macros/mod.rs (1)
  • call_macro_value (148-163)
src/manifest/glob/mod.rs (6)
src/manifest/glob/errors.rs (2)
  • create_glob_error (19-45)
  • create_unmatched_brace_error (47-49)
src/manifest/glob/normalize.rs (2)
  • normalize_separators (3-24)
  • force_literal_escapes (59-78)
src/manifest/glob/validate.rs (2)
  • validate_brace_matching (119-127)
  • new (13-20)
src/manifest/glob/walk.rs (2)
  • open_root_dir (7-15)
  • process_glob_entry (17-61)
src/manifest/diagnostics/mod.rs (6)
  • from (35-37)
  • from (41-43)
  • from (84-86)
  • from (90-92)
  • new (23-25)
  • new (72-74)
src/manifest/jinja_macros/cache.rs (5)
  • entries (42-42)
  • new (86-92)
  • new (126-144)
  • new (166-173)
  • new (234-241)
src/manifest/glob/validate.rs (1)
src/manifest/glob/errors.rs (1)
  • create_unmatched_brace_error (47-49)
src/manifest/glob/tests.rs (4)
src/manifest/glob/normalize.rs (2)
  • force_literal_escapes (59-78)
  • normalize_separators (3-24)
src/manifest/glob/validate.rs (1)
  • validate_brace_matching (119-127)
src/manifest/glob/walk.rs (1)
  • process_glob_entry (17-61)
src/manifest/glob/mod.rs (1)
  • glob_paths (37-92)
src/manifest/jinja_macros/mod.rs (1)
src/manifest/jinja_macros/cache.rs (6)
  • make_macro_fn (15-64)
  • new (86-92)
  • new (126-144)
  • new (166-173)
  • new (234-241)
  • macro_kwargs (33-33)
src/manifest/tests/workspace.rs (1)
src/manifest/mod.rs (3)
  • from_path_with_policy (179-189)
  • stdlib_config_for_manifest (211-239)
  • from_path (158-160)
src/manifest/tests/macros.rs (1)
src/manifest/jinja_macros/mod.rs (4)
  • call_macro_value (148-163)
  • parse_macro_name (37-56)
  • register_macro (74-93)
  • register_manifest_macros (105-121)
src/manifest/glob/walk.rs (1)
src/manifest/glob/errors.rs (1)
  • create_glob_error (19-45)
src/manifest/diagnostics/mod.rs (1)
src/manifest/diagnostics/yaml.rs (1)
  • map_yaml_error (106-130)
🔍 Remote MCP

Perfect! I now have comprehensive documentation about the netsuke repository architecture and organization. Let me compile the most relevant context for this PR review.

Summary of Additional Context for PR Review

Based on the repository's comprehensive documentation, I've gathered the following key context that's relevant to this refactoring PR:

Project Context

netsuke is a build system compiler that transforms YAML manifests with Jinja2 templating into Ninja build files. It follows a six-stage pipeline that transforms high-level user configurations into machine-executable build instructions.

Architecture Alignment

This PR directly implements the modular decomposition proposed in the project's architecture documentation. The refactoring aligns with Module Organization principles by:

  1. Separating concerns into focused submodules (diagnostics, glob, jinja_macros)
  2. Enabling focused testing through compartmentalization
  3. Improving maintainability by partitioning related functionality

Key Design Principles Affected

The refactoring preserves these core design principles:

  1. Module Organization: The new structure follows the documented pattern of subsystem organization with clear module boundaries
  2. Error Handling Architecture: Diagnostics module properly implements the three-tier error model (domain errors → context propagation → user-facing diagnostics)
  3. Testing Framework: Tests remain organized with unit tests in modules and BDD tests in tests/features/ and tests/steps/

Validation Checklist for Reviewers

Based on the documented architecture:

Module exports - Verify that public APIs remain unchanged (only internal import paths change)
Error diagnostics - Ensure ManifestError and diagnostic functions remain accessible
Glob functionality - Confirm glob pattern validation and escaping logic is preserved
Jinja macro handling - Validate that macro registration and invocation patterns work correctly
Test coverage - Confirm BDD features in tests/features/manifest.feature still pass
Cross-platform support - Unix-specific glob logic (escape handling) remains intact

Testing Strategy

Per Testing Framework, this PR should be validated with:

  • cargo test --all (unit + BDD + integration tests)
  • cargo insta review (for snapshot test updates)
  • Specific test modules: tests/features/manifest.feature, tests/features/ninja.feature, workspace tests

CI/CD Impact

The CI/CD Pipeline will validate:

  • make fmt - Code formatting
  • make lint - Clippy linting
  • make test - All test suites
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
  • GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
  • GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
  • GitHub Check: release / build-linux (x86_64-unknown-linux-gnu, linux-amd64, amd64) / Build linux artefacts
  • GitHub Check: build-test (ubuntu-latest, stable)
  • GitHub Check: Sourcery review
🔇 Additional comments (42)
src/manifest/glob/normalize.rs (4)

1-24: LGTM!

The module documentation and normalize_separators function are well-structured. The Unix branch correctly handles backslash escape sequences via process_backslash, whilst the non-Unix branch performs a simple replacement. The past review comment about redundant backslash handling no longer applies—the current implementation correctly separates backslash processing from forward-slash normalisation.


26-56: LGTM!

The predicate functions should_preserve_backslash_for_bracket and should_preserve_backslash_for_wildcard cleanly separate the decision logic for backslash handling. The use of const fn where possible is appropriate.


58-106: LGTM!

The force_literal_escapes function correctly tracks character class state to avoid escaping within [...] constructs. The escape replacement mapping in get_escape_replacement is comprehensive for glob metacharacters.


108-111: LGTM!

Simple, focused predicate function.

src/manifest/tests/mod.rs (1)

1-4: LGTM!

Minimal test module scaffold with appropriate module-level documentation.

src/manifest/jinja_macros/cache.rs (3)

136-148: Unsafe code is well-documented and justified.

The mem::transmute extending the State lifetime is necessary due to MiniJinja's Environment<'static> requirement. The SAFETY comment clearly documents the invariant. The Send and Sync implementations are required for MiniJinja's registered function requirements, and the debug assertions provide runtime validation during development.


193-257: Exemplary unsafe documentation.

The CallerAdapter documentation thoroughly covers safety invariants, thread safety constraints, and usage restrictions. The debug assertions provide valuable runtime checks during development.


1-14: LGTM!

Module documentation is present and imports are appropriately scoped. The pub(super) visibility correctly limits exposure to the parent module.

src/manifest/tests/macros.rs (4)

1-11: LGTM!

Module documentation is present. The updated import paths correctly navigate the new module structure.


12-28: LGTM!

Good use of rstest fixtures for shared setup. The MacroRenderCase struct cleanly encapsulates test data.


30-117: LGTM!

Excellent use of parameterised #[rstest] tests covering various macro signature formats, argument handling, and caller blocks. The test cases are comprehensive and well-organised.


119-247: LGTM!

Comprehensive test coverage for macro functionality including kwargs support, reusability, validation errors, and edge cases. Tests correctly return Result and use ensure! for assertions.

src/manifest/glob/tests.rs (4)

1-11: LGTM!

Module documentation is present. Imports are appropriately organised with conditional compilation for Unix-specific functionality.


13-28: LGTM!

Platform-specific assertions correctly validate the normalisation behaviour on Unix and non-Unix systems.


30-48: LGTM!

Brace validation tests correctly verify both success and failure cases with appropriate error kind assertions.


50-75: LGTM!

Good use of tempdir for filesystem isolation. Tests correctly verify directory filtering and brace mismatch error handling.

src/manifest/glob/mod.rs (2)

1-23: LGTM: Module structure and types are well-organised.

The modular decomposition cleanly separates concerns. GlobPattern now uses non-optional normalized field, and validate_brace_matching accepts &str as suggested in prior reviews.


85-91: LGTM: Entry processing uses borrows correctly.

The loop passes &pattern_state rather than cloning per iteration, addressing prior review feedback about unnecessary clones in the hot path.

src/manifest/glob/validate.rs (4)

1-10: LGTM: Simplified validator structure.

The BraceValidator now directly holds all state fields, eliminating the intermediate CharContext and BraceValidationState structs suggested in prior reviews. The module doc comment is present and descriptive.


37-60: LGTM: Escape sequence handling is correct.

The platform-conditional escape logic correctly handles Unix backslash escapes whilst remaining a no-op on other platforms. The #[expect] annotation includes a reason as required by coding guidelines.


74-102: LGTM: Brace handling logic is sound.

The match correctly guards against unmatched closing braces at depth zero, and properly tracks nesting depth for balanced braces.


119-127: LGTM: Public validation API is clean.

The validate_brace_matching function provides a simple interface that hides the validator's internal state machine.

src/manifest/glob/errors.rs (3)

1-17: LGTM: Error context types are well-defined.

The GlobErrorContext and GlobErrorType provide structured error information with appropriate visibility restrictions (pub(super)).


35-44: Verify the "glob " prefix heuristic remains stable.

The check detail.starts_with("glob ") prevents double-prefixing but relies on upstream error message format. If the glob crate changes its error message format, this heuristic may behave unexpectedly. Document this assumption or pin the dependency version.


47-49: LGTM: Wrapper simplifies common error case.

src/manifest/tests/workspace.rs (5)

1-12: LGTM: Test module is well-organised with appropriate imports.

Module doc comment present. Uses rstest for test fixtures as required by coding guidelines.


14-41: LGTM: Directory guard correctly synchronises environment access.

The EnvLock ensures environment mutations are properly serialised across tests. Error logging in Drop is the right approach since panicking in drop is problematic.


43-69: LGTM: Good use of parameterised testing.

The #[rstest] with #[case] variants correctly tests both relative and absolute path resolution scenarios.


71-90: LGTM: Non-UTF-8 rejection test is thorough.

Correctly uses Unix-specific APIs to create an invalid path and verifies the error message contains expected content.


92-157: LGTM: Comprehensive cache directory test.

The test correctly verifies that caches are created in the manifest's workspace rather than the current working directory. Graceful handling of PermissionDenied for HTTP binding ensures the test works in restricted environments.

src/manifest/glob/walk.rs (3)

1-5: LGTM: Module doc comment and imports are appropriate.

Uses cap_std for capability-based filesystem access, providing security boundaries for glob expansion.


7-15: LGTM: Root directory selection is correct.

Correctly distinguishes between absolute and relative patterns for capability-based directory access.


78-84: LGTM: Simple accessor with appropriate lint suppression.

The #[expect] annotation includes a reason as required by coding guidelines.

src/manifest/jinja_macros/mod.rs (4)

1-6: Module documentation is well-written.

The //! doc comment clearly explains the module's purpose and how manifest macros integrate with the rendering environment.


37-56: Function logic is sound.

The validation steps correctly handle empty signatures, missing parameter lists, and empty identifiers. The use of split_once('(') is idiomatic.


105-121: Robust iteration and error propagation.

The early return for missing macros key is sensible, and context is added at each failure point. The use of serde_json::from_value for deserialisation from ManifestValue (which appears to be serde_json::Value) is correct.


148-163: Implementation is correct and idiomatic.

Using map_or_else to branch on Option<Kwargs> avoids unnecessary allocations when no keyword arguments are present. The convention of appending Kwargs to positional arguments matches the minijinja expectation documented in the relevant snippet from cache.rs.

src/manifest/diagnostics/yaml.rs (1)

132-138: Test helper is correctly scoped.

The expected_offset function is appropriately gated behind #[cfg(test)] and provides a clear reference implementation for verifying byte index calculations.

src/manifest/diagnostics/mod.rs (4)

17-32: NewType pattern is well-applied.

ManifestSource correctly wraps a String with appropriate derives and conversion traits. The #[must_use] annotations are present on constructors and accessors.


66-81: ManifestName mirrors ManifestSource appropriately.

The same const fn as_str() concern applies here. Otherwise, the implementation is consistent and correct.


130-137: Internal diagnostic struct is appropriate.

DataDiagnostic is correctly scoped as pub(crate) equivalent (private) and derives the necessary traits. The #[source] attribute ensures the underlying serde_json::Error is accessible via std::error::Error::source().


27-31: No issues found. The project's MSRV of Rust 1.89.0 is well above Rust 1.87.0, where String::as_str() was stabilised as a const fn. The code will compile correctly on all supported toolchains.

Comment thread src/manifest/diagnostics/mod.rs
Comment thread src/manifest/diagnostics/mod.rs
Comment thread src/manifest/diagnostics/mod.rs
Comment thread src/manifest/diagnostics/yaml.rs
Comment thread src/manifest/glob/mod.rs Outdated
Comment thread src/manifest/glob/walk.rs Outdated
Comment thread src/manifest/glob/walk.rs
Comment thread src/manifest/jinja_macros/mod.rs
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Dec 3, 2025

@coderabbitai have the following now been resolved?

  • In glob::BraceValidator, the BraceValidationState.escape_active flag is initialized from cfg!(unix) but never mutated, so you could simplify this by removing it from the state and using #[cfg(unix)] branches directly in handle_escape_sequence to reduce runtime state and make the escape semantics clearer.
  • The normalized_or_raw helper for GlobPattern panics if normalized is None; since this invariant is critical for glob_paths, consider enforcing it at the type level (e.g., by using a dedicated type for validated patterns or constructing GlobPattern only with a normalized value) instead of relying on a runtime panic.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 3, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the validate.rs file to check BraceValidator and BraceValidationState
cat src/manifest/glob/validate.rs

Length of output: 3222


🏁 Script executed:

#!/bin/bash
# Read the walk.rs file to check normalized_or_raw implementation
cat src/manifest/glob/walk.rs

Length of output: 2849


🏁 Script executed:

#!/bin/bash
# Read mod.rs to check GlobPattern structure
cat src/manifest/glob/mod.rs

Length of output: 2635


@leynos Yes, both issues have been resolved:

1. BraceValidator escape flag ✓

The escape_active flag no longer exists. The refactored code in src/manifest/glob/validate.rs now uses:

  • A mutable escaped field that is properly updated during processing
  • Direct #[cfg(unix)] branches in handle_escape_sequence to control escape semantics at compile time
  • On Unix: toggles self.escaped when encountering backslashes
  • On non-Unix: no escape processing occurs

This eliminates the runtime state waste and makes the platform-specific escape behaviour explicit.

2. GlobPattern normalized field ✓

The type now enforces the invariant at compile time. In src/manifest/glob/mod.rs:

pub struct GlobPattern {
    pub raw: String,
    pub normalized: String,  // String, not Option<String>
}

The normalized field is now a String rather than Option<String>, so normalized_or_raw in src/manifest/glob/walk.rs simply calls .as_str() with no possibility of panic. The glob_paths function always constructs GlobPattern with both fields populated, guaranteeing the invariant.

- Enhance manifest diagnostics modules with richer comments and clearer error messages.
- Simplify string formatting in YAML and data error mappings.
- Correct path separator handling in glob walking for better cross-platform support.
- Change MacroStateGuard::new to return Result, improving error propagation.
- Fix minor API usage in jinja_macros mod for function registration.
- Update tests to use concat! macro for better readability in multi-line YAML strings.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos changed the title Refactor manifest internals into focused diagnostics, glob, and macros Refactor manifest internals into modular diagnostics, glob, and macros Dec 3, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/manifest/jinja_macros/cache.rs (1)

150-190: Defensive error handling for theoretically unreachable case.

Box::into_raw never returns null for non-zero-sized types, so the ok_or_else on line 169-170 is technically unreachable. This defensive approach is acceptable, though a // SAFETY: comment explaining why the error branch is unreachable would clarify the intent.

         let boxed = Box::new(state);
         let ptr = Box::into_raw(boxed);
+        // SAFETY: Box::into_raw never returns null for non-ZST types, but we
+        // handle the impossible case defensively.
         let ptr_non_null = NonNull::new(ptr)
             .ok_or_else(|| anyhow::anyhow!("Box::into_raw returned null pointer"))?;
♻️ Duplicate comments (1)
src/manifest/glob/walk.rs (1)

49-49: LGTM: Character literals now used instead of hex codes.

The code now uses '\\' and ['/', '\\'] instead of the previous char::from(0x5c) and char::from(0x2f), addressing the readability concern from earlier feedback.

Also applies to: 65-65

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a31b93 and f2fc48f.

📒 Files selected for processing (7)
  • src/manifest/diagnostics/mod.rs (1 hunks)
  • src/manifest/diagnostics/yaml.rs (2 hunks)
  • src/manifest/glob/mod.rs (1 hunks)
  • src/manifest/glob/walk.rs (1 hunks)
  • src/manifest/jinja_macros/cache.rs (6 hunks)
  • src/manifest/jinja_macros/mod.rs (1 hunks)
  • src/manifest/tests/macros.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during Rust tests in the code itself rather than silencing them.
In Rust, extract meaningfully named helper functions when a function is too long, adhering to separation of concerns and CQRS.
In Rust, group related parameters in meaningfully named structs when a function has too many parameters.
In Rust, consider using Arc to reduce data returned when a function is returning a large error.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
In Rust, prefer immutable data and avoid unnecessary mut bindings.
In Rust, handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code in Rust unless absolutely necessary, and document any usage clearly.
In Rust, place function attributes after doc comments.
In Rust, do not use return in single-line functions.
In Rust, use predicate functions for conditional criteria with more than two branches.
Lints in Rust must not be silenced except as a last resort.
Lint rule suppressions in Rust must be tightly scoped and include a clear reason.
In Rust, prefer expect over allow.
In Rust tests, use rstest fixtures for shared setup.
In Rust tests, replace duplicated tests with #[rstest(...)] parameterized cases.
In Rust, prefer mockall for mocks and stubs.
In Rust, use concat!() to combine long string literals rather than escaping newlines with a backslash.
In Rust, prefer single line versions of functions where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line).
Use NewTypes in Rust to model domain values and eliminate 'integer soup'. Reach for newt-hype when introducing many homogeneous wrappers; add small shims for string-backed wrappers. For path-centric wrappers implement AsRef<Path> along...

Files:

  • src/manifest/glob/mod.rs
  • src/manifest/glob/walk.rs
  • src/manifest/diagnostics/yaml.rs
  • src/manifest/diagnostics/mod.rs
  • src/manifest/tests/macros.rs
  • src/manifest/jinja_macros/mod.rs
  • src/manifest/jinja_macros/cache.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/manifest/glob/mod.rs
  • src/manifest/glob/walk.rs
  • src/manifest/diagnostics/yaml.rs
  • src/manifest/diagnostics/mod.rs
  • src/manifest/tests/macros.rs
  • src/manifest/jinja_macros/mod.rs
  • src/manifest/jinja_macros/cache.rs
🧬 Code graph analysis (6)
src/manifest/glob/mod.rs (1)
src/manifest/glob/walk.rs (1)
  • open_root_dir (7-15)
src/manifest/glob/walk.rs (1)
src/manifest/glob/errors.rs (1)
  • create_glob_error (19-45)
src/manifest/diagnostics/mod.rs (1)
src/manifest/diagnostics/yaml.rs (1)
  • map_yaml_error (112-136)
src/manifest/tests/macros.rs (1)
src/manifest/jinja_macros/mod.rs (4)
  • call_macro_value (148-163)
  • parse_macro_name (37-56)
  • register_macro (74-93)
  • register_manifest_macros (105-121)
src/manifest/jinja_macros/mod.rs (1)
src/manifest/jinja_macros/cache.rs (6)
  • make_macro_fn (15-64)
  • new (86-92)
  • new (126-144)
  • new (166-172)
  • new (233-240)
  • macro_kwargs (33-33)
src/manifest/jinja_macros/cache.rs (1)
src/manifest/jinja_macros/mod.rs (1)
  • call_macro_value (148-163)
🔍 Remote MCP Deepwiki

Summary of additional repository context relevant to this PR (sources listed):

  • Project architecture and goals: netsuke is a build-system compiler converting YAML+Jinja manifests into Ninja; pipeline stages (parse → expand → render → IR → ninja) and emphasis on deterministic, secure (shell-quote) output.
  • Diagnostics design: manifest diagnostics are intended to produce miette-styled reports with source spans and hints; map_yaml_error maps serde_yml errors into rich diagnostics. New diagnostics module in PR must align with this pattern.
  • Glob functionality expectations: glob() must validate brace-matching, normalize separators across platforms, preserve Unix escape semantics, return lexicographically sorted UTF‑8 file lists, and use cap-std/camino for safe filesystem access. The PR splits glob logic into normalize/validate/walk/errors modules — reviewer should verify these behaviors and capability-based I/O usage are preserved.
  • Jinja/macros expectations: template environment uses minijinja with strict undefined behavior; macros are registered before rendering. The PR reworks jinja_macros (cache.rs and mod.rs) and reduces visibility in cache.rs — verify registrations, error handling (anyhow vs Result) and that macro behavior remains equivalent.
  • Public API & visibility policy: repository favors restricting internal items (pub(super)) while preserving public API surface. The PR both introduces new public types (ManifestSource, ManifestName, ManifestError::Parse, map_data_error) and moves / reduces visibility in jinja_macros/cache.rs — confirm exported items and stdlib registration points remain stable for consumers.
  • Tests & CI expectations: CI runs fmt, clippy, full tests (unit, insta snapshots, cucumber BDD). PR added/updated tests for glob (brace validation, non-UTF-8 paths), jinja macros, and workspace/caching — ensure snapshots and BDD features remain consistent; run cargo test + insta review locally/CI.

Actionable review checklist (based on repo docs + PR summary):

  • Confirm new diagnostics types integrate with existing map_yaml_error behavior and miette diagnostics rendering.
  • Validate glob split preserves previous semantics: brace validation, separator normalization, escape handling (Unix only), capability-based Dir usage, UTF‑8 rejection, lexicographic ordering.
  • Verify jinja macro registration/invocation behavior unchanged (macro parsing, registration order, error behavior); check visibility changes (anyhow vs Result) do not break callers or tests.
  • Check public exports: ensure no unintended breaking changes to public API surface consumed by other crates (stdlib functions, manifest::map_yaml_error re-export).
  • Run full test matrix (cargo test, insta review, BDD) and CI linters (fmt/clippy) as PR's tests and formatting must pass.

Sources used:

  • Project documentation and architecture pages (module organization, diagnostics, glob, template system, testing, CI): Deepwiki read of repository wiki/contents.
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
  • GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
  • GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
  • GitHub Check: release / build-linux (x86_64-unknown-linux-gnu, linux-amd64, amd64) / Build linux artefacts
  • GitHub Check: Sourcery review
  • GitHub Check: build-test (ubuntu-latest, stable)
🔇 Additional comments (37)
src/manifest/glob/mod.rs (3)

1-1: LGTM: Module documentation present and uses correct en-GB spelling.

The module-level doc comment correctly uses "normalising" per en-GB-oxendict conventions.


17-21: LGTM: GlobPattern invariant now enforced at the type level.

The normalized field is now a String (not Option<String>), eliminating the invalid intermediate state and the panic path that was previously flagged. This aligns with the earlier review feedback.


32-87: LGTM: glob_paths orchestration is well-structured.

The function validates first, normalises second, then constructs the GlobPattern only after both steps succeed. Error handling uses Result throughout with no panics. The iteration passes &pattern_state by reference, avoiding per-entry clones as recommended in prior feedback.

src/manifest/glob/walk.rs (4)

1-1: LGTM: Module-level documentation present.

The doc comment explains the module's purpose.


7-15: LGTM: Root directory resolution logic is sound.

The function correctly distinguishes absolute from relative paths and opens the appropriate ambient directory. Uses capability-based filesystem access via cap_std as expected.


17-61: LGTM: Entry processing handles all error cases correctly.

The function:

  • Converts paths to UTF-8, returning a structured error on failure
  • Fetches metadata via the capability-restricted Dir
  • Filters out non-files
  • Normalises path separators to forward slashes

Error propagation uses Result throughout with no panics.


63-74: LGTM: Metadata fetching handles absolute path edge cases.

The function correctly strips leading separators from absolute paths before querying relative to the root Dir, and handles the empty-string edge case by falling back to ".".

src/manifest/tests/macros.rs (7)

1-10: LGTM!

Module documentation is present and the import paths correctly reference the refactored module structure.


12-28: LGTM!

Test helper struct and rstest fixture are well-structured and follow coding guidelines for shared test setup.


30-62: LGTM!

Parameterised tests provide good coverage for both successful and error cases of parse_macro_name. The use of rstest with #[case] follows the coding guidelines for reducing test duplication.


64-117: LGTM!

Excellent coverage of macro argument handling scenarios including defaults, positional arguments, keyword arguments, and caller blocks.


119-140: LGTM!

Test correctly exercises the call_macro_value helper with keyword arguments, verifying the kwargs passing convention.


142-158: LGTM!

This test verifies macro caching behaviour by confirming repeated invocations work correctly.


249-266: LGTM!

Use of concat!() for the multiline YAML string follows the coding guidelines. The test correctly verifies that multiple macros can be registered and invoked together.

src/manifest/jinja_macros/mod.rs (5)

1-19: LGTM!

Module documentation clearly explains the purpose and the submodule structure is well-organised.


21-56: LGTM!

Documentation is thorough with examples, and the parsing logic correctly validates macro signatures.


74-93: LGTM!

The implementation correctly handles macro registration. The past review comment about redundant name.clone() at line 91 has been addressed — name is now moved directly into add_function.


105-121: LGTM!

The implementation correctly handles optional macros section and provides good error context.


148-163: LGTM!

The implementation correctly handles the MiniJinja convention for keyword arguments and efficiently builds the argument list.

src/manifest/diagnostics/yaml.rs (7)

1-12: LGTM!

Module documentation has been expanded to explain the purpose and public entry point, addressing the previous review comment.


14-47: LGTM!

The byte offset reconstruction correctly handles UTF-8 multi-byte characters and both Unix and Windows line endings.


49-67: LGTM!

The span calculation correctly handles edge cases at line breaks, and the #[expect] lint suppression includes a clear reason as required.


69-82: LGTM!

The diagnostic struct is correctly structured with miette attributes for source code, labels, and help text.


84-104: LGTM!

The hint functions correctly detect tab indentation and provide contextual hints for common YAML mistakes.


106-136: LGTM!

The function correctly maps YAML parse errors to rich diagnostics with spans and contextual hints.


138-272: LGTM!

Comprehensive test coverage for byte offset calculations including UTF-8, CRLF, and edge cases.

src/manifest/diagnostics/mod.rs (5)

1-13: LGTM!

Module documentation has been expanded to explain the wrapper types and mapping helpers, addressing the previous review comment.


15-62: LGTM!

ManifestSource newtype is well-implemented with appropriate trait implementations and documentation.


64-111: LGTM!

ManifestName newtype follows the same well-structured pattern as ManifestSource.


113-134: LGTM!

The ManifestError enum is correctly structured with proper diagnostic chaining. The doc comment on the source field is now correctly placed before the attributes.


136-159: LGTM!

The DataDiagnostic struct and map_data_error function are correctly implemented. The duplicate diagnostic code in the message has been removed, addressing the previous review comment.

src/manifest/jinja_macros/cache.rs (6)

1-13: LGTM!

Module documentation and imports are appropriate for this internal caching module.


15-64: LGTM!

The macro function factory correctly handles argument adaptation, thread safety assertions, and auto-escape behaviour. Visibility has been appropriately restricted to pub(super).


66-75: LGTM!

The function correctly validates and wraps caller arguments.


77-107: LGTM!

MacroCache correctly uses OnceLock for thread-safe lazy initialisation of the macro instance.


109-148: LGTM!

MacroInstance correctly handles the unsafe lifetime extension with appropriate SAFETY documentation. The thread safety notice clearly explains the constraints.


192-255: LGTM!

CallerAdapter has excellent safety documentation explaining the invariants, thread safety constraints, and usage restrictions. The implementation correctly preserves the outer template state for caller block invocations.

Comment thread src/manifest/glob/mod.rs
Comment thread src/manifest/glob/walk.rs Outdated
- Remove redundant normalized_or_raw function in glob/walk.rs by directly accessing pattern.normalized.
- Improve code clarity by using normalized field directly in glob/mod.rs.

Additionally, add defensive null pointer check after Box::into_raw in jinja_macros/cache.rs for safety, though this is a minor code robustness improvement.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/manifest/jinja_macros/cache.rs (1)

94-102: Remove redundant drop call.

The explicit drop(returned_instance) on line 98 is unnecessary; Rust drops the value automatically when it goes out of scope at the end of the if let block.

     pub(super) fn prepare(&self, env: &Environment) -> anyhow::Result<()> {
         if self.instance.get().is_none() {
             let instance = MacroInstance::new(env, &self.template_name, &self.macro_name)?;
-            if let Err(returned_instance) = self.instance.set(instance) {
-                drop(returned_instance);
-            }
+            let _ = self.instance.set(instance);
         }
         Ok(())
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2fc48f and e8623ba.

📒 Files selected for processing (3)
  • src/manifest/glob/mod.rs (1 hunks)
  • src/manifest/glob/walk.rs (1 hunks)
  • src/manifest/jinja_macros/cache.rs (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during Rust tests in the code itself rather than silencing them.
In Rust, extract meaningfully named helper functions when a function is too long, adhering to separation of concerns and CQRS.
In Rust, group related parameters in meaningfully named structs when a function has too many parameters.
In Rust, consider using Arc to reduce data returned when a function is returning a large error.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
In Rust, prefer immutable data and avoid unnecessary mut bindings.
In Rust, handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code in Rust unless absolutely necessary, and document any usage clearly.
In Rust, place function attributes after doc comments.
In Rust, do not use return in single-line functions.
In Rust, use predicate functions for conditional criteria with more than two branches.
Lints in Rust must not be silenced except as a last resort.
Lint rule suppressions in Rust must be tightly scoped and include a clear reason.
In Rust, prefer expect over allow.
In Rust tests, use rstest fixtures for shared setup.
In Rust tests, replace duplicated tests with #[rstest(...)] parameterized cases.
In Rust, prefer mockall for mocks and stubs.
In Rust, use concat!() to combine long string literals rather than escaping newlines with a backslash.
In Rust, prefer single line versions of functions where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line).
Use NewTypes in Rust to model domain values and eliminate 'integer soup'. Reach for newt-hype when introducing many homogeneous wrappers; add small shims for string-backed wrappers. For path-centric wrappers implement AsRef<Path> along...

Files:

  • src/manifest/glob/mod.rs
  • src/manifest/glob/walk.rs
  • src/manifest/jinja_macros/cache.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/manifest/glob/mod.rs
  • src/manifest/glob/walk.rs
  • src/manifest/jinja_macros/cache.rs
🧬 Code graph analysis (3)
src/manifest/glob/mod.rs (4)
src/manifest/glob/errors.rs (2)
  • create_glob_error (19-45)
  • create_unmatched_brace_error (47-49)
src/manifest/glob/normalize.rs (2)
  • normalize_separators (3-24)
  • force_literal_escapes (59-78)
src/manifest/glob/validate.rs (2)
  • validate_brace_matching (119-127)
  • new (13-20)
src/manifest/glob/walk.rs (2)
  • open_root_dir (7-15)
  • process_glob_entry (17-61)
src/manifest/glob/walk.rs (1)
src/manifest/glob/errors.rs (1)
  • create_glob_error (19-45)
src/manifest/jinja_macros/cache.rs (1)
src/manifest/jinja_macros/mod.rs (1)
  • call_macro_value (148-163)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
  • GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
  • GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
  • GitHub Check: release / build-linux (x86_64-unknown-linux-gnu, linux-amd64, amd64) / Build linux artefacts
  • GitHub Check: Sourcery review
🔇 Additional comments (7)
src/manifest/jinja_macros/cache.rs (5)

1-3: LGTM!

Module-level documentation is present and imports are appropriate. Good use of anyhow::Context for error propagation.


15-64: LGTM!

The visibility reduction to pub(super) aligns with the refactoring goals. Error handling via ? is correct, and the thread-safety assertion at line 45 appropriately guards against cross-thread misuse in debug builds.


66-75: LGTM!

Clean helper with proper error handling and clear error message.


194-257: LGTM!

The CallerAdapter documentation thoroughly explains the safety requirements and thread-ownership invariants. The debug_assert at line 250 catches misuse during development, and the safety comments justify the unsafe impl Send/Sync.


136-140: The safety invariant is properly enforced through register_macro; no changes required.

The unsafe transmute is safe because prepare() and MacroInstance::new() are only called from register_macro() (line 90 in mod.rs), which enforces Environment<'static> in its signature. The Environment reference flows through the call chain unchanged, so the template bytecode genuinely outlives the cached state. The loose signatures on prepare() and new() are acceptable for internal helper functions since their only caller is register_macro.

src/manifest/glob/mod.rs (1)

1-42: Well-structured module orchestration.

The refactored glob module provides clear separation of concerns with proper error handling, conditional compilation for platform-specific behaviour, and appropriate use of Result types. The validation → normalization → expansion flow is easy to follow.

Also applies to: 51-87

src/manifest/glob/walk.rs (1)

1-6: Solid filesystem traversal implementation.

The code correctly handles edge cases (empty stripped paths, non-files, non-UTF-8) with appropriate error propagation. The use of cap_std for capability-restricted access and camino for UTF-8 paths is well-suited to the domain. Path normalisation to forward slashes ensures cross-platform consistency.

Also applies to: 7-15, 22-23, 46-50, 63-74

Comment thread src/manifest/glob/mod.rs Outdated
Comment thread src/manifest/glob/walk.rs
Comment thread src/manifest/glob/walk.rs
Comment thread src/manifest/glob/walk.rs Outdated
Comment thread src/manifest/jinja_macros/cache.rs Outdated
…r handling

- Force literal escapes immediately after normalization on Unix.
- Replace repetitive glob error creation with a helper function.
- Simplify UTF-8 path conversion and metadata fetching error handling.
- Improve MacroStateGuard to return Self directly without error wrapping.

These changes clean up code, reduce duplication, and improve maintainability.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Dec 4, 2025

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In the new GlobPattern type, consider making raw/normalized private and exposing a constructor so that callers can’t construct inconsistent patterns (e.g. with normalized not derived from raw).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the new `GlobPattern` type, consider making `raw`/`normalized` private and exposing a constructor so that callers can’t construct inconsistent patterns (e.g. with `normalized` not derived from `raw`).

## Individual Comments

### Comment 1
<location> `src/manifest/glob/tests.rs:31-36` </location>
<code_context>
-    use anyhow::{Context, Result, anyhow, ensure};
-
-    #[test]
-    fn validate_brace_matching_accepts_balanced_braces() {
-        let pattern = GlobPattern {
-            raw: "{foo,bar}".into(),
-            normalized: None,
-        };
-        assert!(validate_brace_matching(&pattern).is_ok());
-    }
-
-    #[test]
-    fn validate_brace_matching_rejects_unmatched_closing() -> Result<()> {
-        let pattern = GlobPattern {
-            raw: "foo}".into(),
</code_context>

<issue_to_address>
**suggestion (testing):** Brace validation tests miss several important edge-cases (nested braces, braces inside classes, and escaped braces).

The validator manages depth, `in_class`, and `escaped` state, but current tests only cover a simple balanced case and a single trailing `}`. Please add coverage for:

- Nested and multiple brace sets: `"{foo,{bar,baz}}"`, `"{a,b}{c,d}"`.
- Braces inside character classes (ignored): `"[abc{]"`, `"[{}]"`.
- Escaped braces on Unix (where `escaped` is tracked): `"\{foo\}"`, ensuring they don't change `depth`.
- An unmatched opening brace at the end, asserting the specific error message/kind.

This will better exercise the validator state machine and guard against regressions.
</issue_to_address>

### Comment 2
<location> `src/manifest/glob/tests.rs:16-19` </location>
<code_context>
+use tempfile::tempdir;
+
+#[test]
+fn normalize_separators_collapses_mixed_slashes() {
+    let normalized = normalize_separators(r"foo\\bar/baz");
+    #[cfg(unix)]
+    assert_eq!(normalized, "foo//bar/baz");
+    #[cfg(not(unix))]
+    assert!(normalized.contains(std::path::MAIN_SEPARATOR));
+}
</code_context>

<issue_to_address>
**suggestion (testing):** On non-Unix platforms the separator normalization test could assert a stronger, deterministic expectation.

In `normalize_separators_collapses_mixed_slashes`, the non-Unix branch only checks `contains(std::path::MAIN_SEPARATOR)`, which could still pass with unexpected mixes of `\` and `/`. Instead, build the full expected path using `std::path::MAIN_SEPARATOR` (e.g. `format!("foo{sep}{sep}bar{sep}baz", sep = std::path::MAIN_SEPARATOR)`) and assert equality so the test is deterministic and fully verifies normalization on non-Unix platforms.

```suggestion
    #[cfg(unix)]
    assert_eq!(normalized, "foo//bar/baz");
    #[cfg(not(unix))]
    {
        let sep = std::path::MAIN_SEPARATOR;
        let expected = format!("foo{sep}{sep}bar{sep}baz");
        assert_eq!(normalized, expected);
    }
```
</issue_to_address>

### Comment 3
<location> `src/manifest/glob/validate.rs:5` </location>
<code_context>
-    }
-}
-
-struct BraceValidator {
-    state: BraceValidationState,
-    escaped: bool,
</code_context>

<issue_to_address>
**issue (complexity):** Consider replacing the small `BraceValidator` state-struct and helper methods with a single inlined loop in `validate_brace_matching` that manages all state directly.

You can simplify this by inlining the state machine into a single loop and dropping the tiny helper methods that only manipulate internal fields. That keeps all behavior (escaping, character classes, brace depth) but makes the control flow easier to follow.

For example, you can replace `BraceValidator` and its methods with a single `validate_brace_matching` function:

```rust
pub(super) fn validate_brace_matching(pattern: &str) -> Result<(), Error> {
    let mut depth = 0_i32;
    let mut in_class = false;
    let mut escaped = false;
    let mut last_open_pos: Option<usize> = None;

    for (pos, ch) in pattern.char_indices() {
        // handle escape
        if escaped {
            escaped = false;
            continue;
        }

        #[cfg(unix)]
        {
            if ch == '\\' {
                escaped = true;
                continue;
            }
        }

        // update character class state
        match ch {
            '[' if !in_class => in_class = true,
            ']' if in_class => in_class = false,
            _ => {}
        }

        // braces are ignored inside character classes
        if in_class {
            continue;
        }

        // handle braces
        match ch {
            '}' if depth == 0 => {
                return Err(create_unmatched_brace_error(&GlobErrorContext {
                    pattern: pattern.to_owned(),
                    error_char: ch,
                    position: pos,
                    error_type: GlobErrorType::UnmatchedBrace,
                }));
            }
            '{' => {
                depth += 1;
                last_open_pos = Some(pos);
            }
            '}' => {
                depth -= 1;
            }
            _ => {}
        }
    }

    if depth != 0 {
        let pos = last_open_pos.unwrap_or(0);
        Err(create_unmatched_brace_error(&GlobErrorContext {
            pattern: pattern.to_owned(),
            error_char: '{',
            position: pos,
            error_type: GlobErrorType::UnmatchedBrace,
        }))
    } else {
        Ok(())
    }
}
```

This preserves:

- `cfg(unix)` escape handling and the `escaped` toggle.
- Ignoring braces when `in_class == true`.
- Tracking of `depth` and `last_open_pos` for unmatched opening braces.
- Same error construction path.

By keeping all state in one place and using `continue` for escapes/character classes, you avoid the indirection of `process_character`, `handle_escape_sequence`, `handle_character_class`, and `handle_braces`, while keeping the behavior intact.
</issue_to_address>

### Comment 4
<location> `src/manifest/glob/normalize.rs:3` </location>
<code_context>
-    }
-}
-
-pub(crate) fn normalize_separators(pattern: &str) -> String {
-    let native = std::path::MAIN_SEPARATOR;
-    #[cfg(unix)]
</code_context>

<issue_to_address>
**issue (complexity):** Consider inlining the backslash and escape-handling helpers directly into the main loops so all glob-normalisation rules are visible in one place and the call chain is shorter.

You can simplify this without changing behaviour by inlining the single‑use helpers into the main loops. That keeps all rules visible in one place and removes the deep call chain.

### 1. Inline backslash handling in `normalize_separators` (Unix)

Instead of `process_backslash` + `should_preserve_*` + `is_wildcard_continuation_char`, you can keep the logic but express it directly in the loop:

```rust
pub(crate) fn normalize_separators(pattern: &str) -> String {
    let native = std::path::MAIN_SEPARATOR;
    #[cfg(unix)]
    {
        let mut out = String::with_capacity(pattern.len());
        let mut it = pattern.chars().peekable();

        while let Some(c) = it.next() {
            match c {
                '\\' => {
                    match it.peek().copied() {
                        // Escapes we always keep as backslash
                        Some('[' | ']' | '{' | '}') => out.push('\\'),
                        // Wildcards: maybe treat '\' as escape, maybe as separator
                        Some('*' | '?') => {
                            let mut lookahead = it.clone();
                            lookahead.next(); // skip '*' or '?'
                            let preserve = match lookahead.peek() {
                                None => true,
                                Some(&ch) => ch.is_alphanumeric() || ch == '-' || ch == '_',
                            };
                            if preserve {
                                out.push('\\');
                            } else {
                                out.push(native);
                            }
                        }
                        // Any other char: '\' acts as separator
                        Some(_) => out.push(native),
                        // Trailing backslash: keep literal
                        None => out.push('\\'),
                    }
                }
                '/' => out.push(native),
                _ => out.push(c),
            }
        }

        out
    }
    #[cfg(not(unix))]
    {
        pattern.replace('/', &native.to_string())
    }
}
```

This keeps exactly the same decision table but removes `process_backslash`, `should_preserve_backslash_for_bracket`, `should_preserve_backslash_for_wildcard`, and `is_wildcard_continuation_char`, making the separator/escape rule auditable in one place.

If you want to retain the “continuation char” concept, you can keep a *single* small helper:

```rust
#[cfg(unix)]
const fn is_wildcard_continuation_char(ch: char) -> bool {
    ch.is_alphanumeric() || ch == '-' || ch == '_'
}
```

and call it inline in the `match lookahead.peek()` arm. That’s still easy to follow.

### 2. Inline escape replacement in `force_literal_escapes`

`process_escape_sequence` and `get_escape_replacement` are pure, single‑use helpers that just wrap a simple `match`. You can fold them into the main loop so the entire transformation is visible:

```rust
#[cfg(unix)]
pub(super) fn force_literal_escapes(pattern: &str) -> String {
    let mut out = String::with_capacity(pattern.len());
    let mut it = pattern.chars().peekable();
    let mut in_class = false;

    while let Some(c) = it.next() {
        match c {
            '[' if !in_class => {
                in_class = true;
                out.push('[');
            }
            ']' if in_class => {
                in_class = false;
                out.push(']');
            }
            '\\' if !in_class => {
                match it.peek().copied() {
                    Some(next) => match next {
                        '*' => { it.next(); out.push_str("[*]"); }
                        '?' => { it.next(); out.push_str("[?]"); }
                        '[' => { it.next(); out.push_str("[[]"); }
                        ']' => { it.next(); out.push_str("[]]"); }
                        '{' => { it.next(); out.push_str("[{]"); }
                        '}' => { it.next(); out.push_str("[}]"); }
                        // default: keep backslash literal
                        _ => out.push('\\'),
                    },
                    // trailing backslash: keep literal
                    None => out.push('\\'),
                }
            }
            _ => out.push(c),
        }
    }

    out
}
```

This preserves all the existing behaviour (same mapping table and class handling) but removes `process_escape_sequence` and `get_escape_replacement`. The rules are now documented by the `match` arms themselves.

If combining the separator normalisation and escape forcing into a single pass is possible at the call site, you could then pull the combined logic into one function, but even just these two refactors will substantially reduce cognitive load while keeping the feature set intact.
</issue_to_address>

### Comment 5
<location> `src/manifest/diagnostics/mod.rs:150` </location>
<code_context>
-/// so the resulting diagnostic only carries the manifest name and error
-/// message.
-#[must_use]
-pub fn map_data_error(
-    err: serde_json::Error,
-    name: &ManifestName,
</code_context>

<issue_to_address>
**issue (review_instructions):** Add behavioural and unit tests covering `map_data_error` (and its diagnostic formatting) to exercise this changed behaviour.

Update the manifest diagnostics test suite to cover `map_data_error` and the `DataDiagnostic`/`ManifestError` behaviour. Validate the emitted message text, diagnostic code, and wrapping in `ManifestError::Parse` so that the new/changed string formatting (e.g. removal of the previous prefix) is locked in by tests.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural and unit tests are required.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +31 to +36
fn validate_brace_matching_accepts_balanced_braces() {
assert!(validate_brace_matching("{foo,bar}").is_ok());
}

#[test]
fn validate_brace_matching_rejects_unmatched_closing() -> Result<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Brace validation tests miss several important edge-cases (nested braces, braces inside classes, and escaped braces).

The validator manages depth, in_class, and escaped state, but current tests only cover a simple balanced case and a single trailing }. Please add coverage for:

  • Nested and multiple brace sets: "{foo,{bar,baz}}", "{a,b}{c,d}".
  • Braces inside character classes (ignored): "[abc{]", "[{}]".
  • Escaped braces on Unix (where escaped is tracked): "\{foo\}", ensuring they don't change depth.
  • An unmatched opening brace at the end, asserting the specific error message/kind.

This will better exercise the validator state machine and guard against regressions.

Comment on lines +16 to +19
#[cfg(unix)]
assert_eq!(normalized, "foo//bar/baz");
#[cfg(not(unix))]
assert!(normalized.contains(std::path::MAIN_SEPARATOR));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): On non-Unix platforms the separator normalization test could assert a stronger, deterministic expectation.

In normalize_separators_collapses_mixed_slashes, the non-Unix branch only checks contains(std::path::MAIN_SEPARATOR), which could still pass with unexpected mixes of \ and /. Instead, build the full expected path using std::path::MAIN_SEPARATOR (e.g. format!("foo{sep}{sep}bar{sep}baz", sep = std::path::MAIN_SEPARATOR)) and assert equality so the test is deterministic and fully verifies normalization on non-Unix platforms.

Suggested change
#[cfg(unix)]
assert_eq!(normalized, "foo//bar/baz");
#[cfg(not(unix))]
assert!(normalized.contains(std::path::MAIN_SEPARATOR));
#[cfg(unix)]
assert_eq!(normalized, "foo//bar/baz");
#[cfg(not(unix))]
{
let sep = std::path::MAIN_SEPARATOR;
let expected = format!("foo{sep}{sep}bar{sep}baz");
assert_eq!(normalized, expected);
}

use super::{GlobErrorContext, GlobErrorType, create_unmatched_brace_error};
use minijinja::Error;

struct BraceValidator {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider replacing the small BraceValidator state-struct and helper methods with a single inlined loop in validate_brace_matching that manages all state directly.

You can simplify this by inlining the state machine into a single loop and dropping the tiny helper methods that only manipulate internal fields. That keeps all behavior (escaping, character classes, brace depth) but makes the control flow easier to follow.

For example, you can replace BraceValidator and its methods with a single validate_brace_matching function:

pub(super) fn validate_brace_matching(pattern: &str) -> Result<(), Error> {
    let mut depth = 0_i32;
    let mut in_class = false;
    let mut escaped = false;
    let mut last_open_pos: Option<usize> = None;

    for (pos, ch) in pattern.char_indices() {
        // handle escape
        if escaped {
            escaped = false;
            continue;
        }

        #[cfg(unix)]
        {
            if ch == '\\' {
                escaped = true;
                continue;
            }
        }

        // update character class state
        match ch {
            '[' if !in_class => in_class = true,
            ']' if in_class => in_class = false,
            _ => {}
        }

        // braces are ignored inside character classes
        if in_class {
            continue;
        }

        // handle braces
        match ch {
            '}' if depth == 0 => {
                return Err(create_unmatched_brace_error(&GlobErrorContext {
                    pattern: pattern.to_owned(),
                    error_char: ch,
                    position: pos,
                    error_type: GlobErrorType::UnmatchedBrace,
                }));
            }
            '{' => {
                depth += 1;
                last_open_pos = Some(pos);
            }
            '}' => {
                depth -= 1;
            }
            _ => {}
        }
    }

    if depth != 0 {
        let pos = last_open_pos.unwrap_or(0);
        Err(create_unmatched_brace_error(&GlobErrorContext {
            pattern: pattern.to_owned(),
            error_char: '{',
            position: pos,
            error_type: GlobErrorType::UnmatchedBrace,
        }))
    } else {
        Ok(())
    }
}

This preserves:

  • cfg(unix) escape handling and the escaped toggle.
  • Ignoring braces when in_class == true.
  • Tracking of depth and last_open_pos for unmatched opening braces.
  • Same error construction path.

By keeping all state in one place and using continue for escapes/character classes, you avoid the indirection of process_character, handle_escape_sequence, handle_character_class, and handle_braces, while keeping the behavior intact.

@@ -0,0 +1,111 @@
//! Separator and escape normalisation for glob patterns.

pub(crate) fn normalize_separators(pattern: &str) -> String {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider inlining the backslash and escape-handling helpers directly into the main loops so all glob-normalisation rules are visible in one place and the call chain is shorter.

You can simplify this without changing behaviour by inlining the single‑use helpers into the main loops. That keeps all rules visible in one place and removes the deep call chain.

1. Inline backslash handling in normalize_separators (Unix)

Instead of process_backslash + should_preserve_* + is_wildcard_continuation_char, you can keep the logic but express it directly in the loop:

pub(crate) fn normalize_separators(pattern: &str) -> String {
    let native = std::path::MAIN_SEPARATOR;
    #[cfg(unix)]
    {
        let mut out = String::with_capacity(pattern.len());
        let mut it = pattern.chars().peekable();

        while let Some(c) = it.next() {
            match c {
                '\\' => {
                    match it.peek().copied() {
                        // Escapes we always keep as backslash
                        Some('[' | ']' | '{' | '}') => out.push('\\'),
                        // Wildcards: maybe treat '\' as escape, maybe as separator
                        Some('*' | '?') => {
                            let mut lookahead = it.clone();
                            lookahead.next(); // skip '*' or '?'
                            let preserve = match lookahead.peek() {
                                None => true,
                                Some(&ch) => ch.is_alphanumeric() || ch == '-' || ch == '_',
                            };
                            if preserve {
                                out.push('\\');
                            } else {
                                out.push(native);
                            }
                        }
                        // Any other char: '\' acts as separator
                        Some(_) => out.push(native),
                        // Trailing backslash: keep literal
                        None => out.push('\\'),
                    }
                }
                '/' => out.push(native),
                _ => out.push(c),
            }
        }

        out
    }
    #[cfg(not(unix))]
    {
        pattern.replace('/', &native.to_string())
    }
}

This keeps exactly the same decision table but removes process_backslash, should_preserve_backslash_for_bracket, should_preserve_backslash_for_wildcard, and is_wildcard_continuation_char, making the separator/escape rule auditable in one place.

If you want to retain the “continuation char” concept, you can keep a single small helper:

#[cfg(unix)]
const fn is_wildcard_continuation_char(ch: char) -> bool {
    ch.is_alphanumeric() || ch == '-' || ch == '_'
}

and call it inline in the match lookahead.peek() arm. That’s still easy to follow.

2. Inline escape replacement in force_literal_escapes

process_escape_sequence and get_escape_replacement are pure, single‑use helpers that just wrap a simple match. You can fold them into the main loop so the entire transformation is visible:

#[cfg(unix)]
pub(super) fn force_literal_escapes(pattern: &str) -> String {
    let mut out = String::with_capacity(pattern.len());
    let mut it = pattern.chars().peekable();
    let mut in_class = false;

    while let Some(c) = it.next() {
        match c {
            '[' if !in_class => {
                in_class = true;
                out.push('[');
            }
            ']' if in_class => {
                in_class = false;
                out.push(']');
            }
            '\\' if !in_class => {
                match it.peek().copied() {
                    Some(next) => match next {
                        '*' => { it.next(); out.push_str("[*]"); }
                        '?' => { it.next(); out.push_str("[?]"); }
                        '[' => { it.next(); out.push_str("[[]"); }
                        ']' => { it.next(); out.push_str("[]]"); }
                        '{' => { it.next(); out.push_str("[{]"); }
                        '}' => { it.next(); out.push_str("[}]"); }
                        // default: keep backslash literal
                        _ => out.push('\\'),
                    },
                    // trailing backslash: keep literal
                    None => out.push('\\'),
                }
            }
            _ => out.push(c),
        }
    }

    out
}

This preserves all the existing behaviour (same mapping table and class handling) but removes process_escape_sequence and get_escape_replacement. The rules are now documented by the match arms themselves.

If combining the separator normalisation and escape forcing into a single pass is possible at the call site, you could then pull the combined logic into one function, but even just these two refactors will substantially reduce cognitive load while keeping the feature set intact.

@leynos leynos merged commit a3cc12e into main Dec 4, 2025
15 checks passed
@leynos leynos deleted the terragon/refactor-manifest-submodules-d0dzyn branch December 4, 2025 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant