Add template library path filters by leynos · Pull Request #165 · leynos/netsuke

leynos · 2025-09-23T16:16:33Z

Summary

implement path and file filters in the Minijinja stdlib, including hashing and expanduser support
add rstest coverage for the new filters and document the design updates
wire in sha1/md5 dependencies for hashing

Testing

make lint
make test

https://chatgpt.com/codex/tasks/task_e_68d2546001688322ab61db963c05f88d

Summary by Sourcery

Enhance the Minijinja standard library by refactoring filesystem helpers, registering both file-type tests and comprehensive path/file filters (including hashing, expanduser, and realpath), and bolstering documentation and test coverage

New Features:

Add path filters (basename, dirname, with_suffix, relative_to, realpath, expanduser, size, contents, linecount, hash, digest) to the Minijinja stdlib
Register platform‐aware file‐type tests (dir, file, symlink, pipe, block_device, char_device, device) in a dedicated setup function

Enhancements:

Refactor filesystem helpers to use cap-std and camino for metadata and canonicalization
Introduce unified error handling and helper routines for path operations
Support multiple hashing algorithms (sha256 default, sha512, sha1, md5) for content hashing and digest

Build:

Add sha1 and md5 dependencies to Cargo.toml

Documentation:

Update netsuke-design with implementation notes for new filters
Mark path and file filters as completed in the roadmap

Tests:

Add rstest‐based integration tests covering all new path and file filters

sourcery-ai · 2025-09-23T16:16:39Z

Reviewer's Guide

This PR refactors stdlib.rs to cleanly separate file tests from path filters, adds a comprehensive suite of path filters (basename, dirname, with_suffix, relative_to, realpath, expanduser, size, contents, linecount, hash and digest) implemented with cap-std and custom helper functions, integrates sha1 and md5 hashing support alongside sha2, updates documentation and roadmap entries, and introduces end-to-end rstest coverage for all new filters.

File-Level Changes

Change	Details	Files
Introduce and register new path/file filters	Added register_path_filters with filters: basename, dirname, with_suffix, relative_to, realpath, expanduser, size, contents, linecount, hash, digest Implemented each filter using helper functions that handle UTF-8, canonicalization, and error conversion	`src/stdlib.rs`
Refactor stdlib.rs registration logic	Split register() into register_file_tests and register_path_filters Moved file tests (dir, file, symlink, etc.) into register_file_tests and restructured is_file_type logic	`src/stdlib.rs`
Add hashing support with multiple algorithms	Imported sha1 and md5 crates alongside sha2, and wired dependencies in Cargo.toml Implemented compute_hash and compute_digest to support sha256, sha512, sha1, md5	`src/stdlib.rs` `Cargo.toml`
Update documentation for new features	Revised module doc comment in stdlib.rs to reflect new filters Added implementation notes in docs/netsuke-design.md and marked filters done in docs/roadmap.md	`src/stdlib.rs` `docs/netsuke-design.md` `docs/roadmap.md`
Add comprehensive rstest coverage for filters	Created tests/std_filter_tests.rs with fixtures and test cases for each filter Removed legacy file-type predicate tests from stdlib.rs	`tests/std_filter_tests.rs` `src/stdlib.rs`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2025-09-23T16:16:40Z

Note

Reviews paused

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

New Features
- Added platform-aware file tests (e.g., directory, file, symlink).
- Introduced path and file filters: basename, dirname, with_suffix, relative_to, realpath, expanduser, size, contents, linecount, hash, digest.
- Hashing now supports sha256 (default), sha512, sha1, and md5.
Documentation
- Added implementation notes for filters, including encoding support, hashing options, and path handling.
- Updated roadmap to mark path and file filters as complete.
Tests
- Added comprehensive tests covering new filters and file/path behaviours.

Walkthrough

Add sha1 and md-5 dependencies. Implement platform-aware file tests and a suite of path/content filters (basename, dirname, with_suffix, relative_to, realpath, expanduser, size, contents, linecount, hash, digest) with helpers and error mapping. Update docs and add tests.

Changes

Cohort / File(s)	Summary
Dependencies `Cargo.toml`	Add `sha1 = "0.10"` and `md-5 = "0.10"` to `[dependencies]`.
Documentation `docs/netsuke-design.md`, `docs/roadmap.md`	Add implementation notes for filters (FS sandboxing via cap-std, parent canonicalisation for realpath, UTF‑8-only contents/linecount, supported hash algorithms, with_suffix behaviour). Mark path/file filters as completed in roadmap.
Stdlib Implementation `src/stdlib.rs`	Replace prior file-type predicate approach with modular registration: `register_file_tests` (dir/file/symlink; Unix-only: fifo/block/char/device) and `register_path_filters` (basename, dirname, with_suffix, relative_to, realpath, expanduser, size, contents, linecount, hash, digest). Add UTF‑8 path helpers, canonicalisation, read/metadata helpers, hashing (sha256/sha512/sha1/md5), hex encoding, digest trimming, non‑Unix fallbacks and unified IO→template error mapping.
Tests `tests/std_filter_tests.rs`	Add integration tests exercising basename, dirname, with_suffix, realpath, contents, linecount, hash, digest, size, expanduser using a temporary filesystem fixture, symlinks, and HOME overriding for expanduser.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Template
  participant Environment
  participant Filters
  participant FS as Filesystem
  participant Hash as Hashers

  User->>Template: Render template with path context
  Template->>Environment: Resolve and invoke filters
  Environment->>Filters: Apply filter (e.g., realpath / hash)
  alt File test / metadata / transform
    Filters->>FS: Open / stat / read
    FS-->>Filters: Data / metadata
  end
  opt Hash / Digest
    Filters->>Hash: Compute (sha256 / sha512 / sha1 / md5)
    Hash-->>Filters: Digest bytes
  end
  Filters-->>Environment: Return result (string/number/bool)
  Environment-->>Template: Substitute output
  Template-->>User: Rendered text

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

Paths aligned, symlinks bowed,
Bytes and lines now clearly ploughed.
Names reshaped and homes revealed,
Hashes whispered, truths unsealed.
Filters hum — the tree compiled.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	Confirm that the title "Add template library path filters" succinctly summarises the primary change: adding a suite of path and file filters to the templating standard library along with tests, documentation updates, and the sha1/md5 dependency additions reflected in src/stdlib.rs, tests/std_filter_tests.rs, docs, and Cargo.toml.
Description Check	✅ Passed	Confirm that the PR description accurately describes the implemented features (path and file filters, hashing and expanduser support), the added tests and documentation updates, and the inclusion of sha1/md5 dependencies, and that it aligns with the provided file diffs and objectives.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

Consider extracting the set of path helper functions (e.g. canonicalize_any, expanduser, with_suffix, etc.) into a dedicated submodule to keep stdlib.rs concise and easier to navigate.
The file‐test registration currently adds block_device unconditionally but is_block_device is only defined under #[cfg(unix)]; add proper stubs or cfg gating so it compiles and returns false on non‐Unix platforms.
The expanduser filter only checks HOME/USERPROFILE and doesn’t handle Windows drive letters or UNC paths—either document that limitation or add more robust support for Windows path expansion.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider extracting the set of path helper functions (e.g. canonicalize_any, expanduser, with_suffix, etc.) into a dedicated submodule to keep stdlib.rs concise and easier to navigate.
- The file‐test registration currently adds `block_device` unconditionally but `is_block_device` is only defined under `#[cfg(unix)]`; add proper stubs or `cfg` gating so it compiles and returns false on non‐Unix platforms.
- The `expanduser` filter only checks HOME/USERPROFILE and doesn’t handle Windows drive letters or UNC paths—either document that limitation or add more robust support for Windows path expansion.

## Individual Comments

### Comment 1
<location> `src/stdlib.rs:386-393` </location>
<code_context>
+    out
+}
+
+fn io_to_error(path: &Utf8Path, action: &str, err: io::Error) -> Error {
+    let message = if err.kind() == io::ErrorKind::NotFound {
+        format!("{action} failed for {path}: not found")
+    } else {
+        format!("{action} failed for {path}: {err}")
+    };
+    Error::new(ErrorKind::InvalidOperation, message).with_source(err)
+}

</code_context>

<issue_to_address>
**suggestion:** io_to_error always maps errors to InvalidOperation, which may obscure error semantics.

Mapping all IO errors to InvalidOperation reduces error granularity. Please map io::ErrorKind variants to corresponding ErrorKind values for clearer error handling.

```suggestion
fn io_to_error(path: &Utf8Path, action: &str, err: io::Error) -> Error {
    use std::io::ErrorKind as IoErrorKind;
    let error_kind = match err.kind() {
        IoErrorKind::NotFound => ErrorKind::NotFound,
        IoErrorKind::PermissionDenied => ErrorKind::PermissionDenied,
        IoErrorKind::AlreadyExists => ErrorKind::AlreadyExists,
        IoErrorKind::InvalidInput => ErrorKind::InvalidArgument,
        IoErrorKind::InvalidData => ErrorKind::InvalidArgument,
        IoErrorKind::TimedOut => ErrorKind::Timeout,
        IoErrorKind::Interrupted => ErrorKind::Interrupted,
        IoErrorKind::WouldBlock => ErrorKind::WouldBlock,
        IoErrorKind::BrokenPipe => ErrorKind::BrokenPipe,
        IoErrorKind::ConnectionRefused => ErrorKind::ConnectionRefused,
        IoErrorKind::ConnectionReset => ErrorKind::ConnectionReset,
        IoErrorKind::ConnectionAborted => ErrorKind::ConnectionAborted,
        IoErrorKind::NotConnected => ErrorKind::NotConnected,
        IoErrorKind::AddrInUse => ErrorKind::AddrInUse,
        IoErrorKind::AddrNotAvailable => ErrorKind::AddrNotAvailable,
        IoErrorKind::UnexpectedEof => ErrorKind::UnexpectedEof,
        _ => ErrorKind::InvalidOperation,
    };
    let message = if err.kind() == IoErrorKind::NotFound {
        format!("{action} failed for {path}: not found")
    } else {
        format!("{action} failed for {path}: {err}")
    };
    Error::new(error_kind, message).with_source(err)
}
```
</issue_to_address>

### Comment 2
<location> `tests/std_filter_tests.rs:107-116` </location>
<code_context>
+}
+
+#[rstest]
+fn hash_and_digest_filters(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
+    let (_temp, root) = filter_workspace;
+    let mut env = Environment::new();
+    stdlib::register(&mut env);
+    let file = root.join("file");
+    let hash = render(&mut env, "hash", "{{ path | hash }}", &file);
+    assert_eq!(
+        hash,
+        "3a6eb0790f39ac87c94f3856b2dd2c5d110e6811602261a9a923d3bb23adc8b7"
+    );
+    let digest = render(&mut env, "digest", "{{ path | digest(8) }}", &file);
+    assert_eq!(digest, "3a6eb079");
+}
+
</code_context>

<issue_to_address>
**suggestion (testing):** Missing tests for alternative hash algorithms in hash and digest filters.

Please add tests for md5, sha1, and sha512 algorithms, and verify that unsupported algorithms produce appropriate errors.

Suggested implementation:

```rust
#[rstest]
fn hash_and_digest_filters(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
    let (_temp, root) = filter_workspace;
    let mut env = Environment::new();
    stdlib::register(&mut env);
    let file = root.join("file");

    // Default algorithm (sha256)
    let hash = render(&mut env, "hash", "{{ path | hash }}", &file);
    assert_eq!(
        hash,
        "3a6eb0790f39ac87c94f3856b2dd2c5d110e6811602261a9a923d3bb23adc8b7"
    );
    let digest = render(&mut env, "digest", "{{ path | digest(8) }}", &file);
    assert_eq!(digest, "3a6eb079");

    // md5
    let hash_md5 = render(&mut env, "hash_md5", "{{ path | hash('md5') }}", &file);
    assert_eq!(hash_md5, "8d777f385d3dfec8815d20f7496026dc");
    let digest_md5 = render(&mut env, "digest_md5", "{{ path | digest(8, 'md5') }}", &file);
    assert_eq!(digest_md5, "8d777f38");

    // sha1
    let hash_sha1 = render(&mut env, "hash_sha1", "{{ path | hash('sha1') }}", &file);
    assert_eq!(hash_sha1, "f0a3e2c8e6a2a1b6c8a3c6a6e2e3e6a2a1b6c8a3");
    let digest_sha1 = render(&mut env, "digest_sha1", "{{ path | digest(8, 'sha1') }}", &file);
    assert_eq!(digest_sha1, &hash_sha1[..8]);

    // sha512
    let hash_sha512 = render(&mut env, "hash_sha512", "{{ path | hash('sha512') }}", &file);
    assert_eq!(
        hash_sha512,
        "b5c6e2e3e6a2a1b6c8a3c6a6e2e3e6a2a1b6c8a3b5c6e2e3e6a2a1b6c8a3c6a6e2e3e6a2a1b6c8a3b5c6e2e3e6a2a1b6c8a3c6a6e2e3e6a2a1b6c8a3b5c6e2e3e6a2a1b6c8a3c6a6e2e3e6a2a1b6c8a3"
    );
    let digest_sha512 = render(&mut env, "digest_sha512", "{{ path | digest(8, 'sha512') }}", &file);
    assert_eq!(digest_sha512, &hash_sha512[..8]);

    // Unsupported algorithm
    let result = std::panic::catch_unwind(|| {
        render(&mut env, "hash_unsupported", "{{ path | hash('unsupported') }}", &file)
    });
    assert!(result.is_err(), "Unsupported hash algorithm should produce an error");

    let result = std::panic::catch_unwind(|| {
        render(&mut env, "digest_unsupported", "{{ path | digest(8, 'unsupported') }}", &file)
    });
    assert!(result.is_err(), "Unsupported digest algorithm should produce an error");
}

```

- The expected hash values for sha1 and sha512 are placeholders. You should replace them with the actual hash outputs for the file contents ("data") using those algorithms.
- If your `render` function or filter implementation returns errors differently (e.g., via Result), adjust the error assertions accordingly.
</issue_to_address>

### Comment 3
<location> `tests/std_filter_tests.rs:131-145` </location>
<code_context>
+}
+
+#[rstest]
+fn expanduser_filter(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
+    let (_temp, root) = filter_workspace;
+    let mut env = Environment::new();
+    stdlib::register(&mut env);
+    let _lock = EnvLock::acquire();
+    let _guard = EnvVarGuard::set("HOME", root.as_str());
+    let home = render(
+        &mut env,
+        "expanduser",
+        "{{ path | expanduser }}",
+        &Utf8PathBuf::from("~/workspace"),
+    );
+    assert_eq!(home, root.join("workspace").as_str());
+}
</code_context>

<issue_to_address>
**suggestion (testing):** No test for expanduser error conditions (e.g., missing HOME, unsupported ~user expansion).

Please add tests for cases where HOME is unset and for unsupported ~user expansions to verify proper error handling.

```suggestion
#[rstest]
fn expanduser_filter(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
    let (_temp, root) = filter_workspace;
    let mut env = Environment::new();
    stdlib::register(&mut env);
    let _lock = EnvLock::acquire();
    let _guard = EnvVarGuard::set("HOME", root.as_str());
    let home = render(
        &mut env,
        "expanduser",
        "{{ path | expanduser }}",
        &Utf8PathBuf::from("~/workspace"),
    );
    assert_eq!(home, root.join("workspace").as_str());
}

#[rstest]
fn expanduser_filter_missing_home(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
    let (_temp, _root) = filter_workspace;
    let mut env = Environment::new();
    stdlib::register(&mut env);
    let _lock = EnvLock::acquire();
    // Unset HOME
    let _guard = EnvVarGuard::unset("HOME");
    let result = std::panic::catch_unwind(|| {
        render(
            &mut env,
            "expanduser_missing_home",
            "{{ path | expanduser }}",
            &Utf8PathBuf::from("~/workspace"),
        )
    });
    assert!(result.is_err(), "expanduser should error when HOME is unset");
}

#[rstest]
fn expanduser_filter_unsupported_user(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
    let (_temp, _root) = filter_workspace;
    let mut env = Environment::new();
    stdlib::register(&mut env);
    let _lock = EnvLock::acquire();
    let _guard = EnvVarGuard::set("HOME", "/home/test");
    let result = std::panic::catch_unwind(|| {
        render(
            &mut env,
            "expanduser_unsupported_user",
            "{{ path | expanduser }}",
            &Utf8PathBuf::from("~otheruser/workspace"),
        )
    });
    assert!(result.is_err(), "expanduser should error for unsupported ~user expansion");
}
```
</issue_to_address>

### Comment 4
<location> `tests/std_filter_tests.rs:51-78` </location>
<code_context>
+}
+
+#[rstest]
+fn with_suffix_filter(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
+    let (_temp, root) = filter_workspace;
+    let mut env = Environment::new();
+    stdlib::register(&mut env);
+    let file = root.join("file.tar.gz");
+    Dir::open_ambient_dir(&root, ambient_authority())
+        .expect("dir")
+        .write("file.tar.gz", b"data")
+        .expect("write");
+    let first = render(
+        &mut env,
+        "suffix",
+        "{{ path | with_suffix('.log') }}",
+        &file,
+    );
+    assert_eq!(first, root.join("file.tar.log").as_str());
+    let second = render(
+        &mut env,
+        "suffix_alt",
+        "{{ path | with_suffix('.zip', 2) }}",
+        &file,
+    );
+    assert_eq!(second, root.join("file.zip").as_str());
+}
+
</code_context>

<issue_to_address>
**suggestion (testing):** No test for with_suffix filter with empty separator or excessive count.

Please add tests to cover cases where the separator is empty and where the count exceeds the number of separators in the filename, ensuring correct error handling and expected behavior.

```suggestion
+}
+
+#[rstest]
+fn with_suffix_filter(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
+    let (_temp, root) = filter_workspace;
+    let mut env = Environment::new();
+    stdlib::register(&mut env);
+    let file = root.join("file.tar.gz");
+    Dir::open_ambient_dir(&root, ambient_authority())
+        .expect("dir")
+        .write("file.tar.gz", b"data")
+        .expect("write");
+    let first = render(
+        &mut env,
+        "suffix",
+        "{{ path | with_suffix('.log') }}",
+        &file,
+    );
+    assert_eq!(first, root.join("file.tar.log").as_str());
+    let second = render(
+        &mut env,
+        "suffix_alt",
+        "{{ path | with_suffix('.zip', 2) }}",
+        &file,
+    );
+    assert_eq!(second, root.join("file.zip").as_str());
+}
+
+#[rstest]
+fn with_suffix_filter_empty_separator(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
+    let (_temp, root) = filter_workspace;
+    let mut env = Environment::new();
+    stdlib::register(&mut env);
+    let file = root.join("file.tar.gz");
+    Dir::open_ambient_dir(&root, ambient_authority())
+        .expect("dir")
+        .write("file.tar.gz", b"data")
+        .expect("write");
+    // Expecting that an empty separator does not change the filename
+    let output = render(
+        &mut env,
+        "suffix_empty",
+        "{{ path | with_suffix('', 1) }}",
+        &file,
+    );
+    assert_eq!(output, root.join("file.tar").as_str());
+}
+
+#[rstest]
+fn with_suffix_filter_excessive_count(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
+    let (_temp, root) = filter_workspace;
+    let mut env = Environment::new();
+    stdlib::register(&mut env);
+    let file = root.join("file.tar.gz");
+    Dir::open_ambient_dir(&root, ambient_authority())
+        .expect("dir")
+        .write("file.tar.gz", b"data")
+        .expect("write");
+    // Count exceeds number of separators, should replace entire filename suffix
+    let output = render(
+        &mut env,
+        "suffix_excessive",
+        "{{ path | with_suffix('.bak', 10) }}",
+        &file,
+    );
+    // If count exceeds, it should replace everything after the first dot
+    assert_eq!(output, root.join("file.bak").as_str());
+}
+
```
</issue_to_address>

### Comment 5
<location> `tests/std_filter_tests.rs:90-104` </location>
<code_context>
+}
+
+#[rstest]
+fn contents_and_linecount_filters(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
+    let (_temp, root) = filter_workspace;
+    let mut env = Environment::new();
+    stdlib::register(&mut env);
+    let file = root.join("file");
+    let text = render(&mut env, "contents", "{{ path | contents }}", &file);
+    assert_eq!(text, "data");
+    let lines = render(
+        &mut env,
+        "linecount",
+        "{{ path | linecount }}",
+        &root.join("lines.txt"),
+    );
+    assert_eq!(lines.parse::<usize>().expect("usize"), 3);
+}
+
</code_context>

<issue_to_address>
**suggestion (testing):** No test for contents filter with unsupported encoding.

Add a test to ensure the filter raises an error when a non-UTF-8 encoding is requested.

```suggestion
fn contents_and_linecount_filters(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
    let (_temp, root) = filter_workspace;
    let mut env = Environment::new();
    stdlib::register(&mut env);
    let file = root.join("file");
    let text = render(&mut env, "contents", "{{ path | contents }}", &file);
    assert_eq!(text, "data");
    let lines = render(
        &mut env,
        "linecount",
        "{{ path | linecount }}",
        &root.join("lines.txt"),
    );
    assert_eq!(lines.parse::<usize>().expect("usize"), 3);
}

#[rstest]
fn contents_filter_unsupported_encoding(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
    let (_temp, root) = filter_workspace;
    let mut env = Environment::new();
    stdlib::register(&mut env);
    let file = root.join("non_utf8_file");
    // Write invalid UTF-8 bytes
    {
        use std::fs::File;
        use std::io::Write;
        let mut f = File::create(&file).expect("create file");
        f.write_all(&[0xff, 0xfe, 0xfd]).expect("write non-utf8");
    }
    let result = std::panic::catch_unwind(|| {
        render(&mut env, "contents", "{{ path | contents }}", &file)
    });
    assert!(result.is_err(), "contents filter should error on non-UTF-8 file");
}
```
</issue_to_address>

### Comment 6
<location> `tests/std_filter_tests.rs:79-87` </location>
<code_context>
+}
+
+#[rstest]
+fn realpath_filter(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
+    let (_temp, root) = filter_workspace;
+    let mut env = Environment::new();
+    stdlib::register(&mut env);
+    let link = root.join("link");
+    let output = render(&mut env, "realpath", "{{ path | realpath }}", &link);
+    assert_eq!(output, root.join("file").as_str());
+}
+
</code_context>

<issue_to_address>
**suggestion (testing):** No test for realpath filter on non-existent or root paths.

Please add tests for the `realpath` filter with non-existent, empty, and root directory paths to verify correct handling of these edge cases.

```suggestion
#[rstest]
fn realpath_filter(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
    let (_temp, root) = filter_workspace;
    let mut env = Environment::new();
    stdlib::register(&mut env);

    // Test symlink resolution
    let link = root.join("link");
    let output = render(&mut env, "realpath", "{{ path | realpath }}", &link);
    assert_eq!(output, root.join("file").as_str());

    // Test non-existent path
    let non_existent = root.join("does_not_exist");
    let output = render(&mut env, "realpath_nonexistent", "{{ path | realpath }}", &non_existent);
    // Should return the input path as string (or handle error gracefully)
    assert_eq!(output, non_existent.as_str());

    // Test empty path
    let empty_path = Utf8PathBuf::from("");
    let output = render(&mut env, "realpath_empty", "{{ path | realpath }}", &empty_path);
    // Should return empty string or handle gracefully
    assert_eq!(output, "");

    // Test root directory
    #[cfg(unix)]
    {
        let root_dir = Utf8PathBuf::from("/");
        let output = render(&mut env, "realpath_root", "{{ path | realpath }}", &root_dir);
        assert_eq!(output, "/");
    }
    #[cfg(windows)]
    {
        // On Windows, root could be "C:\"
        use std::env;
        let root_dir = Utf8PathBuf::from(
            env::var("SystemDrive").unwrap_or_else(|_| "C:".to_string()) + "\\"
        );
        let output = render(&mut env, "realpath_root", "{{ path | realpath }}", &root_dir);
        assert_eq!(output, root_dir.as_str());
    }
}
```
</issue_to_address>

### Comment 7
<location> `tests/std_filter_tests.rs:121-129` </location>
<code_context>
+}
+
+#[rstest]
+fn size_filter(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
+    let (_temp, root) = filter_workspace;
+    let mut env = Environment::new();
+    stdlib::register(&mut env);
+    let file = root.join("file");
+    let size = render(&mut env, "size", "{{ path | size }}", &file);
+    assert_eq!(size.parse::<u64>().expect("u64"), 4);
+}
+
</code_context>

<issue_to_address>
**suggestion (testing):** No test for size filter on non-existent file.

Add a test to check the behavior of the `size` filter when used on a non-existent file, ensuring it handles the case correctly.

```suggestion
#[rstest]
fn size_filter(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
    let (_temp, root) = filter_workspace;
    let mut env = Environment::new();
    stdlib::register(&mut env);
    let file = root.join("file");
    let size = render(&mut env, "size", "{{ path | size }}", &file);
    assert_eq!(size.parse::<u64>().expect("u64"), 4);
}

#[rstest]
fn size_filter_nonexistent_file(filter_workspace: (tempfile::TempDir, Utf8PathBuf)) {
    let (_temp, root) = filter_workspace;
    let mut env = Environment::new();
    stdlib::register(&mut env);
    let missing_file = root.join("does_not_exist.txt");
    let size = render(&mut env, "size", "{{ path | size }}", &missing_file);
    // Adjust the expected value below to match the filter's behavior for missing files.
    // If it returns "0" for missing files:
    assert_eq!(size.parse::<u64>().expect("u64"), 0);
    // If it errors or returns a specific string, update the assertion accordingly.
}
```
</issue_to_address>

### Comment 8
<location> `src/stdlib.rs:183` </location>
<code_context>
+    path.file_name().unwrap_or(path.as_str()).to_string()
+}
+
+fn parent_dir(path: &Utf8Path) -> Result<(Dir, String, Utf8PathBuf), io::Error> {
+    let (dir_path, name) = dir_and_basename(path);
+    let dir = Dir::open_ambient_dir(&dir_path, ambient_authority())?;
</code_context>

<issue_to_address>
**issue (complexity):** Consider consolidating helper functions and repetitive filter registrations using unified functions and macros to reduce boilerplate.

```markdown
A few opportunities to collapse boilerplate without losing functionality:

1. **Unify `parent_dir` / `open_parent_dir` / `dir_and_basename`**  
   Instead of three tiny helpers, just have one that
   1) normalises empty/“.” paths,
   2) splits into `(Dir, basename, parent_path)`,
   3) maps all I/O errors to `minijinja::Error`.  

   ```rust
   fn open_parent(path: &Utf8Path) -> Result<(Dir, String, Utf8PathBuf), Error> {
       // treat "" as "."
       let path = if path.as_str().is_empty() { Utf8Path::new(".") } else { path };
       let parent = path.parent().unwrap_or(Utf8Path::new("."));
       let name = path
           .file_name()
           .map(str::to_string)
           .unwrap_or_else(|| ".".to_string());
       let dir = Dir::open_ambient_dir(parent, ambient_authority())
           .map_err(|e| io_to_error(path, "open directory", e))?;
       Ok((dir, name, parent.to_path_buf()))
   }
   ```
   Then in `is_file_type`, `file_size`, `read_text`, etc., just call
   ```rust
   let (dir, name, _) = open_parent(path)?;
   ```

2. **Merge `compute_hash` + `compute_digest` + `encode_hex`**  
   Use the `digest` trait to avoid repeating each algorithm branch:
   ```rust
   use digest::{DynDigest, Digest};

   fn compute_digest(
       path: &Utf8Path,
       alg: &str,
       len: Option<usize>,
   ) -> Result<String, Error> {
       let bytes = read_bytes(path)?;
       let mut hasher: Box<dyn DynDigest> = match alg.to_ascii_lowercase().as_str() {
           "sha256" => Box::new(Sha256::new()),
           "sha512" => Box::new(Sha512::new()),
           "sha1"   => Box::new(Sha1::new()),
           "md5"    => Box::new(Md5::new()),
           other    => {
               return Err(Error::new(
                   ErrorKind::InvalidOperation,
                   format!("unsupported algorithm '{}'", other),
               ))
           }
       };
       hasher.update(&bytes);
       let full = hasher.finalize();
       let hex = hex::encode(full);
       let end = len.unwrap_or(hex.len()).min(hex.len());
       Ok(hex[..end].to_string())
   }
   ```
   Remove the old `compute_hash`, `encode_hex`, and shrink your template filters to one:
   ```rust
   env.add_filter("digest", |raw: String, len: Option<usize>, alg: Option<String>| {
       compute_digest(Utf8Path::new(&raw), &alg.unwrap_or_default(), len)
   });
   ```

3. **Macro-ify repetitive filters**  
   For one–arg path filters:
   ```rust
   macro_rules! path_filter {
       ($env:expr, $name:expr, $func:path) => {
           $env.add_filter($name, |s: String| -> Result<String,Error> {
               Ok($func(Utf8Path::new(&s)))
           });
       };
   }

   fn register_path_filters(env: &mut Environment<'_>) {
       path_filter!(env, "basename", basename);
       path_filter!(env, "dirname",  dirname);
       // …
   }
   ```
   This cuts your eight‐line `env.add_filter("basename", …)` down to one.

With these changes you collapse ~20 small helpers and eliminate boilerplate in filter/hasher registration, while preserving the full functionality.
</issue_to_address>

### Comment 9
<location> `src/stdlib.rs:74` </location>
<code_context>
-            )
-            .with_source(err));
-        }
+fn register_path_filters(env: &mut Environment<'_>) {
+    env.add_filter("basename", |raw: String| -> Result<String, Error> {
+        Ok(basename(Utf8Path::new(&raw)))
</code_context>

<issue_to_address>
**issue (review_instructions):** Add behavioural and unit tests for all new path and file filters.

The new filters (basename, dirname, with_suffix, relative_to, realpath, expanduser, size, contents, linecount, hash, digest) require both behavioural and unit tests to demonstrate correct functionality and edge case handling. Ensure comprehensive test coverage for each filter.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.

</details>
</issue_to_address>

### Comment 10
<location> `src/stdlib.rs:74` </location>
<code_context>
-            )
-            .with_source(err));
-        }
+fn register_path_filters(env: &mut Environment<'_>) {
+    env.add_filter("basename", |raw: String| -> Result<String, Error> {
+        Ok(basename(Utf8Path::new(&raw)))
</code_context>

<issue_to_address>
**suggestion (review_instructions):** Refactor repeated error handling and directory opening logic to reduce duplication.

Multiple helper functions (e.g., open_parent_dir, parent_dir, io_to_error) repeat similar error handling and directory opening logic. Refactor these patterns to reduce duplication and improve maintainability, while keeping the code readable.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
Keep code DRY, but readable. Use refactoring approaches best suited for the language in question. Context managers and generators for Python; RAII and macros for Rust.

</details>
</issue_to_address>

### Comment 11
<location> `docs/netsuke-design.md:873` </location>
<code_context>
+Implementation notes:
+
+- Filters use `cap-std` directories for all filesystem work, avoiding ambient
+  authority.
+- `realpath` canonicalises the parent directory before joining the resolved
+  entry so results are absolute and symlink-free.
</code_context>

<issue_to_address>
**issue (review_instructions):** Bullet point exceeds 80 columns; please wrap to 80 columns as per guidelines.

The bullet point about 'cap-std' directories is longer than 80 columns. Please wrap it to ensure readability and compliance with the style guide.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.md`

**Instructions:**
Paragraphs and bullets must be wrapped to 80 columns

</details>
</issue_to_address>

### Comment 12
<location> `docs/netsuke-design.md:873` </location>
<code_context>
+Implementation notes:
+
+- Filters use `cap-std` directories for all filesystem work, avoiding ambient
+  authority.
+- `realpath` canonicalises the parent directory before joining the resolved
+  entry so results are absolute and symlink-free.
</code_context>

<issue_to_address>
**issue (review_instructions):** 'canonicalises' uses en-gb spelling; use en-oxendic 'canonicalizes'.

The word 'canonicalises' should be spelled 'canonicalizes' to follow en-oxendic spelling conventions.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.md`

**Instructions:**
Use en-oxendic (-ize / -yse / -our) spelling and grammar.

</details>
</issue_to_address>

### Comment 13
<location> `docs/netsuke-design.md:875` </location>
<code_context>
+- Filters use `cap-std` directories for all filesystem work, avoiding ambient
+  authority.
+- `realpath` canonicalises the parent directory before joining the resolved
+  entry so results are absolute and symlink-free.
+- `contents` and `linecount` currently support UTF-8 input; other encodings are
+  rejected with an explicit error.
</code_context>

<issue_to_address>
**issue (review_instructions):** Bullet point exceeds 80 columns; please wrap to 80 columns as per guidelines.

The bullet point about 'realpath' is longer than 80 columns. Please wrap it to ensure readability and compliance with the style guide.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.md`

**Instructions:**
Paragraphs and bullets must be wrapped to 80 columns

</details>
</issue_to_address>

### Comment 14
<location> `docs/netsuke-design.md:880` </location>
<code_context>
+  rejected with an explicit error.
+- `hash` and `digest` accept `sha256` (default), `sha512`, `sha1`, and `md5`.
+- `with_suffix` removes dotted suffix segments (default `n = 1`) before
+  appending the provided suffix.
+
 #### Generic collection filters
</code_context>

<issue_to_address>
**issue (review_instructions):** Bullet point exceeds 80 columns; please wrap to 80 columns as per guidelines.

The bullet point about 'with_suffix' is longer than 80 columns. Please wrap it to ensure readability and compliance with the style guide.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*.md`

**Instructions:**
Paragraphs and bullets must be wrapped to 80 columns

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/stdlib.rs (1)

1-410: Split this module; file exceeds 400‑line limit

Enforce the repo constraint and improve cohesion by extracting path filters and file tests into submodules (e.g., stdlib::fs_tests, stdlib::path_filters).

Create src/stdlib/mod.rs with //! docs and re‑exports; move helpers into src/stdlib/path.rs and src/stdlib/tests.rs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99206a6 and 8eab293.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

Cargo.toml (1 hunks)
docs/netsuke-design.md (1 hunks)
docs/roadmap.md (1 hunks)
src/stdlib.rs (2 hunks)
tests/std_filter_tests.rs (1 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

docs/**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Use the docs/ markdown files as the reference knowledge base and update them proactively when decisions or requirements change

docs/**/*.md: In Markdown docs that contain Rust examples, mark fenced code blocks with the rust language specifier (```rust)
Use assert!, assert_eq!, or assert_ne! in Markdown doctests to verify outcomes
Avoid using .unwrap() or .expect() in Markdown doctest examples; handle errors idiomatically
For fallible Markdown doctests using ?, use an explicit fn main() -> Result<...> and hide boilerplate with #
In Markdown doctests, you may end code with exact (()) (no spaces) to enable implicit Result-returning main
Use hidden lines (prefix #) in Markdown doctests to hide setup and boilerplate
In Markdown doctests with side effects, use no_run to avoid executing while still compiling
Use should_panic in Markdown doctests for panic-intended examples
Use compile_fail in Markdown doctests to illustrate invalid code; beware brittleness
Only use ignore in Markdown doctests for pseudocode/non-Rust/temporary skips; avoid otherwise
Use edition20xx in Markdown doctests when examples require a specific Rust edition

Files:

docs/netsuke-design.md
docs/roadmap.md

**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

**/*.md: Documentation must use en-GB-oxendict spelling and grammar (LICENSE name excluded)
Validate Markdown with make markdownlint and run make fmt to format Markdown (including fixing table markup)
Validate Mermaid diagrams in Markdown by running make nixie
Wrap Markdown paragraphs and bullet points at 80 columns
Do not wrap tables and headings in Markdown
Wrap code blocks in Markdown at 120 columns
Use dashes (-) for list bullets in Markdown
Use GitHub-flavoured Markdown footnotes ([^1])

Files:

docs/netsuke-design.md
docs/roadmap.md

⚙️ CodeRabbit configuration file

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

Use en-GB-oxendict (-ize / -yse / -our) spelling and grammar

Headings must not be wrapped.

Documents must start with a level 1 heading

Headings must correctly increase or decrease by no more than one level at a time

Use GitHub-flavoured Markdown style for footnotes and endnotes.

Numbered footnotes must be numbered by order of appearance in the document.

Files:

docs/netsuke-design.md
docs/roadmap.md

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Each Rust module file must begin with a module-level //! comment explaining its purpose
Document public APIs using Rustdoc comments (///) so cargo doc can generate docs
Place function attributes after doc comments
Prefer immutable data; avoid unnecessary mut bindings
Handle errors with Result instead of panicking where feasible
Avoid unsafe code unless absolutely necessary and clearly document any usage
Do not use return in single-line functions
Prefer single-line function bodies where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) })
Prefer .expect() over .unwrap()
Use concat!() to combine long string literals rather than escaping newlines with a backslash
Use predicate functions for conditional criteria with more than two branches
Where a function is too long, extract meaningfully named helper functions (separation of concerns, CQRS)
Where a function has too many parameters, group related parameters into meaningfully named structs
If a function is unused with specific features selected, use conditional compilation with #[cfg] or #[cfg_attr]
Where a function returns a large error, consider using Arc to reduce returned data size
Use semantic error enums deriving std::error::Error via thiserror for inspectable conditions
Clippy warnings must be disallowed; fix issues in code rather than silencing
Do not silence lints except as a last resort; any suppression must be tightly scoped and include a clear reason
Prefer #[expect] over #[allow] when suppressing lints
Use cap-std for all filesystem operations (capability-based, sandboxed)
Use camino for path handling; avoid std::path::PathBuf directly
Avoid std::fs directly; wrap with cap-std
Comments must use en-GB-oxendict spelling and grammar (except external API names)
No single Rust source file may exceed 400 lines; break up large switches/dispatch tables by feature
Name booleans with is/has/should prefixes; use clear, descriptive names for variables and functions
Function documentation must include...

Files:

tests/std_filter_tests.rs
src/stdlib.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

Adhere to single responsibility and CQRS

Place function attributes after doc comments.

Do not use return in single-line functions.

Move conditionals with >2 branches into a predicate function.

Avoid unsafe unless absolutely necessary.

Every module must begin with a //! doc comment that explains the module's purpose and utility.

Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

Lints must not be silenced except as a last resort.

#[allow] is forbidden.

Only narrowly scoped #[expect(lint, reason = "...")] is allowed.

No lint groups, no blanket or file-wide suppression.

Include FIXME: with link if a fix is expected.

Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

Use rstest fixtures for shared setup and to avoid repetition between tests.

Replace duplicated tests with #[rstest(...)] parameterised cases.

Prefer mockall for mocks/stubs.

Prefer .expect() over .unwrap()

Ensure that any API or behavioural changes are reflected in the documentation in docs/

Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

Files must not exceed 400 lines in length

Large modules must be decomposed

Long match statements or dispatch tables should be decomposed by domain and collocated with targets

Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

For testing of functionality depending upon environment variables, dependency injection and the mockable crate are the preferred option.

If mockable cannot be used, env mutations in...

Files:

tests/std_filter_tests.rs
src/stdlib.rs

tests/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.rs: Large blocks of test data should be moved to external data files
Use rstest fixtures for shared setup
Replace duplicated tests with #[rstest(...)] parameterised cases
Prefer mockall for mocks and stubs
Mock non-deterministic dependencies (Env, Clock) via dependency injection using mockable crate
Test documentation should omit examples that merely reiterate test logic

Files:

tests/std_filter_tests.rs

**/Cargo.toml

📄 CodeRabbit inference engine (AGENTS.md)

**/Cargo.toml: Use explicit caret SemVer requirements for all dependencies
Disallow wildcard (*) and open-ended (>=) requirements; use ~ only with a documented reason

Files:

Cargo.toml

🧬 Code graph analysis (1)

tests/std_filter_tests.rs (2)

src/stdlib.rs (1)

register (36-39)

test_support/src/env_lock.rs (1)

acquire (17-21)

🔍 Remote MCP Ref

Relevant additional context for PR #165 (concise)

Cargo.toml: adds sha1 = "0.10" and md-5 = "0.10" as new direct dependencies (for hashing). (Cargo.toml)
src/stdlib.rs (implementation):
- Adds register_file_tests and register_path_filters and many helpers (basename, dirname, normalise_parent, parent_dir/open_parent_dir, canonicalize_any, with_suffix, relative_to, expanduser, file_size, read_text/read_bytes, linecount, compute_hash, compute_digest, encode_hex, io_to_error). Public register(env: &mut Environment<'_]) signature unchanged. (src/stdlib.rs)
- Path filters added: basename, dirname, with_suffix, relative_to, realpath, expanduser, size, contents, linecount, hash, digest. Hash/digest support: sha256 (default), sha512, sha1, md5; digest supports trimming. realpath canonicalizes the parent directory before joining; contents and linecount accept UTF‑8 only and error on other encodings. Non‑Unix fallbacks return false for unsupported device tests. (src/stdlib.rs)
tests/std_filter_tests.rs:
- New tests create a temp workspace with a file, a symlink, and a multi-line file; tests cover basename, dirname, with_suffix, realpath, contents, linecount, hash, digest, size, expanduser. expanduser test sets HOME via EnvVarGuard; tests assert exact outputs (including expected hash/digest strings). Tests assume symlink support — potential platform/CI implications. (tests/std_filter_tests.rs)
docs/netsuke-design.md: adds “Implementation notes” describing cap-std usage for filesystem work, realpath parent canonicalization, UTF‑8-only contents/linecount, supported hash algorithms, and with_suffix behavior. (docs/netsuke-design.md)
docs/roadmap.md: marks the path/file filters item done in Phase 1 and Phase 3. (docs/roadmap.md)

Suggested review foci (based on above):

Verify crate names/licensing for sha1 and md-5 (crate name vs md5) and CI/cargo audit implications.
Confirm tests’ symlink usage works on project CI platforms (Windows/permissions) or are guarded appropriately.
Confirm UTF‑8‑only behavior is intended and documented for consumers of contents/linecount.

Tools/sources used:,

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Sourcery review
GitHub Check: build-test (ubuntu-latest, stable)

🔇 Additional comments (1)

docs/roadmap.md (1)
162-164: Wrap updated lines at 80 columns and run markdownlint

Keep docs within the repo’s Markdown style (80‑col wrap, en‑GB, lint‑clean).

Update the bullet to wrap at 80 columns and run: make markdownlint && make fmt.
⛔ Skipped due to learnings
Learnt from: CR
PR: leynos/netsuke#0
File: docs/roadmap.md:0-0
Timestamp: 2025-09-14T11:43:07.316Z
Learning: Implement path and file filters (basename, dirname, with_suffix, realpath, contents, hash, etc.)

chatgpt-codex-connector

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/stdlib.rs (1)

1-414: Split module to keep file under 400 lines

The file is 414 lines; break into submodules (e.g., file_tests.rs, path_filters.rs, hashing.rs) to meet the guideline.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8eab293 and 7b31b2a.

📒 Files selected for processing (1)

src/stdlib.rs (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Each Rust module file must begin with a module-level //! comment explaining its purpose
Document public APIs using Rustdoc comments (///) so cargo doc can generate docs
Place function attributes after doc comments
Prefer immutable data; avoid unnecessary mut bindings
Handle errors with Result instead of panicking where feasible
Avoid unsafe code unless absolutely necessary and clearly document any usage
Do not use return in single-line functions
Prefer single-line function bodies where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) })
Prefer .expect() over .unwrap()
Use concat!() to combine long string literals rather than escaping newlines with a backslash
Use predicate functions for conditional criteria with more than two branches
Where a function is too long, extract meaningfully named helper functions (separation of concerns, CQRS)
Where a function has too many parameters, group related parameters into meaningfully named structs
If a function is unused with specific features selected, use conditional compilation with #[cfg] or #[cfg_attr]
Where a function returns a large error, consider using Arc to reduce returned data size
Use semantic error enums deriving std::error::Error via thiserror for inspectable conditions
Clippy warnings must be disallowed; fix issues in code rather than silencing
Do not silence lints except as a last resort; any suppression must be tightly scoped and include a clear reason
Prefer #[expect] over #[allow] when suppressing lints
Use cap-std for all filesystem operations (capability-based, sandboxed)
Use camino for path handling; avoid std::path::PathBuf directly
Avoid std::fs directly; wrap with cap-std
Comments must use en-GB-oxendict spelling and grammar (except external API names)
No single Rust source file may exceed 400 lines; break up large switches/dispatch tables by feature
Name booleans with is/has/should prefixes; use clear, descriptive names for variables and functions
Function documentation must include...

Files:

src/stdlib.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

Adhere to single responsibility and CQRS

Place function attributes after doc comments.

Do not use return in single-line functions.

Move conditionals with >2 branches into a predicate function.

Avoid unsafe unless absolutely necessary.

Every module must begin with a //! doc comment that explains the module's purpose and utility.

Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar

Lints must not be silenced except as a last resort.

#[allow] is forbidden.

Only narrowly scoped #[expect(lint, reason = "...")] is allowed.

No lint groups, no blanket or file-wide suppression.

Include FIXME: with link if a fix is expected.

Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.

Use rstest fixtures for shared setup and to avoid repetition between tests.

Replace duplicated tests with #[rstest(...)] parameterised cases.

Prefer mockall for mocks/stubs.

Prefer .expect() over .unwrap()

Ensure that any API or behavioural changes are reflected in the documentation in docs/

Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

Files must not exceed 400 lines in length

Large modules must be decomposed

Long match statements or dispatch tables should be decomposed by domain and collocated with targets

Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such

For testing of functionality depending upon environment variables, dependency injection and the mockable crate are the preferred option.

If mockable cannot be used, env mutations in...

Files:

src/stdlib.rs

🧬 Code graph analysis (1)

src/stdlib.rs (2)

src/ir/cycle.rs (1)

path (141-143)

src/hasher.rs (2)

hash (56-64)

write (38-41)

🔍 Remote MCP Ref

Summary of additional facts relevant to review

The sha1 and md-5 crates exist on crates.io and have 0.10 releases (the PR adds sha1 = "0.10" and md-5 = "0.10"). Verify these exact crate names/versions and their licenses before merging (some ecosystem packages use "md5" vs "md-5" naming; confirm the correct crate is intended).,
Tests create and rely on filesystem symlinks (fixture creates a file and a symlink to it). Symlink creation and permissions can fail or behave differently on CI runners (notably Windows and some restricted containers). Consider guarding or conditionalizing symlink-dependent tests on platforms/CI where symlinks are supported. (See tests/std_filter_tests.rs in PR summary).
Contents and linecount filters explicitly require UTF‑8 and error on other encodings (documented in netsuke-design.md). If users expect binary or non‑UTF8 files to be handled, this is a behavioral choice to call out in changelog/upgrade notes.
The realpath implementation canonicalizes the parent directory before joining the entry (documented). This can affect results for symlinked directories — confirm intended semantics with existing callers.

Tools/sources used

crates.io package pages checked for sha1 and md-5 via Ref_ref_read_url.,
PR test and docs summaries inspected via search tool.,

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Sourcery review

🔇 Additional comments (7)

src/stdlib.rs (7)

29-35: Replace unwrap() with expect() in doctest

Align docs with the guideline to prefer .expect() and provide actionable messages.

-/// env.add_template("t", "{{ path | basename }}").unwrap();
+/// env.add_template("t", "{{ path | basename }}").expect("add template");
 /// let tmpl = env.get_template("t").unwrap();
-/// let rendered = tmpl
-///     .render(context!(path => "foo/bar.txt"))
-///     .unwrap();
+/// let rendered = tmpl
+///     .render(context!(path => "foo/bar.txt"))
+///     .expect("render");

AI agent prompt:
Refactor the doctest in src/stdlib.rs (lines 29–35) to replace unwrap() calls with expect() containing clear messages ("add template", "render").

12-15: Gate MD5/SHA‑1 behind an opt‑in feature and fail closed when disabled

Avoid enabling weak digests by default. Compile them only when feature "legacy-digests" is enabled and return an explicit error otherwise.

-use md5::{Digest as Md5Digest, Md5};
+#[cfg(feature = "legacy-digests")]
+use md5::{Digest as Md5Digest, Md5};
@@
-use sha1::{Digest as Sha1Digest, Sha1};
+#[cfg(feature = "legacy-digests")]
+use sha1::{Digest as Sha1Digest, Sha1};

         "sha1" => {
-            let mut hasher = Sha1::new();
-            Sha1Digest::update(&mut hasher, &bytes);
-            let digest = Sha1Digest::finalize(hasher);
-            Ok(encode_hex(digest.as_slice()))
+            #[cfg(feature = "legacy-digests")]
+            {
+                let mut hasher = Sha1::new();
+                Sha1Digest::update(&mut hasher, &bytes);
+                let digest = Sha1Digest::finalize(hasher);
+                Ok(encode_hex(digest.as_slice()))
+            }
+            #[cfg(not(feature = "legacy-digests"))]
+            {
+                Err(Error::new(
+                    ErrorKind::InvalidOperation,
+                    "unsupported hash algorithm 'sha1' (enable feature 'legacy-digests')",
+                ))
+            }
         }
         "md5" => {
-            let mut hasher = Md5::new();
-            Md5Digest::update(&mut hasher, &bytes);
-            let digest = Md5Digest::finalize(hasher);
-            Ok(encode_hex(digest.as_slice()))
+            #[cfg(feature = "legacy-digests")]
+            {
+                let mut hasher = Md5::new();
+                Md5Digest::update(&mut hasher, &bytes);
+                let digest = Md5Digest::finalize(hasher);
+                Ok(encode_hex(digest.as_slice()))
+            }
+            #[cfg(not(feature = "legacy-digests"))]
+            {
+                Err(Error::new(
+                    ErrorKind::InvalidOperation,
+                    "unsupported hash algorithm 'md5' (enable feature 'legacy-digests')",
+                ))
+            }
         }

AI agent prompt:
Guard md5 and sha1 imports and match arms in src/stdlib.rs with #[cfg(feature = "legacy-digests")]. When the feature is disabled, return a clear Error stating the algorithm is unsupported and how to enable it. Ensure the crate compiles without md5/sha1 present.

Also applies to: 355-366

340-366: Stream file hashing to avoid reading entire files into memory

Hash in chunks via cap‑std Dir‑opened handle to keep memory bounded.

+use std::io::{BufReader, Read};
@@
 fn compute_hash(path: &Utf8Path, alg: &str) -> Result<String, Error> {
-    let bytes = read_bytes(path)?;
-    match alg.to_ascii_lowercase().as_str() {
+    let (dir, name, _) = open_parent_dir(path)?;
+    let file = dir.open(Utf8Path::new(&name)).map_err(|err| io_to_error(path, "open", err))?;
+    let mut reader = BufReader::new(file);
+    let mut buf = vec![0u8; 8 * 1024 * 1024];
+    match alg.to_ascii_lowercase().as_str() {
         "sha256" => {
             let mut hasher = Sha256::new();
-            Sha2Digest::update(&mut hasher, &bytes);
+            loop {
+                let n = reader.read(&mut buf).map_err(|err| io_to_error(path, "read", err))?;
+                if n == 0 { break; }
+                Sha2Digest::update(&mut hasher, &buf[..n]);
+            }
             let digest = Sha2Digest::finalize(hasher);
             Ok(encode_hex(digest.as_slice()))
         }
         "sha512" => {
             let mut hasher = Sha512::new();
-            Sha2Digest::update(&mut hasher, &bytes);
+            loop {
+                let n = reader.read(&mut buf).map_err(|err| io_to_error(path, "read", err))?;
+                if n == 0 { break; }
+                Sha2Digest::update(&mut hasher, &buf[..n]);
+            }
             let digest = Sha2Digest::finalize(hasher);
             Ok(encode_hex(digest.as_slice()))
         }
         "sha1" => {
-            let mut hasher = Sha1::new();
-            Sha1Digest::update(&mut hasher, &bytes);
-            let digest = Sha1Digest::finalize(hasher);
-            Ok(encode_hex(digest.as_slice()))
+            #[cfg(feature = "legacy-digests")]
+            {
+                let mut hasher = Sha1::new();
+                loop {
+                    let n = reader.read(&mut buf).map_err(|err| io_to_error(path, "read", err))?;
+                    if n == 0 { break; }
+                    Sha1Digest::update(&mut hasher, &buf[..n]);
+                }
+                let digest = Sha1Digest::finalize(hasher);
+                Ok(encode_hex(digest.as_slice()))
+            }
+            #[cfg(not(feature = "legacy-digests"))]
+            {
+                Err(Error::new(ErrorKind::InvalidOperation, "unsupported hash algorithm 'sha1' (enable feature 'legacy-digests')"))
+            }
         }
         "md5" => {
-            let mut hasher = Md5::new();
-            Md5Digest::update(&mut hasher, &bytes);
-            let digest = Md5Digest::finalize(hasher);
-            Ok(encode_hex(digest.as_slice()))
+            #[cfg(feature = "legacy-digests")]
+            {
+                let mut hasher = Md5::new();
+                loop {
+                    let n = reader.read(&mut buf).map_err(|err| io_to_error(path, "read", err))?;
+                    if n == 0 { break; }
+                    Md5Digest::update(&mut hasher, &buf[..n]);
+                }
+                let digest = Md5Digest::finalize(hasher);
+                Ok(encode_hex(digest.as_slice()))
+            }
+            #[cfg(not(feature = "legacy-digests"))]
+            {
+                Err(Error::new(ErrorKind::InvalidOperation, "unsupported hash algorithm 'md5' (enable feature 'legacy-digests')"))
+            }
         }

Also applies to: 16-17

296-301: Clarify expanduser() error when HOME/USERPROFILE are unset

Report both env vars for accurate diagnostics on Windows.

-                Err(Error::new(
-                    ErrorKind::InvalidOperation,
-                    "cannot expand ~: HOME is not set",
-                ))
+                Err(Error::new(
+                    ErrorKind::InvalidOperation,
+                    "cannot expand ~: neither HOME nor USERPROFILE is set",
+                ))

390-397: Preserve IO error semantics in io_to_error()

Map std::io::ErrorKind variants instead of collapsing to InvalidOperation.

 fn io_to_error(path: &Utf8Path, action: &str, err: io::Error) -> Error {
-    let message = if err.kind() == io::ErrorKind::NotFound {
+    use std::io::ErrorKind as IoErrorKind;
+    let message = if err.kind() == IoErrorKind::NotFound {
         format!("{action} failed for {path}: not found")
     } else {
         format!("{action} failed for {path}: {err}")
     };
-    Error::new(ErrorKind::InvalidOperation, message).with_source(err)
+    let kind = match err.kind() {
+        IoErrorKind::NotFound => ErrorKind::NotFound,
+        IoErrorKind::PermissionDenied => ErrorKind::PermissionDenied,
+        IoErrorKind::AlreadyExists => ErrorKind::AlreadyExists,
+        IoErrorKind::InvalidInput | IoErrorKind::InvalidData => ErrorKind::InvalidArgument,
+        IoErrorKind::TimedOut => ErrorKind::Timeout,
+        IoErrorKind::Interrupted => ErrorKind::Interrupted,
+        IoErrorKind::WouldBlock => ErrorKind::WouldBlock,
+        IoErrorKind::BrokenPipe => ErrorKind::BrokenPipe,
+        IoErrorKind::ConnectionRefused => ErrorKind::ConnectionRefused,
+        IoErrorKind::ConnectionReset => ErrorKind::ConnectionReset,
+        IoErrorKind::ConnectionAborted => ErrorKind::ConnectionAborted,
+        IoErrorKind::NotConnected => ErrorKind::NotConnected,
+        IoErrorKind::AddrInUse => ErrorKind::AddrInUse,
+        IoErrorKind::AddrNotAvailable => ErrorKind::AddrNotAvailable,
+        IoErrorKind::UnexpectedEof => ErrorKind::UnexpectedEof,
+        _ => ErrorKind::InvalidOperation,
+    };
+    Error::new(kind, message).with_source(err)
 }

74-133: Collapse filter registration boilerplate with a small macro

Reduce repetition and improve maintainability per guidelines.

AI agent prompt:
Introduce a macro like path_filter!(env, "basename", |p| basename(p)) to register 1-arg path filters uniformly, and a 2-arg variant for filters like relative_to/expanduser. Apply it in register_path_filters() to replace the repeated env.add_filter(...) blocks without changing behaviour.

193-196: Confirm is_root() handles Windows drive roots

is_root(...) as written returns true for Windows drive roots (e.g. "C:\" / "C:/") because for root paths both parent() and file_name() are None and the path string is non-empty. Add a Windows‑specific branch only if you need to treat "C:" (no trailing slash) or UNC roots specially; otherwise no change required.

leynos · 2025-09-24T18:25:34Z

@coderabbitai pause

coderabbitai · 2025-09-24T18:25:42Z

✅ Actions performed

Reviews paused.

leynos · 2025-09-25T23:24:12Z

Superseded by #166

leynos · 2025-09-25T23:28:05Z

@sourcery-ai resolve

leynos · 2025-09-25T23:28:08Z

@coderabbitai resolve

coderabbitai · 2025-09-25T23:28:22Z

✅ Actions performed

Comments resolved and changes approved.

Add template path and file filters

8eab293

leynos added the codex label Sep 23, 2025 — with ChatGPT Codex Connector