Skip to content

Implement cross-platform which filter and helper for MiniJinja#237

Merged
leynos merged 23 commits intomainfrom
terragon/implement-which-filter-u5itpo
Nov 24, 2025
Merged

Implement cross-platform which filter and helper for MiniJinja#237
leynos merged 23 commits intomainfrom
terragon/implement-which-filter-u5itpo

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Nov 18, 2025

Summary

  • Implement cross-platform which filter and helper function for MiniJinja, with full test coverage. The filter supports options such as all, canonical, fresh, and cwd_mode, and safely resolves executables across POSIX and Windows, including Windows-specific PATH handling via PATHEXT. Includes an LRU-cached resolver to optimise repeated lookups and diagnostic error codes for easier troubleshooting. Tests cover path-discovery, multi-entry PATH lookups, canonicalisation, cache invalidation, and missing executables.

Changes

  • New: cross-platform which filter and helper
    • File: src/stdlib/which.rs
    • Provides filter which and function which with caching, cache invalidation, and options:
      • all (default false) to return all matches in PATH order
      • canonical to deduplicate via canonical paths
      • fresh to bypass cache for a lookup
      • cwd_mode to control whether current directory is searched (auto | always | never)
    • Handles Windows PATHEXT, path parsing, and UTF-8 PATH entries
    • Returns matches as path strings, with Windows paths normalized to / in output where appropriate
    • Emits actionable error codes like netsuke::jinja::which::not_found and netsuke::jinja::which::args for diagnostics
  • Integration: register which in stdlib
    • File: src/stdlib/mod.rs
    • Registered via which::register(env) during stdlib setup
  • Tests
    • Unit/integration tests for the which filter included in new module and test suite
    • Features and steps updated to exercise cross-platform behavior (scenarios for first match, all matches, cwd_mode, and not_found)
    • Test scaffolding extended to support executable creation, PATH setup, and platform-specific behavior
  • Documentation & roadmap
    • Docs updated to reflect the cross-platform which behavior and caching design
    • Roadmap entries adjusted to reflect completed work for the which filter and related tests
  • Dependencies
    • Cargo.toml: added dependency on lru (for the cache)
    • Cargo.lock updated accordingly

Test Plan

  • Unit tests for which resolver behaviour (first match, all matches, canonicalisation, fresh lookups, cache invalidation)
  • Integration tests validating cross-platform PATH discovery with Windows PATHEXT support
  • Tests ensuring not_found and diagnostic error codes are emitted for missing executables
  • Feature tests covering:
    • which resolves the first PATH entry
    • which lists every PATH match in order
    • which honours cwd_mode on POSIX and Windows
    • not_found path reports
  • End-to-end in stdlib filter tests and cucumber-based feature tests updated accordingly

Documentation and Design Notes

  • Cross-platform compatibility: path discovery, executable checks, and proxying through PATH are implemented to behave consistently across Unix-like systems and Windows.
  • Diagnostics: error codes such as netsuke::jinja::which::not_found and netsuke::jinja::which::args provide actionable feedback in templates.
  • Caching: an LRU cache with a 64-entry capacity is used to speed repeated resolutions; fresh=true bypasses the cache while keeping previous entries available for subsequent renders.
  • Canonicalisation: when requested, results are canonicalised to unique absolute paths to avoid duplicates caused by symlinks or relative path variations.

Why this change

  • Enables deterministic, cross-platform resolution of executables from MiniJinja templates with predictable caching and clear diagnostics.
  • Complements existing stdlib filters with robust path handling and Windows-specific behavior (PATHEXT) where necessary.

Breaking Changes

  • No API-breaking changes to existing filters aside from the new which filter and function addition, registered under stdlib.
  • New dependencies added (lru) for internal caching; should be transparent to users.

If you want me to adjust the PR description with more or fewer details, or emphasize specific usage examples in the docs, I can update accordingly.

🌿 Generated by Terry


ℹ️ Tag @terragon-labs to ask questions and address PR feedback

📎 Task: https://www.terragonlabs.com/task/d68b1ca7-62e2-4219-b655-2b114f28a0a6

Summary by Sourcery

Add a cross-platform which filter/function to the MiniJinja stdlib with cached executable resolution and integrate it into a refactored stdlib registration/configuration layer.

New Features:

  • Introduce a cross-platform which MiniJinja filter and function that resolve executables from PATH with options for returning all matches, canonicalisation, cache bypass, and configurable current-directory search.

Enhancements:

  • Refactor stdlib configuration and registration into dedicated config and register modules, making setup fallible and more robust while exposing typed defaults and workspace configuration.
  • Add an LRU-backed cache and environment fingerprinting for which lookups to optimise repeated resolutions without sacrificing determinism.
  • Improve stdlib test scaffolding and Cucumber runner configuration, including serialising scenario execution and shared helpers for PATH and workspace management.
  • Tighten stdlib file-test registration to be platform-aware and clarify error handling around non-UTF-8 paths and workspace roots.

Build:

  • Add lru and walkdir as dependencies to support cached resolution and workspace traversal for the which helper.

Documentation:

  • Update design docs, user guide, ADRs, and roadmap to document the new which behaviour, caching model, workspace fallback, and the testing strategy, and to mark the executable discovery work as complete.

Tests:

  • Add focused unit and integration tests for the which resolver and workspace fallback, including PATH/PATHEXT handling, canonicalisation, caching semantics, and error reporting across POSIX and Windows.
  • Extend stdlib behavioural tests and step definitions to cover which usage in templates, PATH setup, and executable fixtures, and introduce shared stdlib assertion helpers for reuse across tests.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 18, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added cross-platform which filter and function for executable discovery with support for PATH resolution, caching, and workspace fallback.
    • Introduced configurable executable search modes: return first match, list all matches, or prefer workspace tools.
    • Added performance optimisation via LRU caching for repeated executable lookups.
  • Documentation

    • Updated user guides with executable discovery documentation and usage examples.
    • Enhanced architectural documentation with resolver cache design and workspace fallback behaviour.
  • Dependencies

    • Added LRU cache and directory traversal support libraries.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Add a cross‑platform MiniJinja "which" filter/function with a fingerprinted LRU cache and environment snapshotting; implement filesystem lookup with PATHEXT/permission handling and workspace fallback; introduce StdlibConfig and register wiring; add test helpers, integration tests and Cucumber scenarios; update docs and add dependencies lru and walkdir.

Changes

Cohort / File(s) Summary
Dependencies
Cargo.toml
Add lru = "0.16" and walkdir = "2.5".
Stdlib surface reorganisation
src/stdlib/mod.rs
Split stdlib into config, register, which modules and re‑export config and register items.
Stdlib config module
src/stdlib/config.rs
Add StdlibConfig, NetworkConfig, default constants, validation, builder methods, Default impl and conversion into component configs.
Stdlib register wiring
src/stdlib/register.rs
Add register and register_with_config, value_from_bytes; wire filters/tests (including which) into MiniJinja and return StdlibState with propagated errors.
Which module entry & utils
src/stdlib/which/mod.rs
Register which filter/function, re‑export WhichResolver/WhichOptions, add format_path_for_output and register helper.
Which caching
src/stdlib/which/cache.rs
Add thread‑safe capacity‑bounded LRU cache (capacity 64) in WhichResolver; implement resolve, try_cache, store, cache key fingerprinting.
Which environment snapshot
src/stdlib/which/env.rs
Add EnvSnapshot capturing cwd, PATH and PATHEXT; parse PATH entries (UTF‑8 validated); compute candidate dirs and candidate paths.
Which options parsing
src/stdlib/which/options.rs
Add CwdMode and WhichOptions with from_kwargs, defaults and cache_key_view to exclude fresh.
Which error helpers
src/stdlib/which/error.rs
Add not_found_error, direct_not_found, args_error, path preview and platform hints.
Which lookup implementation
src/stdlib/which/lookup/mod.rs
Implement lookup, resolve_direct, PATHEXT expansion, executability checks, canonicalisation, error propagation and workspace fallback hooks.
Which workspace search
src/stdlib/which/lookup/workspace.rs
Add search_workspace using walkdir, platform‑specific matching (Windows case‑insensitive/PATHEXT), skip rules and match collection.
Lookup tests
src/stdlib/which/lookup/tests.rs
Add tests for workspace discovery, PATHEXT resolution, skip rules and permission edge cases.
Manifest & registration error handling
src/manifest/mod.rs, tests/std_filter_tests/support.rs, tests/steps/stdlib_steps/rendering.rs, tests/std_filter_tests/network_functions.rs
Propagate errors from StdlibConfig::new, with_workspace_root_path and register_with_config by using ?.
Test support helpers
test_support/src/lib.rs, test_support/src/stdlib_assert.rs
Add stdlib_assert module and stdlib_output_or_error() helper with unit tests.
Cucumber runner change
tests/cucumber.rs
Use CliWorld::cucumber().max_concurrent_scenarios(1).run(...) for sequential execution.
Cucumber features & docs
tests/features/stdlib.feature, docs/*, docs/netsuke-design.md, docs/users-guide.md, docs/rstest-bdd-users-guide.md, docs/roadmap.md, docs/adr-*
Add which scenarios; expand design doc with fingerprinted LRU cache and diagrams; update user docs and roadmap; minor formatting/renames.
Cucumber steps — types, workspace and assertions
tests/steps/stdlib_steps/types.rs, tests/steps/stdlib_steps/workspace.rs, tests/steps/stdlib_steps/assertions.rs
Add PathEntries type; add steps stdlib_executable_exists, stdlib_path_entries; implement OS‑aware executable creation, PATHEXT handling and workspace assertion helper assert_stdlib_output_is_workspace_executable.
Filter tests & fixtures
tests/std_filter_tests/mod.rs, tests/std_filter_tests/which_filter_tests.rs
Add which_filter_tests module and extensive cross‑platform tests/fixtures for first vs all matches, canonicalisation, cwd_mode and workspace fallback.

Sequence Diagram(s)

sequenceDiagram
    participant Template as Template
    participant Jinja as MiniJinja which
    participant Resolver as WhichResolver
    participant Cache as LRU Cache
    participant Env as EnvSnapshot
    participant Lookup as lookup()

    Template->>Jinja: which(command, kwargs)
    Jinja->>Resolver: resolve(command, options)
    Resolver->>Resolver: Build CacheKey(command, env_fp, cwd, options.cache_key_view)
    Resolver->>Cache: try_cache(key)
    alt Cache hit
        Cache-->>Resolver: matches
    else Cache miss
        Resolver->>Env: capture(cwd_override)
        Env-->>Resolver: EnvSnapshot
        Resolver->>Lookup: lookup(command, env, options)
        Lookup-->>Resolver: matches / Error
        Resolver->>Cache: store(key, matches) if success
    end
    Resolver-->>Jinja: matches / Error
    Jinja->>Jinja: format_path_for_output + render (single/list)
    Jinja-->>Template: rendered output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

  • Focus review on cache concurrency and eviction correctness in src/stdlib/which/cache.rs.
  • Validate PATH/PATHEXT parsing, UTF‑8 checks and cwd override logic in src/stdlib/which/env.rs.
  • Audit lookup canonicalisation, executability checks, and workspace traversal/skip rules in src/stdlib/which/lookup/*.rs.
  • Verify error‑propagation changes in register/config/manifest and corresponding test adjustments.
  • Check cross‑platform test fixtures, executable creation and permission handling in tests/std_filter_tests/which_filter_tests.rs and tests/steps/*.

Possibly related PRs

Poem

Add the which that wanders down PATH's lane,
Fingerprint the world, keep answers in the brain,
Walkdir will wander when PATH yields no trail,
LRU will remember the oft‑travelled trail,
Ship tests and docs — let discovery prevail. 🎉

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarises the main change: implementing a cross-platform which filter and helper for MiniJinja, which is the primary objective of the PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the which filter implementation, options, caching strategy, test coverage, and documentation updates.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch terragon/implement-which-filter-u5itpo

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76d428e and e02cbac.

📒 Files selected for processing (1)
  • src/stdlib/which/lookup/workspace.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/stdlib/which/lookup/workspace.rs
🧬 Code graph analysis (1)
src/stdlib/which/lookup/workspace.rs (2)
src/stdlib/which/lookup/mod.rs (1)
  • is_executable (195-198)
src/stdlib/which/env.rs (1)
  • candidate_paths (148-165)
🔍 Remote MCP Ref

Relevant facts for reviewing the which cache and workspace traversal

  • lru 0.16 LruCache behavior

    • LruCache is not thread-safe; typical pattern is Arc<Mutex<LruCache<..>>> (so the PR’s Arc<Mutex<...>> is expected). get() promotes an entry's recency (mutates LRU order) and put() inserts/replaces and evicts least-recently-used entries when capacity is exceeded. Review implications: reads mutate the cache (lock on get), so consider lock contention on hot paths and whether promotion-on-read semantics match intended cache-key lifecycle (e.g., fresh semantics, cache_key_view).,
  • walkdir 2.5 WalkDir behavior

    • WalkDir yields Iterator<Item = Result<DirEntry, Error>>; unreadable entries produce Err and must be handled (caller should filter_map or explicitly skip). WalkDir supports filter_entry to skip descending into directories (useful to avoid heavy dirs like .git/target). Review implications: workspace search must handle Err entries without panicking, should prune traversal with filter_entry (or equivalent) to limit work, and should allow early exit when collect_all is false to avoid full traversal.,

Concrete review checks to run (quick checklist)

  • Confirm cache concurrency semantics:
    • Cache is Arc<Mutex<LruCache<...>>> and capacity 64 — ensure callers hold short-lived locks and avoid performing filesystem lookups while holding the cache lock.
    • Verify resolve() does not perform IO under the cache lock (i.e., check try_cache/store call sites and that lookup occurs outside the lock). (Why: get() mutates LRU and will contend under frequent calls.)
  • Verify fresh/cache_key handling:
    • Ensure WhichOptions::cache_key_view disables fresh for cache keys (so fresh lookups bypass but do not invalidate existing entries), and that env fingerprint includes only stable inputs.
  • Verify workspace traversal robustness:
    • Ensure WalkDir Err items are handled (not unwrapped) and filter_entry or explicit skipping is used for heavy directories (target, .git, etc.).
    • Ensure traversal can stop early when collect_all is false to avoid scanning full workspace.
  • Verify UTF-8 handling/canonicalisation error paths:
    • WalkDir yields OsStr paths; code must convert to UTF-8 safely and surface a clear error rather than panic.

Tools/sources used

  • Ref_ref_search_documentation (docs.rs searches for lru and walkdir)
  • Ref_ref_read_url (docs.rs pages for lru 0.16 and walkdir 2.5)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
  • GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
  • GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
  • GitHub Check: release / build-linux (aarch64-unknown-linux-gnu, linux-arm64, arm64) / Build linux artefacts
  • GitHub Check: release / build-linux (x86_64-unknown-linux-gnu, linux-amd64, amd64) / Build linux artefacts
  • GitHub Check: Sourcery review
  • GitHub Check: build-test (ubuntu-latest, stable)
🔇 Additional comments (6)
src/stdlib/which/lookup/workspace.rs (6)

1-14: Module structure and imports are exemplary.

The module doc comment is concise and describes the purpose clearly. Platform-specific imports are properly gated, and the organization is clean and minimal.


16-60: Excellent implementation of workspace traversal with robust error handling.

The function follows the recommended iterator pipeline pattern, avoiding nested conditionals. The filter_map properly handles unreadable entries per walkdir best practices (logs and skips without failing the search), and filter_entry prunes heavy directories to keep scans efficient. The cfg'd parameter types align with the module's established pattern.


62-97: Collection logic with early exit optimization is correct.

The early termination when !collect_all avoids unnecessary traversal of the full workspace, as recommended in the external tools guidance. Error propagation and platform-specific calling conventions are properly handled.


99-109: Directory filtering is efficient and well-documented.

The skip list (.git, target) reduces traversal cost effectively whilst the logic remains simple and maintainable.


111-142: UTF-8 conversion error handling is exemplary.

The function properly handles the conversion failure path (lines 132-140) with a descriptive error that includes the command and lossy path representation, exactly as recommended in the external tools context. The sequential checks (file type → match → UTF-8 → executable) are efficient and clear.


144-213: Windows matching infrastructure is correct and well-documented.

The case-insensitive matching logic with PATHEXT expansion is properly implemented. The prepare_workspace_match function (lines 194-213) correctly derives candidate basenames, and workspace_entry_matches (lines 166-175) performs the appropriate lookups. The use of HashSet for basenames provides O(1) lookup performance during traversal.

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)
  • UTF-8: Entity not found: Issue - Could not find referenced Issue.

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Nov 18, 2025

Reviewer's Guide

Adds a cross-platform MiniJinja stdlib which filter/function with a cached resolver, workspace-root-aware PATH fallback, and improved diagnostics, wiring it into the stdlib registration/configuration and covering it with tests and docs updates.

Class diagram for the new which resolver and stdlib configuration

classDiagram
    class StdlibConfig {
        +workspace_root: Arc_Dir
        +workspace_root_path: Option_Utf8PathBuf
        +fetch_cache_relative: Utf8PathBuf
        +network_policy: NetworkPolicy
        +fetch_max_response_bytes: u64
        +command_max_output_bytes: u64
        +command_max_stream_bytes: u64
        +new(workspace_root: Dir) Result_StdlibConfig
        +with_workspace_root_path(path: Utf8Path) Result_StdlibConfig
        +with_fetch_cache_relative(relative_path: Utf8Path) Result_StdlibConfig
        +with_network_policy(policy: NetworkPolicy) StdlibConfig
        +with_fetch_max_response_bytes(max_bytes: u64) Result_StdlibConfig
        +with_command_max_output_bytes(max_bytes: u64) Result_StdlibConfig
        +with_command_max_stream_bytes(max_bytes: u64) Result_StdlibConfig
        +fetch_cache_relative() Utf8Path
        +into_components() (NetworkConfig, CommandConfig)
        +workspace_root_path() Option_Utf8Path
        +validate_cache_relative(relative: Utf8Path) Result_unit
    }

    class NetworkConfig {
        +cache_root: Arc_Dir
        +cache_relative: Utf8PathBuf
        +policy: NetworkPolicy
        +max_response_bytes: u64
    }

    class CommandConfig {
        +max_capture_bytes: u64
        +max_stream_bytes: u64
        +new(max_capture: u64, max_stream: u64, root: Arc_Dir, workspace_root_path: Option_Arc_Utf8PathBuf)
    }

    class StdlibState {
        +impure: Arc_AtomicBool
        +impure_flag() Arc_AtomicBool
    }

    class RegisterModule {
        +register(env: Environment) Result_StdlibState
        +register_with_config(env: Environment, config: StdlibConfig) Result_StdlibState
        +value_from_bytes(bytes: Vec_u8) Value
    }

    class WhichModule {
        +register(env: Environment, cwd_override: Option_Arc_Utf8PathBuf) Result_unit
        +resolve_with(resolver: WhichResolver, command: Value, kwargs: Kwargs) Result_Value
        +render_value(matches: Vec_Utf8PathBuf, options: WhichOptions) Value
        +format_path_for_output(path: Utf8Path) String
    }

    class WhichResolver {
        -cache: Arc_Mutex_LruCache_CacheKey_CacheEntry
        -cwd_override: Option_Arc_Utf8PathBuf
        +new(cwd_override: Option_Arc_Utf8PathBuf) Result_WhichResolver
        +resolve(command: &str, options: WhichOptions) Result_Vec_Utf8PathBuf
        -try_cache(key: CacheKey) Option_Vec_Utf8PathBuf
        -store(key: CacheKey, matches: Vec_Utf8PathBuf)
        -lock_cache() MutexGuard_LruCache_CacheKey_CacheEntry
    }

    class EnvSnapshot {
        +cwd: Utf8PathBuf
        +raw_path: Option_OsString
        +raw_pathext: Option_OsString
        +entries: Vec_PathEntry
        +pathext: Vec_String
        +capture(cwd_override: Option_Utf8Path) Result_EnvSnapshot
        +resolved_dirs(mode: CwdMode) Vec_Utf8PathBuf
        +pathext() Vec_String
    }

    class WhichOptions {
        +all: bool
        +canonical: bool
        +fresh: bool
        +cwd_mode: CwdMode
        +from_kwargs(kwargs: Kwargs) Result_WhichOptions
        +cache_key_view() WhichOptions
    }

    class CwdMode {
        <<enumeration>>
        Auto
        Always
        Never
        +parse(value: &str) Option_CwdMode
    }

    class LookupModule {
        +lookup(command: &str, env: EnvSnapshot, options: WhichOptions) Result_Vec_Utf8PathBuf
        +resolve_direct(command: &str, env: EnvSnapshot, options: WhichOptions) Result_Vec_Utf8PathBuf
        +push_matches(matches: Vec_Utf8PathBuf, candidates: Vec_Utf8PathBuf, collect_all: bool) bool
        +is_direct_path(command: &str) bool
        +is_executable(path: Utf8Path) bool
        +canonicalise(paths: Vec_Utf8PathBuf) Result_Vec_Utf8PathBuf
    }

    class WorkspaceSearchModule {
        +search_workspace(cwd: Utf8Path, command: &str, collect_all: bool, env_or_unit) Result_Vec_Utf8PathBuf
    }

    class CacheKey {
        +command: String
        +env_fingerprint: u64
        +cwd: Utf8PathBuf
        +options: WhichOptions
        +new(command: &str, env: EnvSnapshot, options: WhichOptions) CacheKey
    }

    class CacheEntry {
        +matches: Vec_Utf8PathBuf
    }

    class ErrorModule {
        +not_found_error(command: &str, dirs: Vec_Utf8PathBuf, mode: CwdMode) Error
        +direct_not_found(command: &str, path: Utf8Path) Error
        +args_error(message: Display) Error
    }

    StdlibConfig --> NetworkConfig : builds
    StdlibConfig --> CommandConfig : builds
    RegisterModule --> StdlibConfig : uses
    RegisterModule --> StdlibState : returns
    RegisterModule --> WhichModule : calls_register
    RegisterModule --> NetworkConfig : passes_to_network
    RegisterModule --> CommandConfig : passes_to_command

    WhichModule --> WhichResolver : owns
    WhichModule --> WhichOptions : parses
    WhichModule --> ErrorModule : args_error

    WhichResolver --> EnvSnapshot : capture
    WhichResolver --> LookupModule : lookup
    WhichResolver o--> CacheKey : key_for_cache
    WhichResolver o--> CacheEntry : stores

    EnvSnapshot --> CwdMode : used_in_resolved_dirs

    WhichOptions --> CwdMode : contains

    LookupModule --> WorkspaceSearchModule : workspace_fallback
    LookupModule --> ErrorModule : not_found

    CacheKey --> WhichOptions : cache_key_view

    ErrorModule --> CwdMode : hint_for_mode
Loading

File-Level Changes

Change Details Files
Introduce cross-platform MiniJinja which filter/function with cached resolution and workspace-aware PATH fallback.
  • Implement src/stdlib/which module with env snapshot, PATH/PATHEXT parsing, cwd-aware directory resolution, and workspace fallback search using walkdir.
  • Add LRU-backed WhichResolver cache keyed by command, environment fingerprint, cwd, and options (excluding fresh), returning normalized path strings and supporting all, canonical, and cwd_mode semantics.
  • Provide structured which errors (netsuke::jinja::which::not_found / ::args) with PATH previews and platform-specific hints, plus canonicalisation and deduplication of results when requested.
src/stdlib/which/mod.rs
src/stdlib/which/cache.rs
src/stdlib/which/env.rs
src/stdlib/which/error.rs
src/stdlib/which/options.rs
src/stdlib/which/lookup/mod.rs
src/stdlib/which/lookup/workspace.rs
src/stdlib/which/lookup/tests.rs
Refactor stdlib configuration/registration to support workspace-root overrides and which registration, and adjust manifest integration.
  • Extract stdlib configuration into src/stdlib/config.rs with fallible StdlibConfig::new, absolute-path validation for workspace_root_path, component splitting into NetworkConfig and CommandConfig, and associated tests.
  • Move registration logic into src/stdlib/register.rs, registering file tests, path/collection filters, network/command/time helpers, and the new which filter/function with an optional workspace-root-based cwd override, returning anyhow::Result<StdlibState>.
  • Update src/stdlib/mod.rs, manifest loading, and stdlib test helpers to use the new config/registration modules and error-returning APIs.
src/stdlib/mod.rs
src/stdlib/config.rs
src/stdlib/register.rs
src/manifest/mod.rs
tests/std_filter_tests/support.rs
tests/std_filter_tests/network_functions.rs
tests/steps/stdlib_steps/rendering.rs
Expand stdlib and filter tests to cover which behaviour across PATH, cwd_mode, workspace fallbacks, and error reporting.
  • Add dedicated which_filter_tests exercising first/ALL match behaviour, canonical deduplication, workspace-root fallback when PATH is empty, cwd_mode handling, and non-impurity, plus PATHEXT behaviour on Windows.
  • Extend Cucumber stdlib steps to create executable fixtures, manage PATH/PATHEXT via colon-delimited step arguments, assert executable paths in outputs, and ensure platform-specific path handling.
  • Wire those scenarios into tests/features/stdlib.feature and ensure cucumber tests run sequentially via CliWorld::cucumber().max_concurrent_scenarios(1) for determinism.
tests/std_filter_tests/mod.rs
tests/std_filter_tests/which_filter_tests.rs
tests/steps/stdlib_steps/workspace.rs
tests/steps/stdlib_steps/types.rs
tests/steps/stdlib_steps/assertions.rs
tests/features/stdlib.feature
tests/cucumber.rs
test_support/src/stdlib_assert.rs
Update documentation and roadmap to describe which, its design, and testing/harness changes.
  • Expand design doc with detailed which resolver semantics, cache keying and fingerprinting, workspace fallback sequence diagrams, and cucumber execution flow diagrams.
  • Document the new which filter/function, options, diagnostics, and impurity behaviour in the user guide, and mark the executable discovery roadmap items as completed.
  • Tidy up ADRs and rstest-bdd docs formatting/word-wrapping and clarify risk/positive-outcome sections related to test harness migration.
docs/netsuke-design.md
docs/users-guide.md
docs/roadmap.md
docs/adr-001-replace-serde-yml-with-serde-saphyr.md
docs/adr-002-replace-cucumber-with-rstest-bdd.md
docs/rstest-bdd-users-guide.md
docs/rust-doctest-dry-guide.md
Adjust test support and dependencies to back the new behaviour.
  • Add lru and walkdir runtime dependencies to support which caching and workspace scanning, updating Cargo.lock accordingly.
  • Expose new test_support::stdlib_assert helpers to share stdlib output vs error assertions between unit and cucumber tests.
  • Slightly refactor stdlib test harness setup/teardown (e.g., env locks, workspace helpers) to work with the new config/registration API.
Cargo.toml
Cargo.lock
test_support/src/lib.rs

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

codescene-delta-analysis[bot]

This comment was marked as outdated.

…ve executables

- Implemented a deterministic executable discovery filter and function 'which' in MiniJinja
- Supports options: all matches, canonical paths, fresh cache bypass, and cwd_mode
- Added an LRU cache with self-healing for stale entries
- Emits actionable diagnostics with PATH previews for troubleshooting
- Fully covers Windows and POSIX path behavior with extensive unit and integration tests
- Updated documentation, roadmap, and user guides with 'which' usage and semantics
- Integrated into stdlib module and registered in MiniJinja environment

This feature allows templates to resolve executables from PATH safely and efficiently, improving script flexibility and debugging.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos force-pushed the terragon/implement-which-filter-u5itpo branch from 291212e to ef74422 Compare November 18, 2025 19:03
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Nov 18, 2025

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix:

src/stdlib/which.rs

Comment on file

//! Cross-platform `which` filter and helper function for `MiniJinja`.

❌ New issue: Low Cohesion
This module has at least 11 different responsibilities amongst its 35 functions, threshold = 3

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Nov 18, 2025

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix:

tests/std_filter_tests/which_filter_tests.rs

Comment on lines +95 to +111

fn which_filter_returns_first_match() -> Result<()> {
    let (_temp, root) = support::filter_workspace()?;
    let first = root.join("bin_first");
    let second = root.join("bin_second");
    std::fs::create_dir_all(first.as_std_path())?;
    std::fs::create_dir_all(second.as_std_path())?;
    write_tool(&first, "helper")?;
    write_tool(&second, "helper")?;
    let _path = PathEnv::new(&[first.clone(), second.clone()])?;
    let (mut env, mut state) = fallible::stdlib_env_with_state()?;
    state.reset_impure();
    let output = render(&mut env, "{{ 'helper' | which }}")?;
    assert_eq!(output, first.join(Utf8Path::new(&tool_name("helper"))).as_str());
    assert!(!state.is_impure());
    drop(temp);
    Ok(())
}

❌ New issue: Code Duplication
The module contains 2 functions with similar structure: which_filter_all_returns_all_matches,which_filter_returns_first_match

codescene-delta-analysis[bot]

This comment was marked as outdated.

@coderabbitai

This comment was marked as resolved.

@coderabbitai

This comment was marked as resolved.

Removed the monolithic src/stdlib/which.rs file and reorganized the codebase into several focused modules:
- cache.rs: Cache management for resolved executable paths
- env.rs: Environment snapshot and PATH parsing
- error.rs: Error generation utilities
- lookup.rs: Lookup resolver logic
- mod.rs: Module facade and filter/function registration
- options.rs: Options parsing and management

This restructuring improves code maintainability and clarity without changing external behavior. Added corresponding tests to verify functionality remains consistent.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Nov 18, 2025

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix:

tests/std_filter_tests/which_filter_tests.rs

Comment on file

use anyhow::{Context, Result, anyhow};

❌ New issue: String Heavy Function Arguments
In this module, 53.8% of all arguments to its 14 functions are strings. The threshold for string arguments is 39.0%

Use NewTypes to model domain values and eliminate "string soup". Reach for
newt-hype when introducing many homogeneous wrappers that share behaviour;
add small shims such as From<&str> and AsRef<str> for string-backed
wrappers. For path-centric wrappers implement AsRef<Path> alongside
into_inner() and to_path_buf(), avoid attempting
impl From<Wrapper> for PathBuf because of the orphan rule. Prefer explicit
tuple structs whenever bespoke validation or tailored trait surfaces are
required, customizing Deref, AsRef, and TryFrom per type. Use
the-newtype when defining traits and needing blanket implementations that
apply across wrappers satisfying Newtype + AsRef/AsMut<Inner>, or when
establishing a coherent internal convention that keeps trait forwarding
consistent without per-type boilerplate. Combine approaches: lean on
newt-hype for the common case, tuple structs for outliers, and
the-newtype to unify behaviour when owning the trait definitions.

@coderabbitai

This comment was marked as resolved.

…tests

- Migrate behavioral tests from cucumber to rstest-bdd
- Introduce new fixtures and modules for scenario handling
- Refactor CliWorld into smaller state holders with mutex guards
- Improve guarantees on resource cleanup via RAII-style fixtures
- Update documentation and user guide to reflect new BDD framework
- Modify which filter tests to use new fixture helpers

This change consolidates behavior tests to run via the standard cargo test harness, simplifying CI and enabling reuse of fixtures and lint configs. Maintains cucumber in parallel during migration to ensure feature parity.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

…plate newtypes

- Introduce ToolName, DirName, and Template structs in which filter tests for stronger typing and clearer intent
- Update all related test functions to use these newtypes instead of raw strings
- Adjust helper functions accordingly to accept newtypes
- Minor code style cleanup in which/env.rs and which/lookup.rs imports

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos changed the title Add cross-platform which filter with tests Implement cross-platform which filter for MiniJinja with tests Nov 20, 2025
Replaced the manual Default trait implementation for the CwdMode enum with the #[derive(Default)] attribute, specifying Auto as the default variant. This simplifies the code and leverages Rust's built-in derive functionality.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos changed the title Implement cross-platform which filter for MiniJinja with tests Implement cross-platform which filter and helper for MiniJinja Nov 22, 2025
Changed multiple 'And' steps to 'Given' in the stdlib.feature test scenarios to improve clarity and adherence to Gherkin syntax for preconditions.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
…238)

* feat(stdlib/which): add workspace fallback to 'which' command lookup

Enhance the 'which' command resolution logic to perform a recursive search of the
workspace directory if the PATH environment variable is empty. This ensures that
commands can be located within the current workspace when the standard PATH lookup fails.

- Added optional CWD override support in WhichResolver and EnvSnapshot to influence
  environment capturing.
- Integrated 'walkdir' crate for recursive directory traversal.
- Implemented search_workspace function to find executables recursively in workspace.
- Modified lookup behavior to try workspace search upon PATH miss when appropriate.
- Updated stdlib registration to pass current workspace root as CWD override.
- Improved error handling and tests accordingly.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

* fix(tests): remove run_and_exit calls in cucumber tests

Changed .run_and_exit() to .run() in cucumber test execution to prevent premature process exit, enabling all tests to run as intended.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

* refactor(stdlib_assert): extract assertion helpers for stdlib output

- Added a new module `stdlib_assert` in `test_support/src` that provides helper
functions for asserting stdlib rendering outputs.
- Replaced ad-hoc stdlib output/error checking in `stdlib_steps` with calls to
this new helper, improving code reuse and clarity.
- Included unit tests for the new assertion helpers to ensure correct behavior.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

* feat(which): add workspace fallback and caching to which resolver

- Implement fallback search in workspace directory when PATH is empty
- Skip heavy directories like .git and target during workspace search
- Add small LRU cache for which resolver keyed by command, cwd, PATH, etc.
- Improve error handling and debug logs for unreadable workspace entries
- Enhance which filter tests to cover workspace fallback and skipping heavy dirs
- Add detailed documentation diagrams for which resolver flow and configuration wiring
- Introduce comprehensive tests for workspace search functionality

This enhancement improves command resolution robustness in empty PATH environments by searching the workspace, optimizing with caching, and clarifying behavior through documentation.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>

---------

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos marked this pull request as ready for review November 24, 2025 00:53
sourcery-ai[bot]

This comment was marked as resolved.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/stdlib/which/lookup.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (8)
src/stdlib/register.rs (2)

1-8: Module docs now enumerate the wired helpers.

The documentation mentions file tests, path helpers, collections, time functions, network fetch, command wrappers, and the which filter/function, addressing the previous concern about listing all registered helpers.


110-136: Device-oriented tests correctly excluded on non-Unix platforms.

The use of cfg-gated FILE_TESTS arrays ensures that device-related predicates are only registered on Unix, resolving the concern about misleading API semantics on other platforms. This approach is cleaner than gating the registration loop.

src/stdlib/mod.rs (1)

1-8: Module docs comprehensively list the stdlib surface.

The documentation now explicitly mentions the which helper, time helpers, and all re-exported configuration and registration entrypoints, addressing the previous concern about keeping docs in sync with the expanded surface.

src/stdlib/config.rs (5)

32-55: Constructor docs clarify invariant checking.

The documentation now explicitly states that validation failure "indicates a programming error in the baked-in constant rather than a runtime condition," addressing the previous concern about the panic contract.


64-69: Builder correctly returns error rather than panicking.

The use of ensure! to validate that the path is absolute addresses the previous concern about preferring explicit errors over assertions.


71-125: Builder methods use consistent validation style.

The unqualified ensure! and bail! macros are used consistently, and with_fetch_cache_relative accepts impl AsRef<Utf8Path>, addressing the previous concerns about API coherence.


195-216: Default implementation documents panic conditions.

The documentation explicitly enumerates the three panic scenarios and directs users to StdlibConfig::new for fallible construction, addressing the previous concern about documenting the panic contract.


231-332: Tests use rstest fixtures to reduce duplication.

The test module employs #[fixture] for shared setup (workspace, base_config) and #[rstest] with #[case] for parameterised validation tests, addressing the previous concern about test repetition.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aed5a18 and 02897e8.

📒 Files selected for processing (8)
  • src/manifest/mod.rs (2 hunks)
  • src/stdlib/config.rs (1 hunks)
  • src/stdlib/mod.rs (1 hunks)
  • src/stdlib/register.rs (1 hunks)
  • src/stdlib/which/lookup/mod.rs (1 hunks)
  • tests/std_filter_tests/network_functions.rs (1 hunks)
  • tests/std_filter_tests/which_filter_tests.rs (1 hunks)
  • tests/steps/stdlib_steps/rendering.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • tests/steps/stdlib_steps/rendering.rs
  • src/stdlib/register.rs
  • tests/std_filter_tests/network_functions.rs
  • tests/std_filter_tests/which_filter_tests.rs
  • src/stdlib/config.rs
  • src/stdlib/which/lookup/mod.rs
  • src/manifest/mod.rs
  • src/stdlib/mod.rs
🧬 Code graph analysis (8)
tests/steps/stdlib_steps/rendering.rs (2)
src/stdlib/config.rs (2)
  • new (41-55)
  • workspace (240-248)
src/stdlib/register.rs (1)
  • register_with_config (81-99)
src/stdlib/register.rs (2)
src/stdlib/config.rs (2)
  • new (41-55)
  • default (204-215)
src/stdlib/path/fs_utils.rs (1)
  • file_type_matches (36-49)
tests/std_filter_tests/network_functions.rs (1)
src/stdlib/config.rs (2)
  • new (41-55)
  • workspace (240-248)
tests/std_filter_tests/which_filter_tests.rs (1)
test_support/src/env_lock.rs (1)
  • acquire (17-22)
src/stdlib/config.rs (2)
src/stdlib/network/mod.rs (1)
  • policy (360-360)
src/cli_policy.rs (1)
  • network_policy (47-65)
src/stdlib/which/lookup/mod.rs (2)
src/stdlib/which/error.rs (2)
  • direct_not_found (23-30)
  • not_found_error (10-21)
src/stdlib/which/env.rs (1)
  • candidate_paths (148-165)
src/manifest/mod.rs (2)
src/stdlib/register.rs (1)
  • register_with_config (81-99)
src/stdlib/config.rs (1)
  • new (41-55)
src/stdlib/mod.rs (2)
src/stdlib/register.rs (3)
  • register (43-55)
  • register_with_config (81-99)
  • value_from_bytes (103-108)
src/stdlib/which/mod.rs (1)
  • register (27-51)
🔍 Remote MCP Ref

Summary of additional facts relevant to reviewing this PR

  • lru 0.16

    • LruCache is an in-memory LRU map (not thread-safe). Common APIs: LruCache::with_capacity, get, put, len, clear. Wrapping in Arc<Mutex<...>> is the expected approach for concurrent access (matches the PR's Arc<Mutex<LruCache<...>>> usage).
  • walkdir 2.5

    • WalkDir provides recursive directory traversal with combinators such as filter_entry, max_depth, follow_links, and an iterator that yields Result<DirEntry, Error>. Typical robust usage handles unreadable entries by filtering or ignoring iterator errors (e.g., into_iter().filter_map(|r| r.ok()) or inspecting DirEntry Results). filter_entry is the correct tool to skip heavy directories (e.g., "target") without descending. These behaviors are relevant to the PR's workspace fallback/search implementation.

Tools used

  • Ref_ref_search_documentation (search for lru and walkdir docs)
  • Ref_ref_read_url (lru docs page)
  • Ref_ref_read_url (walkdir docs page)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: release / build-macos (aarch64-apple-darwin, macos-14, macos-arm64, macos-aarch64) / Build macos artefacts
  • GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
  • GitHub Check: release / build-linux (aarch64-unknown-linux-gnu, linux-arm64, arm64) / Build linux artefacts
  • GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
  • GitHub Check: release / build-linux (x86_64-unknown-linux-gnu, linux-amd64, amd64) / Build linux artefacts
  • GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
  • GitHub Check: Sourcery review
  • GitHub Check: build-test (ubuntu-latest, stable)
🔇 Additional comments (18)
tests/std_filter_tests/which_filter_tests.rs (6)

13-63: Excellent use of NewType wrappers.

The ToolName, DirName, and Template newtypes provide strong typing and eliminate ambiguous string arguments in test signatures, exactly as outlined in the PR objectives. The implementations are clean and idiomatic.


65-91: LGTM – Proper environment isolation.

PathEnv correctly wraps environment mutations with EnvLock and VarGuard, adhering to the mandatory pattern for test-time env access. The RAII approach ensures cleanup even on panic.


93-147: LGTM – Helper functions are well-structured.

The helper functions are appropriately scoped, use proper error propagation with context, and handle platform-specific behaviour cleanly with conditional compilation.


149-185: LGTM – Fixture eliminates setup duplication.

WhichTestFixture successfully encapsulates the common test setup pattern (workspace creation, tool installation, PATH configuration), addressing PR objective #2. The structure is clear and maintainable.


306-325: Verify whether env::set_current_dir requires unsafe in Rust 2024.

The coding guidelines mandate that environment mutations (env::set_var, env::remove_var) are always unsafe in Rust 2024 edition and must be marked as such. Whilst set_current_dir mutates global process state, confirm whether it also requires unsafe marking or whether it's exempt from this requirement.

If set_current_dir is unsafe in Rust 2024, wrap lines 312 and 321 in unsafe blocks with a comment justifying the call, or refactor the test to use dependency injection with the mockable crate (preferred per guidelines).

Run the following script to check the Rust 2024 edition documentation for env::set_current_dir:

#!/bin/bash
# Description: Check if env::set_current_dir is unsafe in Rust 2024 edition

# Check the Rust standard library documentation for the function signature
rg -A 10 'pub fn set_current_dir' --type rust

# Also check for any unsafe markers in env module
rg -C 5 'unsafe.*set_current_dir' --type rust

254-303: LGTM – Edge cases well covered.

The tests for cwd_mode, missing commands, workspace fallback, and heavy-directory skipping provide thorough coverage of the which filter's behaviour and error handling.

tests/std_filter_tests/network_functions.rs (1)

25-35: LGTM: Error propagation correctly implemented.

The fallible config construction pattern is properly applied with ? operators for both StdlibConfig::new and with_workspace_root_path.

src/manifest/mod.rs (2)

109-112: LGTM: Correct direct Result propagation.

The removal of Ok(...) wrapping is correct—register_with_config returns Result<StdlibState>, which matches the binding's expected type.


236-238: LGTM: Builder chain correctly propagates errors.

The fallible builder methods are properly chained with ?, while with_network_policy (which returns Self) continues the chain without error propagation.

tests/steps/stdlib_steps/rendering.rs (1)

12-55: LGTM: Correct error handling throughout the config builder chain.

Both the fallible config construction (lines 21, 40) and the subsequent builder methods correctly propagate errors where needed.

src/stdlib/register.rs (3)

43-55: LGTM: Robust error handling with clear diagnostics.

The function correctly propagates errors from ambient directory access, cwd resolution, and UTF-8 validation, providing clear error messages at each step.


81-99: LGTM: Systematic stdlib component registration.

The registration orchestrates all stdlib modules correctly, with proper error propagation from the fallible which::register call.


138-166: LGTM: File type predicates correctly platform-gated.

The Unix-specific predicates (FIFO, block device, char device) are properly gated with #[cfg(unix)], ensuring they're only compiled on appropriate platforms.

src/stdlib/mod.rs (1)

10-27: LGTM: Clean module organization with appropriate re-exports.

The new module structure (config, register, which) is well-factored, and the re-exports maintain a discoverable public API surface.

src/stdlib/config.rs (3)

1-30: LGTM: Clear module structure with well-documented constants.

The public constants and StdlibConfig fields provide a clean configuration surface for stdlib helpers.


127-193: LGTM: Clean accessor and validation logic.

The validate_cache_relative method correctly rejects empty, absolute, and escaping paths, and into_components properly wires configuration into the network and command modules.


218-229: LGTM: NetworkConfig appropriately structures network module configuration.

The fields clearly capture the necessary parameters for network cache initialisation.

src/stdlib/which/lookup/mod.rs (1)

1-402: Excellent work addressing all past review comments.

The file length is now compliant (tests moved to a separate file), all pub(super) functions have comprehensive documentation, and the search_workspace function has been refactored with focused helper functions (should_visit_entry, process_workspace_entry, workspace_entry_matches), bringing cognitive complexity down to approximately 5—well under the guideline maximum of 9.

Comment thread src/stdlib/which/lookup/mod.rs Outdated
Comment thread tests/std_filter_tests/which_filter_tests.rs
Comment thread tests/std_filter_tests/which_filter_tests.rs
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Nov 24, 2025

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix:

src/stdlib/which/lookup/mod.rs

Comment on lines +225 to +269

fn search_workspace(
    cwd: &Utf8Path,
    command: &str,
    collect_all: bool,
    env: &EnvSnapshot,
) -> Result<Vec<Utf8PathBuf>, Error> {
    let mut matches = Vec::new();
    #[cfg(not(windows))]
    let _ = env;
    #[cfg(windows)]
    let match_ctx = prepare_workspace_match(command, env);
    #[cfg(not(windows))]
    let match_ctx = ();
    let walker = WalkDir::new(cwd)
        .follow_links(false)
        .sort_by_file_name()
        .into_iter()
        .filter_entry(should_visit_entry);

    for walk_entry in walker {
        let entry = match walk_entry {
            Ok(value) => value,
            Err(err) => {
                tracing::debug!(
                    %command,
                    error = %err,
                    "skipping unreadable workspace entry during which fallback"
                );
                continue;
            }
        };
        #[cfg(windows)]
        let maybe_match = process_workspace_entry(entry, command, &match_ctx)?;
        #[cfg(not(windows))]
        let maybe_match = process_workspace_entry(entry, command, match_ctx)?;

        if let Some(path) = maybe_match {
            matches.push(path);
            if !collect_all {
                break;
            }
        }
    }
    Ok(matches)
}

❌ New issue: Bumpy Road Ahead
search_workspace has 2 blocks with nested conditional logic. Any nesting of 2 or deeper is considered. Threshold is 2 blocks per function

@coderabbitai

This comment was marked as resolved.

… params

Changed the parameters of the workspace_entry_matches function to conditionally compile the 'command' argument for non-Windows targets and prefix it with underscore on Windows, similarly for 'ctx' argument. This improves platform-specific parameter usage and compilation cleanliness.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/stdlib/which/lookup/mod.rs (1)

229-233: Adopt cfg'd parameter types for consistency.

The function silences the unused env parameter with let _ = env; on line 233, whilst process_workspace_entry (lines 284-285) and workspace_entry_matches (lines 312-315) use cfg-gated parameter types. Apply the same pattern here for consistency.

Apply this diff to align with the established pattern:

 fn search_workspace(
     cwd: &Utf8Path,
     command: &str,
     collect_all: bool,
-    env: &EnvSnapshot,
+    #[cfg(windows)] env: &EnvSnapshot,
+    #[cfg(not(windows))] _env: (),
 ) -> Result<Vec<Utf8PathBuf>, Error> {
     let mut matches = Vec::new();
-    #[cfg(not(windows))]
-    let _ = env;
     #[cfg(windows)]
     let match_ctx = prepare_workspace_match(command, env);

Then update the call site in handle_miss (line 210):

-        let discovered = search_workspace(&env.cwd, command, options.all, env)?;
+        #[cfg(windows)]
+        let discovered = search_workspace(&env.cwd, command, options.all, env)?;
+        #[cfg(not(windows))]
+        let discovered = search_workspace(&env.cwd, command, options.all, ())?;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02897e8 and 799798f.

📒 Files selected for processing (1)
  • src/stdlib/which/lookup/mod.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/stdlib/which/lookup/mod.rs
🧬 Code graph analysis (1)
src/stdlib/which/lookup/mod.rs (2)
src/stdlib/which/error.rs (2)
  • direct_not_found (23-30)
  • not_found_error (10-21)
src/stdlib/which/env.rs (1)
  • candidate_paths (148-165)
🔍 Remote MCP Ref

Summary of additional facts relevant to reviewing this PR

  • LRU cache semantics and threading: lru 0.16 provides LruCache with constructors like LruCache::new or with_capacity and methods get/put/len/clear. The crate is not thread-safe; wrapping LruCache in Arc<Mutex<...>> (as done in WhichResolver) is the standard way to enable concurrent access. Source: Ref::Ref_ref_read_url (lru docs).

  • WalkDir traversal patterns and error handling: walkdir 2.5 provides WalkDir with filter_entry, max_depth, follow_links, and yields Iterator<Item = Result<DirEntry, Error>>. Robust traversal commonly handles unreadable entries by filtering iterator errors (e.g., filter_map(|r| r.ok())) or using filter_entry to skip directories like "target". This supports the PR's workspace search implementation and suggests validating handling of iterator errors and use of filter_entry to avoid descending heavy directories. Source: Ref::Ref_ref_read_url (walkdir docs).

Tools used:

  • Ref::Ref_ref_search_documentation (searched lru and walkdir docs)
  • Ref::Ref_ref_read_url (lru docs)
  • Ref::Ref_ref_read_url (walkdir docs)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: release / build-linux (aarch64-unknown-linux-gnu, linux-arm64, arm64) / Build linux artefacts
  • GitHub Check: release / build-macos (aarch64-apple-darwin, macos-14, macos-arm64, macos-aarch64) / Build macos artefacts
  • GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
  • GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
  • GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
  • GitHub Check: build-test (ubuntu-latest, stable)
  • GitHub Check: Sourcery review

Comment thread src/stdlib/which/lookup/mod.rs
Comment thread src/stdlib/which/lookup/mod.rs Outdated
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Nov 24, 2025

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix:

Bumpy Road Ahead

src/stdlib/which/lookup/mod.rs: search_workspace

What lead to degradation?

search_workspace has 2 blocks with nested conditional logic. Any nesting of 2 or deeper is considered. Threshold is 2 blocks per function

Why does this problem occur?

A Bumpy Road is a function that contains multiple chunks of nested conditional logic inside the same function. The deeper the nesting and the more bumps, the lower the code health.
A bumpy code road represents a lack of encapsulation which becomes an obstacle to comprehension. In imperative languages there’s also an increased risk for feature entanglement, which leads to complex state management. CodeScene considers the following rules for the code health impact: 1) The deeper the nested conditional logic of each bump, the higher the tax on our working memory. 2) The more bumps inside a function, the more expensive it is to refactor as each bump represents a missing abstraction. 3) The larger each bump – that is, the more lines of code it spans – the harder it is to build up a mental model of the function. The nesting depth for what is considered a bump is levels of conditionals.

How to fix it?

Bumpy Road implementations indicate a lack of encapsulation. Check out the detailed description of the Bumpy Road code health issue.
A Bumpy Road often suggests that the function/method does too many things. The first refactoring step is to identify the different possible responsibilities of the function. Consider extracting those responsibilities into smaller, cohesive, and well-named functions. The EXTRACT FUNCTION refactoring is the primary response.

@coderabbitai

This comment was marked as resolved.

Extract the workspace fallback search logic from mod.rs into a dedicated workspace.rs module. This change organizes code better by separating concerns, improves code readability and maintainability without altering functionality.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
… error handling

- Extracted workspace match collection into a new function `collect_workspace_matches` for clarity.
- Enhanced error handling during directory entry iteration with improved tracing.
- Removed platform-dependent conditional compilation from iterator construction, centralizing logic.
- Maintained functionality to collect all matches or stop after first match.
- Cleaned up code structure to simplify maintenance and readability.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 799798f and f1a343e.

📒 Files selected for processing (2)
  • src/stdlib/which/lookup/mod.rs (1 hunks)
  • src/stdlib/which/lookup/workspace.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/stdlib/which/lookup/workspace.rs
  • src/stdlib/which/lookup/mod.rs
🧬 Code graph analysis (2)
src/stdlib/which/lookup/workspace.rs (2)
src/stdlib/which/lookup/mod.rs (1)
  • is_executable (195-198)
src/stdlib/which/env.rs (1)
  • candidate_paths (148-165)
src/stdlib/which/lookup/mod.rs (3)
src/stdlib/which/error.rs (2)
  • direct_not_found (23-30)
  • not_found_error (10-21)
src/stdlib/which/lookup/workspace.rs (1)
  • search_workspace (14-61)
src/stdlib/which/env.rs (1)
  • candidate_paths (148-165)
🪛 GitHub Actions: CI
src/stdlib/which/lookup/workspace.rs

[error] 41-62: cargo fmt --all -- --check detected formatting diffs. Run 'cargo fmt' (or 'cargo fmt --all') to format the code.

🪛 GitHub Check: build-test (ubuntu-latest, 1.89.0)
src/stdlib/which/lookup/workspace.rs

[warning] 41-41:
Diff in /home/runner/work/netsuke/netsuke/src/stdlib/which/lookup/workspace.rs

🪛 GitHub Check: build-test (ubuntu-latest, stable)
src/stdlib/which/lookup/workspace.rs

[warning] 41-41:
Diff in /home/runner/work/netsuke/netsuke/src/stdlib/which/lookup/workspace.rs

🔍 Remote MCP Ref

Relevant concrete facts to aid review

  • lru crate (0.16)

    • LruCache is not Sync/Send by itself; typical concurrent usage wraps it in Arc<Mutex<LruCache<..>>> — matches the PR’s Arc<Mutex<...>> choice for WhichResolver.
    • Primary API: constructors like LruCache::new / LruCache::with_capacity, and operations get (promotes to recent), put/put_or_get (insert/replace), len, pop_lru, clear. Capacity is fixed; inserting past capacity evicts oldest entries.
    • Implication: locking granularity (Mutex per cache) and eviction semantics are important review points (contention, eviction policy, cloning of keys/values on access).
    • Source: lru docs.
  • walkdir crate (2.5)

    • WalkDir yields iterator of Result<DirEntry, Error>; unreadable entries produce Err items that the caller must handle (e.g., filter_map(|r| r.ok()) to skip).
    • WalkDir supports filter_entry to skip descending into directories (useful to exclude WORKSPACE_SKIP_DIRS like .git and target) and max_depth/follow_links options to control traversal cost.
    • Implication: workspace search should explicitly handle iterator Err values and prefer filter_entry to avoid expensive descent; current module uses WalkDir — verify it handles DirEntry errors and uses filter_entry to skip heavy dirs.
    • Source: walkdir docs.

Tools used

  • Documentation search for lru and walkdir
  • Read lru API page
  • Read walkdir API page
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: release / build-macos (aarch64-apple-darwin, macos-14, macos-arm64, macos-aarch64) / Build macos artefacts
  • GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
  • GitHub Check: release / build-linux (aarch64-unknown-linux-gnu, linux-arm64, arm64) / Build linux artefacts
  • GitHub Check: release / build-linux (x86_64-unknown-linux-gnu, linux-amd64, amd64) / Build linux artefacts
  • GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
  • GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
  • GitHub Check: Sourcery review
🔇 Additional comments (1)
src/stdlib/which/lookup/mod.rs (1)

1-261: Approve the coordinating module implementation.

This file correctly wires together the workspace fallback, PATH resolution, and platform-specific executable detection logic. Past review concerns (missing documentation, excessive cognitive complexity, file length) have been addressed by extracting workspace logic to a separate module and adding comprehensive doc comments. The cfg-gated platform handling is sound, error propagation is correct, and the public API is well-documented.

Comment thread src/stdlib/which/lookup/workspace.rs
Comment thread src/stdlib/which/lookup/workspace.rs Outdated
Comment thread src/stdlib/which/lookup/workspace.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f1a343e and 108c0a7.

📒 Files selected for processing (1)
  • src/stdlib/which/lookup/workspace.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/stdlib/which/lookup/workspace.rs
🧬 Code graph analysis (1)
src/stdlib/which/lookup/workspace.rs (2)
src/stdlib/which/lookup/mod.rs (1)
  • is_executable (195-198)
src/stdlib/which/env.rs (1)
  • candidate_paths (148-165)
🔍 Remote MCP Ref

Relevant facts to aid review

  • lru 0.16 behavior and concurrency implications:

    • LruCache is not thread-safe by itself; common pattern is Arc<Mutex<LruCache<..>>> (matches PR's choice). API includes LruCache::with_capacity, get (which promotes an entry), put (insert/replace), pop_lru (evict oldest). Eviction occurs when inserting past capacity (oldest entry removed). Review points: lock contention on Mutex, whether get/put cloning semantics are acceptable, and eviction interactions with cache_key composition.
  • walkdir 2.5 traversal semantics relevant to workspace search:

    • WalkDir yields an iterator of Result<DirEntry, Error>; unreadable entries appear as Err and must be handled (e.g., filter_map(|r| r.ok())). WalkDir supports filter_entry to avoid descending into skipped directories (useful to exclude heavy dirs like .git or target). Review points: ensure the PR handles Err items from the iterator, uses filter_entry to avoid expensive traversal, and correctly skips/filters entries to bound cost.

Sources consulted

  • lru crate docs via Ref_ref_read_url.
  • walkdir crate docs via Ref_ref_read_url.
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review

Comment thread src/stdlib/which/lookup/workspace.rs
Comment thread src/stdlib/which/lookup/workspace.rs
Comment thread src/stdlib/which/lookup/workspace.rs
Comment thread src/stdlib/which/lookup/workspace.rs
Comment thread src/stdlib/which/lookup/workspace.rs
Comment thread src/stdlib/which/lookup/workspace.rs
…ific matching

- Implement workspace traversal to find executables respecting VCS and build directories.
- Introduce platform-specific matching contexts to handle case sensitivity on Windows vs Unix.
- Add skip logic for heavy directories like .git and target to keep scans fast.
- Enhance tests to cover workspace filtering, PATH resolution, and diagnostic output.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
coderabbitai[bot]

This comment was marked as resolved.

- Introduce conditional compilation to enable workspace search on non-Windows platforms by passing unit () instead of EnvSnapshot.
- Refactor workspace search to support exact case-sensitive matching on Unix-like systems and case-insensitive PATHEXT-aware matching on Windows.
- Enhance tests to run workspace search on both Windows and non-Windows environments.
- Document and encapsulate platform-specific filename matching logic into separate functions to unify and clarify behavior.
- This enables improved command lookup support across platforms by searching the workspace with appropriate matching logic.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
coderabbitai[bot]

This comment was marked as resolved.

Reordered the imports of env and EnvSnapshot under the Windows cfg to maintain consistent style and grouping.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
- Documented the `search_workspace` function detailing parameters and behavior.
- Added doc comments to `collect_workspace_matches` explaining its parameters and return values.
- Improved clarity on platform-specific matching and error handling in comments.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos merged commit 342d645 into main Nov 24, 2025
15 checks passed
@leynos leynos deleted the terragon/implement-which-filter-u5itpo branch November 24, 2025 19:21
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Nov 24, 2025

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The which resolver cache uses a hard-coded CACHE_CAPACITY of 64; consider plumbing this through StdlibConfig so callers with very large or very small PATHs can tune cache behaviour explicitly.
  • The workspace fallback in which::lookup::workspace::search_workspace only skips .git and target; you may want to extend or make this skip list configurable to avoid expensive scans in common heavy directories like node_modules, .idea, or .vscode.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `which` resolver cache uses a hard-coded `CACHE_CAPACITY` of 64; consider plumbing this through `StdlibConfig` so callers with very large or very small PATHs can tune cache behaviour explicitly.
- The workspace fallback in `which::lookup::workspace::search_workspace` only skips `.git` and `target`; you may want to extend or make this skip list configurable to avoid expensive scans in common heavy directories like `node_modules`, `.idea`, or `.vscode`.

## Individual Comments

### Comment 1
<location> `src/stdlib/register.rs:121` </location>
<code_context>
-    ft.is_fifo()
-}
-
-#[cfg(not(unix))]
-fn is_fifo(_ft: fs::FileType) -> bool {
-    false
</code_context>

<issue_to_address>
**issue (bug_risk):** Dropping non-Unix file tests changes behavior compared to previous always-false implementations.

On non-Unix, these tests (pipe, block_device, char_device, device) used to exist but always returned false; now they aren’t registered at all, so templates that reference them will fail at render time instead of evaluating to false. If that behavior change isn’t intended, please keep these tests defined on non-Unix and continue returning false to preserve compatibility.
</issue_to_address>

### Comment 2
<location> `src/stdlib/which/lookup/workspace.rs:41-46` </location>
<code_context>
+    #[cfg(not(windows))]
+    let match_ctx = ();
+
+    let entries = WalkDir::new(cwd)
+        .follow_links(false)
+        .sort_by_file_name()
+        .into_iter()
+        .filter_entry(should_visit_entry)
+        .filter_map(|walk_entry| {
+            walk_entry
+                .map_err(|err| {
</code_context>

<issue_to_address>
**suggestion (performance):** Unbounded workspace traversal when PATH is empty could be expensive on large trees.

The fallback in `search_workspace` currently walks the entire workspace from `cwd`, only skipping a few directories via `should_visit_entry`. On very large workspaces, this can be quite expensive when PATH is empty or misconfigured. Consider limiting traversal depth, expanding the skip list (e.g., `node_modules`, `dist`), or making this fallback opt-in to avoid unexpected latency.

Suggested implementation:

```rust
    // Limit traversal depth to avoid expensive walks on very large workspaces when PATH is empty
    // or misconfigured. This keeps the fallback from degenerating into a full-tree scan.
    let entries = WalkDir::new(cwd)
        .follow_links(false)
        .max_depth(6)
        .sort_by_file_name()
        .into_iter()
        .filter_entry(should_visit_entry)

```

To fully address the review comment, you should also:
1. Update `should_visit_entry` (in the same file or its module) to skip additional high-churn/large directories such as `node_modules`, `dist`, `build`, `.git`, `target`, etc.
2. Consider making the workspace-fallback search opt-in or configurable (e.g., via an environment variable or config flag), and gate this entire fallback branch (the `WalkDir` traversal) behind that flag to avoid surprising latency for users who do not expect a full workspace scan.
3. Optionally, log when the depth limit is hit or when the fallback is used (e.g., `tracing::debug!(%command, "using workspace which fallback with depth limit");`) so users can diagnose unexpected latency or missed matches.
</issue_to_address>

### Comment 3
<location> `tests/std_filter_tests/which_filter_tests.rs:191` </location>
<code_context>
+}
+
+#[rstest]
+fn which_filter_returns_first_match() -> Result<()> {
+    let mut fixture = WhichTestFixture::with_tool_in_dirs(
+        &ToolName::from("helper"),
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests to exercise the which resolver cache and `fresh` option semantics.

These tests cover basic lookup but not the caching semantics described in the PR (LRU cache keyed by command, PATH, PATHEXT, cwd, options, plus `fresh` bypass). Please add tests that:
- call `which` twice for the same command, then remove the executable and verify the second call still succeeds (proving the cached result is used),
- then call `which` with `fresh = true` after removal and verify a `not_found` error (proving the cache is bypassed).

This will lock in the intended cache/`fresh` behavior and guard against regressions in the resolver logic.

Suggested implementation:

```rust
#[rstest]
fn which_filter_uses_cached_result_when_executable_removed() -> Result<()> {
    let mut fixture = WhichTestFixture::with_tool_in_dirs(
        &ToolName::from("helper"),
        &[DirName::from("bin_first"), DirName::from("bin_second")],
    )?;
    fixture.state.reset_impure();

    // First lookup populates the cache.
    let first = fixture.render(&Template::from("{{ 'helper' | which }}"))?;
    assert_eq!(first, fixture.paths[0].as_str());

    // Remove the executable from disk.
    std::fs::remove_file(&fixture.paths[0])?;

    fixture.state.reset_impure();

    // Second lookup should still succeed because the result is served from cache.
    let second = fixture.render(&Template::from("{{ 'helper' | which }}"))?;
    assert_eq!(second, fixture.paths[0].as_str());

    Ok(())
}

#[rstest]
fn which_filter_fresh_bypasses_cache_after_executable_removed() -> Result<()> {
    let mut fixture = WhichTestFixture::with_tool_in_dirs(
        &ToolName::from("helper"),
        &[DirName::from("bin_first"), DirName::from("bin_second")],
    )?;
    fixture.state.reset_impure();

    // Populate cache.
    let first = fixture.render(&Template::from("{{ 'helper' | which }}"))?;
    assert_eq!(first, fixture.paths[0].as_str());

    // Remove the executable from disk.
    std::fs::remove_file(&fixture.paths[0])?;

    fixture.state.reset_impure();

    // A fresh lookup should bypass the cache and fail because the executable is gone.
    let result = fixture.render(&Template::from("{{ 'helper' | which(fresh=true) }}"));
    assert!(result.is_err(), "expected fresh which lookup to fail after removal");

    Ok(())
}

#[rstest]

```

To fully align these tests with your implementation, you may need to:
1. Adjust the `fresh` option syntax in the template if your filter expects a different argument form (e.g., `{{ 'helper' | which(fresh=true) }}` vs `{{ 'helper' | which(fresh=True) }}` or positional args).
2. If the error type or message for a `not_found` condition is more specific than a generic `Err`, refine the assertion:
   - For example, downcast the error to your custom error type or assert that `err.to_string()` contains your canonical `not_found` wording.
3. If `fixture.paths[0]` is not the exact on-disk path to the executable, replace `std::fs::remove_file(&fixture.paths[0])?` with the correct path field or helper on `WhichTestFixture` that targets the actual binary location.
4. If your cache logic is supposed to mark the state as pure/impure in a specific way on cached vs fresh lookups, you can extend these tests with additional `assert!(fixture.state.is_impure())` / `assert!(!fixture.state.is_impure())` checks that match your semantics.
</issue_to_address>

### Comment 4
<location> `tests/std_filter_tests/which_filter_tests.rs:258` </location>
<code_context>
+}
+
+#[rstest]
+fn which_function_honours_cwd_mode() -> Result<()> {
+    let (_temp, root) = support::filter_workspace()?;
+    let tool = write_tool(&root, &ToolName::from("local"))?;
</code_context>

<issue_to_address>
**suggestion (testing):** Add negative tests for invalid `cwd_mode` values to cover argument validation and diagnostics.

Currently only the `cwd_mode="always"` success path is tested. Please also add tests that:
- call `{{ which('tool', cwd_mode='invalid') }}` and assert the error contains the `netsuke::jinja::which::args` code and a clear message about allowed values;
- optionally verify the intended case-sensitivity behaviour (e.g. whether `"Auto"` / `"ALWAYS"` are accepted or rejected).

This will fully cover option validation and diagnostics from templates.

Suggested implementation:

```rust
#[rstest]
fn which_function_honours_cwd_mode() -> Result<()> {
    let (_temp, root) = support::filter_workspace()?;
    let tool = write_tool(&root, &ToolName::from("local"))?;
    let _path = PathEnv::new(&[])?;
    let (mut env, _state) = fallible::stdlib_env_with_state()?;
    let template = Template::from("{{ which('local', cwd_mode='always') }}");
    let output = render(&mut env, &template)?;
    assert_eq!(output, tool.as_str());
    Ok(())
}

#[rstest]
fn which_function_rejects_invalid_cwd_mode() -> Result<()> {
    let (_temp, _root) = support::filter_workspace()?;
    let _path = PathEnv::new(&[])?;
    let (mut env, _state) = fallible::stdlib_env_with_state()?;
    let template = Template::from("{{ which('local', cwd_mode='invalid') }}");

    let err = render(&mut env, &template).expect_err("expected invalid cwd_mode to fail");

    let message = err.to_string();
    assert!(
        message.contains("netsuke::jinja::which::args"),
        "expected error code in message, got: {message}"
    );
    assert!(
        message.contains("cwd_mode"),
        "expected parameter name in message, got: {message}"
    );
    assert!(
        message.contains("allowed"),
        "expected explanation of allowed values in message, got: {message}"
    );

    Ok(())
}

#[rstest]
fn which_function_rejects_mismatched_casing_for_cwd_mode() -> Result<()> {
    let (_temp, _root) = support::filter_workspace()?;
    let _path = PathEnv::new(&[])?;
    let (mut env, _state) = fallible::stdlib_env_with_state()?;
    let template = Template::from("{{ which('local', cwd_mode='Always') }}");

    let err = render(&mut env, &template).expect_err("expected incorrectly cased cwd_mode to fail");

    let message = err.to_string();
    assert!(
        message.contains("netsuke::jinja::which::args"),
        "expected error code in message, got: {message}"
    );
    assert!(
        message.contains("cwd_mode"),
        "expected parameter name in message, got: {message}"
    );

    Ok(())
}

#[rstest]
fn which_filter_reports_missing_command() -> Result<()> {

```

These tests assume:
- `render` returns an error type that implements `Display` and whose string representation includes the diagnostic code `netsuke::jinja::which::args`.
- The error message for invalid `cwd_mode` mentions the word "allowed" when describing valid values, and includes the parameter name `cwd_mode`.

If the actual diagnostics use different wording (for example, listing valid values like `"always", "auto", "never"` without the word "allowed"), please adjust the `assert!(message.contains(...))` checks to match the real error text or codes used in your project. You may also want to:
- Tighten the assertions to check for specific allowed values (e.g. `message.contains(r#""always", "auto", "never""#)`).
- Change the casing test to `cwd_mode='ALWAYS'` or another variant if your intended case-sensitivity behaviour differs (accepting or rejecting particular casings).
</issue_to_address>

### Comment 5
<location> `tests/std_filter_tests/which_filter_tests.rs:270` </location>
<code_context>
+}
+
+#[rstest]
+fn which_filter_reports_missing_command() -> Result<()> {
+    let (_temp, _root) = support::filter_workspace()?;
+    let _path = PathEnv::new(&[])?;
</code_context>

<issue_to_address>
**suggestion (testing):** Also assert the shape/type of the `which` output in `all=true` mode, not just the string join.

Right now `all=true` is only verified via `{{ 'helper' | which(all=true) | join('|') }}`, which checks contents/order but not that the filter actually returns a list. Please add a test that renders something like `{{ which('helper', all=true) | tojson }}` (or inspects the `Value` in Rust) and asserts that the result is a list/JSON array, so the tests enforce the documented contract for `all=true`.

Suggested implementation:

```rust
    let template = Template::from("{{ 'helper' | which(all=true) | join('|') }}");
    let output = render(&mut env, &template)?;
    assert_eq!(output, expected);
    Ok(())
}

#[rstest]
fn which_filter_all_returns_list() -> Result<()> {
    let (_temp, _root) = support::filter_workspace()?;
    let _path = PathEnv::new(&[])?;
    let (mut env, _state) = fallible::stdlib_env_with_state()?;

    // Render which(..., all=true) directly and encode as JSON so we can assert the shape.
    let template = Template::from("{{ which('helper', all=true) | tojson }}");
    let output = render(&mut env, &template)?;

    // Ensure the result is a JSON array, enforcing that all=true returns a list.
    let value: serde_json::Value = serde_json::from_str(&output)?;
    assert!(
        value.is_array(),
        "expected which(..., all=true) to return a JSON array, got {value:?}"
    );

    Ok(())
}

```

To compile successfully, ensure that `serde_json::Value` and `serde_json::from_str` are in scope for this test module. If they are not already imported at the top of `tests/std_filter_tests/which_filter_tests.rs`, add:

- `use serde_json;`

or, more specifically:

- `use serde_json::Value;`
- `use serde_json::from_str;`

and then adjust the code above to use those imports (e.g. `let value: Value = from_str(&output)?;`) if you prefer more concise names, while keeping the same test logic.
</issue_to_address>

### Comment 6
<location> `src/stdlib/which/lookup/tests.rs:1` </location>
<code_context>
+//! Tests for the which lookup helpers, covering PATH search, workspace
+//! fallback, canonicalisation, and platform-specific PATHEXT behaviour.
+
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding targeted tests for PATH/PATHEXT parsing edge cases and direct-path lookup behaviour.

Current lookup tests already cover workspace fallback and (on Windows) PATHEXT handling via `resolve_direct_appends_pathext`, but a few specific cases are still untested:
- PATH parsing:
  - a PATH entry with invalid UTF-8 should trigger the `netsuke::jinja::which::args` error from `EnvSnapshot::capture`/PATH parsing;
  - relative PATH entries that resolve against `cwd`, to confirm they’re normalised correctly in `resolved_dirs`.
- PATHEXT parsing on Windows:
  - behaviour when `PATHEXT` is empty or has entries without a leading dot (e.g. `"COM;EXE"`), including deduplication and the default fallback when the parsed list is empty.
- Direct-path behaviour on POSIX:
  - a direct path that exists but is not executable should raise `direct_not_found`.

A few focused unit tests for these would significantly strengthen coverage of the low-level resolution rules beyond what higher-level filter tests exercise implicitly.

Suggested implementation:

```rust
use super::*;
use anyhow::{Context, Result, anyhow, ensure};
use rstest::{fixture, rstest};
use std::fs;
use std::env;
#[cfg(unix)]
use std::os::unix::fs::PermissionsExt;
#[cfg(unix)]
use std::os::unix::ffi::OsStrExt;
use tempfile::TempDir;

struct TempWorkspace {
    root: Utf8PathBuf,
    _tempdir: TempDir,
}

#[cfg(unix)]
#[test]
fn path_with_invalid_utf8_triggers_args_error() -> Result<()> {
    // Construct a PATH with an invalid UTF-8 entry and ensure EnvSnapshot
    // surfaces the netsuke::jinja::which::args error from PATH parsing.
    let tempdir = tempfile::tempdir()?;
    let cwd = Utf8PathBuf::from_path_buf(tempdir.path().to_path_buf())
        .expect("tempdir path should be valid UTF-8");

    // "/bin:\xFF" – invalid UTF-8 in the second entry.
    let invalid_path = std::ffi::OsStr::from_bytes(b"/bin:\xFF");
    env::set_var("PATH", invalid_path);

    let err = EnvSnapshot::capture(&cwd).expect_err("invalid PATH should fail EnvSnapshot::capture");
    let msg = format!("{err:#}");

    assert!(
        msg.contains("netsuke::jinja::which::args"),
        "expected netsuke::jinja::which::args error, got: {msg}"
    );

    Ok(())
}

#[test]
fn relative_path_entries_resolve_against_cwd() -> Result<()> {
    // Relative PATH entries should be resolved against cwd and normalised
    // into EnvSnapshot.resolved_dirs (or equivalent field).
    let tempdir = tempfile::tempdir()?;
    let root = Utf8PathBuf::from_path_buf(tempdir.path().to_path_buf())
        .expect("tempdir path should be valid UTF-8");

    let rel_bin = root.join("bin");
    let rel_tools = root.join("tools/..").join("tools"); // intentionally non-canonical
    fs::create_dir_all(&rel_bin)?;
    fs::create_dir_all(&rel_tools)?;

    // PATH with a mix of absolute and relative entries.
    // The relative entries should be resolved from cwd (`root`).
    let path_value = format!("{}:bin:tools", root);
    env::set_var("PATH", &path_value);

    let snapshot = EnvSnapshot::capture(&root)
        .context("EnvSnapshot::capture should succeed for relative PATH entries")?;

    // Adjust these field names depending on how resolved PATH dirs are exposed.
    let resolved_dirs = &snapshot.resolved_dirs;

    assert!(
        resolved_dirs.contains(&rel_bin.canonicalize_utf8()?),
        "resolved_dirs should contain canonicalised bin dir: {resolved_dirs:?}"
    );
    assert!(
        resolved_dirs.contains(&rel_tools.canonicalize_utf8()?),
        "resolved_dirs should contain canonicalised tools dir: {resolved_dirs:?}"
    );

    Ok(())
}

#[cfg(windows)]
#[test]
fn pathext_empty_uses_default_fallback() -> Result<()> {
    // When PATHEXT parses to an empty list, the implementation should fall
    // back to the default PATHEXT set.
    let tempdir = tempfile::tempdir()?;
    let cwd = Utf8PathBuf::from_path_buf(tempdir.path().to_path_buf())
        .expect("tempdir path should be valid UTF-8");

    // Empty PATHEXT means "no entries provided".
    env::set_var("PATHEXT", "");
    // Use a minimal PATH just to satisfy capture.
    env::set_var("PATH", &cwd);

    let snapshot = EnvSnapshot::capture(&cwd)
        .context("EnvSnapshot::capture should succeed for empty PATHEXT")?;

    // Adjust field / expected default set to match the production implementation.
    let pathexts = &snapshot.pathexts;

    // The default Windows PATHEXT set usually includes at least .COM and .EXE.
    assert!(
        pathexts.iter().any(|ext| ext.eq_ignore_ascii_case(".COM")),
        "default PATHEXT should include .COM, got: {pathexts:?}"
    );
    assert!(
        pathexts.iter().any(|ext| ext.eq_ignore_ascii_case(".EXE")),
        "default PATHEXT should include .EXE, got: {pathexts:?}"
    );

    Ok(())
}

#[cfg(windows)]
#[test]
fn pathext_without_leading_dots_is_normalised_and_deduplicated() -> Result<()> {
    // PATHEXT entries without a leading dot (e.g. 'COM;EXE;exe') should be
    // normalised to '.COM', '.EXE' and deduplicated.
    let tempdir = tempfile::tempdir()?;
    let cwd = Utf8PathBuf::from_path_buf(tempdir.path().to_path_buf())
        .expect("tempdir path should be valid UTF-8");

    env::set_var("PATHEXT", "COM;EXE;EXE; .BAT ;bat");
    env::set_var("PATH", &cwd);

    let snapshot = EnvSnapshot::capture(&cwd)
        .context("EnvSnapshot::capture should succeed for PATHEXT without leading dots")?;

    let mut pathexts = snapshot.pathexts.clone();
    pathexts.sort_unstable_by(|a, b| a.to_lowercase().cmp(&b.to_lowercase()));

    // The exact ordering may differ; we assert on set membership and dedup.
    fn contains_ci(list: &[String], needle: &str) -> bool {
        list.iter().any(|s| s.eq_ignore_ascii_case(needle))
    }

    assert!(contains_ci(&pathexts, ".COM"), "PATHEXT should contain normalised .COM: {pathexts:?}");
    assert!(contains_ci(&pathexts, ".EXE"), "PATHEXT should contain normalised .EXE: {pathexts:?}");
    assert!(contains_ci(&pathexts, ".BAT"), "PATHEXT should contain normalised .BAT: {pathexts:?}");

    // Ensure deduplication: at most one of each extension (case-insensitive).
    let mut lowercased: Vec<String> = pathexts.iter().map(|s| s.to_lowercase()).collect();
    lowercased.sort_unstable();
    lowercased.dedup();
    assert_eq!(
        lowercased.len(),
        pathexts.len(),
        "PATHEXT list should be deduplicated (case-insensitive): {pathexts:?}"
    );

    Ok(())
}

#[cfg(unix)]
#[test]
fn direct_path_not_executable_raises_direct_not_found() -> Result<()> {
    // A direct path (containing a path separator) that exists but is not
    // executable should raise the direct_not_found error rather than being
    // treated as a PATH-style lookup.
    let tempdir = tempfile::tempdir()?;
    let root = Utf8PathBuf::from_path_buf(tempdir.path().to_path_buf())
        .expect("tempdir path should be valid UTF-8");

    let script_path = root.join("script.sh");
    fs::write(&script_path, "#!/bin/sh\necho test\n")?;

    // Ensure the file is not executable: 0o644 (rw-r--r--).
    let mut perms = fs::metadata(&script_path)?.permissions();
    perms.set_mode(0o644);
    fs::set_permissions(&script_path, perms)?;

    env::set_var("PATH", &root); // PATH shouldn't matter for direct path

    let snapshot = EnvSnapshot::capture(&root)
        .context("EnvSnapshot::capture should succeed for direct path test")?;

    // Adjust this call to the actual direct-lookup helper in the module.
    let err = resolve_direct(&snapshot, &script_path)
        .expect_err("non-executable direct path should not resolve successfully");

    let msg = format!("{err:#}");
    assert!(
        msg.contains("direct_not_found"),
        "expected direct_not_found error for non-executable direct path, got: {msg}"
    );

    Ok(())
}

```

The new tests assume several names and public surfaces that may differ slightly from the actual implementation. To wire them up correctly, please:

1. EnvSnapshot API and fields:
   - Ensure `EnvSnapshot::capture(&cwd: &Utf8PathBuf)` (or similar) exists; if the signature differs (e.g. takes `&Utf8Path` or an explicit env snapshot), adjust the calls accordingly.
   - Expose or adjust:
     - `resolved_dirs`: a `Vec<Utf8PathBuf>` (or equivalent) of PATH directories already resolved against `cwd`. If the field has a different name or type, update the references in `relative_path_entries_resolve_against_cwd`.
     - `pathexts`: a `Vec<String>` (or similar) of parsed PATHEXT entries. If this is computed lazily or has a different field name, change the test to access the correct field or accessor method.

2. Error reporting:
   - The invalid-UTF8 PATH test (`path_with_invalid_utf8_triggers_args_error`) currently asserts on the formatted error containing `netsuke::jinja::which::args`. If you expose a typed error (e.g. `WhichError::Args` / `WhichErrorKind::Args`), prefer to match on the concrete type/kind instead and adjust the assertion.
   - For the direct path case, if there is a specific error type/variant for this (e.g. `WhichError::DirectNotFound` or a `WhichErrorKind::DirectNotFound`), update the assertion in `direct_path_not_executable_raises_direct_not_found` to match on that instead of the string `"direct_not_found"`.

3. Windows PATHEXT handling:
   - The tests assume that:
     - parsing an empty PATHEXT results in a default fallback list containing at least `.COM` and `.EXE`, exposed via `snapshot.pathexts`;
     - PATHEXT entries are normalised to include a leading dot, are case-insensitive, and are deduplicated.
   - If your default PATHEXT set or normalisation rules differ, update the expected values (or loosen the assertions) to reflect the actual semantics you intend to guarantee.

4. Direct-path resolution helper:
   - The test `direct_path_not_executable_raises_direct_not_found` calls a helper `resolve_direct(&snapshot, &script_path)`. Replace this with the actual direct-path lookup function exposed by the lookup module (for example, `lookup_direct`, `lookup_one`, `which`, etc.), making sure you pass all required parameters (workspace, environment snapshot, arguments) as appropriate.
   - If direct-path handling is only triggered when the user passes a string containing a path separator, and the function expects a command string rather than a path type, convert `script_path` back to the expected string type (e.g. `script_path.as_str()` or `script_path.to_string()`).

5. Imports and feature flags:
   - Confirm that `Utf8PathBuf` and `canonicalize_utf8` are already imported from `camino`; if they are not, add `use camino::{Utf8PathBuf, Utf8Path};` and adjust the canonicalisation calls to use the existing utilities in the codebase.
   - If you already have helper functions/fixtures in this test module to create temporary workspaces and environment snapshots, consider refactoring the new tests to reuse those helpers instead of constructing tempdirs and calling `EnvSnapshot::capture` directly, to keep the style consistent with the rest of the file.
</issue_to_address>

### Comment 7
<location> `docs/netsuke-design.md:964-965` </location>
<code_context>
+The resolver keeps a small LRU cache keyed by the command, a fingerprint of
+`PATH`/`PATHEXT`, the working directory, and the cache-relevant options (`all`,
+`canonical`, `cwd_mode`). Entries are validated once at insertion; cache reads
+no longer re-probe executability, keeping the hot path lean. Because `fresh`
+only controls bypass behaviour it is stripped from the cache key so fresh
+lookups still repopulate the cache for subsequent calls. The fingerprint means
+environment changes invalidate keys without cloning large strings, and the
</code_context>

<issue_to_address>
**nitpick (typo):** Consider adding a comma after "behaviour" for smoother readability.

The clause after "behaviour" makes the sentence dense; adding a comma there would improve readability: "Because `fresh` only controls bypass behaviour, it is stripped..."

```suggestion
no longer re-probe executability, keeping the hot path lean. Because `fresh`
only controls bypass behaviour, it is stripped from the cache key so fresh
```
</issue_to_address>

### Comment 8
<location> `src/stdlib/which/lookup/workspace.rs:29` </location>
<code_context>
+/// Skips unreadable entries, ignores heavy/VCS directories via
+/// `should_visit_entry`, and returns `Ok(Vec<Utf8PathBuf>)` containing the
+/// discovered executables or an `Error` if UTF-8 conversion fails.
+pub(super) fn search_workspace(
+    cwd: &Utf8Path,
+    command: &str,
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying the workspace search implementation by moving platform-specific logic into separate modules and inlining the traversal loop into a single search function per platform.

You can simplify this without losing any behaviour by pushing the `#[cfg]`s to function/module boundaries and collapsing the two helper functions into a single traversal loop.

1) Remove unit-type parameters and internal `#[cfg]` branches

Right now you carry `env: ()` / `match_ctx: ()` / `ctx: ()` and have internal `#[cfg]` to select which call to make. You can keep the public API shape and the Windows behaviour while making each platform’s implementation straightforward.

For example:

```rust
// workspace/mod.rs
#[cfg(windows)]
mod workspace_windows;
#[cfg(not(windows))]
mod workspace_posix;

#[cfg(windows)]
pub(super) use workspace_windows::search_workspace;
#[cfg(not(windows))]
pub(super) use workspace_posix::search_workspace;
```

Then in `workspace_windows.rs`:

```rust
pub(super) fn search_workspace(
    cwd: &Utf8Path,
    command: &str,
    collect_all: bool,
    env: &EnvSnapshot,
) -> Result<Vec<Utf8PathBuf>, Error> {
    let match_ctx = prepare_workspace_match(command, env);

    let mut matches = Vec::new();
    for entry in WalkDir::new(cwd)
        .follow_links(false)
        .sort_by_file_name()
        .into_iter()
        .filter_entry(should_visit_entry)
    {
        let entry = match entry {
            Ok(e) => e,
            Err(err) => {
                tracing::debug!(%command, error = %err,
                    "skipping unreadable workspace entry during which fallback");
                continue;
            }
        };

        if let Some(path) = process_workspace_entry(entry, command, &match_ctx)? {
            matches.push(path);
            if !collect_all {
                break;
            }
        }
    }
    Ok(matches)
}
```

And in `workspace_posix.rs`:

```rust
pub(super) fn search_workspace(
    cwd: &Utf8Path,
    command: &str,
    collect_all: bool,
    _env: (),
) -> Result<Vec<Utf8PathBuf>, Error> {
    let mut matches = Vec::new();
    for entry in WalkDir::new(cwd)
        .follow_links(false)
        .sort_by_file_name()
        .into_iter()
        .filter_entry(should_visit_entry)
    {
        let entry = match entry {
            Ok(e) => e,
            Err(err) => {
                tracing::debug!(%command, error = %err,
                    "skipping unreadable workspace entry during which fallback");
                continue;
            }
        };

        if let Some(path) = process_workspace_entry(entry, command)? {
            matches.push(path);
            if !collect_all {
                break;
            }
        }
    }
    Ok(matches)
}
```

This keeps the external `search_workspace` API identical, but eliminates:

- `match_ctx: ()` and `ctx: ()` parameters,
- internal `#[cfg(windows)]` / `#[cfg(not(windows))]` call-site branches.

2) Merge `collect_workspace_matches` into `search_workspace`

`collect_workspace_matches` is effectively a thin loop around `process_workspace_entry`. Inlining it (as above) removes an indirection layer and makes traversal, early-break logic, and error handling visible in one place.

You can still keep a small `process_workspace_entry` per platform:

Windows:

```rust
fn process_workspace_entry(
    entry: walkdir::DirEntry,
    command: &str,
    ctx: &WorkspaceMatchContext,
) -> Result<Option<Utf8PathBuf>, Error> {
    if !entry.file_type().is_file() {
        return Ok(None);
    }
    if !workspace_entry_matches(&entry, ctx) {
        return Ok(None);
    }
    let path = entry.into_path();
    let utf8 = Utf8PathBuf::from_path_buf(path).map_err(|path_buf| {
        let lossy_path = path_buf.to_string_lossy();
        Error::new(
            ErrorKind::InvalidOperation,
            format!(
                "workspace path contains non-UTF-8 components while resolving command \
                 '{command}': {lossy_path}"
            ),
        )
    })?;
    Ok(is_executable(&utf8).then_some(utf8))
}
```

POSIX:

```rust
fn process_workspace_entry(
    entry: walkdir::DirEntry,
    command: &str,
) -> Result<Option<Utf8PathBuf>, Error> {
    if !entry.file_type().is_file() {
        return Ok(None);
    }
    if !workspace_entry_matches(&entry, command) {
        return Ok(None);
    }
    // same UTF-8 and is_executable logic as Windows
}
```

This preserves:

- filtering of unreadable entries,
- directory skipping via `should_visit_entry`,
- platform-specific filename matching (including PATHEXT),
- UTF-8 error reporting and `is_executable` checks,
- `collect_all` early-exit behaviour,

while significantly reducing the conditional-compilation plumbing and abstraction layers in the main control flow.
</issue_to_address>

### Comment 9
<location> `src/stdlib/which/lookup/mod.rs:82` </location>
<code_context>
+///
+/// Returns `netsuke::jinja::which::not_found` when no matching executable is
+/// discovered.
+pub(super) fn resolve_direct(
+    command: &str,
+    env: &EnvSnapshot,
</code_context>

<issue_to_address>
**issue (complexity):** Consider pushing the platform-specific cfg logic into small helper functions so the main lookup and resolve_direct flows remain clean and platform-agnostic.

You can reduce the cfg-driven structural complexity without changing behaviour by pushing more of the platform branching to the edges.

Two concrete, small changes:

1) Split `resolve_direct` into cfg-specific implementations

Right now `resolve_direct` does its own path normalization and then cfg-dispatches to two helpers. You can instead:

- Extract the shared “normalize to absolute-ish path” step into a tiny helper.
- Provide cfg-gated `resolve_direct` implementations that each call that helper and then do platform-specific logic.

That removes the cfg noise from the main call site and makes `resolve_direct` clearer.

Example:

```rust
fn normalize_direct_path(command: &str, env: &EnvSnapshot) -> Utf8PathBuf {
    let raw = Utf8Path::new(command);
    if raw.is_absolute() {
        raw.to_path_buf()
    } else {
        env.cwd.join(raw)
    }
}

#[cfg(windows)]
pub(super) fn resolve_direct(
    command: &str,
    env: &EnvSnapshot,
    options: &WhichOptions,
) -> Result<Vec<Utf8PathBuf>, Error> {
    let resolved = normalize_direct_path(command, env);
    let candidates = direct_candidates(&resolved, env);
    let mut matches = Vec::new();
    let _ = push_matches(&mut matches, candidates, options.all);
    if matches.is_empty() {
        return Err(direct_not_found(command, &resolved));
    }
    if options.canonical {
        canonicalise(matches)
    } else {
        Ok(matches)
    }
}

#[cfg(not(windows))]
pub(super) fn resolve_direct(
    command: &str,
    env: &EnvSnapshot,
    options: &WhichOptions,
) -> Result<Vec<Utf8PathBuf>, Error> {
    let resolved = normalize_direct_path(command, env);
    if !is_executable(&resolved) {
        return Err(direct_not_found(command, &resolved));
    }
    if options.canonical {
        canonicalise(vec![resolved.clone()])
    } else {
        Ok(vec![resolved])
    }
}
```

This keeps behaviour identical but removes the inner cfg branching and the extra `resolve_direct_*` layer.

2) Remove cfg from `lookup`’s main loop via a tiny helper

You can hide the `#[cfg]` in candidate generation behind a small function, so the search loop is platform-agnostic and easier to scan.

Example:

```rust
fn candidates_for_dir(env: &EnvSnapshot, dir: &Utf8Path, command: &str) -> Vec<Utf8PathBuf> {
    #[cfg(windows)]
    {
        let suffixes = env.pathext();
        env::candidate_paths(dir, command, suffixes)
    }
    #[cfg(not(windows))]
    {
        vec![dir.join(command)]
    }
}

pub(super) fn lookup(
    command: &str,
    env: &EnvSnapshot,
    options: &WhichOptions,
) -> Result<Vec<Utf8PathBuf>, Error> {
    if is_direct_path(command) {
        return resolve_direct(command, env, options);
    }

    let dirs = env.resolved_dirs(options.cwd_mode);
    let mut matches = Vec::new();

    for dir in &dirs {
        let candidates = candidates_for_dir(env, dir, command);
        if push_matches(&mut matches, candidates, options.all) {
            break;
        }
    }

    // ... rest unchanged ...
}
```

This keeps the same semantics (including `PATHEXT` use on Windows) but makes the core control flow free of cfg conditionals.

If you like this pattern, you can apply it similarly to `handle_miss` by adding a thin, cfg-gated wrapper around `search_workspace` so that `handle_miss` doesn’t need cfg in the middle of its logic.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/stdlib/which/lookup/workspace.rs
Comment thread tests/std_filter_tests/which_filter_tests.rs
Comment thread tests/std_filter_tests/which_filter_tests.rs
Comment thread tests/std_filter_tests/which_filter_tests.rs
Comment thread src/stdlib/which/lookup/tests.rs
Comment thread docs/netsuke-design.md
Comment thread src/stdlib/which/lookup/workspace.rs
Comment thread src/stdlib/which/lookup/mod.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant