Implement cross-platform which filter and helper for MiniJinja#237
Implement cross-platform which filter and helper for MiniJinja#237
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Summary by CodeRabbitRelease Notes
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdd a cross‑platform MiniJinja "which" filter/function with a fingerprinted LRU cache and environment snapshotting; implement filesystem lookup with PATHEXT/permission handling and workspace fallback; introduce StdlibConfig and register wiring; add test helpers, integration tests and Cucumber scenarios; update docs and add dependencies Changes
Sequence Diagram(s)sequenceDiagram
participant Template as Template
participant Jinja as MiniJinja which
participant Resolver as WhichResolver
participant Cache as LRU Cache
participant Env as EnvSnapshot
participant Lookup as lookup()
Template->>Jinja: which(command, kwargs)
Jinja->>Resolver: resolve(command, options)
Resolver->>Resolver: Build CacheKey(command, env_fp, cwd, options.cache_key_view)
Resolver->>Cache: try_cache(key)
alt Cache hit
Cache-->>Resolver: matches
else Cache miss
Resolver->>Env: capture(cwd_override)
Env-->>Resolver: EnvSnapshot
Resolver->>Lookup: lookup(command, env, options)
Lookup-->>Resolver: matches / Error
Resolver->>Cache: store(key, matches) if success
end
Resolver-->>Jinja: matches / Error
Jinja->>Jinja: format_path_for_output + render (single/list)
Jinja-->>Template: rendered output
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.rs⚙️ CodeRabbit configuration file
Files:
🧬 Code graph analysis (1)src/stdlib/which/lookup/workspace.rs (2)
🔍 Remote MCP RefRelevant facts for reviewing the which cache and workspace traversal
Concrete review checks to run (quick checklist)
Tools/sources used
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
🔇 Additional comments (6)
Warning Review ran into problems🔥 ProblemsErrors were encountered while retrieving linked issues. Errors (1)
Comment |
Reviewer's GuideAdds a cross-platform MiniJinja stdlib Class diagram for the new which resolver and stdlib configurationclassDiagram
class StdlibConfig {
+workspace_root: Arc_Dir
+workspace_root_path: Option_Utf8PathBuf
+fetch_cache_relative: Utf8PathBuf
+network_policy: NetworkPolicy
+fetch_max_response_bytes: u64
+command_max_output_bytes: u64
+command_max_stream_bytes: u64
+new(workspace_root: Dir) Result_StdlibConfig
+with_workspace_root_path(path: Utf8Path) Result_StdlibConfig
+with_fetch_cache_relative(relative_path: Utf8Path) Result_StdlibConfig
+with_network_policy(policy: NetworkPolicy) StdlibConfig
+with_fetch_max_response_bytes(max_bytes: u64) Result_StdlibConfig
+with_command_max_output_bytes(max_bytes: u64) Result_StdlibConfig
+with_command_max_stream_bytes(max_bytes: u64) Result_StdlibConfig
+fetch_cache_relative() Utf8Path
+into_components() (NetworkConfig, CommandConfig)
+workspace_root_path() Option_Utf8Path
+validate_cache_relative(relative: Utf8Path) Result_unit
}
class NetworkConfig {
+cache_root: Arc_Dir
+cache_relative: Utf8PathBuf
+policy: NetworkPolicy
+max_response_bytes: u64
}
class CommandConfig {
+max_capture_bytes: u64
+max_stream_bytes: u64
+new(max_capture: u64, max_stream: u64, root: Arc_Dir, workspace_root_path: Option_Arc_Utf8PathBuf)
}
class StdlibState {
+impure: Arc_AtomicBool
+impure_flag() Arc_AtomicBool
}
class RegisterModule {
+register(env: Environment) Result_StdlibState
+register_with_config(env: Environment, config: StdlibConfig) Result_StdlibState
+value_from_bytes(bytes: Vec_u8) Value
}
class WhichModule {
+register(env: Environment, cwd_override: Option_Arc_Utf8PathBuf) Result_unit
+resolve_with(resolver: WhichResolver, command: Value, kwargs: Kwargs) Result_Value
+render_value(matches: Vec_Utf8PathBuf, options: WhichOptions) Value
+format_path_for_output(path: Utf8Path) String
}
class WhichResolver {
-cache: Arc_Mutex_LruCache_CacheKey_CacheEntry
-cwd_override: Option_Arc_Utf8PathBuf
+new(cwd_override: Option_Arc_Utf8PathBuf) Result_WhichResolver
+resolve(command: &str, options: WhichOptions) Result_Vec_Utf8PathBuf
-try_cache(key: CacheKey) Option_Vec_Utf8PathBuf
-store(key: CacheKey, matches: Vec_Utf8PathBuf)
-lock_cache() MutexGuard_LruCache_CacheKey_CacheEntry
}
class EnvSnapshot {
+cwd: Utf8PathBuf
+raw_path: Option_OsString
+raw_pathext: Option_OsString
+entries: Vec_PathEntry
+pathext: Vec_String
+capture(cwd_override: Option_Utf8Path) Result_EnvSnapshot
+resolved_dirs(mode: CwdMode) Vec_Utf8PathBuf
+pathext() Vec_String
}
class WhichOptions {
+all: bool
+canonical: bool
+fresh: bool
+cwd_mode: CwdMode
+from_kwargs(kwargs: Kwargs) Result_WhichOptions
+cache_key_view() WhichOptions
}
class CwdMode {
<<enumeration>>
Auto
Always
Never
+parse(value: &str) Option_CwdMode
}
class LookupModule {
+lookup(command: &str, env: EnvSnapshot, options: WhichOptions) Result_Vec_Utf8PathBuf
+resolve_direct(command: &str, env: EnvSnapshot, options: WhichOptions) Result_Vec_Utf8PathBuf
+push_matches(matches: Vec_Utf8PathBuf, candidates: Vec_Utf8PathBuf, collect_all: bool) bool
+is_direct_path(command: &str) bool
+is_executable(path: Utf8Path) bool
+canonicalise(paths: Vec_Utf8PathBuf) Result_Vec_Utf8PathBuf
}
class WorkspaceSearchModule {
+search_workspace(cwd: Utf8Path, command: &str, collect_all: bool, env_or_unit) Result_Vec_Utf8PathBuf
}
class CacheKey {
+command: String
+env_fingerprint: u64
+cwd: Utf8PathBuf
+options: WhichOptions
+new(command: &str, env: EnvSnapshot, options: WhichOptions) CacheKey
}
class CacheEntry {
+matches: Vec_Utf8PathBuf
}
class ErrorModule {
+not_found_error(command: &str, dirs: Vec_Utf8PathBuf, mode: CwdMode) Error
+direct_not_found(command: &str, path: Utf8Path) Error
+args_error(message: Display) Error
}
StdlibConfig --> NetworkConfig : builds
StdlibConfig --> CommandConfig : builds
RegisterModule --> StdlibConfig : uses
RegisterModule --> StdlibState : returns
RegisterModule --> WhichModule : calls_register
RegisterModule --> NetworkConfig : passes_to_network
RegisterModule --> CommandConfig : passes_to_command
WhichModule --> WhichResolver : owns
WhichModule --> WhichOptions : parses
WhichModule --> ErrorModule : args_error
WhichResolver --> EnvSnapshot : capture
WhichResolver --> LookupModule : lookup
WhichResolver o--> CacheKey : key_for_cache
WhichResolver o--> CacheEntry : stores
EnvSnapshot --> CwdMode : used_in_resolved_dirs
WhichOptions --> CwdMode : contains
LookupModule --> WorkspaceSearchModule : workspace_fallback
LookupModule --> ErrorModule : not_found
CacheKey --> WhichOptions : cache_key_view
ErrorModule --> CwdMode : hint_for_mode
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
…ve executables - Implemented a deterministic executable discovery filter and function 'which' in MiniJinja - Supports options: all matches, canonical paths, fresh cache bypass, and cwd_mode - Added an LRU cache with self-healing for stale entries - Emits actionable diagnostics with PATH previews for troubleshooting - Fully covers Windows and POSIX path behavior with extensive unit and integration tests - Updated documentation, roadmap, and user guides with 'which' usage and semantics - Integrated into stdlib module and registered in MiniJinja environment This feature allows templates to resolve executables from PATH safely and efficiently, improving script flexibility and debugging. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
291212e to
ef74422
Compare
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix: Comment on file //! Cross-platform `which` filter and helper function for `MiniJinja`.
❌ New issue: Low Cohesion |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix: tests/std_filter_tests/which_filter_tests.rs Comment on lines +95 to +111 fn which_filter_returns_first_match() -> Result<()> {
let (_temp, root) = support::filter_workspace()?;
let first = root.join("bin_first");
let second = root.join("bin_second");
std::fs::create_dir_all(first.as_std_path())?;
std::fs::create_dir_all(second.as_std_path())?;
write_tool(&first, "helper")?;
write_tool(&second, "helper")?;
let _path = PathEnv::new(&[first.clone(), second.clone()])?;
let (mut env, mut state) = fallible::stdlib_env_with_state()?;
state.reset_impure();
let output = render(&mut env, "{{ 'helper' | which }}")?;
assert_eq!(output, first.join(Utf8Path::new(&tool_name("helper"))).as_str());
assert!(!state.is_impure());
drop(temp);
Ok(())
}❌ New issue: Code Duplication |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Removed the monolithic src/stdlib/which.rs file and reorganized the codebase into several focused modules: - cache.rs: Cache management for resolved executable paths - env.rs: Environment snapshot and PATH parsing - error.rs: Error generation utilities - lookup.rs: Lookup resolver logic - mod.rs: Module facade and filter/function registration - options.rs: Options parsing and management This restructuring improves code maintainability and clarity without changing external behavior. Added corresponding tests to verify functionality remains consistent. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix: tests/std_filter_tests/which_filter_tests.rs Comment on file use anyhow::{Context, Result, anyhow};❌ New issue: String Heavy Function Arguments Use NewTypes to model domain values and eliminate "string soup". Reach for |
This comment was marked as resolved.
This comment was marked as resolved.
…tests - Migrate behavioral tests from cucumber to rstest-bdd - Introduce new fixtures and modules for scenario handling - Refactor CliWorld into smaller state holders with mutex guards - Improve guarantees on resource cleanup via RAII-style fixtures - Update documentation and user guide to reflect new BDD framework - Modify which filter tests to use new fixture helpers This change consolidates behavior tests to run via the standard cargo test harness, simplifying CI and enabling reuse of fixtures and lint configs. Maintains cucumber in parallel during migration to ensure feature parity. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
…plate newtypes - Introduce ToolName, DirName, and Template structs in which filter tests for stronger typing and clearer intent - Update all related test functions to use these newtypes instead of raw strings - Adjust helper functions accordingly to accept newtypes - Minor code style cleanup in which/env.rs and which/lookup.rs imports Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Replaced the manual Default trait implementation for the CwdMode enum with the #[derive(Default)] attribute, specifying Auto as the default variant. This simplifies the code and leverages Rust's built-in derive functionality. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Changed multiple 'And' steps to 'Given' in the stdlib.feature test scenarios to improve clarity and adherence to Gherkin syntax for preconditions. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
…238) * feat(stdlib/which): add workspace fallback to 'which' command lookup Enhance the 'which' command resolution logic to perform a recursive search of the workspace directory if the PATH environment variable is empty. This ensures that commands can be located within the current workspace when the standard PATH lookup fails. - Added optional CWD override support in WhichResolver and EnvSnapshot to influence environment capturing. - Integrated 'walkdir' crate for recursive directory traversal. - Implemented search_workspace function to find executables recursively in workspace. - Modified lookup behavior to try workspace search upon PATH miss when appropriate. - Updated stdlib registration to pass current workspace root as CWD override. - Improved error handling and tests accordingly. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> * fix(tests): remove run_and_exit calls in cucumber tests Changed .run_and_exit() to .run() in cucumber test execution to prevent premature process exit, enabling all tests to run as intended. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> * refactor(stdlib_assert): extract assertion helpers for stdlib output - Added a new module `stdlib_assert` in `test_support/src` that provides helper functions for asserting stdlib rendering outputs. - Replaced ad-hoc stdlib output/error checking in `stdlib_steps` with calls to this new helper, improving code reuse and clarity. - Included unit tests for the new assertion helpers to ensure correct behavior. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> * feat(which): add workspace fallback and caching to which resolver - Implement fallback search in workspace directory when PATH is empty - Skip heavy directories like .git and target during workspace search - Add small LRU cache for which resolver keyed by command, cwd, PATH, etc. - Improve error handling and debug logs for unreadable workspace entries - Enhance which filter tests to cover workspace fallback and skipping heavy dirs - Add detailed documentation diagrams for which resolver flow and configuration wiring - Introduce comprehensive tests for workspace search functionality This enhancement improves command resolution robustness in empty PATH environments by searching the workspace, optimizing with caching, and clarifying behavior through documentation. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> --------- Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (8)
src/stdlib/register.rs (2)
1-8: Module docs now enumerate the wired helpers.The documentation mentions file tests, path helpers, collections, time functions, network fetch, command wrappers, and the
whichfilter/function, addressing the previous concern about listing all registered helpers.
110-136: Device-oriented tests correctly excluded on non-Unix platforms.The use of
cfg-gatedFILE_TESTSarrays ensures that device-related predicates are only registered on Unix, resolving the concern about misleading API semantics on other platforms. This approach is cleaner than gating the registration loop.src/stdlib/mod.rs (1)
1-8: Module docs comprehensively list the stdlib surface.The documentation now explicitly mentions the
whichhelper, time helpers, and all re-exported configuration and registration entrypoints, addressing the previous concern about keeping docs in sync with the expanded surface.src/stdlib/config.rs (5)
32-55: Constructor docs clarify invariant checking.The documentation now explicitly states that validation failure "indicates a programming error in the baked-in constant rather than a runtime condition," addressing the previous concern about the panic contract.
64-69: Builder correctly returns error rather than panicking.The use of
ensure!to validate that the path is absolute addresses the previous concern about preferring explicit errors over assertions.
71-125: Builder methods use consistent validation style.The unqualified
ensure!andbail!macros are used consistently, andwith_fetch_cache_relativeacceptsimpl AsRef<Utf8Path>, addressing the previous concerns about API coherence.
195-216: Default implementation documents panic conditions.The documentation explicitly enumerates the three panic scenarios and directs users to
StdlibConfig::newfor fallible construction, addressing the previous concern about documenting the panic contract.
231-332: Tests use rstest fixtures to reduce duplication.The test module employs
#[fixture]for shared setup (workspace,base_config) and#[rstest]with#[case]for parameterised validation tests, addressing the previous concern about test repetition.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
src/manifest/mod.rs(2 hunks)src/stdlib/config.rs(1 hunks)src/stdlib/mod.rs(1 hunks)src/stdlib/register.rs(1 hunks)src/stdlib/which/lookup/mod.rs(1 hunks)tests/std_filter_tests/network_functions.rs(1 hunks)tests/std_filter_tests/which_filter_tests.rs(1 hunks)tests/steps/stdlib_steps/rendering.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
tests/steps/stdlib_steps/rendering.rssrc/stdlib/register.rstests/std_filter_tests/network_functions.rstests/std_filter_tests/which_filter_tests.rssrc/stdlib/config.rssrc/stdlib/which/lookup/mod.rssrc/manifest/mod.rssrc/stdlib/mod.rs
🧬 Code graph analysis (8)
tests/steps/stdlib_steps/rendering.rs (2)
src/stdlib/config.rs (2)
new(41-55)workspace(240-248)src/stdlib/register.rs (1)
register_with_config(81-99)
src/stdlib/register.rs (2)
src/stdlib/config.rs (2)
new(41-55)default(204-215)src/stdlib/path/fs_utils.rs (1)
file_type_matches(36-49)
tests/std_filter_tests/network_functions.rs (1)
src/stdlib/config.rs (2)
new(41-55)workspace(240-248)
tests/std_filter_tests/which_filter_tests.rs (1)
test_support/src/env_lock.rs (1)
acquire(17-22)
src/stdlib/config.rs (2)
src/stdlib/network/mod.rs (1)
policy(360-360)src/cli_policy.rs (1)
network_policy(47-65)
src/stdlib/which/lookup/mod.rs (2)
src/stdlib/which/error.rs (2)
direct_not_found(23-30)not_found_error(10-21)src/stdlib/which/env.rs (1)
candidate_paths(148-165)
src/manifest/mod.rs (2)
src/stdlib/register.rs (1)
register_with_config(81-99)src/stdlib/config.rs (1)
new(41-55)
src/stdlib/mod.rs (2)
src/stdlib/register.rs (3)
register(43-55)register_with_config(81-99)value_from_bytes(103-108)src/stdlib/which/mod.rs (1)
register(27-51)
🔍 Remote MCP Ref
Summary of additional facts relevant to reviewing this PR
-
lru 0.16
- LruCache is an in-memory LRU map (not thread-safe). Common APIs: LruCache::with_capacity, get, put, len, clear. Wrapping in Arc<Mutex<...>> is the expected approach for concurrent access (matches the PR's Arc<Mutex<LruCache<...>>> usage).
-
walkdir 2.5
- WalkDir provides recursive directory traversal with combinators such as filter_entry, max_depth, follow_links, and an iterator that yields Result<DirEntry, Error>. Typical robust usage handles unreadable entries by filtering or ignoring iterator errors (e.g., into_iter().filter_map(|r| r.ok()) or inspecting DirEntry Results). filter_entry is the correct tool to skip heavy directories (e.g., "target") without descending. These behaviors are relevant to the PR's workspace fallback/search implementation.
Tools used
- Ref_ref_search_documentation (search for lru and walkdir docs)
- Ref_ref_read_url (lru docs page)
- Ref_ref_read_url (walkdir docs page)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: release / build-macos (aarch64-apple-darwin, macos-14, macos-arm64, macos-aarch64) / Build macos artefacts
- GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
- GitHub Check: release / build-linux (aarch64-unknown-linux-gnu, linux-arm64, arm64) / Build linux artefacts
- GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
- GitHub Check: release / build-linux (x86_64-unknown-linux-gnu, linux-amd64, amd64) / Build linux artefacts
- GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
- GitHub Check: Sourcery review
- GitHub Check: build-test (ubuntu-latest, stable)
🔇 Additional comments (18)
tests/std_filter_tests/which_filter_tests.rs (6)
13-63: Excellent use of NewType wrappers.The ToolName, DirName, and Template newtypes provide strong typing and eliminate ambiguous string arguments in test signatures, exactly as outlined in the PR objectives. The implementations are clean and idiomatic.
65-91: LGTM – Proper environment isolation.PathEnv correctly wraps environment mutations with EnvLock and VarGuard, adhering to the mandatory pattern for test-time env access. The RAII approach ensures cleanup even on panic.
93-147: LGTM – Helper functions are well-structured.The helper functions are appropriately scoped, use proper error propagation with context, and handle platform-specific behaviour cleanly with conditional compilation.
149-185: LGTM – Fixture eliminates setup duplication.WhichTestFixture successfully encapsulates the common test setup pattern (workspace creation, tool installation, PATH configuration), addressing PR objective #2. The structure is clear and maintainable.
306-325: Verify whetherenv::set_current_dirrequires unsafe in Rust 2024.The coding guidelines mandate that environment mutations (
env::set_var,env::remove_var) are always unsafe in Rust 2024 edition and must be marked as such. Whilstset_current_dirmutates global process state, confirm whether it also requires unsafe marking or whether it's exempt from this requirement.If
set_current_diris unsafe in Rust 2024, wrap lines 312 and 321 inunsafeblocks with a comment justifying the call, or refactor the test to use dependency injection with themockablecrate (preferred per guidelines).Run the following script to check the Rust 2024 edition documentation for
env::set_current_dir:#!/bin/bash # Description: Check if env::set_current_dir is unsafe in Rust 2024 edition # Check the Rust standard library documentation for the function signature rg -A 10 'pub fn set_current_dir' --type rust # Also check for any unsafe markers in env module rg -C 5 'unsafe.*set_current_dir' --type rust
254-303: LGTM – Edge cases well covered.The tests for cwd_mode, missing commands, workspace fallback, and heavy-directory skipping provide thorough coverage of the which filter's behaviour and error handling.
tests/std_filter_tests/network_functions.rs (1)
25-35: LGTM: Error propagation correctly implemented.The fallible config construction pattern is properly applied with
?operators for bothStdlibConfig::newandwith_workspace_root_path.src/manifest/mod.rs (2)
109-112: LGTM: Correct direct Result propagation.The removal of
Ok(...)wrapping is correct—register_with_configreturnsResult<StdlibState>, which matches the binding's expected type.
236-238: LGTM: Builder chain correctly propagates errors.The fallible builder methods are properly chained with
?, whilewith_network_policy(which returnsSelf) continues the chain without error propagation.tests/steps/stdlib_steps/rendering.rs (1)
12-55: LGTM: Correct error handling throughout the config builder chain.Both the fallible config construction (lines 21, 40) and the subsequent builder methods correctly propagate errors where needed.
src/stdlib/register.rs (3)
43-55: LGTM: Robust error handling with clear diagnostics.The function correctly propagates errors from ambient directory access, cwd resolution, and UTF-8 validation, providing clear error messages at each step.
81-99: LGTM: Systematic stdlib component registration.The registration orchestrates all stdlib modules correctly, with proper error propagation from the fallible
which::registercall.
138-166: LGTM: File type predicates correctly platform-gated.The Unix-specific predicates (FIFO, block device, char device) are properly gated with
#[cfg(unix)], ensuring they're only compiled on appropriate platforms.src/stdlib/mod.rs (1)
10-27: LGTM: Clean module organization with appropriate re-exports.The new module structure (config, register, which) is well-factored, and the re-exports maintain a discoverable public API surface.
src/stdlib/config.rs (3)
1-30: LGTM: Clear module structure with well-documented constants.The public constants and
StdlibConfigfields provide a clean configuration surface for stdlib helpers.
127-193: LGTM: Clean accessor and validation logic.The
validate_cache_relativemethod correctly rejects empty, absolute, and escaping paths, andinto_componentsproperly wires configuration into the network and command modules.
218-229: LGTM: NetworkConfig appropriately structures network module configuration.The fields clearly capture the necessary parameters for network cache initialisation.
src/stdlib/which/lookup/mod.rs (1)
1-402: Excellent work addressing all past review comments.The file length is now compliant (tests moved to a separate file), all
pub(super)functions have comprehensive documentation, and thesearch_workspacefunction has been refactored with focused helper functions (should_visit_entry,process_workspace_entry,workspace_entry_matches), bringing cognitive complexity down to approximately 5—well under the guideline maximum of 9.
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix: src/stdlib/which/lookup/mod.rs Comment on lines +225 to +269 fn search_workspace(
cwd: &Utf8Path,
command: &str,
collect_all: bool,
env: &EnvSnapshot,
) -> Result<Vec<Utf8PathBuf>, Error> {
let mut matches = Vec::new();
#[cfg(not(windows))]
let _ = env;
#[cfg(windows)]
let match_ctx = prepare_workspace_match(command, env);
#[cfg(not(windows))]
let match_ctx = ();
let walker = WalkDir::new(cwd)
.follow_links(false)
.sort_by_file_name()
.into_iter()
.filter_entry(should_visit_entry);
for walk_entry in walker {
let entry = match walk_entry {
Ok(value) => value,
Err(err) => {
tracing::debug!(
%command,
error = %err,
"skipping unreadable workspace entry during which fallback"
);
continue;
}
};
#[cfg(windows)]
let maybe_match = process_workspace_entry(entry, command, &match_ctx)?;
#[cfg(not(windows))]
let maybe_match = process_workspace_entry(entry, command, match_ctx)?;
if let Some(path) = maybe_match {
matches.push(path);
if !collect_all {
break;
}
}
}
Ok(matches)
}❌ New issue: Bumpy Road Ahead |
This comment was marked as resolved.
This comment was marked as resolved.
… params Changed the parameters of the workspace_entry_matches function to conditionally compile the 'command' argument for non-Windows targets and prefix it with underscore on Windows, similarly for 'ctx' argument. This improves platform-specific parameter usage and compilation cleanliness. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/stdlib/which/lookup/mod.rs (1)
229-233: Adopt cfg'd parameter types for consistency.The function silences the unused
envparameter withlet _ = env;on line 233, whilstprocess_workspace_entry(lines 284-285) andworkspace_entry_matches(lines 312-315) use cfg-gated parameter types. Apply the same pattern here for consistency.Apply this diff to align with the established pattern:
fn search_workspace( cwd: &Utf8Path, command: &str, collect_all: bool, - env: &EnvSnapshot, + #[cfg(windows)] env: &EnvSnapshot, + #[cfg(not(windows))] _env: (), ) -> Result<Vec<Utf8PathBuf>, Error> { let mut matches = Vec::new(); - #[cfg(not(windows))] - let _ = env; #[cfg(windows)] let match_ctx = prepare_workspace_match(command, env);Then update the call site in
handle_miss(line 210):- let discovered = search_workspace(&env.cwd, command, options.all, env)?; + #[cfg(windows)] + let discovered = search_workspace(&env.cwd, command, options.all, env)?; + #[cfg(not(windows))] + let discovered = search_workspace(&env.cwd, command, options.all, ())?;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
src/stdlib/which/lookup/mod.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
src/stdlib/which/lookup/mod.rs
🧬 Code graph analysis (1)
src/stdlib/which/lookup/mod.rs (2)
src/stdlib/which/error.rs (2)
direct_not_found(23-30)not_found_error(10-21)src/stdlib/which/env.rs (1)
candidate_paths(148-165)
🔍 Remote MCP Ref
Summary of additional facts relevant to reviewing this PR
-
LRU cache semantics and threading: lru 0.16 provides LruCache with constructors like LruCache::new or with_capacity and methods get/put/len/clear. The crate is not thread-safe; wrapping LruCache in Arc<Mutex<...>> (as done in WhichResolver) is the standard way to enable concurrent access. Source: Ref::Ref_ref_read_url (lru docs).
-
WalkDir traversal patterns and error handling: walkdir 2.5 provides WalkDir with filter_entry, max_depth, follow_links, and yields Iterator<Item = Result<DirEntry, Error>>. Robust traversal commonly handles unreadable entries by filtering iterator errors (e.g., filter_map(|r| r.ok())) or using filter_entry to skip directories like "target". This supports the PR's workspace search implementation and suggests validating handling of iterator errors and use of filter_entry to avoid descending heavy directories. Source: Ref::Ref_ref_read_url (walkdir docs).
Tools used:
- Ref::Ref_ref_search_documentation (searched lru and walkdir docs)
- Ref::Ref_ref_read_url (lru docs)
- Ref::Ref_ref_read_url (walkdir docs)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: release / build-linux (aarch64-unknown-linux-gnu, linux-arm64, arm64) / Build linux artefacts
- GitHub Check: release / build-macos (aarch64-apple-darwin, macos-14, macos-arm64, macos-aarch64) / Build macos artefacts
- GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
- GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
- GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
- GitHub Check: build-test (ubuntu-latest, stable)
- GitHub Check: Sourcery review
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix: Bumpy Road Aheadsrc/stdlib/which/lookup/mod.rs: search_workspace What lead to degradation?search_workspace has 2 blocks with nested conditional logic. Any nesting of 2 or deeper is considered. Threshold is 2 blocks per function Why does this problem occur?A Bumpy Road is a function that contains multiple chunks of nested conditional logic inside the same function. The deeper the nesting and the more bumps, the lower the code health. How to fix it?Bumpy Road implementations indicate a lack of encapsulation. Check out the detailed description of the Bumpy Road code health issue. |
This comment was marked as resolved.
This comment was marked as resolved.
Extract the workspace fallback search logic from mod.rs into a dedicated workspace.rs module. This change organizes code better by separating concerns, improves code readability and maintainability without altering functionality. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
… error handling - Extracted workspace match collection into a new function `collect_workspace_matches` for clarity. - Enhanced error handling during directory entry iteration with improved tracing. - Removed platform-dependent conditional compilation from iterator construction, centralizing logic. - Maintained functionality to collect all matches or stop after first match. - Cleaned up code structure to simplify maintenance and readability. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
src/stdlib/which/lookup/mod.rs(1 hunks)src/stdlib/which/lookup/workspace.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
src/stdlib/which/lookup/workspace.rssrc/stdlib/which/lookup/mod.rs
🧬 Code graph analysis (2)
src/stdlib/which/lookup/workspace.rs (2)
src/stdlib/which/lookup/mod.rs (1)
is_executable(195-198)src/stdlib/which/env.rs (1)
candidate_paths(148-165)
src/stdlib/which/lookup/mod.rs (3)
src/stdlib/which/error.rs (2)
direct_not_found(23-30)not_found_error(10-21)src/stdlib/which/lookup/workspace.rs (1)
search_workspace(14-61)src/stdlib/which/env.rs (1)
candidate_paths(148-165)
🪛 GitHub Actions: CI
src/stdlib/which/lookup/workspace.rs
[error] 41-62: cargo fmt --all -- --check detected formatting diffs. Run 'cargo fmt' (or 'cargo fmt --all') to format the code.
🪛 GitHub Check: build-test (ubuntu-latest, 1.89.0)
src/stdlib/which/lookup/workspace.rs
[warning] 41-41:
Diff in /home/runner/work/netsuke/netsuke/src/stdlib/which/lookup/workspace.rs
🪛 GitHub Check: build-test (ubuntu-latest, stable)
src/stdlib/which/lookup/workspace.rs
[warning] 41-41:
Diff in /home/runner/work/netsuke/netsuke/src/stdlib/which/lookup/workspace.rs
🔍 Remote MCP Ref
Relevant concrete facts to aid review
-
lru crate (0.16)
- LruCache is not Sync/Send by itself; typical concurrent usage wraps it in Arc<Mutex<LruCache<..>>> — matches the PR’s Arc<Mutex<...>> choice for WhichResolver.
- Primary API: constructors like LruCache::new / LruCache::with_capacity, and operations get (promotes to recent), put/put_or_get (insert/replace), len, pop_lru, clear. Capacity is fixed; inserting past capacity evicts oldest entries.
- Implication: locking granularity (Mutex per cache) and eviction semantics are important review points (contention, eviction policy, cloning of keys/values on access).
- Source: lru docs.
-
walkdir crate (2.5)
- WalkDir yields iterator of Result<DirEntry, Error>; unreadable entries produce Err items that the caller must handle (e.g., filter_map(|r| r.ok()) to skip).
- WalkDir supports filter_entry to skip descending into directories (useful to exclude WORKSPACE_SKIP_DIRS like .git and target) and max_depth/follow_links options to control traversal cost.
- Implication: workspace search should explicitly handle iterator Err values and prefer filter_entry to avoid expensive descent; current module uses WalkDir — verify it handles DirEntry errors and uses filter_entry to skip heavy dirs.
- Source: walkdir docs.
Tools used
- Documentation search for lru and walkdir
- Read lru API page
- Read walkdir API page
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: release / build-macos (aarch64-apple-darwin, macos-14, macos-arm64, macos-aarch64) / Build macos artefacts
- GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
- GitHub Check: release / build-linux (aarch64-unknown-linux-gnu, linux-arm64, arm64) / Build linux artefacts
- GitHub Check: release / build-linux (x86_64-unknown-linux-gnu, linux-amd64, amd64) / Build linux artefacts
- GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
- GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
- GitHub Check: Sourcery review
🔇 Additional comments (1)
src/stdlib/which/lookup/mod.rs (1)
1-261: Approve the coordinating module implementation.This file correctly wires together the workspace fallback, PATH resolution, and platform-specific executable detection logic. Past review concerns (missing documentation, excessive cognitive complexity, file length) have been addressed by extracting workspace logic to a separate module and adding comprehensive doc comments. The cfg-gated platform handling is sound, error propagation is correct, and the public API is well-documented.
There was a problem hiding this comment.
Actionable comments posted: 6
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
src/stdlib/which/lookup/workspace.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
src/stdlib/which/lookup/workspace.rs
🧬 Code graph analysis (1)
src/stdlib/which/lookup/workspace.rs (2)
src/stdlib/which/lookup/mod.rs (1)
is_executable(195-198)src/stdlib/which/env.rs (1)
candidate_paths(148-165)
🔍 Remote MCP Ref
Relevant facts to aid review
-
lru 0.16 behavior and concurrency implications:
- LruCache is not thread-safe by itself; common pattern is Arc<Mutex<LruCache<..>>> (matches PR's choice). API includes LruCache::with_capacity, get (which promotes an entry), put (insert/replace), pop_lru (evict oldest). Eviction occurs when inserting past capacity (oldest entry removed). Review points: lock contention on Mutex, whether get/put cloning semantics are acceptable, and eviction interactions with cache_key composition.
-
walkdir 2.5 traversal semantics relevant to workspace search:
- WalkDir yields an iterator of Result<DirEntry, Error>; unreadable entries appear as Err and must be handled (e.g., filter_map(|r| r.ok())). WalkDir supports filter_entry to avoid descending into skipped directories (useful to exclude heavy dirs like .git or target). Review points: ensure the PR handles Err items from the iterator, uses filter_entry to avoid expensive traversal, and correctly skips/filters entries to bound cost.
Sources consulted
- lru crate docs via Ref_ref_read_url.
- walkdir crate docs via Ref_ref_read_url.
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
…ific matching - Implement workspace traversal to find executables respecting VCS and build directories. - Introduce platform-specific matching contexts to handle case sensitivity on Windows vs Unix. - Add skip logic for heavy directories like .git and target to keep scans fast. - Enhance tests to cover workspace filtering, PATH resolution, and diagnostic output. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
- Introduce conditional compilation to enable workspace search on non-Windows platforms by passing unit () instead of EnvSnapshot. - Refactor workspace search to support exact case-sensitive matching on Unix-like systems and case-insensitive PATHEXT-aware matching on Windows. - Enhance tests to run workspace search on both Windows and non-Windows environments. - Document and encapsulate platform-specific filename matching logic into separate functions to unify and clarify behavior. - This enables improved command lookup support across platforms by searching the workspace with appropriate matching logic. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Reordered the imports of env and EnvSnapshot under the Windows cfg to maintain consistent style and grouping. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
- Documented the `search_workspace` function detailing parameters and behavior. - Added doc comments to `collect_workspace_matches` explaining its parameters and return values. - Improved clarity on platform-specific matching and error handling in comments. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- The
whichresolver cache uses a hard-codedCACHE_CAPACITYof 64; consider plumbing this throughStdlibConfigso callers with very large or very small PATHs can tune cache behaviour explicitly. - The workspace fallback in
which::lookup::workspace::search_workspaceonly skips.gitandtarget; you may want to extend or make this skip list configurable to avoid expensive scans in common heavy directories likenode_modules,.idea, or.vscode.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `which` resolver cache uses a hard-coded `CACHE_CAPACITY` of 64; consider plumbing this through `StdlibConfig` so callers with very large or very small PATHs can tune cache behaviour explicitly.
- The workspace fallback in `which::lookup::workspace::search_workspace` only skips `.git` and `target`; you may want to extend or make this skip list configurable to avoid expensive scans in common heavy directories like `node_modules`, `.idea`, or `.vscode`.
## Individual Comments
### Comment 1
<location> `src/stdlib/register.rs:121` </location>
<code_context>
- ft.is_fifo()
-}
-
-#[cfg(not(unix))]
-fn is_fifo(_ft: fs::FileType) -> bool {
- false
</code_context>
<issue_to_address>
**issue (bug_risk):** Dropping non-Unix file tests changes behavior compared to previous always-false implementations.
On non-Unix, these tests (pipe, block_device, char_device, device) used to exist but always returned false; now they aren’t registered at all, so templates that reference them will fail at render time instead of evaluating to false. If that behavior change isn’t intended, please keep these tests defined on non-Unix and continue returning false to preserve compatibility.
</issue_to_address>
### Comment 2
<location> `src/stdlib/which/lookup/workspace.rs:41-46` </location>
<code_context>
+ #[cfg(not(windows))]
+ let match_ctx = ();
+
+ let entries = WalkDir::new(cwd)
+ .follow_links(false)
+ .sort_by_file_name()
+ .into_iter()
+ .filter_entry(should_visit_entry)
+ .filter_map(|walk_entry| {
+ walk_entry
+ .map_err(|err| {
</code_context>
<issue_to_address>
**suggestion (performance):** Unbounded workspace traversal when PATH is empty could be expensive on large trees.
The fallback in `search_workspace` currently walks the entire workspace from `cwd`, only skipping a few directories via `should_visit_entry`. On very large workspaces, this can be quite expensive when PATH is empty or misconfigured. Consider limiting traversal depth, expanding the skip list (e.g., `node_modules`, `dist`), or making this fallback opt-in to avoid unexpected latency.
Suggested implementation:
```rust
// Limit traversal depth to avoid expensive walks on very large workspaces when PATH is empty
// or misconfigured. This keeps the fallback from degenerating into a full-tree scan.
let entries = WalkDir::new(cwd)
.follow_links(false)
.max_depth(6)
.sort_by_file_name()
.into_iter()
.filter_entry(should_visit_entry)
```
To fully address the review comment, you should also:
1. Update `should_visit_entry` (in the same file or its module) to skip additional high-churn/large directories such as `node_modules`, `dist`, `build`, `.git`, `target`, etc.
2. Consider making the workspace-fallback search opt-in or configurable (e.g., via an environment variable or config flag), and gate this entire fallback branch (the `WalkDir` traversal) behind that flag to avoid surprising latency for users who do not expect a full workspace scan.
3. Optionally, log when the depth limit is hit or when the fallback is used (e.g., `tracing::debug!(%command, "using workspace which fallback with depth limit");`) so users can diagnose unexpected latency or missed matches.
</issue_to_address>
### Comment 3
<location> `tests/std_filter_tests/which_filter_tests.rs:191` </location>
<code_context>
+}
+
+#[rstest]
+fn which_filter_returns_first_match() -> Result<()> {
+ let mut fixture = WhichTestFixture::with_tool_in_dirs(
+ &ToolName::from("helper"),
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests to exercise the which resolver cache and `fresh` option semantics.
These tests cover basic lookup but not the caching semantics described in the PR (LRU cache keyed by command, PATH, PATHEXT, cwd, options, plus `fresh` bypass). Please add tests that:
- call `which` twice for the same command, then remove the executable and verify the second call still succeeds (proving the cached result is used),
- then call `which` with `fresh = true` after removal and verify a `not_found` error (proving the cache is bypassed).
This will lock in the intended cache/`fresh` behavior and guard against regressions in the resolver logic.
Suggested implementation:
```rust
#[rstest]
fn which_filter_uses_cached_result_when_executable_removed() -> Result<()> {
let mut fixture = WhichTestFixture::with_tool_in_dirs(
&ToolName::from("helper"),
&[DirName::from("bin_first"), DirName::from("bin_second")],
)?;
fixture.state.reset_impure();
// First lookup populates the cache.
let first = fixture.render(&Template::from("{{ 'helper' | which }}"))?;
assert_eq!(first, fixture.paths[0].as_str());
// Remove the executable from disk.
std::fs::remove_file(&fixture.paths[0])?;
fixture.state.reset_impure();
// Second lookup should still succeed because the result is served from cache.
let second = fixture.render(&Template::from("{{ 'helper' | which }}"))?;
assert_eq!(second, fixture.paths[0].as_str());
Ok(())
}
#[rstest]
fn which_filter_fresh_bypasses_cache_after_executable_removed() -> Result<()> {
let mut fixture = WhichTestFixture::with_tool_in_dirs(
&ToolName::from("helper"),
&[DirName::from("bin_first"), DirName::from("bin_second")],
)?;
fixture.state.reset_impure();
// Populate cache.
let first = fixture.render(&Template::from("{{ 'helper' | which }}"))?;
assert_eq!(first, fixture.paths[0].as_str());
// Remove the executable from disk.
std::fs::remove_file(&fixture.paths[0])?;
fixture.state.reset_impure();
// A fresh lookup should bypass the cache and fail because the executable is gone.
let result = fixture.render(&Template::from("{{ 'helper' | which(fresh=true) }}"));
assert!(result.is_err(), "expected fresh which lookup to fail after removal");
Ok(())
}
#[rstest]
```
To fully align these tests with your implementation, you may need to:
1. Adjust the `fresh` option syntax in the template if your filter expects a different argument form (e.g., `{{ 'helper' | which(fresh=true) }}` vs `{{ 'helper' | which(fresh=True) }}` or positional args).
2. If the error type or message for a `not_found` condition is more specific than a generic `Err`, refine the assertion:
- For example, downcast the error to your custom error type or assert that `err.to_string()` contains your canonical `not_found` wording.
3. If `fixture.paths[0]` is not the exact on-disk path to the executable, replace `std::fs::remove_file(&fixture.paths[0])?` with the correct path field or helper on `WhichTestFixture` that targets the actual binary location.
4. If your cache logic is supposed to mark the state as pure/impure in a specific way on cached vs fresh lookups, you can extend these tests with additional `assert!(fixture.state.is_impure())` / `assert!(!fixture.state.is_impure())` checks that match your semantics.
</issue_to_address>
### Comment 4
<location> `tests/std_filter_tests/which_filter_tests.rs:258` </location>
<code_context>
+}
+
+#[rstest]
+fn which_function_honours_cwd_mode() -> Result<()> {
+ let (_temp, root) = support::filter_workspace()?;
+ let tool = write_tool(&root, &ToolName::from("local"))?;
</code_context>
<issue_to_address>
**suggestion (testing):** Add negative tests for invalid `cwd_mode` values to cover argument validation and diagnostics.
Currently only the `cwd_mode="always"` success path is tested. Please also add tests that:
- call `{{ which('tool', cwd_mode='invalid') }}` and assert the error contains the `netsuke::jinja::which::args` code and a clear message about allowed values;
- optionally verify the intended case-sensitivity behaviour (e.g. whether `"Auto"` / `"ALWAYS"` are accepted or rejected).
This will fully cover option validation and diagnostics from templates.
Suggested implementation:
```rust
#[rstest]
fn which_function_honours_cwd_mode() -> Result<()> {
let (_temp, root) = support::filter_workspace()?;
let tool = write_tool(&root, &ToolName::from("local"))?;
let _path = PathEnv::new(&[])?;
let (mut env, _state) = fallible::stdlib_env_with_state()?;
let template = Template::from("{{ which('local', cwd_mode='always') }}");
let output = render(&mut env, &template)?;
assert_eq!(output, tool.as_str());
Ok(())
}
#[rstest]
fn which_function_rejects_invalid_cwd_mode() -> Result<()> {
let (_temp, _root) = support::filter_workspace()?;
let _path = PathEnv::new(&[])?;
let (mut env, _state) = fallible::stdlib_env_with_state()?;
let template = Template::from("{{ which('local', cwd_mode='invalid') }}");
let err = render(&mut env, &template).expect_err("expected invalid cwd_mode to fail");
let message = err.to_string();
assert!(
message.contains("netsuke::jinja::which::args"),
"expected error code in message, got: {message}"
);
assert!(
message.contains("cwd_mode"),
"expected parameter name in message, got: {message}"
);
assert!(
message.contains("allowed"),
"expected explanation of allowed values in message, got: {message}"
);
Ok(())
}
#[rstest]
fn which_function_rejects_mismatched_casing_for_cwd_mode() -> Result<()> {
let (_temp, _root) = support::filter_workspace()?;
let _path = PathEnv::new(&[])?;
let (mut env, _state) = fallible::stdlib_env_with_state()?;
let template = Template::from("{{ which('local', cwd_mode='Always') }}");
let err = render(&mut env, &template).expect_err("expected incorrectly cased cwd_mode to fail");
let message = err.to_string();
assert!(
message.contains("netsuke::jinja::which::args"),
"expected error code in message, got: {message}"
);
assert!(
message.contains("cwd_mode"),
"expected parameter name in message, got: {message}"
);
Ok(())
}
#[rstest]
fn which_filter_reports_missing_command() -> Result<()> {
```
These tests assume:
- `render` returns an error type that implements `Display` and whose string representation includes the diagnostic code `netsuke::jinja::which::args`.
- The error message for invalid `cwd_mode` mentions the word "allowed" when describing valid values, and includes the parameter name `cwd_mode`.
If the actual diagnostics use different wording (for example, listing valid values like `"always", "auto", "never"` without the word "allowed"), please adjust the `assert!(message.contains(...))` checks to match the real error text or codes used in your project. You may also want to:
- Tighten the assertions to check for specific allowed values (e.g. `message.contains(r#""always", "auto", "never""#)`).
- Change the casing test to `cwd_mode='ALWAYS'` or another variant if your intended case-sensitivity behaviour differs (accepting or rejecting particular casings).
</issue_to_address>
### Comment 5
<location> `tests/std_filter_tests/which_filter_tests.rs:270` </location>
<code_context>
+}
+
+#[rstest]
+fn which_filter_reports_missing_command() -> Result<()> {
+ let (_temp, _root) = support::filter_workspace()?;
+ let _path = PathEnv::new(&[])?;
</code_context>
<issue_to_address>
**suggestion (testing):** Also assert the shape/type of the `which` output in `all=true` mode, not just the string join.
Right now `all=true` is only verified via `{{ 'helper' | which(all=true) | join('|') }}`, which checks contents/order but not that the filter actually returns a list. Please add a test that renders something like `{{ which('helper', all=true) | tojson }}` (or inspects the `Value` in Rust) and asserts that the result is a list/JSON array, so the tests enforce the documented contract for `all=true`.
Suggested implementation:
```rust
let template = Template::from("{{ 'helper' | which(all=true) | join('|') }}");
let output = render(&mut env, &template)?;
assert_eq!(output, expected);
Ok(())
}
#[rstest]
fn which_filter_all_returns_list() -> Result<()> {
let (_temp, _root) = support::filter_workspace()?;
let _path = PathEnv::new(&[])?;
let (mut env, _state) = fallible::stdlib_env_with_state()?;
// Render which(..., all=true) directly and encode as JSON so we can assert the shape.
let template = Template::from("{{ which('helper', all=true) | tojson }}");
let output = render(&mut env, &template)?;
// Ensure the result is a JSON array, enforcing that all=true returns a list.
let value: serde_json::Value = serde_json::from_str(&output)?;
assert!(
value.is_array(),
"expected which(..., all=true) to return a JSON array, got {value:?}"
);
Ok(())
}
```
To compile successfully, ensure that `serde_json::Value` and `serde_json::from_str` are in scope for this test module. If they are not already imported at the top of `tests/std_filter_tests/which_filter_tests.rs`, add:
- `use serde_json;`
or, more specifically:
- `use serde_json::Value;`
- `use serde_json::from_str;`
and then adjust the code above to use those imports (e.g. `let value: Value = from_str(&output)?;`) if you prefer more concise names, while keeping the same test logic.
</issue_to_address>
### Comment 6
<location> `src/stdlib/which/lookup/tests.rs:1` </location>
<code_context>
+//! Tests for the which lookup helpers, covering PATH search, workspace
+//! fallback, canonicalisation, and platform-specific PATHEXT behaviour.
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding targeted tests for PATH/PATHEXT parsing edge cases and direct-path lookup behaviour.
Current lookup tests already cover workspace fallback and (on Windows) PATHEXT handling via `resolve_direct_appends_pathext`, but a few specific cases are still untested:
- PATH parsing:
- a PATH entry with invalid UTF-8 should trigger the `netsuke::jinja::which::args` error from `EnvSnapshot::capture`/PATH parsing;
- relative PATH entries that resolve against `cwd`, to confirm they’re normalised correctly in `resolved_dirs`.
- PATHEXT parsing on Windows:
- behaviour when `PATHEXT` is empty or has entries without a leading dot (e.g. `"COM;EXE"`), including deduplication and the default fallback when the parsed list is empty.
- Direct-path behaviour on POSIX:
- a direct path that exists but is not executable should raise `direct_not_found`.
A few focused unit tests for these would significantly strengthen coverage of the low-level resolution rules beyond what higher-level filter tests exercise implicitly.
Suggested implementation:
```rust
use super::*;
use anyhow::{Context, Result, anyhow, ensure};
use rstest::{fixture, rstest};
use std::fs;
use std::env;
#[cfg(unix)]
use std::os::unix::fs::PermissionsExt;
#[cfg(unix)]
use std::os::unix::ffi::OsStrExt;
use tempfile::TempDir;
struct TempWorkspace {
root: Utf8PathBuf,
_tempdir: TempDir,
}
#[cfg(unix)]
#[test]
fn path_with_invalid_utf8_triggers_args_error() -> Result<()> {
// Construct a PATH with an invalid UTF-8 entry and ensure EnvSnapshot
// surfaces the netsuke::jinja::which::args error from PATH parsing.
let tempdir = tempfile::tempdir()?;
let cwd = Utf8PathBuf::from_path_buf(tempdir.path().to_path_buf())
.expect("tempdir path should be valid UTF-8");
// "/bin:\xFF" – invalid UTF-8 in the second entry.
let invalid_path = std::ffi::OsStr::from_bytes(b"/bin:\xFF");
env::set_var("PATH", invalid_path);
let err = EnvSnapshot::capture(&cwd).expect_err("invalid PATH should fail EnvSnapshot::capture");
let msg = format!("{err:#}");
assert!(
msg.contains("netsuke::jinja::which::args"),
"expected netsuke::jinja::which::args error, got: {msg}"
);
Ok(())
}
#[test]
fn relative_path_entries_resolve_against_cwd() -> Result<()> {
// Relative PATH entries should be resolved against cwd and normalised
// into EnvSnapshot.resolved_dirs (or equivalent field).
let tempdir = tempfile::tempdir()?;
let root = Utf8PathBuf::from_path_buf(tempdir.path().to_path_buf())
.expect("tempdir path should be valid UTF-8");
let rel_bin = root.join("bin");
let rel_tools = root.join("tools/..").join("tools"); // intentionally non-canonical
fs::create_dir_all(&rel_bin)?;
fs::create_dir_all(&rel_tools)?;
// PATH with a mix of absolute and relative entries.
// The relative entries should be resolved from cwd (`root`).
let path_value = format!("{}:bin:tools", root);
env::set_var("PATH", &path_value);
let snapshot = EnvSnapshot::capture(&root)
.context("EnvSnapshot::capture should succeed for relative PATH entries")?;
// Adjust these field names depending on how resolved PATH dirs are exposed.
let resolved_dirs = &snapshot.resolved_dirs;
assert!(
resolved_dirs.contains(&rel_bin.canonicalize_utf8()?),
"resolved_dirs should contain canonicalised bin dir: {resolved_dirs:?}"
);
assert!(
resolved_dirs.contains(&rel_tools.canonicalize_utf8()?),
"resolved_dirs should contain canonicalised tools dir: {resolved_dirs:?}"
);
Ok(())
}
#[cfg(windows)]
#[test]
fn pathext_empty_uses_default_fallback() -> Result<()> {
// When PATHEXT parses to an empty list, the implementation should fall
// back to the default PATHEXT set.
let tempdir = tempfile::tempdir()?;
let cwd = Utf8PathBuf::from_path_buf(tempdir.path().to_path_buf())
.expect("tempdir path should be valid UTF-8");
// Empty PATHEXT means "no entries provided".
env::set_var("PATHEXT", "");
// Use a minimal PATH just to satisfy capture.
env::set_var("PATH", &cwd);
let snapshot = EnvSnapshot::capture(&cwd)
.context("EnvSnapshot::capture should succeed for empty PATHEXT")?;
// Adjust field / expected default set to match the production implementation.
let pathexts = &snapshot.pathexts;
// The default Windows PATHEXT set usually includes at least .COM and .EXE.
assert!(
pathexts.iter().any(|ext| ext.eq_ignore_ascii_case(".COM")),
"default PATHEXT should include .COM, got: {pathexts:?}"
);
assert!(
pathexts.iter().any(|ext| ext.eq_ignore_ascii_case(".EXE")),
"default PATHEXT should include .EXE, got: {pathexts:?}"
);
Ok(())
}
#[cfg(windows)]
#[test]
fn pathext_without_leading_dots_is_normalised_and_deduplicated() -> Result<()> {
// PATHEXT entries without a leading dot (e.g. 'COM;EXE;exe') should be
// normalised to '.COM', '.EXE' and deduplicated.
let tempdir = tempfile::tempdir()?;
let cwd = Utf8PathBuf::from_path_buf(tempdir.path().to_path_buf())
.expect("tempdir path should be valid UTF-8");
env::set_var("PATHEXT", "COM;EXE;EXE; .BAT ;bat");
env::set_var("PATH", &cwd);
let snapshot = EnvSnapshot::capture(&cwd)
.context("EnvSnapshot::capture should succeed for PATHEXT without leading dots")?;
let mut pathexts = snapshot.pathexts.clone();
pathexts.sort_unstable_by(|a, b| a.to_lowercase().cmp(&b.to_lowercase()));
// The exact ordering may differ; we assert on set membership and dedup.
fn contains_ci(list: &[String], needle: &str) -> bool {
list.iter().any(|s| s.eq_ignore_ascii_case(needle))
}
assert!(contains_ci(&pathexts, ".COM"), "PATHEXT should contain normalised .COM: {pathexts:?}");
assert!(contains_ci(&pathexts, ".EXE"), "PATHEXT should contain normalised .EXE: {pathexts:?}");
assert!(contains_ci(&pathexts, ".BAT"), "PATHEXT should contain normalised .BAT: {pathexts:?}");
// Ensure deduplication: at most one of each extension (case-insensitive).
let mut lowercased: Vec<String> = pathexts.iter().map(|s| s.to_lowercase()).collect();
lowercased.sort_unstable();
lowercased.dedup();
assert_eq!(
lowercased.len(),
pathexts.len(),
"PATHEXT list should be deduplicated (case-insensitive): {pathexts:?}"
);
Ok(())
}
#[cfg(unix)]
#[test]
fn direct_path_not_executable_raises_direct_not_found() -> Result<()> {
// A direct path (containing a path separator) that exists but is not
// executable should raise the direct_not_found error rather than being
// treated as a PATH-style lookup.
let tempdir = tempfile::tempdir()?;
let root = Utf8PathBuf::from_path_buf(tempdir.path().to_path_buf())
.expect("tempdir path should be valid UTF-8");
let script_path = root.join("script.sh");
fs::write(&script_path, "#!/bin/sh\necho test\n")?;
// Ensure the file is not executable: 0o644 (rw-r--r--).
let mut perms = fs::metadata(&script_path)?.permissions();
perms.set_mode(0o644);
fs::set_permissions(&script_path, perms)?;
env::set_var("PATH", &root); // PATH shouldn't matter for direct path
let snapshot = EnvSnapshot::capture(&root)
.context("EnvSnapshot::capture should succeed for direct path test")?;
// Adjust this call to the actual direct-lookup helper in the module.
let err = resolve_direct(&snapshot, &script_path)
.expect_err("non-executable direct path should not resolve successfully");
let msg = format!("{err:#}");
assert!(
msg.contains("direct_not_found"),
"expected direct_not_found error for non-executable direct path, got: {msg}"
);
Ok(())
}
```
The new tests assume several names and public surfaces that may differ slightly from the actual implementation. To wire them up correctly, please:
1. EnvSnapshot API and fields:
- Ensure `EnvSnapshot::capture(&cwd: &Utf8PathBuf)` (or similar) exists; if the signature differs (e.g. takes `&Utf8Path` or an explicit env snapshot), adjust the calls accordingly.
- Expose or adjust:
- `resolved_dirs`: a `Vec<Utf8PathBuf>` (or equivalent) of PATH directories already resolved against `cwd`. If the field has a different name or type, update the references in `relative_path_entries_resolve_against_cwd`.
- `pathexts`: a `Vec<String>` (or similar) of parsed PATHEXT entries. If this is computed lazily or has a different field name, change the test to access the correct field or accessor method.
2. Error reporting:
- The invalid-UTF8 PATH test (`path_with_invalid_utf8_triggers_args_error`) currently asserts on the formatted error containing `netsuke::jinja::which::args`. If you expose a typed error (e.g. `WhichError::Args` / `WhichErrorKind::Args`), prefer to match on the concrete type/kind instead and adjust the assertion.
- For the direct path case, if there is a specific error type/variant for this (e.g. `WhichError::DirectNotFound` or a `WhichErrorKind::DirectNotFound`), update the assertion in `direct_path_not_executable_raises_direct_not_found` to match on that instead of the string `"direct_not_found"`.
3. Windows PATHEXT handling:
- The tests assume that:
- parsing an empty PATHEXT results in a default fallback list containing at least `.COM` and `.EXE`, exposed via `snapshot.pathexts`;
- PATHEXT entries are normalised to include a leading dot, are case-insensitive, and are deduplicated.
- If your default PATHEXT set or normalisation rules differ, update the expected values (or loosen the assertions) to reflect the actual semantics you intend to guarantee.
4. Direct-path resolution helper:
- The test `direct_path_not_executable_raises_direct_not_found` calls a helper `resolve_direct(&snapshot, &script_path)`. Replace this with the actual direct-path lookup function exposed by the lookup module (for example, `lookup_direct`, `lookup_one`, `which`, etc.), making sure you pass all required parameters (workspace, environment snapshot, arguments) as appropriate.
- If direct-path handling is only triggered when the user passes a string containing a path separator, and the function expects a command string rather than a path type, convert `script_path` back to the expected string type (e.g. `script_path.as_str()` or `script_path.to_string()`).
5. Imports and feature flags:
- Confirm that `Utf8PathBuf` and `canonicalize_utf8` are already imported from `camino`; if they are not, add `use camino::{Utf8PathBuf, Utf8Path};` and adjust the canonicalisation calls to use the existing utilities in the codebase.
- If you already have helper functions/fixtures in this test module to create temporary workspaces and environment snapshots, consider refactoring the new tests to reuse those helpers instead of constructing tempdirs and calling `EnvSnapshot::capture` directly, to keep the style consistent with the rest of the file.
</issue_to_address>
### Comment 7
<location> `docs/netsuke-design.md:964-965` </location>
<code_context>
+The resolver keeps a small LRU cache keyed by the command, a fingerprint of
+`PATH`/`PATHEXT`, the working directory, and the cache-relevant options (`all`,
+`canonical`, `cwd_mode`). Entries are validated once at insertion; cache reads
+no longer re-probe executability, keeping the hot path lean. Because `fresh`
+only controls bypass behaviour it is stripped from the cache key so fresh
+lookups still repopulate the cache for subsequent calls. The fingerprint means
+environment changes invalidate keys without cloning large strings, and the
</code_context>
<issue_to_address>
**nitpick (typo):** Consider adding a comma after "behaviour" for smoother readability.
The clause after "behaviour" makes the sentence dense; adding a comma there would improve readability: "Because `fresh` only controls bypass behaviour, it is stripped..."
```suggestion
no longer re-probe executability, keeping the hot path lean. Because `fresh`
only controls bypass behaviour, it is stripped from the cache key so fresh
```
</issue_to_address>
### Comment 8
<location> `src/stdlib/which/lookup/workspace.rs:29` </location>
<code_context>
+/// Skips unreadable entries, ignores heavy/VCS directories via
+/// `should_visit_entry`, and returns `Ok(Vec<Utf8PathBuf>)` containing the
+/// discovered executables or an `Error` if UTF-8 conversion fails.
+pub(super) fn search_workspace(
+ cwd: &Utf8Path,
+ command: &str,
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the workspace search implementation by moving platform-specific logic into separate modules and inlining the traversal loop into a single search function per platform.
You can simplify this without losing any behaviour by pushing the `#[cfg]`s to function/module boundaries and collapsing the two helper functions into a single traversal loop.
1) Remove unit-type parameters and internal `#[cfg]` branches
Right now you carry `env: ()` / `match_ctx: ()` / `ctx: ()` and have internal `#[cfg]` to select which call to make. You can keep the public API shape and the Windows behaviour while making each platform’s implementation straightforward.
For example:
```rust
// workspace/mod.rs
#[cfg(windows)]
mod workspace_windows;
#[cfg(not(windows))]
mod workspace_posix;
#[cfg(windows)]
pub(super) use workspace_windows::search_workspace;
#[cfg(not(windows))]
pub(super) use workspace_posix::search_workspace;
```
Then in `workspace_windows.rs`:
```rust
pub(super) fn search_workspace(
cwd: &Utf8Path,
command: &str,
collect_all: bool,
env: &EnvSnapshot,
) -> Result<Vec<Utf8PathBuf>, Error> {
let match_ctx = prepare_workspace_match(command, env);
let mut matches = Vec::new();
for entry in WalkDir::new(cwd)
.follow_links(false)
.sort_by_file_name()
.into_iter()
.filter_entry(should_visit_entry)
{
let entry = match entry {
Ok(e) => e,
Err(err) => {
tracing::debug!(%command, error = %err,
"skipping unreadable workspace entry during which fallback");
continue;
}
};
if let Some(path) = process_workspace_entry(entry, command, &match_ctx)? {
matches.push(path);
if !collect_all {
break;
}
}
}
Ok(matches)
}
```
And in `workspace_posix.rs`:
```rust
pub(super) fn search_workspace(
cwd: &Utf8Path,
command: &str,
collect_all: bool,
_env: (),
) -> Result<Vec<Utf8PathBuf>, Error> {
let mut matches = Vec::new();
for entry in WalkDir::new(cwd)
.follow_links(false)
.sort_by_file_name()
.into_iter()
.filter_entry(should_visit_entry)
{
let entry = match entry {
Ok(e) => e,
Err(err) => {
tracing::debug!(%command, error = %err,
"skipping unreadable workspace entry during which fallback");
continue;
}
};
if let Some(path) = process_workspace_entry(entry, command)? {
matches.push(path);
if !collect_all {
break;
}
}
}
Ok(matches)
}
```
This keeps the external `search_workspace` API identical, but eliminates:
- `match_ctx: ()` and `ctx: ()` parameters,
- internal `#[cfg(windows)]` / `#[cfg(not(windows))]` call-site branches.
2) Merge `collect_workspace_matches` into `search_workspace`
`collect_workspace_matches` is effectively a thin loop around `process_workspace_entry`. Inlining it (as above) removes an indirection layer and makes traversal, early-break logic, and error handling visible in one place.
You can still keep a small `process_workspace_entry` per platform:
Windows:
```rust
fn process_workspace_entry(
entry: walkdir::DirEntry,
command: &str,
ctx: &WorkspaceMatchContext,
) -> Result<Option<Utf8PathBuf>, Error> {
if !entry.file_type().is_file() {
return Ok(None);
}
if !workspace_entry_matches(&entry, ctx) {
return Ok(None);
}
let path = entry.into_path();
let utf8 = Utf8PathBuf::from_path_buf(path).map_err(|path_buf| {
let lossy_path = path_buf.to_string_lossy();
Error::new(
ErrorKind::InvalidOperation,
format!(
"workspace path contains non-UTF-8 components while resolving command \
'{command}': {lossy_path}"
),
)
})?;
Ok(is_executable(&utf8).then_some(utf8))
}
```
POSIX:
```rust
fn process_workspace_entry(
entry: walkdir::DirEntry,
command: &str,
) -> Result<Option<Utf8PathBuf>, Error> {
if !entry.file_type().is_file() {
return Ok(None);
}
if !workspace_entry_matches(&entry, command) {
return Ok(None);
}
// same UTF-8 and is_executable logic as Windows
}
```
This preserves:
- filtering of unreadable entries,
- directory skipping via `should_visit_entry`,
- platform-specific filename matching (including PATHEXT),
- UTF-8 error reporting and `is_executable` checks,
- `collect_all` early-exit behaviour,
while significantly reducing the conditional-compilation plumbing and abstraction layers in the main control flow.
</issue_to_address>
### Comment 9
<location> `src/stdlib/which/lookup/mod.rs:82` </location>
<code_context>
+///
+/// Returns `netsuke::jinja::which::not_found` when no matching executable is
+/// discovered.
+pub(super) fn resolve_direct(
+ command: &str,
+ env: &EnvSnapshot,
</code_context>
<issue_to_address>
**issue (complexity):** Consider pushing the platform-specific cfg logic into small helper functions so the main lookup and resolve_direct flows remain clean and platform-agnostic.
You can reduce the cfg-driven structural complexity without changing behaviour by pushing more of the platform branching to the edges.
Two concrete, small changes:
1) Split `resolve_direct` into cfg-specific implementations
Right now `resolve_direct` does its own path normalization and then cfg-dispatches to two helpers. You can instead:
- Extract the shared “normalize to absolute-ish path” step into a tiny helper.
- Provide cfg-gated `resolve_direct` implementations that each call that helper and then do platform-specific logic.
That removes the cfg noise from the main call site and makes `resolve_direct` clearer.
Example:
```rust
fn normalize_direct_path(command: &str, env: &EnvSnapshot) -> Utf8PathBuf {
let raw = Utf8Path::new(command);
if raw.is_absolute() {
raw.to_path_buf()
} else {
env.cwd.join(raw)
}
}
#[cfg(windows)]
pub(super) fn resolve_direct(
command: &str,
env: &EnvSnapshot,
options: &WhichOptions,
) -> Result<Vec<Utf8PathBuf>, Error> {
let resolved = normalize_direct_path(command, env);
let candidates = direct_candidates(&resolved, env);
let mut matches = Vec::new();
let _ = push_matches(&mut matches, candidates, options.all);
if matches.is_empty() {
return Err(direct_not_found(command, &resolved));
}
if options.canonical {
canonicalise(matches)
} else {
Ok(matches)
}
}
#[cfg(not(windows))]
pub(super) fn resolve_direct(
command: &str,
env: &EnvSnapshot,
options: &WhichOptions,
) -> Result<Vec<Utf8PathBuf>, Error> {
let resolved = normalize_direct_path(command, env);
if !is_executable(&resolved) {
return Err(direct_not_found(command, &resolved));
}
if options.canonical {
canonicalise(vec![resolved.clone()])
} else {
Ok(vec![resolved])
}
}
```
This keeps behaviour identical but removes the inner cfg branching and the extra `resolve_direct_*` layer.
2) Remove cfg from `lookup`’s main loop via a tiny helper
You can hide the `#[cfg]` in candidate generation behind a small function, so the search loop is platform-agnostic and easier to scan.
Example:
```rust
fn candidates_for_dir(env: &EnvSnapshot, dir: &Utf8Path, command: &str) -> Vec<Utf8PathBuf> {
#[cfg(windows)]
{
let suffixes = env.pathext();
env::candidate_paths(dir, command, suffixes)
}
#[cfg(not(windows))]
{
vec![dir.join(command)]
}
}
pub(super) fn lookup(
command: &str,
env: &EnvSnapshot,
options: &WhichOptions,
) -> Result<Vec<Utf8PathBuf>, Error> {
if is_direct_path(command) {
return resolve_direct(command, env, options);
}
let dirs = env.resolved_dirs(options.cwd_mode);
let mut matches = Vec::new();
for dir in &dirs {
let candidates = candidates_for_dir(env, dir, command);
if push_matches(&mut matches, candidates, options.all) {
break;
}
}
// ... rest unchanged ...
}
```
This keeps the same semantics (including `PATHEXT` use on Windows) but makes the core control flow free of cfg conditionals.
If you like this pattern, you can apply it similarly to `handle_miss` by adding a thin, cfg-gated wrapper around `search_workspace` so that `handle_miss` doesn’t need cfg in the middle of its logic.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Summary
all,canonical,fresh, andcwd_mode, and safely resolves executables across POSIX and Windows, including Windows-specific PATH handling via PATHEXT. Includes an LRU-cached resolver to optimise repeated lookups and diagnostic error codes for easier troubleshooting. Tests cover path-discovery, multi-entry PATH lookups, canonicalisation, cache invalidation, and missing executables.Changes
whichand functionwhichwith caching, cache invalidation, and options:all(default false) to return all matches in PATH ordercanonicalto deduplicate via canonical pathsfreshto bypass cache for a lookupcwd_modeto control whether current directory is searched (auto | always | never)/in output where appropriatenetsuke::jinja::which::not_foundandnetsuke::jinja::which::argsfor diagnosticswhich::register(env)during stdlib setupTest Plan
cwd_modeon POSIX and WindowsDocumentation and Design Notes
netsuke::jinja::which::not_foundandnetsuke::jinja::which::argsprovide actionable feedback in templates.fresh=truebypasses the cache while keeping previous entries available for subsequent renders.Why this change
Breaking Changes
whichfilter and function addition, registered under stdlib.If you want me to adjust the PR description with more or fewer details, or emphasize specific usage examples in the docs, I can update accordingly.
🌿 Generated by Terry
ℹ️ Tag @terragon-labs to ask questions and address PR feedback
📎 Task: https://www.terragonlabs.com/task/d68b1ca7-62e2-4219-b655-2b114f28a0a6
Summary by Sourcery
Add a cross-platform
whichfilter/function to the MiniJinja stdlib with cached executable resolution and integrate it into a refactored stdlib registration/configuration layer.New Features:
whichMiniJinja filter and function that resolve executables from PATH with options for returning all matches, canonicalisation, cache bypass, and configurable current-directory search.Enhancements:
configandregistermodules, making setup fallible and more robust while exposing typed defaults and workspace configuration.whichlookups to optimise repeated resolutions without sacrificing determinism.Build:
lruandwalkdiras dependencies to support cached resolution and workspace traversal for thewhichhelper.Documentation:
whichbehaviour, caching model, workspace fallback, and the testing strategy, and to mark the executable discovery work as complete.Tests:
whichresolver and workspace fallback, including PATH/PATHEXT handling, canonicalisation, caching semantics, and error reporting across POSIX and Windows.whichusage in templates, PATH setup, and executable fixtures, and introduce shared stdlib assertion helpers for reuse across tests.