Problem
We currently have 3 bash e2e test scripts in e2e/bash/ that test CLI functionality by shelling out to the nemoclaw binary:
| Script |
What it tests |
test_sandbox_custom_image.sh |
Custom Dockerfile build + sandbox creation via --from |
test_sandbox_sync.sh |
Bidirectional file sync (directories, single files, large files with checksum verification) |
test_port_forward.sh |
TCP port forwarding through a sandbox via sandbox forward start |
These bash tests are brittle, hard to maintain, and inconsistent with the rest of the test suite which is written in Rust and Python. They rely on hand-rolled helpers (strip_ansi(), poll loops, trap-based cleanup) that would be better served by Rust's type system, assert! macros, and RAII cleanup.
Proposed Solution
Replace all 3 bash e2e test scripts with Rust integration tests that invoke the nemoclaw CLI binary as a subprocess (using std::process::Command or assert_cmd). The new tests should live in crates/navigator-cli/tests/ alongside the existing Rust integration tests (provider_commands_integration.rs, mtls_integration.rs).
Key design decisions
- Invoke the actual binary (
cargo build artifact or env!("CARGO_BIN_EXE_nemoclaw")) rather than calling library functions directly — these are true e2e tests that should exercise the full CLI entrypoint.
- Use
assert_cmd (or std::process::Command + helpers) for ergonomic subprocess assertions.
- Use
tempfile (already a dev-dependency) for temp directories and cleanup via RAII/Drop.
- Port all test scenarios faithfully — each bash test has specific edge cases that must be preserved:
- Custom image: Dockerfile build, marker file verification in sandbox output
- Sync: nested directories, single-file mode, large file (~512 KiB) with SHA-256 checksum + size verification, multi-chunk ordering
- Port forward: background process management, TCP echo server, retry logic for tunnel readiness
Tasks
Acceptance Criteria
- All 3 bash e2e test scripts are deleted
- Equivalent Rust integration tests exist that invoke the
nemoclaw binary as a subprocess
- All test scenarios from the bash scripts are faithfully ported (no coverage regression)
- Shared test helpers are extracted to avoid duplication
tasks/test.toml is updated so mise run tasks point to the new Rust tests
- All new tests pass in CI
Problem
We currently have 3 bash e2e test scripts in
e2e/bash/that test CLI functionality by shelling out to thenemoclawbinary:test_sandbox_custom_image.sh--fromtest_sandbox_sync.shtest_port_forward.shsandbox forward startThese bash tests are brittle, hard to maintain, and inconsistent with the rest of the test suite which is written in Rust and Python. They rely on hand-rolled helpers (
strip_ansi(), poll loops, trap-based cleanup) that would be better served by Rust's type system,assert!macros, and RAII cleanup.Proposed Solution
Replace all 3 bash e2e test scripts with Rust integration tests that invoke the
nemoclawCLI binary as a subprocess (usingstd::process::Commandorassert_cmd). The new tests should live incrates/navigator-cli/tests/alongside the existing Rust integration tests (provider_commands_integration.rs,mtls_integration.rs).Key design decisions
cargo buildartifact orenv!("CARGO_BIN_EXE_nemoclaw")) rather than calling library functions directly — these are true e2e tests that should exercise the full CLI entrypoint.assert_cmd(orstd::process::Command+ helpers) for ergonomic subprocess assertions.tempfile(already a dev-dependency) for temp directories and cleanup via RAII/Drop.Tasks
assert_cmdas a dev-dependency fornavigator-clicrates/navigator-cli/tests/e2e_custom_image.rs— porttest_sandbox_custom_image.shcrates/navigator-cli/tests/e2e_sync.rs— porttest_sandbox_sync.sh(all 5 steps)crates/navigator-cli/tests/e2e_port_forward.rs— porttest_port_forward.shcrates/navigator-cli/tests/common/mod.rs)tasks/test.toml— replace bash task definitions (test:e2e:custom-image,test:e2e:sync,test:e2e:port-forward) to run the new Rust tests (e.g., viacargo test --test e2e_*)e2e/bash/directory and all 3 bash scriptsmise run e2e)Acceptance Criteria
nemoclawbinary as a subprocesstasks/test.tomlis updated somise runtasks point to the new Rust tests