Enforce workspace-scoped tempfile creation with UTF-8 paths#235
Conversation
…mp files Reworked command temporary file management to use cap-std crate instead of tempfile. Implemented a custom unique tempfile naming strategy with atomic counter. Improved tempfile lifecycle handling and removal on drop when not persisted. Standardized sanitation of tempfile labels for safer filenames. Added robust error handling for tempfile creation attempts. Smoothed tempfile usage in pipe reading and empty tempfile creation. Changed workspace root path validation message for clarity. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Reviewer's GuideThis PR fully replaces arbitrary NamedTempFile usage with workspace-scoped temp files by leveraging cap_std directories and files, introduces deterministic unique naming via an atomic counter and sanitized labels, implements streaming and cleanup logic in CommandTempFile, and removes non-UTF-8 path handling throughout the temp file API. Entity relationship diagram for workspace-scoped temp fileserDiagram
WORKSPACE_ROOT ||--o{ TEMPFILE : contains
TEMPFILE {
relative_path Utf8PathBuf
absolute_path Utf8PathBuf
persisted bool
}
WORKSPACE_ROOT {
path Utf8PathBuf
}
Class diagram for updated CommandTempFile and related temp file managementclassDiagram
class CommandTempDir {
+create(label: &str) CommandTempFile
workspace_root: Arc<Dir>
workspace_root_path: Option<Utf8PathBuf>
relative: Utf8PathBuf
}
class CommandTempFile {
+new(file: CapFile, workspace_root: Arc<Dir>, relative_path: Utf8PathBuf, absolute_path: Utf8PathBuf) CommandTempFile
+into_path() Utf8PathBuf
+file_mut() &mut CapFile
+write(buf: &[u8]) io::Result<usize>
+flush() io::Result<()>
file: Option<CapFile>
workspace_root: Arc<Dir>
relative_path: Utf8PathBuf
absolute_path: Utf8PathBuf
persisted: bool
}
CommandTempDir --> CommandTempFile
CommandTempFile ..> CapFile
CommandTempFile ..> Dir
CommandTempFile ..> Utf8PathBuf
CommandTempDir ..> Dir
CommandTempDir ..> Utf8PathBuf
class StdlibConfig {
+with_workspace_root_path(path: impl Into<Utf8PathBuf>) &mut Self
workspace_root_path: Option<Utf8PathBuf>
}
StdlibConfig ..> Utf8PathBuf
class PipeOutcome {
Tempfile(Utf8PathBuf)
}
PipeOutcome ..> Utf8PathBuf
Flow diagram for workspace-scoped tempfile creation and cleanupflowchart TD
A["create(label)"] --> B["Sanitize label"]
B --> C["Generate unique name with atomic counter"]
C --> D["Create temp file in workspace root/relative path"]
D --> E["Return CommandTempFile"]
E --> F["Write/stream data into file"]
F --> G["Call into_path() to persist"]
G --> H["Return UTF-8 absolute path"]
E --> I["Drop CommandTempFile (if not persisted)"]
I --> J["Remove file from workspace directory"]
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Hey there - I've reviewed your changes and found some issues that need to be addressed.
- next_tempfile_component currently panics if write_fmt fails—consider returning an error instead of panicking to avoid unexpected crashes.
- The "unable to allocate unique command tempfile name" error could include the label or attempt count to make debugging name collisions easier.
- Using a global atomic counter alone may collide with leftover files after a restart—consider adding a randomized or time‐based component to the filename to further reduce collision risk.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- next_tempfile_component currently panics if write_fmt fails—consider returning an error instead of panicking to avoid unexpected crashes.
- The "unable to allocate unique command tempfile name" error could include the label or attempt count to make debugging name collisions easier.
- Using a global atomic counter alone may collide with leftover files after a restart—consider adding a randomized or time‐based component to the filename to further reduce collision risk.
## Individual Comments
### Comment 1
<location> `src/stdlib/command.rs:93` </location>
<code_context>
+const MAX_COMMAND_TEMPFILE_ATTEMPTS: usize = 32;
+
+static COMMAND_TEMPFILE_COUNTER: AtomicU64 = AtomicU64::new(0);
+
#[derive(Clone)]
</code_context>
<issue_to_address>
**issue (bug_risk):** Atomic counter may not guarantee uniqueness across multiple processes.
Process-local atomic counters can result in filename collisions when multiple processes operate in the same workspace. To mitigate this, add the process ID or a random value to the filename.
</issue_to_address>
### Comment 2
<location> `src/stdlib/command.rs:227` </location>
<code_context>
+ }
+ }
+
+ fn into_path(mut self) -> Utf8PathBuf {
+ self.persisted = true;
+ drop(self.file.take());
</code_context>
<issue_to_address>
**issue (bug_risk):** Dropping the file before returning the path may not guarantee data is flushed.
Ensure the file is explicitly flushed before returning the path, or enforce flushing within this method to guarantee all data is written.
</issue_to_address>
### Comment 3
<location> `src/stdlib/command.rs:137` </location>
<code_context>
}
fn create(&self, label: &str) -> io::Result<CommandTempFile> {
- self.workspace_root.create_dir_all(&self.relative)?;
- let mut builder = Builder::new();
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the manual tempfile management logic with tempfile::Builder to simplify code and leverage automatic cleanup.
```markdown
You can collapse all of the manual open‐loop, counter, Drop impl, etc. into a few lines using `tempfile::Builder`. That preserves automatic cleanup on Drop and still lets you control directory, prefix and suffix:
```rust
use tempfile::{Builder, NamedTempFile};
impl CommandTempDir {
fn create(&self, label: &str) -> io::Result<CommandTempFile> {
// sanitize_label can stay if you want to filter out non-ascii
let prefix = sanitize_label(label);
let root = self
.workspace_root_path
.as_ref()
.ok_or_else(|| io::Error::new(io::ErrorKind::InvalidInput, "no workspace root"))?;
// tempfile::Builder handles unique names, creation and removal on Drop
let file = Builder::new()
.prefix(&prefix)
.suffix(".tmp")
.tempfile_in(root.as_std_path())?;
Ok(CommandTempFile { file })
}
}
struct CommandTempFile {
file: NamedTempFile,
}
impl CommandTempFile {
/// stream data into this file
pub fn write_all(&mut self, buf: &[u8]) -> io::Result<()> {
self.file.write_all(buf)
}
/// finish writing, persist and return the UTF-8 path
pub fn into_path(mut self) -> io::Result<Utf8PathBuf> {
let tmp_path = self.file
.into_temp_path() // relinquish auto-cleanup
.keep()? // persist into final path
.into_path_buf();
Utf8PathBuf::from_path_buf(tmp_path)
.map_err(|_| io::Error::new(io::ErrorKind::Other, "non-UTF8 tempfile path"))
}
}
```
Then in your pipe‐reading functions just:
```rust
let mut tf = config.create_tempfile(label)?;
reader.read_to_end(&mut tf.file)?;
let path = tf.into_path()?;
Ok(PipeOutcome::Tempfile(path))
```
This removes:
- the manual `AtomicU64` loop,
- all of the `cap_std::OpenOptions` juggling,
- custom `Drop` cleanup,
- extra `persisted` flag,
- and reuses a well‐tested crate.
```
</issue_to_address>
### Comment 4
<location> `src/stdlib/command.rs:176` </location>
<code_context>
+ }
+}
+
+fn next_tempfile_component(label: &str) -> Utf8PathBuf {
+ let counter = COMMAND_TEMPFILE_COUNTER.fetch_add(1, Ordering::Relaxed);
+ let mut name = sanitize_label(label);
</code_context>
<issue_to_address>
**issue (review_instructions):** Add behavioural and unit tests for the new tempfile naming logic and sanitization.
The new functions `next_tempfile_component` and `sanitize_label` introduce logic for generating and sanitizing tempfile names. You must provide both behavioural and unit tests to verify correct name generation, uniqueness, and label sanitization, including edge cases.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*`
**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.
</details>
</issue_to_address>
### Comment 5
<location> `src/stdlib/command.rs:150` </location>
<code_context>
+ options.read(true);
+ options.write(true);
+ options.create_new(true);
+ for _ in 0..MAX_COMMAND_TEMPFILE_ATTEMPTS {
+ let file_component = next_tempfile_component(label);
+ match temp_dir.open_with(&file_component, &options) {
</code_context>
<issue_to_address>
**issue (review_instructions):** Add tests to verify unique tempfile creation and error handling when attempts are exhausted.
The loop for creating tempfiles with a maximum number of attempts is new logic. You must add tests to ensure that unique files are created and that the error path (when all attempts fail) is exercised and behaves as expected.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*`
**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.
</details>
</issue_to_address>
### Comment 6
<location> `src/stdlib/command.rs:251` </location>
<code_context>
+ }
+}
+
+impl Drop for CommandTempFile {
+ fn drop(&mut self) {
+ if self.persisted {
</code_context>
<issue_to_address>
**issue (review_instructions):** Add tests to verify correct cleanup of tempfiles on drop and when persisted.
The new `Drop` implementation for `CommandTempFile` must be covered by tests to ensure that tempfiles are properly cleaned up when not persisted, and not removed when persisted. Test both normal and error paths (e.g., removal failure).
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*`
**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.
</details>
</issue_to_address>
### Comment 7
<location> `src/stdlib/command.rs:241` </location>
<code_context>
+ }
+}
+
+impl Write for CommandTempFile {
+ fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
+ self.file_mut().write(buf)
</code_context>
<issue_to_address>
**issue (review_instructions):** Add tests for the new `Write` implementation for `CommandTempFile`.
The implementation of the `Write` trait for `CommandTempFile` is new and must be covered by unit tests to verify correct writing and flushing behavior, including error handling.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*`
**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.
</details>
</issue_to_address>
### Comment 8
<location> `src/stdlib/command.rs:227` </location>
<code_context>
+ }
+ }
+
+ fn into_path(mut self) -> Utf8PathBuf {
+ self.persisted = true;
+ drop(self.file.take());
</code_context>
<issue_to_address>
**issue (review_instructions):** Add tests for `into_path` to verify correct path return and state changes.
The new `into_path` method should be tested to ensure it returns the correct path, updates internal state as expected, and interacts correctly with the Drop logic.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*`
**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.
</details>
</issue_to_address>
### Comment 9
<location> `src/stdlib/command.rs:1175` </location>
<code_context>
+ Ok(PipeOutcome::Tempfile(path))
}
fn create_empty_tempfile(
</code_context>
<issue_to_address>
**issue (review_instructions):** Add tests for the refactored `create_empty_tempfile` logic.
The logic for creating empty tempfiles has changed. You must add tests to verify correct file creation, path handling, and error cases.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*`
**Instructions:**
For any new feature or change to an existing feature, both behavioural *and* unit tests are required.
</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Replaced the custom logic for creating temporary files within the command module with the tempfile crate's Builder API. This simplifies tempfile creation, removes counters and manual uniqueness handling, and improves error handling. Dropped the custom CommandTempFile's complex management for a simpler wrapper around NamedTempFile, ensuring proper cleanup and persistence semantics. Added tests for label sanitization and tempfile lifecycle behavior. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
|
@coderabbitai Have the following now been resolved?
|
Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 553 🏁 Script executed: Length of output: 465 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 6833 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 1460 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 1200 🏁 Script executed: Length of output: 14 🏁 Script executed: Length of output: 8156 🏁 Script executed: Length of output: 62 🏁 Script executed: Length of output: 9735 🏁 Script executed: Length of output: 1907 🏁 Script executed: Length of output: 1292 🏁 Script executed: Length of output: 888 @leynos, yes, all three concerns have been fully resolved! 🎉 The cap-std implementation in commit d6fdea2 did have all three issues you mentioned:
However, commit bd9d75f completely replaced that approach with the
The |
7b2e32d
into
codex/enforce-command-output-size-limits
* Enforce bounded command output and stream large results * Refactor read_pipe and workspace-scoped tempfile plumbing (#232) * refactor(stdlib): split read_pipe into separate capture and tempfile functions Refactored the read_pipe function in src/stdlib/command.rs by splitting its logic based on OutputMode into two distinct functions, read_pipe_capture and read_pipe_tempfile. This improves code clarity and separation of concerns while maintaining existing behavior. Additionally, refactored template rendering test helpers in tests/steps/stdlib_steps/rendering.rs to consolidate context construction into a single helper function render_with_single_context, reducing code duplication in tests. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> * feat(stdlib/command): enforce byte limits and use workspace-rooted tempfiles for command output - Introduce distinct configurable byte limits for stdout capture and streaming in CommandConfig. - Implement workspace-rooted temporary file creation for streaming command outputs, replacing system temp directory usage. - Modify child process pipe readers to enforce configured limits and produce bytes or workspace-based tempfiles accordingly. - Propagate workspace root path into stdlib configuration, enabling temp file placement inside project directories. - Add tests validating capturing and streaming output within limits and error handling on limit exceedance. - Enhance documentation with a Mermaid class diagram illustrating pipe reading abstractions. - Update stdlib configuration defaults to include workspace root path for consistent temp file handling. - Adjust tests and feature files to verify new output limit enforcement and streaming behavior. This improves resource control and workspace hygiene for command output handling in Netsuke standard library. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> * refactor(stdlib/command): use references for CommandContext to avoid cloning Changed functions to accept `&CommandContext` instead of consuming `CommandContext` by value to improve efficiency and avoid unnecessary cloning. Made `CommandContext::new` a `const fn`. Updated related pipe reader functions to accept config by reference rather than Arc cloning. Fixed a slice indexing panic by validating bounds before write. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> --------- Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> * Refactor rendering tests with macro and absolute workspace roots (#234) * refactor(stdlib,manifest): improve workspace root handling and render helpers - Resolve workspace root as absolute path in manifest loading, supporting relative paths by joining with current working directory. - Enforce absolute workspace root path in StdlibConfig with panic on invalid input. - Return error if workspace root not configured when creating command temp files. - Replace unsafe slice indexing with safe slice usage in command file writes. - Introduce macro in stdlib_steps for DRY rendering with world string fields, simplifying template rendering functions. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> * feat(manifest): add workspace root path handling and related tests - Handle resolution of relative and absolute workspace root paths with UTF-8 validation - Expose workspace_root_path() getter in StdlibConfig - Add tests for resolving workspace roots and rejecting non-UTF-8 paths - Add test ensuring command tempdir requires workspace root path - Add manifest subcommand test for relative manifest paths These changes improve manifest workspace handling by enforcing UTF-8 constraints and providing better API visibility and test coverage. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> * refactor(manifest): extract absolute workspace root resolution to helper Refactored workspace root resolution logic in stdlib_config_for_manifest into a new function resolve_absolute_workspace_root for clarity and reuse. Updated related tests to combine relative and absolute workspace root checks into a single parameterized test for better coverage and reduced duplication. Additionally, introduced assert_output_limit_error helper in stdlib/command tests to consolidate output limit error assertions in multiple test cases. This improves code maintainability and test clarity. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> --------- Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> * Enforce workspace-scoped tempfile creation with UTF-8 paths (#235) * refactor(command): replace tempfile crate with cap-std for command temp files Reworked command temporary file management to use cap-std crate instead of tempfile. Implemented a custom unique tempfile naming strategy with atomic counter. Improved tempfile lifecycle handling and removal on drop when not persisted. Standardized sanitation of tempfile labels for safer filenames. Added robust error handling for tempfile creation attempts. Smoothed tempfile usage in pipe reading and empty tempfile creation. Changed workspace root path validation message for clarity. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> * refactor(command): replace manual tempfile creation with tempfile crate Replaced the custom logic for creating temporary files within the command module with the tempfile crate's Builder API. This simplifies tempfile creation, removes counters and manual uniqueness handling, and improves error handling. Dropped the custom CommandTempFile's complex management for a simpler wrapper around NamedTempFile, ensuring proper cleanup and persistence semantics. Added tests for label sanitization and tempfile lifecycle behavior. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> * style(stdlib/command): simplify flush call by chaining method calls Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> --------- Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com> --------- Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Summary
Changes
Core changes
API and runtime behavior
Error handling
Cleanup
Why
Tests plan
Notes
🌿 Generated by Terry
ℹ️ Tag @terragon-labs to ask questions and address PR feedback
📎 Task: https://www.terragonlabs.com/task/e545d8c3-7fc7-40ae-88a1-e684d4d7991d
Summary by Sourcery
Scope all fallback temporary file creation within the configured workspace root, enforcing unique UTF-8-safe file names and deterministic cleanup.
New Features:
Enhancements:
Chores: