Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions docs/netsuke-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -1045,6 +1045,66 @@ Implementation details:
shared `StdlibState` that flips an `impure` flag whenever these helpers
execute so callers can detect templates that interacted with the outside
world.
- `shell` and `grep` enforce a configurable stdout capture limit (default
1 MiB) via `StdlibConfig::with_command_max_output_bytes`. Exceeding the limit
raises an error that quotes the configured budget so manifests can adjust.
Templates can request streaming by passing `{'mode': 'tempfile'}` as the
second filter argument. Streaming writes stdout to a temporary file guarded
by `StdlibConfig::with_command_max_stream_bytes`, which defaults to 64 MiB to
prevent runaway disk usage while still tolerating deliberate large outputs.
- The command helpers manage pipe budgets using a `PipeSpec`/`PipeLimit`
tracker. Each pipe spawns a dedicated reader thread that records how many
bytes were drained and aborts once the configured limit is exceeded,
surfacing an `OutputLimit` diagnostic that names the stream and mode. When
streaming is requested the reader persists data to a temporary file, keeping
the limit in place so exceptionally large outputs are rejected before the
filesystem fills up. The `StdlibConfig::into_components` helper consumes the
builder and hands owned network/command configurations to the registration
routines, avoiding needless cloning of the capability handles.

```mermaid
classDiagram
class read_pipe {
+read_pipe<R>(reader: R, spec: PipeSpec): Result<PipeOutcome, CommandFailure>
}
class read_pipe_capture {
+read_pipe_capture<R>(reader: R, limit: PipeLimit): Result<PipeOutcome, CommandFailure>
}
class read_pipe_tempfile {
+read_pipe_tempfile<R>(reader: R, limit: PipeLimit): Result<PipeOutcome, CommandFailure>
}
read_pipe --> read_pipe_capture : calls
read_pipe --> read_pipe_tempfile : calls
class PipeSpec {
+into_limit(): PipeLimit
+mode(): OutputMode
}
class PipeOutcome {
<<enum>>
Bytes(Vec<u8>)
Tempfile(Utf8PathBuf)
}
class CommandFailure {
<<enum>>
Io
StreamPathNotUtf8
}
class PipeLimit {
+record(read: usize): Result<(), CommandFailure>
}
class OutputMode {
<<enum>>
Capture
Tempfile
}
read_pipe ..> PipeSpec : uses
read_pipe_capture ..> PipeLimit : uses
read_pipe_tempfile ..> PipeLimit : uses
read_pipe_capture ..> PipeOutcome : returns
read_pipe_tempfile ..> PipeOutcome : returns
read_pipe_capture ..> CommandFailure : error
read_pipe_tempfile ..> CommandFailure : error
```

Custom external commands can be registered as additional filters. Those should
be marked `pure` if safe for caching or `impure` otherwise.
Expand Down
17 changes: 13 additions & 4 deletions docs/security-network-command-audit.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,21 @@ introduces, and concrete remediation tasks that would harden the helpers.
- Provide an allowlist-based command runner (e.g. declarative mapping of
helper names to binaries) so manifests can reference vetted utilities
without shell access.
- [ ] **Helpers buffer stdout/stderr without limits.** Both filters capture the
entire command output into memory before returning it. A command that writes
an unbounded stream will lead to memory exhaustion or at least prolonged
blocking. *Remediation tasks:*
- [x] **Helpers buffer stdout/stderr without limits.** *(Status: done.)* Both
filters capture the entire command output into memory before returning it. A
command that writes an unbounded stream will lead to memory exhaustion or at
least prolonged blocking. *Remediation tasks:*
- Enforce maximum output sizes with clear errors when exceeded.
- Stream results to temporary files when callers opt in to large outputs.
- **Remediation:** `StdlibConfig` now exposes
`with_command_max_output_bytes` and `with_command_max_stream_bytes` so
integrators can tailor command limits. The `shell` and `grep` filters
honour those limits, reporting the configured byte budget when commands
exceed it. Templates can request streaming by passing
`{'mode': 'tempfile'}` as an options argument, which spools stdout to a
temporary file guarded by the streaming limit. Pipe readers enforce their
budgets incrementally so long-running commands fail fast once the
configured allowance is exceeded.

## Next steps

Expand Down
22 changes: 15 additions & 7 deletions docs/users-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,8 +249,8 @@ dynamic capabilities to your manifest.
- Expressions: `{{ 1 + 1 }}`, `{{ sources | map('basename') }}`

- Control Structures (within specific keys like `foreach`, `when`, or inside
`macros`): `{% if enable %}…{% endif %}`, `{% for item in list %}…{%
endfor %}`
`macros`): `{% if enable %}…{% endif %}`, `{% for item in list %}…{% endfor
%}`

**Important:** Structural Jinja (`{% %}`) is generally **not** allowed directly
within the YAML structure outside of `macros`. Logic should primarily be within
Expand Down Expand Up @@ -337,8 +337,8 @@ templates.
Enforces a configurable maximum response size (default 8 MiB); requests abort
with an error quoting the configured threshold when the limit is exceeded.
Cached downloads stream directly to disk and remove partial files on error.
Configure the limit with `StdlibConfig::with_fetch_max_response_bytes`.
Marks template as impure.
Configure the limit with `StdlibConfig::with_fetch_max_response_bytes`. Marks
template as impure.

- `now(offset=None)`: Returns the current time as a timezone-aware object
(defaults to UTC). `offset` can be '+HH:MM' or 'Z'. Exposes `.iso8601`,
Expand Down Expand Up @@ -408,12 +408,20 @@ Apply filters using the pipe `|` operator: `{{ value | filter_name(args...) }}`

- `shell(command_string)`: Pipes the input value (string or bytes) as stdin
to `command_string` executed via the system shell (`sh -c` or `cmd /C`).
Returns stdout. **Marks the template as impure.** Example: `{{ user_list |
shell('grep admin') }}`
Returns stdout. **Marks the template as impure.** Example:
`{{ user_list | shell('grep admin') }}`. The captured stdout is limited to 1
MiB by default; configure a different budget with
`StdlibConfig::with_command_max_output_bytes`. Exceeding the limit raises an
`InvalidOperation` error that quotes the configured threshold. Templates can
pass an options mapping such as `{'mode': 'tempfile'}` to stream stdout into
a temporary file instead. The file path is returned to the template and
remains bounded by `StdlibConfig::with_command_max_stream_bytes` (default 64
MiB).

- `grep(pattern, flags=None)`: Filters input lines matching `pattern`.
`flags` can be a string (e.g., `'-i'`) or list of strings. Implemented via
`shell`. Marks template as impure.
`shell`. Marks template as impure. The same output and streaming limits apply
when `grep` emits large result sets.

**Impurity:** Filters like `shell` and functions like `fetch` interact with the
outside world. Netsuke tracks this "impurity". Impure templates might affect
Expand Down
39 changes: 31 additions & 8 deletions src/manifest/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ use crate::{
stdlib::{NetworkPolicy, StdlibConfig},
};
use anyhow::{Context, Result, anyhow};
use camino::Utf8Path;
use camino::{Utf8Path, Utf8PathBuf};
use cap_std::{ambient_authority, fs_utf8::Dir};
use minijinja::{Environment, Error, ErrorKind, UndefinedBehavior, value::Value};
use serde::de::Error as _;
use std::{fs, path::Path};
use std::{env, fs, path::Path};

mod diagnostics;
mod expand;
Expand Down Expand Up @@ -191,6 +191,23 @@ pub fn from_path_with_policy(
#[cfg(test)]
mod tests;

/// Resolve a potentially relative manifest parent path to an absolute UTF-8 workspace root.
fn resolve_absolute_workspace_root(utf8_parent: &Utf8Path) -> Result<Utf8PathBuf> {
let workspace_base = if utf8_parent.is_absolute() {
utf8_parent.to_path_buf().into_std_path_buf()
} else {
env::current_dir()
.context("resolve current directory for manifest workspace root")?
.join(utf8_parent.as_std_path())
};
Utf8PathBuf::from_path_buf(workspace_base).map_err(|invalid| {
anyhow!(
"workspace root '{}' contains non-UTF-8 components",
invalid.display()
)
})
}

fn stdlib_config_for_manifest(path: &Path, policy: NetworkPolicy) -> Result<StdlibConfig> {
let parent = match path.parent() {
Some(parent) if !parent.as_os_str().is_empty() => parent,
Expand All @@ -207,10 +224,16 @@ fn stdlib_config_for_manifest(path: &Path, policy: NetworkPolicy) -> Result<Stdl
path.display()
)
})?;
let dir = Dir::open_ambient_dir(utf8_parent, ambient_authority()).with_context(|| {
format!(
"failed to open workspace directory '{utf8_parent}' for manifest '{manifest_label}'"
)
})?;
Ok(StdlibConfig::new(dir).with_network_policy(policy))
let workspace_root = resolve_absolute_workspace_root(utf8_parent)?;
let dir = Dir::open_ambient_dir(workspace_root.as_path(), ambient_authority()).with_context(
|| {
format!(
"failed to open workspace directory '{workspace_root}' \
for manifest '{manifest_label}'"
)
},
)?;
Ok(StdlibConfig::new(dir)
.with_workspace_root_path(workspace_root)
.with_network_policy(policy))
}
50 changes: 50 additions & 0 deletions src/manifest/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ use minijinja::value::{Kwargs, Value};
use minijinja::{Environment, UndefinedBehavior};
use rstest::{fixture, rstest};
use std::fs;
use std::path::Path;
use tempfile::tempdir;
use test_support::{EnvVarGuard, env_lock::EnvLock, hash, http};
use url::Url;
Expand Down Expand Up @@ -294,6 +295,55 @@ fn register_manifest_macros_supports_multiple(
Ok(())
}

#[rstest]
#[case(true)]
#[case(false)]
fn stdlib_config_for_manifest_resolves_workspace_root(#[case] use_relative: bool) -> AnyResult<()> {
let temp = tempdir().context("create temp workspace")?;
let _guard = if use_relative {
Some(CurrentDirGuard::change_to(temp.path())?)
} else {
None
};
let manifest_path = if use_relative {
Path::new("Netsukefile").to_path_buf()
} else {
temp.path().join("Netsukefile")
};
let config = stdlib_config_for_manifest(&manifest_path, NetworkPolicy::default())?;
let recorded = config
.workspace_root_path()
.context("workspace root path should be recorded")?;
let expected = camino::Utf8Path::from_path(temp.path())
.context("temp workspace path should be valid UTF-8")?;
ensure!(
recorded == expected,
"expected workspace root {expected}, got {recorded}"
);
Ok(())
}

#[cfg(unix)]
#[rstest]
fn stdlib_config_for_manifest_rejects_non_utf_workspace_root() -> AnyResult<()> {
use std::ffi::OsString;
use std::os::unix::ffi::OsStringExt;

let temp = tempdir().context("create temp workspace")?;
let invalid_component = OsString::from_vec(vec![0xFF]); // invalid standalone byte
let manifest_dir = temp.path().join(&invalid_component);
fs::create_dir_all(&manifest_dir)
.context("create manifest directory with invalid UTF-8 component")?;
let manifest_path = manifest_dir.join("manifest.yml");
let err = stdlib_config_for_manifest(&manifest_path, NetworkPolicy::default())
.expect_err("config should fail when workspace root contains non-UTF-8 components");
ensure!(
err.to_string().contains("contains non-UTF-8 components"),
"error should mention non-UTF-8 components but was {err}"
);
Ok(())
}

#[rstest]
fn from_path_uses_manifest_directory_for_caches() -> AnyResult<()> {
let temp = tempdir()?;
Expand Down
Loading
Loading