Skip to content

Add worldforge runs prune retention policy#297

Open
adrienlacombe wants to merge 2 commits into
AbdelStark:mainfrom
adrienlacombe:runs-prune-retention
Open

Add worldforge runs prune retention policy#297
adrienlacombe wants to merge 2 commits into
AbdelStark:mainfrom
adrienlacombe:runs-prune-retention

Conversation

@adrienlacombe
Copy link
Copy Markdown
Contributor

Summary

  • Adds worldforge.runs_prune and worldforge runs prune. Implements WF-FEAT3-003 (WF-FEAT3-003: Add run artifact retention policy with worldforge runs prune #282).
  • New surface: RunsRetentionPolicy, PruneCandidate, PruneReport, plan_prune, apply_prune, parse_runs_retention, RUNS_PRUNE_SCHEMA_VERSION (all re-exported on the top-level package).
  • Default behavior is dry-run; --apply actually deletes. --max-age-days (default 30), --keep-latest (default 10), repeatable --family <kind> filter, --retention-profile <path> reads runs_retention from a non-secret config profile (CLI flags override).
  • Safety rails: 24-hour fresh-run safety window blocks deletion unless --max-age-days=0 is passed explicitly; refuses to operate on paths outside <workspace>/runs/; corrupted directories (missing manifest) only become delete candidates when also outside the --keep-latest window.

Acceptance criteria (from #282)

  • CLI works with documented flags and refuses unsafe paths
  • Config-profile schema includes runs_retention parsed via parse_runs_retention
  • Dry-run is the default; --apply is required to delete
  • Tests cover happy path, family filter, refusal, config-profile integration (18 tests)
  • Docs explain when to use which flag (docs/src/run-index.md retention section now contrasts runs cleanup vs runs prune)

Validation

  • uv run pytest tests/test_runs_prune.py tests/test_run_index.py tests/test_provider_config.py — 53 passed
  • uv run pytest — 1262 passed, 85 skipped
  • uv lock --check, uv run ruff check, uv run ruff format --check, uv run mkdocs build --strict — clean
  • bash scripts/test_package.sh — 1261 passed
  • Sample CLI smoke:
    • worldforge runs prune --workspace-dir .worldforge --format markdown → dry-run report, no deletion
    • With --apply and --max-age-days=0 --keep-latest=0 on a tmp workspace, the older run directory is removed and the newer kept-latest survives

Out of scope (per #282)

  • Pruning evidence bundles or release evidence (separate retention concerns)
  • Cloud / external storage retention
  • Background daemons

Companion to

🤖 Generated with Claude Code

Copy link
Copy Markdown
Collaborator

@omarespejel omarespejel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mirrored this into my fork for extra CodeRabbit/Qodo review. I think this is close, but I would fix a few retention-path issues before merging because this command deletes local artifacts.

  1. Retention profile precedence misses --flag=value forms.

The CLI checks exact strings in sys.argv, so valid argparse forms like --max-age-days=7 or --keep-latest=2 are treated as absent and can be overwritten by profile values. Since the docs say CLI flags override the profile, this can change deletion behavior.

I would use argparse defaults like None or SUPPRESS, or a helper that detects both --flag value and --flag=value, plus regression tests for both forms.

  1. The prune path safety check is too loose.

_ensure_inside_runs() only checks whether "runs" appears somewhere in the resolved path. A symlink or crafted report path under another runs/ directory could pass even if it is outside the intended workspace. Since apply_prune() eventually calls shutil.rmtree(), I think this should verify the resolved deletion target is under the exact resolved <workspace>/runs root before deleting.

Something like target.resolve().is_relative_to((workspace / "runs").resolve()) is the important property to enforce, with a symlink/crafted-report regression test.

  1. keep_latest + --family semantics are ambiguous and may surprise users.

Right now keep_latest is computed before applying families. If --family benchmark is used, a newer non-benchmark run can consume the keep slot and the newest benchmark run can become deletable. I think keep_latest should be computed inside the filtered family set, or the docs should explicitly say it is global across all run kinds.

Smaller follow-up: invalid retention profile JSON shapes should fail as WorldForgeError, not leak AttributeError/ValueError, and shutil.rmtree() failures should be wrapped so CLI users get a stable error instead of a traceback.

adrienlacombe pushed a commit to adrienlacombe/worldforge that referenced this pull request May 18, 2026
Address review feedback on AbdelStark#297:

- Replace the lenient substring check in `_ensure_inside_runs` with a
  resolved-path `is_relative_to(runs_root)` comparison so crafted report
  paths and symlinks pointing outside `<workspace>/runs/` are rejected
  before `shutil.rmtree`.
- Wrap `shutil.rmtree` failures in `WorldForgeError` so CLI callers see
  the stable error envelope rather than a traceback.
- Scope `keep_latest` to the family filter: only entries that match the
  current `--family` selection (or all entries when no filter is set)
  populate the keep window, so a newer non-matching run can no longer
  consume the slot reserved for the filtered family.
- Switch the `--max-age-days` and `--keep-latest` argparse defaults to
  `None` and decide profile precedence from the parsed value rather than
  from a substring search on `sys.argv`. Profile values now correctly
  yield to either the `--flag value` or `--flag=value` form.
- Validate retention profile shapes in `parse_runs_retention`: non-int
  `max_age_days` / `keep_latest` (including `bool`) and non-string
  family entries raise `WorldForgeError` instead of bubbling up
  `AttributeError` / `ValueError`.
- Document the new keep_latest-within-family rule, the
  `--flag=value` precedence guarantee, and the resolved-path delete
  check in `docs/src/run-index.md` and the changelog.
@adrienlacombe
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @omarespejel. All three blocking concerns plus the two smaller follow-ups landed in 826707f:

  1. Tight path check: _ensure_inside_runs now compares resolved paths with target.is_relative_to(runs_root). Two new regressions cover (a) a crafted path like <tmp>/fake_runs/runs/victim and (b) a symlink under <workspace>/runs/ pointing outside the tree — both are refused, and the symlink target is left untouched on disk.
  2. --flag=value precedence: switched the --max-age-days and --keep-latest argparse defaults to None and decide precedence on the parsed value rather than via a substring lookup on sys.argv. Added a regression that mixes --retention-profile <p> with --max-age-days=999 and asserts the CLI value wins.
  3. keep_latest + --family semantics: keep_latest is now computed inside the filtered family set so a newer non-matching run cannot consume the keep slot. Documented the new rule under docs/src/run-index.md and added a regression that proves the newest benchmark is kept while neighbouring evals are skipped by family.

Follow-ups also addressed:

  • parse_runs_retention now raises WorldForgeError on non-int / non-string field shapes (including bool) instead of bubbling AttributeError or ValueError. Four new tests cover that.
  • shutil.rmtree failures during --apply are wrapped as WorldForgeError for a stable CLI envelope. Regression uses monkeypatch to inject an OSError and asserts the wrapped message.

claude added 2 commits May 18, 2026 08:00
The new worldforge.runs_prune module ships RunsRetentionPolicy,
PruneCandidate, PruneReport, plan_prune, apply_prune, and
parse_runs_retention. `worldforge runs prune` walks
`<workspace>/runs/` read-only by default and reports which run
workspaces fall outside the requested retention; --apply actually
removes them.

Policy fields: --max-age-days (default 30), --keep-latest (default
10), and repeatable --family <kind> (filter to manifest kinds like
eval, benchmark, flow). A 24-hour safety window blocks deletion of
fresh runs unless --max-age-days=0 is passed explicitly. The command
refuses to operate on paths outside <workspace>/runs/.

A non-secret config profile can carry a runs_retention block (read via
--retention-profile <path>); explicit CLI flags still override profile
values.

Run-index docs gained a retention/cleanup section comparing `runs
cleanup` (blanket keep-N trim) against `runs prune` (typed policy with
dry-run default). Artifact schemas table now lists
RUNS_PRUNE_SCHEMA_VERSION.

Closes AbdelStark#282 (WF-FEAT3-003).
Address review feedback on AbdelStark#297:

- Replace the lenient substring check in `_ensure_inside_runs` with a
  resolved-path `is_relative_to(runs_root)` comparison so crafted report
  paths and symlinks pointing outside `<workspace>/runs/` are rejected
  before `shutil.rmtree`.
- Wrap `shutil.rmtree` failures in `WorldForgeError` so CLI callers see
  the stable error envelope rather than a traceback.
- Scope `keep_latest` to the family filter: only entries that match the
  current `--family` selection (or all entries when no filter is set)
  populate the keep window, so a newer non-matching run can no longer
  consume the slot reserved for the filtered family.
- Switch the `--max-age-days` and `--keep-latest` argparse defaults to
  `None` and decide profile precedence from the parsed value rather than
  from a substring search on `sys.argv`. Profile values now correctly
  yield to either the `--flag value` or `--flag=value` form.
- Validate retention profile shapes in `parse_runs_retention`: non-int
  `max_age_days` / `keep_latest` (including `bool`) and non-string
  family entries raise `WorldForgeError` instead of bubbling up
  `AttributeError` / `ValueError`.
- Document the new keep_latest-within-family rule, the
  `--flag=value` precedence guarantee, and the resolved-path delete
  check in `docs/src/run-index.md` and the changelog.
@adrienlacombe adrienlacombe force-pushed the runs-prune-retention branch from 826707f to 6d6d8d6 Compare May 18, 2026 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants