Skip to content

test(dry_run): remove dead 'logs allowed tools count' assertion#8

Merged
wtthornton merged 1 commit into
mainfrom
fix-dead-bats-test
May 3, 2026
Merged

test(dry_run): remove dead 'logs allowed tools count' assertion#8
wtthornton merged 1 commit into
mainfrom
fix-dead-bats-test

Conversation

@wtthornton
Copy link
Copy Markdown
Owner

Summary

Unblocks the bash unit test suite, which was failing with # bats warning: Executed 1455 instead of expected 1456 tests since the Test Suite workflow was enabled this session.

The dead test in tests/unit/test_log_rotation_dryrun.bats asserted dry_run_simulate emits a "N tools" log from CLAUDE_ALLOWED_TOOLS — but ADR-0006 deleted that code path. The test silently became "not run" because sourcing ralph_loop.sh in the bats env now triggers an early exit in the post-ADR-0006 startup preflight, so bats counted the @test in 1..N but never produced an ok N line.

Test plan

  • bats tests/unit/test_log_rotation_dryrun.bats → 1..35, 35 pass, no warning
  • CI Test Suite green

🤖 Generated with Claude Code

This test asserted that dry_run_simulate emits "N tools" from
CLAUDE_ALLOWED_TOOLS, but ADR-0006 deleted the legacy `-p` mode and the
--allowedTools allowlist months ago — the log line never fires anymore.

Why it surfaced now: when bats sources ralph_loop.sh in the test
environment, the post-ADR-0006 startup preflight calls `exit`, which
kills the bats subprocess silently. Bats had counted the @test in
the file's 1..N plan but never produced an `ok N` line, so the suite
emitted "Executed 35 instead of expected 36 tests" and CI exited 1.
The assertion only became visible once `gh workflow enable "Test Suite"`
turned on the previously inactive workflow.

Local: 35/35 pass on the file, npm run test:unit no longer warns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wtthornton wtthornton merged commit 33056f7 into main May 3, 2026
4 of 5 checks passed
@wtthornton wtthornton deleted the fix-dead-bats-test branch May 3, 2026 18:01
wtthornton added a commit that referenced this pull request May 4, 2026
…ocked

The eval test "FILE PROTECTION: blocks edit to .ralphrc" asserted the
hook returns exit 2 when no .ralphrc exists in the test fixture dir.
But the hook's actual contract (HOOKS-5 in tests/unit/test_hooks.bats)
is: ALLOW creating a new .ralphrc when absent, BLOCK editing once it
exists. The test was asserting the wrong half of the contract — it
went red the moment the eval suite started running end-to-end (post
PR #8 / #9 fixes that unmasked the eval step).

Split into two tests that match the real invariant:
  1. Edit on EXISTING .ralphrc → blocked (touch then test)
  2. Create on ABSENT .ralphrc → allowed (HOOKS-5 already covers this
     for the hook script directly; this is the eval-level mirror)

Local: 69/69 deterministic evals passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wtthornton added a commit that referenced this pull request May 4, 2026
* test+ci: enforce hook+workflow invariants instead of brittle counts

Five tests were exposed once the bats count mismatch was fixed (PR #8).
Three of the five were brittle assertions that locked out legitimate
plugin ecosystems; two were a real workflow gap. Long-term right fix
is to encode the actual invariants and close the gap, not silence
the tests.

Workflow gap fixed:
- .github/workflows/codeql-analysis.yml now pins `defaults.run.shell:
  bash` per the TAP-667 standard. Previously the only hand-authored
  workflow without this. Two tests (#3, #6) were really one root cause.

Test invariants tightened (instead of "exactly N" counts):
- HOOKS-2: `bash <path>` commands must reference EITHER .ralph/hooks/
  OR .claude/hooks/. The original test rejected .claude/hooks/ entries
  and broke as soon as tapps-mcp registered hooks there.
- "all hook commands start with bash": now also accepts the bare
  .claude/hooks/<name>.sh form that tapps-mcp emits when registering
  Linear MCP governance hooks. Catches garbage paths / tool names
  (Write/Edit) without policing plugin command-emission style.
- "PreToolUse has exactly two entries": rewritten to verify Ralph's
  two safety hooks (Bash → validate-command.sh, Edit|Write →
  protect-ralph-files.sh) are present AND wired to the right scripts.
  Plugin-injected entries are allowed; what's protected is removal
  or rewiring of Ralph's own defenses.

Local: 1455/1455 unit tests passing, no warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(integration): replace dead ALLOWED_TOOLS asserts with negative invariant

Two integration tests asserted that setup.sh-generated .ralphrc shipped
with `ALLOWED_TOOLS="..."` containing Bash(npm *)/Bash(pytest), but
ADR-0006 deleted the legacy `-p` mode and the ALLOWED_TOOLS allowlist
along with it — tool surface now lives in .claude/agents/ralph.md
(`tools:` allowlist + `disallowedTools:` blocklist).

Replaced both with a single negative assertion: if a future change
re-introduces `ALLOWED_TOOLS=` to .ralphrc, this test fires so we
don't silently split the tool-surface contract across two files
again. The positive invariant (tool surface defined in agent file)
is already covered by HOOKS-6 in tests/unit/test_hooks.bats.

Local: full integration suite (203 tests) passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(evals): split .ralphrc protection into create-allowed vs edit-blocked

The eval test "FILE PROTECTION: blocks edit to .ralphrc" asserted the
hook returns exit 2 when no .ralphrc exists in the test fixture dir.
But the hook's actual contract (HOOKS-5 in tests/unit/test_hooks.bats)
is: ALLOW creating a new .ralphrc when absent, BLOCK editing once it
exists. The test was asserting the wrong half of the contract — it
went red the moment the eval suite started running end-to-end (post
PR #8 / #9 fixes that unmasked the eval step).

Split into two tests that match the real invariant:
  1. Edit on EXISTING .ralphrc → blocked (touch then test)
  2. Create on ABSENT .ralphrc → allowed (HOOKS-5 already covers this
     for the hook script directly; this is the eval-level mirror)

Local: 69/69 deterministic evals passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant