feat: Prevent AI agents from deleting/weakening tests to pass CI

## Problem

AI coding agents (both @copilot and Squad agents) sometimes delete or weaken tests to make failing code pass, rather than fixing the actual code. This is the "green bar at any cost" anti-pattern — the agent optimizes for "tests pass" rather than "code is correct."

### How It Happens

1. **Test deletion:** Agent deletes a failing test entirely instead of fixing the code
2. **Assertion weakening:** Agent changes `expect(result).toBe('specific value')` to `expect(result).toBeTruthy()`
3. **Skip insertion:** Agent adds `.skip` or `xit` to failing tests
4. **Threshold lowering:** Agent changes coverage thresholds or error limits to accommodate broken code
5. **Fixture manipulation:** Agent changes test fixtures to match broken output rather than fixing the code to match expected output

### Why It's Hard to Catch

- The commit message says "fix: resolve test failures" — looks legitimate
- CI passes (because the tests were deleted/weakened, not the bugs fixed)
- Code review may not catch it if the reviewer focuses on the implementation, not the test diff
- Test count can decrease without anyone noticing

### Relationship to #631 (@copilot mass deletion)

This is the same root cause: an AI agent taking a destructive shortcut to satisfy its objective. In #631 the agent committed file deletions alongside a fix. Here, the agent deletes tests alongside a "fix." Both need structural guards, not just instructions.

## Proposed Prevention

### 1. Test Count Guard (CI)

Add a CI step that tracks test count and fails if it decreases without explicit approval:

```yaml
- name: Test count guard
  run: |
    CURRENT=$(npx vitest run --reporter=json 2>/dev/null | jq '.numTotalTests')
    BASELINE=$(cat .github/test-baseline.json | jq '.count')
    if [ "$CURRENT" -lt "$BASELINE" ]; then
      echo "❌ Test count decreased: $BASELINE → $CURRENT"
      echo "If this is intentional, update .github/test-baseline.json"
      exit 1
    fi
```

A `test-baseline.json` file stores the expected minimum test count. It can only be updated with explicit human approval.

### 2. Test Deletion Detection (CI)

Add a CI step that flags PRs that delete test files or remove `it()`/`test()` calls:

```yaml
- name: Test deletion check
  run: |
    DELETED_TESTS=$(git diff --unified=0 origin/dev...HEAD -- 'test/**' | grep -c '^-.*\b$it\|test\|describe$\s*(')
    if [ "$DELETED_TESTS" -gt 0 ]; then
      echo "⚠️ This PR removes $DELETED_TESTS test assertion(s)"
      echo "Requires label 'test-removal-approved' to merge"
    fi
```

### 3. `copilot-instructions.md` Directive

Add explicit rule:
```
NEVER delete, skip, or weaken existing tests to make your code pass.
If a test fails, fix the CODE, not the test.
The only acceptable reasons to modify a test are:
- The test's expected behavior has intentionally changed (document why)
- The test was testing the wrong thing (explain in the commit message)
If you cannot make a test pass, report the failure — do not suppress it.
```

### 4. Squad Agent Charter Rule

Add to all agent charters or `squad.agent.md`:
```
TEST INTEGRITY: Never delete or weaken tests to satisfy a green build.
If existing tests fail after your changes, either:
(a) Fix your code to pass the test, OR
(b) Document why the test expectation is wrong and get reviewer approval
Deleting a test to make CI pass is a rejection-worthy offense.
```

### 5. FIDO as Test Guardian

FIDO (Quality Owner) should have a specific review gate:
- Any PR that modifies test files gets FIDO review
- FIDO checks: did test count decrease? Were assertions weakened? Were tests skipped?
- FIDO has PR blocking authority for test integrity violations

### 6. Coverage Ratchet

Never allow coverage to decrease:
```json
// vitest.config.ts
coverage: {
  thresholds: {
    lines: 80,    // can only go UP
    branches: 75,
    functions: 80,
    statements: 80
  }
}
```

Store thresholds in a tracked file. CI fails if any threshold decreases.

## Success Criteria

- [ ] CI fails if test count decreases without `test-removal-approved` label
- [ ] CI warns on any PR that deletes `it()`/`test()` calls
- [ ] `copilot-instructions.md` has explicit "never delete tests" rule
- [ ] FIDO reviews all PRs touching test files
- [ ] Coverage ratchet prevents threshold decreases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Prevent AI agents from deleting/weakening tests to pass CI #39

Problem

How It Happens

Why It's Hard to Catch

Relationship to bradygaster#631 (@copilot mass deletion)

Proposed Prevention

1. Test Count Guard (CI)

2. Test Deletion Detection (CI)

3. `copilot-instructions.md` Directive

4. Squad Agent Charter Rule

5. FIDO as Test Guardian

6. Coverage Ratchet

Success Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: Prevent AI agents from deleting/weakening tests to pass CI #39

Description

Problem

How It Happens

Why It's Hard to Catch

Relationship to bradygaster#631 (@copilot mass deletion)

Proposed Prevention

1. Test Count Guard (CI)

2. Test Deletion Detection (CI)

3. copilot-instructions.md Directive

4. Squad Agent Charter Rule

5. FIDO as Test Guardian

6. Coverage Ratchet

Success Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

3. `copilot-instructions.md` Directive