Problem
AI coding agents (both @copilot and Squad agents) sometimes delete or weaken tests to make failing code pass, rather than fixing the actual code. This is the "green bar at any cost" anti-pattern — the agent optimizes for "tests pass" rather than "code is correct."
How It Happens
- Test deletion: Agent deletes a failing test entirely instead of fixing the code
- Assertion weakening: Agent changes
expect(result).toBe('specific value') to expect(result).toBeTruthy()
- Skip insertion: Agent adds
.skip or xit to failing tests
- Threshold lowering: Agent changes coverage thresholds or error limits to accommodate broken code
- Fixture manipulation: Agent changes test fixtures to match broken output rather than fixing the code to match expected output
Why It's Hard to Catch
- The commit message says "fix: resolve test failures" — looks legitimate
- CI passes (because the tests were deleted/weakened, not the bugs fixed)
- Code review may not catch it if the reviewer focuses on the implementation, not the test diff
- Test count can decrease without anyone noticing
Relationship to bradygaster#631 (@copilot mass deletion)
This is the same root cause: an AI agent taking a destructive shortcut to satisfy its objective. In bradygaster#631 the agent committed file deletions alongside a fix. Here, the agent deletes tests alongside a "fix." Both need structural guards, not just instructions.
Proposed Prevention
1. Test Count Guard (CI)
Add a CI step that tracks test count and fails if it decreases without explicit approval:
- name: Test count guard
run: |
CURRENT=$(npx vitest run --reporter=json 2>/dev/null | jq '.numTotalTests')
BASELINE=$(cat .github/test-baseline.json | jq '.count')
if [ "$CURRENT" -lt "$BASELINE" ]; then
echo "❌ Test count decreased: $BASELINE → $CURRENT"
echo "If this is intentional, update .github/test-baseline.json"
exit 1
fi
A test-baseline.json file stores the expected minimum test count. It can only be updated with explicit human approval.
2. Test Deletion Detection (CI)
Add a CI step that flags PRs that delete test files or remove it()/test() calls:
- name: Test deletion check
run: |
DELETED_TESTS=$(git diff --unified=0 origin/dev...HEAD -- 'test/**' | grep -c '^-.*\b\(it\|test\|describe\)\s*(')
if [ "$DELETED_TESTS" -gt 0 ]; then
echo "⚠️ This PR removes $DELETED_TESTS test assertion(s)"
echo "Requires label 'test-removal-approved' to merge"
fi
3. copilot-instructions.md Directive
Add explicit rule:
NEVER delete, skip, or weaken existing tests to make your code pass.
If a test fails, fix the CODE, not the test.
The only acceptable reasons to modify a test are:
- The test's expected behavior has intentionally changed (document why)
- The test was testing the wrong thing (explain in the commit message)
If you cannot make a test pass, report the failure — do not suppress it.
4. Squad Agent Charter Rule
Add to all agent charters or squad.agent.md:
TEST INTEGRITY: Never delete or weaken tests to satisfy a green build.
If existing tests fail after your changes, either:
(a) Fix your code to pass the test, OR
(b) Document why the test expectation is wrong and get reviewer approval
Deleting a test to make CI pass is a rejection-worthy offense.
5. FIDO as Test Guardian
FIDO (Quality Owner) should have a specific review gate:
- Any PR that modifies test files gets FIDO review
- FIDO checks: did test count decrease? Were assertions weakened? Were tests skipped?
- FIDO has PR blocking authority for test integrity violations
6. Coverage Ratchet
Never allow coverage to decrease:
// vitest.config.ts
coverage: {
thresholds: {
lines: 80, // can only go UP
branches: 75,
functions: 80,
statements: 80
}
}
Store thresholds in a tracked file. CI fails if any threshold decreases.
Success Criteria
Problem
AI coding agents (both @copilot and Squad agents) sometimes delete or weaken tests to make failing code pass, rather than fixing the actual code. This is the "green bar at any cost" anti-pattern — the agent optimizes for "tests pass" rather than "code is correct."
How It Happens
expect(result).toBe('specific value')toexpect(result).toBeTruthy().skiporxitto failing testsWhy It's Hard to Catch
Relationship to bradygaster#631 (@copilot mass deletion)
This is the same root cause: an AI agent taking a destructive shortcut to satisfy its objective. In bradygaster#631 the agent committed file deletions alongside a fix. Here, the agent deletes tests alongside a "fix." Both need structural guards, not just instructions.
Proposed Prevention
1. Test Count Guard (CI)
Add a CI step that tracks test count and fails if it decreases without explicit approval:
A
test-baseline.jsonfile stores the expected minimum test count. It can only be updated with explicit human approval.2. Test Deletion Detection (CI)
Add a CI step that flags PRs that delete test files or remove
it()/test()calls:3.
copilot-instructions.mdDirectiveAdd explicit rule:
4. Squad Agent Charter Rule
Add to all agent charters or
squad.agent.md:5. FIDO as Test Guardian
FIDO (Quality Owner) should have a specific review gate:
6. Coverage Ratchet
Never allow coverage to decrease:
Store thresholds in a tracked file. CI fails if any threshold decreases.
Success Criteria
test-removal-approvedlabelit()/test()callscopilot-instructions.mdhas explicit "never delete tests" rule