Process gap — silent test coverage erosion.
The problem
scripts/verify_test_fidelity.py reports every TS it("...") that has no matching Python test_*():
```
MISSING: [Options Load] should call onOptionsLoad handler for a matching action ID
MISSING: [Options Load] should prefer specific handlers before catch-all handlers
...
TOTAL: 548/588 matched (93%), 40 missing, 3 absorbers
```
But it exits 0 regardless of how many are missing. CI prints "Run with --fix to generate stubs for missing tests" and moves on. New upstream tests land unported, undetected, for releases at a time.
Why it's bad
This is how we ended up with:
Every one of those shows up in fidelity output. Every one got ignored because the script exits 0.
Fix
Option A — strict: exit non-zero when `missing > 0`. Simple, forces discipline.
Option B — graduated: exit non-zero when `missing > K` (where K is the current count, i.e. treat today's gap as the baseline and ratchet down). Allows incremental adoption without a big-bang PR.
Option C — per-file threshold: allow configured gaps per file in a `fidelity-baseline.json` (like pyrefly's). New misses must be explicitly acknowledged.
I'd recommend Option A for post-4.26 PRs + a separate catch-up PR that either ports the 40 missing tests or explicitly marks them as "intentionally skipped" with a reason.
Acceptance
Process gap — silent test coverage erosion.
The problem
scripts/verify_test_fidelity.pyreports every TSit("...")that has no matching Pythontest_*():```
MISSING: [Options Load] should call onOptionsLoad handler for a matching action ID
MISSING: [Options Load] should prefer specific handlers before catch-all handlers
...
TOTAL: 548/588 matched (93%), 40 missing, 3 absorbers
```
But it exits 0 regardless of how many are missing. CI prints "Run with --fix to generate stubs for missing tests" and moves on. New upstream tests land unported, undetected, for releases at a time.
Why it's bad
This is how we ended up with:
onOptionsLoadhandler for dynamic select dropdown population #50)rehydrate_attachmentto Adapter protocol +_rehydrate_message#52)Every one of those shows up in fidelity output. Every one got ignored because the script exits 0.
Fix
Option A — strict: exit non-zero when `missing > 0`. Simple, forces discipline.
Option B — graduated: exit non-zero when `missing > K` (where K is the current count, i.e. treat today's gap as the baseline and ratchet down). Allows incremental adoption without a big-bang PR.
Option C — per-file threshold: allow configured gaps per file in a `fidelity-baseline.json` (like pyrefly's). New misses must be explicitly acknowledged.
I'd recommend Option A for post-4.26 PRs + a separate catch-up PR that either ports the 40 missing tests or explicitly marks them as "intentionally skipped" with a reason.
Acceptance