fix(scan): stop attributing host-wide findings to per-target scans by jonathansantilli · Pull Request #53 · jonathansantilli/codegate

jonathansantilli · 2026-04-22T12:42:01Z

Summary

Targeted scan <target> reports were surfacing two classes of finding whose file_path lives outside the scan target. When the Mobb Agent Security dashboard groups scans by target, those host-wide findings make every skill look suspicious even though only one is. This PR fixes both leaks and adds regression tests.

Bug 1 — L2 `rule-file-hidden-unicode` (and siblings) scan other skills

Root cause. User-scope wildcard patterns in the KB (e.g. ~/.agents/skills/*/SKILL.md) walk the entire home directory. When the scan target is itself inside $HOME (for example a single skill at ~/.codex/skills/<name>), those wildcards match sibling skills that belong to completely different scans. The hidden-unicode (and every other rule-file) finding for a sibling was attributed to the current scan.

Fix. In src/scan.ts, gate user-scope candidate acceptance with a new shouldKeepUserScopeCandidate helper:

When the scan target is inside the home directory, only keep user-scope matches that are also inside the scan target.
When the scan target is outside the home directory (regular project root scan), keep the existing host-wide behavior — user-scope configs like ~/.cursor/mcp.json remain legitimate context.

Bug 2 — L3 `layer3-network_error` for every host-configured MCP endpoint

Root cause. layer3OutcomesToFindings emitted a LOW layer3-network_error (category PARSE_ERROR) finding whose file_path was the remote URL — e.g. https://mcp.linear.app/mcp — whenever a successful deep-scan outcome carried no metadata.findings[] or metadata.tools[]. The default executeDeepResource in src/cli.ts deliberately makes no outbound calls, so this "schema mismatch" was the norm, not an anomaly. Every scan with deep enabled therefore leaked a host-level finding onto the per-target report.

Fix. Apply the same treatment registry resources already had: silently skip any "ok" outcome whose metadata yields no findings/tools. Genuine fetch-level failures (timeout, auth_failure, command_error, skipped_without_consent) still produce findings.

Tests

New:

tests/layer2/cross-scan-attribution.test.ts — sibling skill inside the same home is no longer attributed to the scanned skill; parent-scope scan still surfaces both skills; project scans outside the home keep user-scope attribution.
tests/layer3/network-error-suppression.test.ts — HTTP/SSE schema mismatches produce no findings while timeout / auth_failure / command_error / skipped_without_consent still do, and actionable tools[] findings are preserved.

Updated (old behavior was encoded as an expectation, same treatment as PR #52):

tests/layer3/layer3-integration.test.ts — the case that asserted the old schema-mismatch finding now asserts its absence.

Totals: 712 → 719 tests (+7), 152 → 154 files (+2).

Test plan

npm run typecheck
npm run lint
npx prettier --check .
npm test (154 files, 719 tests, all passing)
New tests demonstrate the fix (tests/layer2/cross-scan-attribution.test.ts, tests/layer3/network-error-suppression.test.ts)
No changes to CHANGELOG.md or package.json — release tooling handles those

A targeted `scan <target>` currently surfaces two classes of finding whose `file_path` is outside the scan target. When the Mobb Agent Security dashboard groups scans by target, those host-wide findings make every skill look suspicious even though only one is. This fixes both leaks. Layer 2 — hidden-unicode (and every other rule-file detector): when the scan target is itself inside the user's home directory (for example a single skill at `~/.codex/skills/<name>`), user-scope wildcard patterns like `~/.agents/skills/*/SKILL.md` were walking the whole home tree and picking up sibling skills. Add a `shouldKeepUserScopeCandidate` guard that drops user-scope matches outside the scan target in exactly this case; project-root scans outside the home directory keep the existing host-wide context behavior unchanged. Layer 3 — `layer3OutcomesToFindings`: a successful outcome with no `metadata.findings[]` / `metadata.tools[]` was emitting a LOW `layer3-network_error` PARSE_ERROR finding whose `file_path` was the remote MCP URL (e.g. `https://mcp.linear.app/mcp`). The default resource executor deliberately makes no outbound calls, so this "schema mismatch" was the norm, not an anomaly, and the resulting finding polluted every per-target scan report with host-level noise. Treat all resource kinds the same way registry resources were already handled: silently skip when no actionable metadata is present. Tests added: - `tests/layer2/cross-scan-attribution.test.ts` — sibling skill inside the same home is no longer attributed to the scanned skill, but is reported when the scan target is their parent; project scans outside the home still see user-scope files. - `tests/layer3/network-error-suppression.test.ts` — HTTP/SSE schema mismatches produce no findings while timeout / auth_failure / command_error / skipped_without_consent still do. `tests/layer3/layer3-integration.test.ts` is updated to match: the case that asserted the old schema-mismatch finding now asserts its absence.

## [0.14.2](v0.14.1...v0.14.2) (2026-04-22) ### Bug Fixes * **scan:** stop attributing host-wide findings to per-target scans ([#53](#53)) ([77f9627](77f9627))

PR #53 closed the cross-scan leak for skill-dir scans but not for single-file scans of configs like ~/.claude/settings.json. Symptom: $ codegate-ai scan ~/.claude/settings.json → finding with file_path=~/.agents/skills/api-design-guide/.../SKILL.md Root cause: the CLI stages single-file targets into a temp dir outside $HOME. The staged dir is not inside homeDir, so shouldKeepUserScopeCandidate short-circuits to `return true` and every sibling user-scope match (e.g. a hidden-unicode hit in a completely unrelated skill) gets attributed to the config scan. Fix: - cli.ts: when resolvedTarget.explicitCandidates is non-empty (the target was a staged local file), force scan_user_scope=false for that scan. Explicit opt-in via --include-user-scope still overrides. This matches user expectation: "scan this file" ≠ "scan my whole home." - scan.ts: shouldKeepUserScopeCandidate now also handles engine-level file targets correctly (if the target is a file inside homeDir, only the target file itself is a valid user-scope candidate). This is defence in depth for library callers that bypass the CLI. Tests: - Existing 3 cases in tests/layer2/cross-scan-attribution.test.ts still pass. - New: engine-level file-target scan drops sibling user-scope candidates. Verified 154 test files / 720 tests pass. Lint + prettier clean.

jonathansantilli merged commit 77f9627 into main Apr 22, 2026
16 checks passed

jonathansantilli deleted the fix/cross-scan-finding-attribution branch April 22, 2026 12:45

github-actions Bot pushed a commit that referenced this pull request Apr 22, 2026

chore(release): 0.14.2 [skip ci]

98cd395

## [0.14.2](v0.14.1...v0.14.2) (2026-04-22) ### Bug Fixes * **scan:** stop attributing host-wide findings to per-target scans ([#53](#53)) ([77f9627](77f9627))

jonathansantilli mentioned this pull request Apr 22, 2026

fix(scan): stop cross-scan leak on single-file scans of configs #54

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scan): stop attributing host-wide findings to per-target scans#53

fix(scan): stop attributing host-wide findings to per-target scans#53
jonathansantilli merged 1 commit intomainfrom
fix/cross-scan-finding-attribution

jonathansantilli commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jonathansantilli commented Apr 22, 2026

Summary

Bug 1 — L2 rule-file-hidden-unicode (and siblings) scan other skills

Bug 2 — L3 layer3-network_error for every host-configured MCP endpoint

Tests

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bug 1 — L2 `rule-file-hidden-unicode` (and siblings) scan other skills

Bug 2 — L3 `layer3-network_error` for every host-configured MCP endpoint