Skip to content

fix(scan): stop attributing host-wide findings to per-target scans#53

Merged
jonathansantilli merged 1 commit intomainfrom
fix/cross-scan-finding-attribution
Apr 22, 2026
Merged

fix(scan): stop attributing host-wide findings to per-target scans#53
jonathansantilli merged 1 commit intomainfrom
fix/cross-scan-finding-attribution

Conversation

@jonathansantilli
Copy link
Copy Markdown
Owner

Summary

Targeted scan <target> reports were surfacing two classes of finding whose file_path lives outside the scan target. When the Mobb Agent Security dashboard groups scans by target, those host-wide findings make every skill look suspicious even though only one is. This PR fixes both leaks and adds regression tests.

Bug 1 — L2 rule-file-hidden-unicode (and siblings) scan other skills

Root cause. User-scope wildcard patterns in the KB (e.g. ~/.agents/skills/*/SKILL.md) walk the entire home directory. When the scan target is itself inside $HOME (for example a single skill at ~/.codex/skills/<name>), those wildcards match sibling skills that belong to completely different scans. The hidden-unicode (and every other rule-file) finding for a sibling was attributed to the current scan.

Fix. In src/scan.ts, gate user-scope candidate acceptance with a new shouldKeepUserScopeCandidate helper:

  • When the scan target is inside the home directory, only keep user-scope matches that are also inside the scan target.
  • When the scan target is outside the home directory (regular project root scan), keep the existing host-wide behavior — user-scope configs like ~/.cursor/mcp.json remain legitimate context.

Bug 2 — L3 layer3-network_error for every host-configured MCP endpoint

Root cause. layer3OutcomesToFindings emitted a LOW layer3-network_error (category PARSE_ERROR) finding whose file_path was the remote URL — e.g. https://mcp.linear.app/mcp — whenever a successful deep-scan outcome carried no metadata.findings[] or metadata.tools[]. The default executeDeepResource in src/cli.ts deliberately makes no outbound calls, so this "schema mismatch" was the norm, not an anomaly. Every scan with deep enabled therefore leaked a host-level finding onto the per-target report.

Fix. Apply the same treatment registry resources already had: silently skip any "ok" outcome whose metadata yields no findings/tools. Genuine fetch-level failures (timeout, auth_failure, command_error, skipped_without_consent) still produce findings.

Tests

New:

  • tests/layer2/cross-scan-attribution.test.ts — sibling skill inside the same home is no longer attributed to the scanned skill; parent-scope scan still surfaces both skills; project scans outside the home keep user-scope attribution.
  • tests/layer3/network-error-suppression.test.ts — HTTP/SSE schema mismatches produce no findings while timeout / auth_failure / command_error / skipped_without_consent still do, and actionable tools[] findings are preserved.

Updated (old behavior was encoded as an expectation, same treatment as PR #52):

  • tests/layer3/layer3-integration.test.ts — the case that asserted the old schema-mismatch finding now asserts its absence.

Totals: 712 → 719 tests (+7), 152 → 154 files (+2).

Test plan

  • npm run typecheck
  • npm run lint
  • npx prettier --check .
  • npm test (154 files, 719 tests, all passing)
  • New tests demonstrate the fix (tests/layer2/cross-scan-attribution.test.ts, tests/layer3/network-error-suppression.test.ts)
  • No changes to CHANGELOG.md or package.json — release tooling handles those

A targeted `scan <target>` currently surfaces two classes of finding whose
`file_path` is outside the scan target. When the Mobb Agent Security
dashboard groups scans by target, those host-wide findings make every
skill look suspicious even though only one is.

This fixes both leaks.

Layer 2 — hidden-unicode (and every other rule-file detector): when the
scan target is itself inside the user's home directory (for example a
single skill at `~/.codex/skills/<name>`), user-scope wildcard patterns
like `~/.agents/skills/*/SKILL.md` were walking the whole home tree and
picking up sibling skills. Add a `shouldKeepUserScopeCandidate` guard
that drops user-scope matches outside the scan target in exactly this
case; project-root scans outside the home directory keep the existing
host-wide context behavior unchanged.

Layer 3 — `layer3OutcomesToFindings`: a successful outcome with no
`metadata.findings[]` / `metadata.tools[]` was emitting a LOW
`layer3-network_error` PARSE_ERROR finding whose `file_path` was the
remote MCP URL (e.g. `https://mcp.linear.app/mcp`). The default resource
executor deliberately makes no outbound calls, so this "schema mismatch"
was the norm, not an anomaly, and the resulting finding polluted every
per-target scan report with host-level noise. Treat all resource kinds
the same way registry resources were already handled: silently skip
when no actionable metadata is present.

Tests added:
- `tests/layer2/cross-scan-attribution.test.ts` — sibling skill inside
  the same home is no longer attributed to the scanned skill, but is
  reported when the scan target is their parent; project scans outside
  the home still see user-scope files.
- `tests/layer3/network-error-suppression.test.ts` — HTTP/SSE schema
  mismatches produce no findings while timeout / auth_failure /
  command_error / skipped_without_consent still do.

`tests/layer3/layer3-integration.test.ts` is updated to match: the case
that asserted the old schema-mismatch finding now asserts its absence.
@jonathansantilli jonathansantilli merged commit 77f9627 into main Apr 22, 2026
16 checks passed
@jonathansantilli jonathansantilli deleted the fix/cross-scan-finding-attribution branch April 22, 2026 12:45
github-actions Bot pushed a commit that referenced this pull request Apr 22, 2026
## [0.14.2](v0.14.1...v0.14.2) (2026-04-22)

### Bug Fixes

* **scan:** stop attributing host-wide findings to per-target scans ([#53](#53)) ([77f9627](77f9627))
jonathansantilli added a commit that referenced this pull request Apr 22, 2026
PR #53 closed the cross-scan leak for skill-dir scans but not for
single-file scans of configs like ~/.claude/settings.json. Symptom:

  $ codegate-ai scan ~/.claude/settings.json
  → finding with file_path=~/.agents/skills/api-design-guide/.../SKILL.md

Root cause: the CLI stages single-file targets into a temp dir outside
$HOME. The staged dir is not inside homeDir, so
shouldKeepUserScopeCandidate short-circuits to `return true` and every
sibling user-scope match (e.g. a hidden-unicode hit in a completely
unrelated skill) gets attributed to the config scan.

Fix:
- cli.ts: when resolvedTarget.explicitCandidates is non-empty (the
  target was a staged local file), force scan_user_scope=false for that
  scan. Explicit opt-in via --include-user-scope still overrides. This
  matches user expectation: "scan this file" ≠ "scan my whole home."
- scan.ts: shouldKeepUserScopeCandidate now also handles engine-level
  file targets correctly (if the target is a file inside homeDir, only
  the target file itself is a valid user-scope candidate). This is
  defence in depth for library callers that bypass the CLI.

Tests:
- Existing 3 cases in tests/layer2/cross-scan-attribution.test.ts still
  pass.
- New: engine-level file-target scan drops sibling user-scope candidates.

Verified 154 test files / 720 tests pass. Lint + prettier clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant