Skip to content

fix(scan): stop cross-scan leak on single-file scans of configs#54

Merged
jonathansantilli merged 1 commit intomainfrom
fix/cross-scan-file-target-attribution
Apr 22, 2026
Merged

fix(scan): stop cross-scan leak on single-file scans of configs#54
jonathansantilli merged 1 commit intomainfrom
fix/cross-scan-file-target-attribution

Conversation

@jonathansantilli
Copy link
Copy Markdown
Owner

Summary

Closes the remaining cross-scan leak left open by #53. Scanning a single file under $HOME (e.g. ~/.claude/settings.json) still surfaced findings from unrelated siblings like ~/.agents/skills/*/SKILL.md.

Reproducer (0.14.2)

$ npx codegate-ai@0.14.2 scan ~/.claude/settings.json --format json
scan_target: /Users/me/.claude/settings.json
findings:
  - HIGH rule-file-hidden-unicode
    file_path: ~/.agents/skills/api-design-guide/domains/rest/SKILL.md

The finding doesn't belong to that scan.

Root cause

The CLI stages single-file targets via stageLocalFile, which copies them into a temp dir (outside $HOME). The scan engine then runs against the staged temp dir. In shouldKeepUserScopeCandidate:

  • scanTarget = /tmp/codegate-scan-target-xxx
  • homeDir = /Users/me
  • isPathInside(homeDir, scanTarget) = false
  • → function returns true for every candidate
  • → the user-scope walk of $HOME leaks every hidden-unicode match back into the scan

#53 scoped its fix to scan targets inside homeDir. Staged file targets aren't inside homeDir, so they skipped the fix.

Fix

CLI layer (primary): when resolvedTarget.explicitCandidates is non-empty (i.e. the raw target was a local file, now staged), force scan_user_scope = false for that scan. Explicit opt-in via --include-user-scope still overrides. This matches user expectation: "scan this file" ≠ "scan my whole home."

Engine layer (defence in depth): shouldKeepUserScopeCandidate now also handles engine-level file targets. If the target is a file inside homeDir, only the file itself is a valid user-scope candidate. Library callers bypassing the CLI get the same guarantee.

Tests

tests/layer2/cross-scan-attribution.test.ts:

Checks

Key files

  • src/cli.ts — CLI-layer fix
  • src/scan.ts — engine-layer fix (shouldKeepUserScopeCandidate)
  • tests/layer2/cross-scan-attribution.test.ts — new engine-level file-target test

PR #53 closed the cross-scan leak for skill-dir scans but not for
single-file scans of configs like ~/.claude/settings.json. Symptom:

  $ codegate-ai scan ~/.claude/settings.json
  → finding with file_path=~/.agents/skills/api-design-guide/.../SKILL.md

Root cause: the CLI stages single-file targets into a temp dir outside
$HOME. The staged dir is not inside homeDir, so
shouldKeepUserScopeCandidate short-circuits to `return true` and every
sibling user-scope match (e.g. a hidden-unicode hit in a completely
unrelated skill) gets attributed to the config scan.

Fix:
- cli.ts: when resolvedTarget.explicitCandidates is non-empty (the
  target was a staged local file), force scan_user_scope=false for that
  scan. Explicit opt-in via --include-user-scope still overrides. This
  matches user expectation: "scan this file" ≠ "scan my whole home."
- scan.ts: shouldKeepUserScopeCandidate now also handles engine-level
  file targets correctly (if the target is a file inside homeDir, only
  the target file itself is a valid user-scope candidate). This is
  defence in depth for library callers that bypass the CLI.

Tests:
- Existing 3 cases in tests/layer2/cross-scan-attribution.test.ts still
  pass.
- New: engine-level file-target scan drops sibling user-scope candidates.

Verified 154 test files / 720 tests pass. Lint + prettier clean.
@jonathansantilli jonathansantilli merged commit 6799651 into main Apr 22, 2026
16 checks passed
@jonathansantilli jonathansantilli deleted the fix/cross-scan-file-target-attribution branch April 22, 2026 17:03
github-actions Bot pushed a commit that referenced this pull request Apr 22, 2026
## [0.14.3](v0.14.2...v0.14.3) (2026-04-22)

### Bug Fixes

* **scan:** disable user-scope walk when CLI scans a single file ([#54](#54)) ([6799651](6799651))
jonathansantilli added a commit that referenced this pull request Apr 22, 2026
#55)

PR #54 disabled the user-scope walk when the CLI scanned a single local
file, by gating on `explicitCandidates.length > 0`. That gate breaks
for files whose extension is not in `inferTextLikeFormat` — e.g.
`.idea/workspace.xml`, `.env` with unusual names, binary-ish configs —
because `collectExplicitCandidates` returns `[]` for them, the guard
never fires, and sibling user-scope findings (e.g. a hidden-unicode hit
in `~/.agents/skills/foo/SKILL.md`) leak into the scan of the
unrelated file.

Reproducer (0.14.3):

  $ npx codegate-ai scan ~/workspace/.idea/workspace.xml --format json
  scan_target: .../.idea/workspace.xml
  findings:
    - HIGH rule-file-hidden-unicode
      file_path: ~/.agents/skills/api-design-guide/domains/rest/SKILL.md

## Fix

Add a `stagedFromLocalFile: true` flag to `ResolvedScanTarget`, set
from `stageLocalFile`. The CLI gate now uses this flag directly:

  scan_user_scope =
    --include-user-scope ? true
    : stagedFromLocalFile ? false
    : baseConfig.scan_user_scope

It's a signal that doesn't depend on whether the file's extension was
recognisable. Covers every file type, no per-format maintenance.

## Tests

`tests/scan-target.test.ts`:
- Staged `.xml` file gets `stagedFromLocalFile=true` AND empty
  `explicitCandidates` (the PR #54 gate would have failed here).
- Staged `.json` file also gets `stagedFromLocalFile=true` and
  populated `explicitCandidates` — no regression on the happy path.

All 155 files / 722 tests pass. Lint + prettier + typecheck clean.
github-actions Bot pushed a commit that referenced this pull request Apr 22, 2026
## [0.14.4](v0.14.3...v0.14.4) (2026-04-22)

### Bug Fixes

* **scan:** close cross-scan leak for single-file targets of any format ([#55](#55)) ([46e2148](46e2148)), closes [#54](#54) [#54](#54)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant