Skip to content

fix(layer3): clean remote-resource URLs + configurable timeout & byte-size guards#52

Merged
jonathansantilli merged 2 commits intomainfrom
fix/layer3-url-and-timeout
Apr 22, 2026
Merged

fix(layer3): clean remote-resource URLs + configurable timeout & byte-size guards#52
jonathansantilli merged 2 commits intomainfrom
fix/layer3-url-and-timeout

Conversation

@jonathansantilli
Copy link
Copy Markdown
Owner

Summary

Three fixes for Layer 3 (deep scan) remote-resource handling.

Bug 1 — Malformed resource IDs (http:https://..., http:http://...)

Resource IDs for HTTP/SSE MCP endpoints were built as ${kind}:${url}, so
the kind (http / sse) collided with the URL's own scheme and produced
values like http:https://mcp.linear.app/mcp. These leaked into
rule_id and file_path on PARSE_ERROR findings.

  • New helper src/layer3-dynamic/url-validation.ts exposing
    normalizeRemoteUrl and buildResourceId.
  • normalizeRemoteUrl validates scheme (http/https only), rejects
    missing-host / empty inputs, normalises trailing slashes.
  • buildResourceId uses the URL itself as the ID for http/sse kinds
    (no more http: prefix collision); keeps <kind>:<locator> for
    npm/pypi/git so isRegistryMetadataResource still works.
  • Wired into collectDeepScanResourcesFromParsed in src/scan.ts.
  • Updated affected expectations across discovery/consent/integration tests
    and added tests/layer3/url-validation.test.ts.

Bug 2 — Configurable L3 remote-fetch timeout

  • New config field layer3_remote_fetch_timeout_ms (default 5000ms).
  • Env var override: CODEGATE_LAYER3_REMOTE_FETCH_TIMEOUT_MS (wins over
    project / global config).
  • Timeout is enforced via AbortController in fetchResourceMetadata.
  • No CLI flag — surfaced only via config + env var, per request.

Bug 3 — Byte-size guard on L3 remote fetches

  • New config field layer3_remote_fetch_max_bytes (default 1 MiB,
    i.e. 1_048_576).
  • Env var override: CODEGATE_LAYER3_REMOTE_FETCH_MAX_BYTES.
  • Enforced in two places:
    1. If the server declares Content-Length > maxBytes, abort before
      reading the body.
    2. While streaming the body, track running byte count and abort as
      soon as the cap is exceeded (covers missing / lying
      Content-Length).
  • Size-limit breaches surface as network_error with a
    response_too_large: ... message and are not retried (retrying
    wouldn't help — the payload is deterministic).

Small accompanying cleanup

  • fetchResourceMetadata now exposes DEFAULT_FETCH_TIMEOUT_MS /
    DEFAULT_FETCH_MAX_BYTES and a resourceFetcherOptionsFromConfig
    helper so callers can pass the new limits through with one line.
  • URL normalisation also rejects file://, ftp://, ssh://,
    javascript:, and schemes with no host — see
    tests/layer3/url-validation.test.ts for covered cases.

Test plan

  • npm run typecheck — clean
  • npm run lint — clean
  • npm test — 712 passing (10 new, 0 failing)
  • npx prettier --check on all changed files — clean

Relevant new test files:

  • tests/layer3/url-validation.test.ts — scheme / host / normalisation
    cases for normalizeRemoteUrl and buildResourceId, explicitly
    asserting IDs no longer start with http:http:// or http:https://.
  • tests/layer3/resource-fetcher-limits.test.ts — Content-Length reject,
    streaming abort, happy path, abort signal plumbed.
  • tests/config/layer3-remote-fetch-limits.test.ts — defaults, config
    precedence, env var override, invalid env var handling.

Risk / compatibility

  • Resource ID shape changed for http/sse kinds: was
    http:https://example.com/x, now https://example.com/x. Any
    external consumers that pattern-matched IDs starting with http: or
    sse: will need to recognise raw URLs instead. Discovery /
    consent-flow / CLI test fixtures have been updated in this PR.
    npm/pypi/git ID shapes are unchanged.
  • New defaults are lower than "unlimited": 5000ms timeout and 1 MiB
    max response. If an existing custom executeDeepResource
    implementation fetches genuinely large registry payloads these may
    now fail; raise layer3_remote_fetch_max_bytes in config or via env
    var to restore previous behaviour.
  • ResourceFetcherOptions gains an optional maxBytes field — purely
    additive, no existing caller signatures change.

Layer 3 resource discovery composed IDs as `${kind}:${url}`, which for
http/sse kinds collided with the URL's own scheme and produced malformed
values like `http:https://mcp.linear.app/mcp`. These IDs leaked into
`rule_id` and `file_path` on Layer 3 findings.

Introduce `url-validation.ts` with `normalizeRemoteUrl` (reject non
http/https schemes, missing hosts, canonicalise trailing slashes) and
`buildResourceId` (use the URL itself for http/sse kinds, preserve the
`<kind>:<locator>` shape for npm/pypi/git). Wire both into the scan
resource collector so IDs are clean and URLs validated before becoming
findings.
…fetches

A slow or malicious remote host could previously hang a scan or flood it
with gigabytes of data through a Layer 3 resource probe. Harden the
remote fetcher with two new limits:

* `layer3_remote_fetch_timeout_ms` (default 5000) — per-attempt timeout
  enforced via AbortController. Overridable by
  `CODEGATE_LAYER3_REMOTE_FETCH_TIMEOUT_MS`.
* `layer3_remote_fetch_max_bytes` (default 1_048_576) — rejects declared
  Content-Length above the cap immediately, and aborts the streaming
  read once bytes exceed it (so servers that lie about / omit the header
  cannot bypass the guard). Overridable by
  `CODEGATE_LAYER3_REMOTE_FETCH_MAX_BYTES`.

Both limits are exposed only via config file and env var — no CLI flag.
The fetcher exports `resourceFetcherOptionsFromConfig` as a small helper
for callers plumbing the limits from `CodeGateConfig`.
@jonathansantilli jonathansantilli merged commit 750c422 into main Apr 22, 2026
16 checks passed
@jonathansantilli jonathansantilli deleted the fix/layer3-url-and-timeout branch April 22, 2026 11:25
github-actions Bot pushed a commit that referenced this pull request Apr 22, 2026
## [0.14.1](v0.14.0...v0.14.1) (2026-04-22)

### Bug Fixes

* **layer3:** clean remote-resource URLs + configurable timeout & byte-size guards ([#52](#52)) ([750c422](750c422))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant