fix(layer3): clean remote-resource URLs + configurable timeout & byte-size guards#52
Merged
jonathansantilli merged 2 commits intomainfrom Apr 22, 2026
Merged
Conversation
Layer 3 resource discovery composed IDs as `${kind}:${url}`, which for
http/sse kinds collided with the URL's own scheme and produced malformed
values like `http:https://mcp.linear.app/mcp`. These IDs leaked into
`rule_id` and `file_path` on Layer 3 findings.
Introduce `url-validation.ts` with `normalizeRemoteUrl` (reject non
http/https schemes, missing hosts, canonicalise trailing slashes) and
`buildResourceId` (use the URL itself for http/sse kinds, preserve the
`<kind>:<locator>` shape for npm/pypi/git). Wire both into the scan
resource collector so IDs are clean and URLs validated before becoming
findings.
…fetches A slow or malicious remote host could previously hang a scan or flood it with gigabytes of data through a Layer 3 resource probe. Harden the remote fetcher with two new limits: * `layer3_remote_fetch_timeout_ms` (default 5000) — per-attempt timeout enforced via AbortController. Overridable by `CODEGATE_LAYER3_REMOTE_FETCH_TIMEOUT_MS`. * `layer3_remote_fetch_max_bytes` (default 1_048_576) — rejects declared Content-Length above the cap immediately, and aborts the streaming read once bytes exceed it (so servers that lie about / omit the header cannot bypass the guard). Overridable by `CODEGATE_LAYER3_REMOTE_FETCH_MAX_BYTES`. Both limits are exposed only via config file and env var — no CLI flag. The fetcher exports `resourceFetcherOptionsFromConfig` as a small helper for callers plumbing the limits from `CodeGateConfig`.
github-actions Bot
pushed a commit
that referenced
this pull request
Apr 22, 2026
## [0.14.1](v0.14.0...v0.14.1) (2026-04-22) ### Bug Fixes * **layer3:** clean remote-resource URLs + configurable timeout & byte-size guards ([#52](#52)) ([750c422](750c422))
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three fixes for Layer 3 (deep scan) remote-resource handling.
Bug 1 — Malformed resource IDs (
http:https://...,http:http://...)Resource IDs for HTTP/SSE MCP endpoints were built as
${kind}:${url}, sothe kind (
http/sse) collided with the URL's own scheme and producedvalues like
http:https://mcp.linear.app/mcp. These leaked intorule_idandfile_pathonPARSE_ERRORfindings.src/layer3-dynamic/url-validation.tsexposingnormalizeRemoteUrlandbuildResourceId.normalizeRemoteUrlvalidates scheme (http/https only), rejectsmissing-host / empty inputs, normalises trailing slashes.
buildResourceIduses the URL itself as the ID for http/sse kinds(no more
http:prefix collision); keeps<kind>:<locator>fornpm/pypi/git so
isRegistryMetadataResourcestill works.collectDeepScanResourcesFromParsedinsrc/scan.ts.and added
tests/layer3/url-validation.test.ts.Bug 2 — Configurable L3 remote-fetch timeout
layer3_remote_fetch_timeout_ms(default 5000ms).CODEGATE_LAYER3_REMOTE_FETCH_TIMEOUT_MS(wins overproject / global config).
AbortControllerinfetchResourceMetadata.Bug 3 — Byte-size guard on L3 remote fetches
layer3_remote_fetch_max_bytes(default 1 MiB,i.e.
1_048_576).CODEGATE_LAYER3_REMOTE_FETCH_MAX_BYTES.Content-Length > maxBytes, abort beforereading the body.
soon as the cap is exceeded (covers missing / lying
Content-Length).
network_errorwith aresponse_too_large: ...message and are not retried (retryingwouldn't help — the payload is deterministic).
Small accompanying cleanup
fetchResourceMetadatanow exposesDEFAULT_FETCH_TIMEOUT_MS/DEFAULT_FETCH_MAX_BYTESand aresourceFetcherOptionsFromConfighelper so callers can pass the new limits through with one line.
file://,ftp://,ssh://,javascript:, and schemes with no host — seetests/layer3/url-validation.test.tsfor covered cases.Test plan
npm run typecheck— cleannpm run lint— cleannpm test— 712 passing (10 new, 0 failing)npx prettier --checkon all changed files — cleanRelevant new test files:
tests/layer3/url-validation.test.ts— scheme / host / normalisationcases for
normalizeRemoteUrlandbuildResourceId, explicitlyasserting IDs no longer start with
http:http://orhttp:https://.tests/layer3/resource-fetcher-limits.test.ts— Content-Length reject,streaming abort, happy path, abort signal plumbed.
tests/config/layer3-remote-fetch-limits.test.ts— defaults, configprecedence, env var override, invalid env var handling.
Risk / compatibility
http:https://example.com/x, nowhttps://example.com/x. Anyexternal consumers that pattern-matched IDs starting with
http:orsse:will need to recognise raw URLs instead. Discovery /consent-flow / CLI test fixtures have been updated in this PR.
npm/pypi/git ID shapes are unchanged.
max response. If an existing custom
executeDeepResourceimplementation fetches genuinely large registry payloads these may
now fail; raise
layer3_remote_fetch_max_bytesin config or via envvar to restore previous behaviour.
ResourceFetcherOptionsgains an optionalmaxBytesfield — purelyadditive, no existing caller signatures change.