security(wave2): SSRF protection, path sanitization, and keyed hashing#239
security(wave2): SSRF protection, path sanitization, and keyed hashing#239KooshaPari wants to merge 1 commit intomainfrom
Conversation
- Add SSRF protection in api_tools.go: validateResolvedHostIPs blocks private/loopback IPs - Add path sanitization in kiro/token.go: cleanTokenPath prevents path traversal - Replace sha256 with HMAC for sensitive ID hashing in conductor.go, types.go, user_id_cache.go - Reject URLs with user info in validateAPICallURL and copilotQuotaURLFromTokenURL - Redact logged request/response bodies with SHA256 hash for auditability - Sanitize websocket session IDs and endpoints before logging Addresses Code Scanning alerts: - go/request-forgery - go/clear-text-logging - go/weak-sensitive-data-hashing - go/path-injection Tests: - pkg/llmproxy/api/middleware: pass - pkg/llmproxy/registry: pass - sdk/cliproxy/auth: pass - internal/runtime/executor: pass Pre-existing issues (not introduced by this PR): - executor packages have undefined normalizeGeminiCLIModel build failure - kiro auth has duplicate roundTripperFunc declaration in test files - path traversal test expects 400 but gets 500 (blocked correctly, wrong status code)
| continue | ||
| } | ||
| current = filepath.Join(current, component) | ||
| info, errStat := os.Lstat(current) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general, to fix “uncontrolled data used in path expression” you must ensure that any user-derived component used in paths is either (a) reduced to a single safe file name (no separators, no ..) or (b) validated to reside within an expected base directory after normalization, and avoid following symlinks where not desired. This codebase already performs directory-level validation (ResolveSafeFilePath, normalizePathWithinBase, denySymlinkPath), and ExtractIDCIdentifier already removes obvious unsafe characters from the identifier. The missing piece, from the analyzer’s perspective, is guaranteeing that the generated file name cannot contain path separators or traversal segments, regardless of how Email or StartURL are manipulated.
The best targeted fix is to (1) tighten GenerateTokenFileName so that it produces a conservative, strictly safe file name: explicitly strip or replace all characters that might introduce path separators or traversal (including .., additional punctuation, or non-alphanumerics) and, optionally, perform a final check that the result contains no /, \, or ..; and (2) add a small explicit validation of the fileName before it is combined with authDir in saveTokenToFile. This preserves existing behavior (files still named based on authMethod, email, or IDC identifier), but removes any theoretical avenue for an unsafe file name to be generated from tainted input and ensures that the path components used in denySymlinkPath are always benign file/directory names.
Concretely:
- In
pkg/llmproxy/auth/kiro/aws.go:- Extend
GenerateTokenFileNameso that:- The sanitized email or IDC identifier is further constrained to a safe character set (e.g.,
[a-zA-Z0-9_-]), replacing any other character with_. - After constructing the final file name (
kiro-...json), verify that it contains no/,\, or..; if it would, fall back to a conservative default likekiro-<authMethod>.json.
- The sanitized email or IDC identifier is further constrained to a safe character set (e.g.,
- This does not change visible functionality in normal cases (valid emails and start URLs already only generate alphanumerics, dashes, and dots), but guarantees safe file names for any weird or unexpected input.
- Extend
- In
pkg/llmproxy/auth/kiro/oauth_web.go:- After
fileName := GenerateTokenFileName(tokenData), add a minimal defensive check ensuring the resultingfileNameis a single component (no separators and no..). If invalid, log an error and either abort saving or fall back to a generic file name (e.g.,kiro-<authMethod>.json). This extra guard both protectsauthDirand makes the data flow obviously safe to the analyzer.
- After
No changes are required to token.go or misc/path_security.go for correctness, as those already provide strong normalization and containment checking.
| @@ -383,6 +383,13 @@ | ||
| // Generate filename using the unified function | ||
| fileName := GenerateTokenFileName(tokenData) | ||
|
|
||
| // Ensure the generated file name is a single path component | ||
| if strings.ContainsAny(fileName, `/\`) || strings.Contains(fileName, "..") || strings.TrimSpace(fileName) == "" { | ||
| log.Errorf("OAuth Web: generated unsafe token file name %q, falling back to default", fileName) | ||
| fallback := "kiro-unknown.json" | ||
| fileName = fallback | ||
| } | ||
|
|
||
| authFilePath := filepath.Join(authDir, fileName) | ||
|
|
||
| // Convert to storage format and save |
| func logCodexWebsocketDisconnected(sessionID, authID, wsURL, reason string, err error) { | ||
| if err != nil { | ||
| log.Infof("codex websockets: upstream disconnected session=%s auth=%s url=%s reason=%s err=%v", strings.TrimSpace(sessionID), sanitizeCodexWebsocketLogField(authID), sanitizeCodexWebsocketLogURL(wsURL), strings.TrimSpace(reason), err) | ||
| log.Infof("codex websockets: upstream disconnected session=%s auth=%s endpoint=%s reason=%s err=%v", sanitizeCodexSessionID(sessionID), sanitizeCodexWebsocketLogField(authID), sanitizeCodexWebsocketLogEndpoint(wsURL), strings.TrimSpace(reason), err) |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
General approach: never log potentially sensitive, tainted data directly. In this case, that means avoiding logging the full error value from thinking.ApplyThinking in websocket disconnect logs, or at least ensuring the logged form of that error does not expose sensitive fields (like model IDs derived from API-key model aliases). Since we’re constrained to the provided snippets, the safest and least invasive fix is to sanitize the error before passing it to logCodexWebsocketDisconnected, so logs show only non-sensitive information.
Best targeted fix:
- In
pkg/llmproxy/executor/codex_websockets_executor.go, inside theExecutemethod’sdeferthat logs websocket disconnects, introduce a localsafeErrthat either:- preserves non-sensitive error information, or
- replaces the original error with a generic error wrapper that does not embed sensitive data.
- Then pass
safeErrintologCodexWebsocketDisconnectedinstead of the rawerr. - This change is localized to logging and does not alter functional behavior of the executor or error propagation; callers still see the original
erras the function’s return value. - We can construct a generic
fmt.Errorf("execution failed: %T", err)(or even justerrors.New("execution failed")) to retain at least the error type in logs while not printing the message that might contain sensitive content.
Concretely:
- Edit the
deferblock at lines 291–296 (within theExecutefunction) inpkg/llmproxy/executor/codex_websockets_executor.go. - Replace the direct call
logCodexWebsocketDisconnected(..., reason, err)with logic that computessafeErrand uses that instead.
No changes are needed in the thinking or registry packages for this specific logging sink.
| @@ -293,7 +293,12 @@ | ||
| if err != nil { | ||
| reason = "error" | ||
| } | ||
| logCodexWebsocketDisconnected(executionSessionID, authID, wsURL, reason, err) | ||
| var safeErr error | ||
| if err != nil { | ||
| // Avoid logging potentially sensitive details from upstream errors. | ||
| safeErr = fmt.Errorf("execution failed: %T", err) | ||
| } | ||
| logCodexWebsocketDisconnected(executionSessionID, authID, wsURL, reason, safeErr) | ||
| if errClose := conn.Close(); errClose != nil { | ||
| log.Errorf("codex websockets executor: close websocket error: %v", errClose) | ||
| } |
| if trimmed == "" { | ||
| return "" | ||
| } | ||
| sum := sha256.Sum256([]byte(trimmed)) |
Check failure
Code scanning / CodeQL
Use of a broken or weak cryptographic hashing algorithm on sensitive data High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
General approach: For password hashing, you would replace SHA-256 with a password hashing algorithm (Argon2, scrypt, bcrypt, PBKDF2). Here, however, the code is not performing password hashing; it’s deriving an opaque session label for logs. To both (a) avoid misuse of SHA-256 in a context tainted as “password-like” and (b) avoid unnecessary cryptographic operations, we can replace the SHA-256 call with a fast, non-cryptographic hash (for example hash/fnv) and truncate its output as before. This preserves functionality (stable, short pseudonymous ID) while removing any appearance of password hashing.
Concretely, in pkg/llmproxy/executor/codex_websockets_executor.go:
- Remove the import of
crypto/sha256(no longer needed in this file). - Add an import of
hash/fnv. - In
sanitizeCodexSessionID, replace:with code that:sum := sha256.Sum256([]byte(trimmed)) return fmt.Sprintf("sess_%x", sum[:6])
- Creates an FNV-1a 64-bit hasher.
- Writes the trimmed string bytes into the hasher.
- Gets the 64-bit hash and formats it as hex, truncating to the same number of hex characters as before (12 hex characters for 6 bytes).
No changes are required in sdk/cliproxy/auth/conductor.go, because it only participates in the taint flow; the hashing actually occurs in sanitizeCodexSessionID.
| @@ -5,8 +5,8 @@ | ||
| import ( | ||
| "bytes" | ||
| "context" | ||
| "crypto/sha256" | ||
| "fmt" | ||
| "hash/fnv" | ||
| "io" | ||
| "net" | ||
| "net/http" | ||
| @@ -1343,8 +1342,11 @@ | ||
| if trimmed == "" { | ||
| return "" | ||
| } | ||
| sum := sha256.Sum256([]byte(trimmed)) | ||
| return fmt.Sprintf("sess_%x", sum[:6]) | ||
| h := fnv.New64a() | ||
| _, _ = h.Write([]byte(trimmed)) | ||
| // Keep the identifier short and opaque, similar to the previous 6-byte SHA-256 prefix. | ||
| hashVal := h.Sum64() | ||
| return fmt.Sprintf("sess_%012x", hashVal&0xfffffffffff) | ||
| } | ||
|
|
||
| // CodexAutoExecutor routes Codex requests to the websocket transport only when: |
| if trimmed == "" { | ||
| return "" | ||
| } | ||
| sum := sha256.Sum256([]byte(trimmed)) |
Check failure
Code scanning / CodeQL
Use of a broken or weak cryptographic hashing algorithm on sensitive data High
Copilot Autofix
AI about 2 months ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| func logCodexWebsocketDisconnected(sessionID string, authID string, wsURL string, reason string, err error) { | ||
| if err != nil { | ||
| log.Infof("codex websockets: upstream disconnected session=%s auth=%s url=%s reason=%s err=%v", strings.TrimSpace(sessionID), sanitizeCodexWebsocketLogField(authID), sanitizeCodexWebsocketLogURL(wsURL), strings.TrimSpace(reason), err) | ||
| log.Infof("codex websockets: upstream disconnected session=%s auth=%s endpoint=%s reason=%s err=%v", sanitizeCodexSessionID(sessionID), sanitizeCodexWebsocketLogField(authID), sanitizeCodexWebsocketLogEndpoint(wsURL), strings.TrimSpace(reason), err) |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
General fix strategy: avoid logging raw error messages that may contain sensitive or tenant-specific data, especially at info level. Instead, either (a) do not log the error at all in that context, or (b) log only non-sensitive metadata (e.g., that an error occurred) and, if needed, a sanitized version of the error that strips or masks any potentially sensitive substrings (e.g., API keys).
Best concrete fix here: adjust logCodexWebsocketDisconnected in pkg/llmproxy/runtime/executor/codex_websockets_executor.go so that it no longer interpolates the raw err value directly into the info-level log line. We can log the disconnection with session ID, auth, endpoint, and reason as before, and add a minimal “err=failed” indicator or a separate debug-level log, but we should not include the full err string. Because this helper is only used in this file and we have no global sanitization for arbitrary errors, the simplest safe change with minimal behavior impact is:
- Keep the function signature the same so callers don’t change.
- In the
if err != nilbranch, remove%vanderrfrom the log format and arguments. - Optionally, include a boolean
error=truestyle marker in the message if retaining error presence is important, but avoid outputting the content.
This fix touches only pkg/llmproxy/runtime/executor/codex_websockets_executor.go lines 1302–1307; no new imports or helpers are necessary.
| @@ -1301,7 +1301,7 @@ | ||
|
|
||
| func logCodexWebsocketDisconnected(sessionID string, authID string, wsURL string, reason string, err error) { | ||
| if err != nil { | ||
| log.Infof("codex websockets: upstream disconnected session=%s auth=%s endpoint=%s reason=%s err=%v", sanitizeCodexSessionID(sessionID), sanitizeCodexWebsocketLogField(authID), sanitizeCodexWebsocketLogEndpoint(wsURL), strings.TrimSpace(reason), err) | ||
| log.Infof("codex websockets: upstream disconnected session=%s auth=%s endpoint=%s reason=%s err=true", sanitizeCodexSessionID(sessionID), sanitizeCodexWebsocketLogField(authID), sanitizeCodexWebsocketLogEndpoint(wsURL), strings.TrimSpace(reason)) | ||
| return | ||
| } | ||
| log.Infof("codex websockets: upstream disconnected session=%s auth=%s endpoint=%s reason=%s", sanitizeCodexSessionID(sessionID), sanitizeCodexWebsocketLogField(authID), sanitizeCodexWebsocketLogEndpoint(wsURL), strings.TrimSpace(reason)) |
| if trimmed == "" { | ||
| return "" | ||
| } | ||
| sum := sha256.Sum256([]byte(trimmed)) |
Check failure
Code scanning / CodeQL
Use of a broken or weak cryptographic hashing algorithm on sensitive data High
Copilot Autofix
AI about 2 months ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
|
Superseded by rebased PR #240 |
feat(registry): add GPT-5.3 Codex to GitHub Copilot provider
Summary
validateResolvedHostIPsto block private/loopback IPs in API callscleanTokenPathto prevent path traversal in kiro token loadingvalidateAPICallURLandcopilotQuotaURLFromTokenURLSecurity Alerts Addressed
go/request-forgery- SSRF via resolved IP validationgo/clear-text-logging- Sensitive data in logsgo/weak-sensitive-data-hashing- Predictable hashinggo/path-injection- Path traversal in token filesTest Results
Pre-existing Issues (Not Introduced)
undefined: normalizeGeminiCLIModelbuild failureroundTripperFuncdeclaration in test filesRelated