feat(api-proxy): fail-fast API key validation at startup#2199
feat(api-proxy): fail-fast API key validation at startup#2199
Conversation
Smoke Test Results ✅Recent PRs:
Test Status: PASS
|
🔥 Smoke Test: Copilot BYOK (Offline) Mode
Running in BYOK offline mode ( Overall: PASS — PR by
|
🔬 Smoke Test Results
PR: feat(api-proxy): fail-fast API key validation at startup Overall: PARTIAL PASS — GitHub connectivity confirmed ✅; pre-computed step outputs were not injected into the workflow template.
|
|
Smoke test results (run 24903914545)
Warning The following domain was blocked by the firewall during workflow execution:
To allow these domains, add them to the network:
allowed:
- defaults
- "registry.npmjs.org"See Network Configuration for more information.
|
Chroot Version Comparison Results
Overall: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot.
|
There was a problem hiding this comment.
Pull request overview
This PR adds startup-time “probe” requests in the api-proxy sidecar to validate configured LLM provider API keys early, so invalid/expired credentials surface immediately in logs instead of later as opaque 401s during agent execution.
Changes:
- Introduces
validateKey()(probe helper) andvalidateApiKeys()(provider-specific orchestrator) to perform lightweight upstream checks with timeouts and structured logging. - Adds startup sequencing so validation runs after all proxy servers are listening (healthcheck port is up) via
Promise.all(listenPromises). - Adds unit tests covering
validateKeyandvalidateApiKeysoutcomes and skip conditions.
Show a summary per file
| File | Description |
|---|---|
| containers/api-proxy/server.js | Adds probe helper/orchestrator plus startup sequencing to trigger key validation after servers begin listening. |
| containers/api-proxy/server.test.js | Adds unit tests for probe behavior, logging, timeouts, and provider-specific skip/validate paths. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
containers/api-proxy/server.js:546
- This branch handles both classic
ghp_*PATs andCOPILOT_API_KEYBYOK mode, but the log message only mentionsCOPILOT_API_KEYauth mode. That will mislead users running with a classic PAT. Update the message to cover both reasons (or differentiate them) and adjust the related unit test expectation accordingly.
} else {
// Classic ghp_* PAT or COPILOT_API_KEY BYOK — validation not supported for this auth mode
logRequest('warn', 'key_validation_skipped', {
provider: 'copilot',
message: 'Validation skipped — COPILOT_API_KEY auth mode does not support probe endpoint',
});
- Files reviewed: 2/2 changed files
- Comments generated: 5
| }); | ||
| const skippedLog = lines.find(l => l.event === 'key_validation_skipped' && l.provider === 'copilot'); | ||
| expect(skippedLog).toBeDefined(); | ||
| expect(skippedLog.message).toContain('COPILOT_API_KEY auth mode'); |
There was a problem hiding this comment.
This test asserts the Copilot “skipped” message contains COPILOT_API_KEY auth mode, but in this scenario the auth token is a classic ghp_* PAT. Once the production log message is fixed to distinguish classic PAT vs BYOK, update this expectation to match the correct skip reason.
| expect(skippedLog.message).toContain('COPILOT_API_KEY auth mode'); | |
| expect(skippedLog.message).toContain('classic ghp_ PAT'); |
| * @param {number} [opts.timeoutMs] - Per-request timeout in ms (default: DEFAULT_VALIDATION_TIMEOUT_MS) | ||
| * @returns {Promise<{result: 'success'|'failed'|'timeout'|'error', status?: number, duration_ms: number, error?: string}>} | ||
| */ | ||
| function validateKey(provider, target, path, method, body, headers, successStatuses, failStatuses, opts = {}) { | ||
| const timeoutMs = opts.timeoutMs || DEFAULT_VALIDATION_TIMEOUT_MS; |
There was a problem hiding this comment.
validateKey uses opts.timeoutMs || DEFAULT_VALIDATION_TIMEOUT_MS, which treats 0 (and other falsy values) as “unset”. Since callers can intentionally pass timeoutMs: 0 (or might pass a computed value that can be 0), this will silently fall back to the default timeout. Use nullish coalescing (opts.timeoutMs ?? DEFAULT_VALIDATION_TIMEOUT_MS) and consider validating that the timeout is a positive finite number.
| * @param {number} [opts.timeoutMs] - Per-request timeout in ms (default: DEFAULT_VALIDATION_TIMEOUT_MS) | |
| * @returns {Promise<{result: 'success'|'failed'|'timeout'|'error', status?: number, duration_ms: number, error?: string}>} | |
| */ | |
| function validateKey(provider, target, path, method, body, headers, successStatuses, failStatuses, opts = {}) { | |
| const timeoutMs = opts.timeoutMs || DEFAULT_VALIDATION_TIMEOUT_MS; | |
| * @param {number} [opts.timeoutMs] - Per-request timeout in ms; must be a finite non-negative number (default: DEFAULT_VALIDATION_TIMEOUT_MS) | |
| * @returns {Promise<{result: 'success'|'failed'|'timeout'|'error', status?: number, duration_ms: number, error?: string}>} | |
| */ | |
| function validateKey(provider, target, path, method, body, headers, successStatuses, failStatuses, opts = {}) { | |
| const requestedTimeoutMs = opts.timeoutMs ?? DEFAULT_VALIDATION_TIMEOUT_MS; | |
| const timeoutMs = Number.isFinite(requestedTimeoutMs) && requestedTimeoutMs >= 0 | |
| ? requestedTimeoutMs | |
| : DEFAULT_VALIDATION_TIMEOUT_MS; |
| const openaiKey = overrides.openaiKey !== undefined ? overrides.openaiKey : OPENAI_API_KEY; | ||
| const openaiTarget = overrides.openaiTarget !== undefined ? overrides.openaiTarget : OPENAI_API_TARGET; | ||
| const openaiBasePath = overrides.openaiBasePath !== undefined ? overrides.openaiBasePath : OPENAI_API_BASE_PATH; | ||
| const anthropicKey = overrides.anthropicKey !== undefined ? overrides.anthropicKey : ANTHROPIC_API_KEY; | ||
| const anthropicTarget = overrides.anthropicTarget !== undefined ? overrides.anthropicTarget : ANTHROPIC_API_TARGET; | ||
| const anthropicBasePath = overrides.anthropicBasePath !== undefined ? overrides.anthropicBasePath : ANTHROPIC_API_BASE_PATH; | ||
| const copilotGithubToken = overrides.copilotGithubToken !== undefined ? overrides.copilotGithubToken : COPILOT_GITHUB_TOKEN; | ||
| const copilotAuthToken = overrides.copilotAuthToken !== undefined ? overrides.copilotAuthToken : COPILOT_AUTH_TOKEN; | ||
| const copilotTarget = overrides.copilotTarget !== undefined ? overrides.copilotTarget : COPILOT_API_TARGET; | ||
| const copilotTargetOverridden = overrides.copilotTargetOverridden !== undefined ? overrides.copilotTargetOverridden : !!process.env.COPILOT_API_TARGET; | ||
| const copilotIntegrationId = overrides.copilotIntegrationId !== undefined ? overrides.copilotIntegrationId : COPILOT_INTEGRATION_ID; | ||
| const geminiKey = overrides.geminiKey !== undefined ? overrides.geminiKey : GEMINI_API_KEY; | ||
| const geminiTarget = overrides.geminiTarget !== undefined ? overrides.geminiTarget : GEMINI_API_TARGET; | ||
| const geminiBasePath = overrides.geminiBasePath !== undefined ? overrides.geminiBasePath : GEMINI_API_BASE_PATH; | ||
| const probeOpts = overrides.timeoutMs !== undefined ? { timeoutMs: overrides.timeoutMs } : {}; |
There was a problem hiding this comment.
The override resolution pattern overrides.foo !== undefined ? overrides.foo : DEFAULT prevents tests (and any future callers) from explicitly overriding a value to undefined to simulate “not set”. Use a presence check like Object.prototype.hasOwnProperty.call(overrides, 'openaiKey') (or 'openaiKey' in overrides) so undefined can be an intentional override and avoid env-dependent test behavior.
| const openaiKey = overrides.openaiKey !== undefined ? overrides.openaiKey : OPENAI_API_KEY; | |
| const openaiTarget = overrides.openaiTarget !== undefined ? overrides.openaiTarget : OPENAI_API_TARGET; | |
| const openaiBasePath = overrides.openaiBasePath !== undefined ? overrides.openaiBasePath : OPENAI_API_BASE_PATH; | |
| const anthropicKey = overrides.anthropicKey !== undefined ? overrides.anthropicKey : ANTHROPIC_API_KEY; | |
| const anthropicTarget = overrides.anthropicTarget !== undefined ? overrides.anthropicTarget : ANTHROPIC_API_TARGET; | |
| const anthropicBasePath = overrides.anthropicBasePath !== undefined ? overrides.anthropicBasePath : ANTHROPIC_API_BASE_PATH; | |
| const copilotGithubToken = overrides.copilotGithubToken !== undefined ? overrides.copilotGithubToken : COPILOT_GITHUB_TOKEN; | |
| const copilotAuthToken = overrides.copilotAuthToken !== undefined ? overrides.copilotAuthToken : COPILOT_AUTH_TOKEN; | |
| const copilotTarget = overrides.copilotTarget !== undefined ? overrides.copilotTarget : COPILOT_API_TARGET; | |
| const copilotTargetOverridden = overrides.copilotTargetOverridden !== undefined ? overrides.copilotTargetOverridden : !!process.env.COPILOT_API_TARGET; | |
| const copilotIntegrationId = overrides.copilotIntegrationId !== undefined ? overrides.copilotIntegrationId : COPILOT_INTEGRATION_ID; | |
| const geminiKey = overrides.geminiKey !== undefined ? overrides.geminiKey : GEMINI_API_KEY; | |
| const geminiTarget = overrides.geminiTarget !== undefined ? overrides.geminiTarget : GEMINI_API_TARGET; | |
| const geminiBasePath = overrides.geminiBasePath !== undefined ? overrides.geminiBasePath : GEMINI_API_BASE_PATH; | |
| const probeOpts = overrides.timeoutMs !== undefined ? { timeoutMs: overrides.timeoutMs } : {}; | |
| const openaiKey = Object.prototype.hasOwnProperty.call(overrides, 'openaiKey') ? overrides.openaiKey : OPENAI_API_KEY; | |
| const openaiTarget = Object.prototype.hasOwnProperty.call(overrides, 'openaiTarget') ? overrides.openaiTarget : OPENAI_API_TARGET; | |
| const openaiBasePath = Object.prototype.hasOwnProperty.call(overrides, 'openaiBasePath') ? overrides.openaiBasePath : OPENAI_API_BASE_PATH; | |
| const anthropicKey = Object.prototype.hasOwnProperty.call(overrides, 'anthropicKey') ? overrides.anthropicKey : ANTHROPIC_API_KEY; | |
| const anthropicTarget = Object.prototype.hasOwnProperty.call(overrides, 'anthropicTarget') ? overrides.anthropicTarget : ANTHROPIC_API_TARGET; | |
| const anthropicBasePath = Object.prototype.hasOwnProperty.call(overrides, 'anthropicBasePath') ? overrides.anthropicBasePath : ANTHROPIC_API_BASE_PATH; | |
| const copilotGithubToken = Object.prototype.hasOwnProperty.call(overrides, 'copilotGithubToken') ? overrides.copilotGithubToken : COPILOT_GITHUB_TOKEN; | |
| const copilotAuthToken = Object.prototype.hasOwnProperty.call(overrides, 'copilotAuthToken') ? overrides.copilotAuthToken : COPILOT_AUTH_TOKEN; | |
| const copilotTarget = Object.prototype.hasOwnProperty.call(overrides, 'copilotTarget') ? overrides.copilotTarget : COPILOT_API_TARGET; | |
| const copilotTargetOverridden = Object.prototype.hasOwnProperty.call(overrides, 'copilotTargetOverridden') ? overrides.copilotTargetOverridden : !!process.env.COPILOT_API_TARGET; | |
| const copilotIntegrationId = Object.prototype.hasOwnProperty.call(overrides, 'copilotIntegrationId') ? overrides.copilotIntegrationId : COPILOT_INTEGRATION_ID; | |
| const geminiKey = Object.prototype.hasOwnProperty.call(overrides, 'geminiKey') ? overrides.geminiKey : GEMINI_API_KEY; | |
| const geminiTarget = Object.prototype.hasOwnProperty.call(overrides, 'geminiTarget') ? overrides.geminiTarget : GEMINI_API_TARGET; | |
| const geminiBasePath = Object.prototype.hasOwnProperty.call(overrides, 'geminiBasePath') ? overrides.geminiBasePath : GEMINI_API_BASE_PATH; | |
| const probeOpts = Object.prototype.hasOwnProperty.call(overrides, 'timeoutMs') ? { timeoutMs: overrides.timeoutMs } : {}; |
| const geminiTarget = overrides.geminiTarget !== undefined ? overrides.geminiTarget : GEMINI_API_TARGET; | ||
| const geminiBasePath = overrides.geminiBasePath !== undefined ? overrides.geminiBasePath : GEMINI_API_BASE_PATH; | ||
| const probeOpts = overrides.timeoutMs !== undefined ? { timeoutMs: overrides.timeoutMs } : {}; | ||
| const timeoutSecs = Math.round((probeOpts.timeoutMs || DEFAULT_VALIDATION_TIMEOUT_MS) / 1000); |
There was a problem hiding this comment.
timeoutSecs is computed with Math.round(timeoutMs / 1000), which can log 0s for sub-second timeouts (including the small timeouts used in tests) and generally understates elapsed time. Prefer Math.ceil(...) or include the exact millisecond timeout in the log message.
| const timeoutSecs = Math.round((probeOpts.timeoutMs || DEFAULT_VALIDATION_TIMEOUT_MS) / 1000); | |
| const timeoutSecs = Math.ceil((probeOpts.timeoutMs || DEFAULT_VALIDATION_TIMEOUT_MS) / 1000); |
| logRequest('warn', 'key_validation_skipped', { | ||
| provider: 'openai', | ||
| message: `Validation skipped — custom API target (${openaiTarget})`, |
There was a problem hiding this comment.
The “skipped” message for OpenAI is emitted both for custom targets and for a non-empty base path, but the message always says “custom API target (...)”. This is misleading when the skip reason is a configured base path. Consider including the base-path condition in the message (or splitting the reasons) so logs are actionable.
This issue also appears on line 541 of the same file.
| logRequest('warn', 'key_validation_skipped', { | |
| provider: 'openai', | |
| message: `Validation skipped — custom API target (${openaiTarget})`, | |
| const skipReasons = []; | |
| if (openaiTarget !== 'api.openai.com') { | |
| skipReasons.push(`custom API target (${openaiTarget})`); | |
| } | |
| if (openaiBasePath) { | |
| skipReasons.push(`configured base path (${openaiBasePath})`); | |
| } | |
| logRequest('warn', 'key_validation_skipped', { | |
| provider: 'openai', | |
| message: `Validation skipped — ${skipReasons.join('; ')}`, |
Smoke Test Results: GitHub Actions Services Connectivity
All three checks failed.
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS
|
Port and adapt 18 tests from PR #2199 (copilot-swe-agent) covering the validateApiKeys orchestrator for all four providers: - OpenAI: valid (200), auth_rejected (401), skipped (custom target), no-op (no key) - Anthropic: valid (400 = key accepted), auth_rejected (401, 403), skipped (custom target) - Copilot: valid (200 with ghu_ token), auth_rejected (401), skipped (custom target, BYOK mode) - Gemini: valid (200), auth_rejected (403), skipped (custom target) - Cross-cutting: network_error (timeout), no-op (no keys at all) To make validateApiKeys testable without module-level state: - Added overrides parameter for injecting keys/targets in tests - Exported keyValidationResults and resetKeyValidationState() - Used 'in' operator for override resolution (supports explicit undefined) Co-authored-by: copilot-swe-agent[bot] <198982749+copilot-swe-agent[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Closing in favor of #2200, which includes the stronger design (AWF_VALIDATE_KEYS mode control, health endpoint integration, richer error classification) plus the orchestration tests from this PR — adapted and merged with co-author credit. Thanks for the tests! |
* feat(api-proxy): add startup API key validation Add a validateApiKeys() function that probes each configured provider's API at startup to detect expired or invalid credentials before the agent starts making requests. This directly addresses issue #2185 where an expired COPILOT_GITHUB_TOKEN caused cryptic 401 errors deep in agent logs with no clear guidance on the fix. Key design: - Validates Copilot (GET /models), OpenAI (GET /v1/models), Anthropic (POST /v1/messages with anthropic-version header), and Gemini (GET /v1beta/models) tokens - Runs after all listeners are ready via a startup latch - Results exposed in /health endpoint (key_validation field) - Non-blocking by default (AWF_VALIDATE_KEYS=warn) — logs clear error messages but doesn't prevent startup - AWF_VALIDATE_KEYS=strict exits with code 1 on auth rejection - AWF_VALIDATE_KEYS=off disables validation entirely - Skips validation for custom API targets (non-default endpoints) - Skips COPILOT_API_KEY-only setups (no probe endpoint available) - Classifies errors as auth_rejected vs network_error vs inconclusive - Routes probe requests through Squid proxy (respects domain allowlist) - 10s timeout per probe, all probes run in parallel Closes #2185 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test(api-proxy): add validateApiKeys orchestration tests Port and adapt 18 tests from PR #2199 (copilot-swe-agent) covering the validateApiKeys orchestrator for all four providers: - OpenAI: valid (200), auth_rejected (401), skipped (custom target), no-op (no key) - Anthropic: valid (400 = key accepted), auth_rejected (401, 403), skipped (custom target) - Copilot: valid (200 with ghu_ token), auth_rejected (401), skipped (custom target, BYOK mode) - Gemini: valid (200), auth_rejected (403), skipped (custom target) - Cross-cutting: network_error (timeout), no-op (no keys at all) To make validateApiKeys testable without module-level state: - Added overrides parameter for injecting keys/targets in tests - Exported keyValidationResults and resetKeyValidationState() - Used 'in' operator for override resolution (supports explicit undefined) Co-authored-by: copilot-swe-agent[bot] <198982749+copilot-swe-agent[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(api-proxy): address review feedback on key validation - httpProbe: add settle-once guard with resolveOnce/rejectOnce to prevent hanging if response stream errors or socket closes early; also handle res 'error' and 'close' events - Startup latch: clarify comment that only validation-participating listeners are counted (no-key Gemini 503 handler excluded) - Test: replace hard-coded port 19999 with dynamic port allocation to prevent flakiness when something listens on that port Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+copilot-swe-agent[bot]@users.noreply.github.com>
Invalid/expired API keys surface as opaque 401s deep in agent execution. This adds lightweight probe requests at startup — after all servers are listening but before the agent processes any traffic — so credential failures are visible immediately with clear, actionable log messages.
Changes
validateKey()— shared probe helperhttps.requestthroughproxyAgent(Squid), same routing as production trafficDEFAULT_VALIDATION_TIMEOUT_MS = 10000), never throws{result: 'success'|'failed'|'timeout'|'error', status, duration_ms}validateApiKeys()— startup orchestratorGET /v1/modelsPOST /v1/messagesGET /modelsghp_*PAT;COPILOT_API_KEYBYOKGET /v1beta/modelsStartup sequencing
Each
server.listen()is wrapped in a Promise;validateApiKeys()fires as a fire-and-forget task only afterPromise.all(listenPromises)resolves — guaranteeing the Docker healthcheck port (10000) is up before probes begin.Log format
{"level":"info", "event":"key_validation_success", "provider":"openai", "duration_ms":342} {"level":"error", "event":"key_validation_failed", "provider":"anthropic", "status":401} {"level":"warn", "event":"key_validation_skipped", "provider":"copilot", "message":"Validation skipped — COPILOT_API_KEY auth mode does not support probe endpoint"} {"level":"warn", "event":"key_validation_timeout", "provider":"gemini", "message":"Key validation timed out after 10s — network may not be ready"}Tests
25 new unit tests covering
validateKey(success, failure, timeout, network error, body forwarding, request options) andvalidateApiKeys(all providers × all outcomes, custom target skip, auth mode skip). Both functions exported for testability;validateApiKeysaccepts anoverridesobject so tests can inject keys, targets, andtimeoutMswithout module reloading.