fix: stop recurring Keychain password prompts on server recovery#462
fix: stop recurring Keychain password prompts on server recovery#462
Conversation
TryRecoverPersistentServerAsync was clearing _resolvedGitHubToken = null on every recovery cycle (watchdog timeout, wake/sleep, auth polling). This caused CheckAuthStatusAsync to re-enter the lazy Keychain resolution path, spawning `security find-generic-password` which triggers a macOS password dialog each time. Re-reading the Keychain is pointless after token expiration — the stored token is static and only changes when `copilot login` is run. The same expired token comes back, triggering another dialog on the next cycle. Fix: preserve the cached token through recovery. Only ReauthenticateAsync (explicit user action) and ReconnectAsync (settings change) clear it. When the forwarded token expires, the auth banner appears and the user can click Re-authenticate for a fresh Keychain read. Updates INV-A3 in auth-token-safety skill to reflect the new invariant. Tests: 3064/3064 pass ✅ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Multi-Model Code Review — PR #462 R1Models: Claude Opus 4.6 · Claude Sonnet 4.6 · GPT-5.3-Codex SummaryThe PR removes ✅ All 3 Models Agree: No Issues FoundRegressions — None ✅
This is more stable than the old behavior (clear → lazy resolve → read same expired token → loop with Keychain dialogs). Security — None ✅ Race Conditions — None ✅ Bugs — None ✅
📋 Test Coverage
Verdict: ✅ ApproveClean, well-scoped fix with thorough inline documentation. Aligns with auth-token-safety skill invariants (INV-A2, INV-A3). No blocking issues. Consider adding a test asserting token preservation across recovery as a follow-up. |
Multi-Model Code Review: PR #462PR: fix: stop recurring Keychain password prompts on server recovery Consensus FindingsNo blocking issues found. Both models independently verified:
Test Coverage Note (2/2 models flagged — 🟢 MINOR)
SummaryThe fix is correct and well-scoped. The one-line removal of Recommended action: ✅ Approve |
## Problem After PR #462 merged, users still get macOS "allow copilot-cli" password dialogs every ~1-2 hours. Process monitoring confirmed the parent PID is the PolyPilot .NET process itself (`dotnet exec`), not external copilot binaries. ## Root Cause PR #462 prevented `TryRecoverPersistentServerAsync` from clearing `_resolvedGitHubToken`. But the token was **never set in the first place** — on startup, `ResolveGitHubTokenFromEnv()` returns null (no env var), and when `CheckAuthStatusAsync` finds the server authenticated, it returns `true` without setting the field. So `_resolvedGitHubToken` stays `null`. Later, any transient auth failure (connection blip, client recreation, server token rotation) calls `CheckAuthStatusAsync` again. This time auth fails, the guard `_resolvedGitHubToken == null` is satisfied, and the lazy path fires: 1. `ResolveGitHubTokenForServer()` → `TryReadCopilotKeychainToken()` 2. Tries 3 service names: `copilot-cli`, `github-copilot`, `GitHub Copilot` 3. Each spawns `/usr/bin/security find-generic-password -s <name> -w` (3s timeout) 4. **3 macOS password dialogs** (PolyPilot not in the Keychain ACL for this entry) ## Evidence Process monitor output confirmed PolyPilot as the source: ``` security PID=66918, Parent PID=66646 66646 66630 /usr/local/share/dotnet/dotnet exec --runtimeconfig ... security PID=67068, Parent PID=66646 (3s later) security PID=67146, Parent PID=66646 (3s later) ``` ## Fix Set `_resolvedGitHubToken ??= string.Empty` when `CheckAuthStatusAsync` finds the server authenticated. The empty string sentinel means "server can self-authenticate, no Keychain read needed." The lazy path guard `_resolvedGitHubToken == null` is no longer satisfied. `ServerManager.StartServerAsync` uses `!string.IsNullOrEmpty(githubToken)` so the empty sentinel is never forwarded as a `COPILOT_GITHUB_TOKEN` env var. ## Behavior Matrix | Scenario | Before | After | |----------|--------|-------| | Startup, server self-auths | `_resolvedGitHubToken` stays null | Set to `""` (sentinel) | | Later transient auth failure | Lazy path fires → 3 password dialogs | Lazy path skipped → auth banner shown | | Server genuinely cant auth | Lazy path fires once (correct) | Same — `""` not set because auth failed | | ReauthenticateAsync (user) | Fresh Keychain read | Same — clears to null first | | ReconnectAsync (settings) | Clears to null | Same | ## Tests 3064/3064 pass ✅ ## Related - Fixes remaining issue after #462 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
## Problem PR #446 added Keychain-reading code (TryReadCopilotKeychainToken, ResolveGitHubTokenForServer, RunProcessWithTimeout) to help users whose headless server could not self-authenticate. This caused a 4-PR regression chain (#446 → #456 → #462 → #463): 1. Each /usr/bin/security call triggers a macOS password dialog (PolyPilot is not in the copilot-cli Keychain ACL) 2. TryReadCopilotKeychainToken tried 3 service names sequentially, each spawning a separate dialog (3× password prompts per call) 3. Clicking Allow/Deny rewrote the Keychain ACL, breaking the server own native keytar access 4. The server fell back to its own /usr/bin/security calls, creating a spiral of recurring password prompts every 1-2 hours ## Root Cause Analysis The headless copilot server authenticates on its own at startup via its native credential store. This has worked reliably across dozens of worktree switches (different binary paths each time). The original PR #446 user issue ("Session was not created with authentication info") was a server auth loss that should have been solved by restarting the server — which TryRecoverPersistentServerAsync already does — not by reading the Keychain from the UI process. ## Changes - Remove ResolveGitHubTokenForServer (Keychain + gh CLI reads) - Remove TryReadCopilotKeychainToken (3-service-name loop) - Remove RunProcessWithTimeout (only used by above) - Remove _tokenResolutionLock SemaphoreSlim (guarded lazy path) - Remove lazy Keychain resolution path in CheckAuthStatusAsync - Remove sentinel logic (_resolvedGitHubToken ??= string.Empty) - Simplify ReauthenticateAsync to just restart the server - Keep ResolveGitHubTokenFromEnv (env vars are safe, no prompts) - Keep auth banner + Re-authenticate button (correct UX) - Rewrite auth-token-safety skill doc with new invariant - Remove 7 tests for deleted methods, keep 3 env var tests ## For users who cannot self-authenticate The auth banner says: "run copilot login in a terminal, then click Re-authenticate." Re-authenticate restarts the server, which picks up the fresh credentials. This was the original PR #446 design before Keychain code was added during review rounds. Tests: 3057/3057 pass ✅ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
## Problem PR #446 added Keychain-reading code (TryReadCopilotKeychainToken, ResolveGitHubTokenForServer, RunProcessWithTimeout) to help users whose headless server could not self-authenticate. This caused a 4-PR regression chain (#446 → #456 → #462 → #463): 1. Each /usr/bin/security call triggers a macOS password dialog (PolyPilot is not in the copilot-cli Keychain ACL) 2. TryReadCopilotKeychainToken tried 3 service names sequentially, each spawning a separate dialog (3× password prompts per call) 3. Clicking Allow/Deny rewrote the Keychain ACL, breaking the server own native keytar access 4. The server fell back to its own /usr/bin/security calls, creating a spiral of recurring password prompts every 1-2 hours ## Root Cause Analysis The headless copilot server authenticates on its own at startup via its native credential store. This has worked reliably across dozens of worktree switches (different binary paths each time). The original PR #446 user issue ("Session was not created with authentication info") was a server auth loss that should have been solved by restarting the server — which TryRecoverPersistentServerAsync already does — not by reading the Keychain from the UI process. ## Changes - Remove ResolveGitHubTokenForServer (Keychain + gh CLI reads) - Remove TryReadCopilotKeychainToken (3-service-name loop) - Remove RunProcessWithTimeout (only used by above) - Remove _tokenResolutionLock SemaphoreSlim (guarded lazy path) - Remove lazy Keychain resolution path in CheckAuthStatusAsync - Remove sentinel logic (_resolvedGitHubToken ??= string.Empty) - Simplify ReauthenticateAsync to just restart the server - Keep ResolveGitHubTokenFromEnv (env vars are safe, no prompts) - Keep auth banner + Re-authenticate button (correct UX) - Rewrite auth-token-safety skill doc with new invariant - Remove 7 tests for deleted methods, keep 3 env var tests ## For users who cannot self-authenticate The auth banner says: "run copilot login in a terminal, then click Re-authenticate." Re-authenticate restarts the server, which picks up the fresh credentials. This was the original PR #446 design before Keychain code was added during review rounds. Tests: 3057/3057 pass ✅ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
## Problem PR #446 added Keychain-reading code to help users whose headless server couldn't self-authenticate. This caused a **4-PR regression chain** (#446 → #456 → #462 → #463) of recurring macOS password dialogs: 1. `TryReadCopilotKeychainToken` spawned `/usr/bin/security find-generic-password` for 3 service names sequentially — **3 password dialogs per call** 2. Clicking Allow/Deny on those dialogs **rewrote the Keychain ACL**, breaking the server's own native keytar access 3. The server fell back to its own `/usr/bin/security` calls → **more dialogs every 1-2 hours** 4. PRs #456, #462, #463 each fixed one trigger path but the core approach was wrong ## Root Cause The headless copilot server **authenticates on its own** at startup via its native credential store. This has worked reliably across dozens of worktree switches (different binary paths each time) without any Keychain intervention from PolyPilot. The original PR #446 user issue ("Session was not created with authentication info") was a server auth loss that should have been solved by **restarting the server** — which `TryRecoverPersistentServerAsync` already does — not by reading the Keychain from the UI process. ## Fix Remove all Keychain-reading code entirely (**-612 lines, +42 lines**): | Removed | Why | |---------|-----| | `ResolveGitHubTokenForServer()` | Keychain + gh CLI reads — triggers password dialogs | | `TryReadCopilotKeychainToken()` | 3-service-name loop — 3× password dialogs per call | | `RunProcessWithTimeout()` | Only used by above | | `_tokenResolutionLock` SemaphoreSlim | Guarded the lazy Keychain path | | Lazy resolution path in `CheckAuthStatusAsync` | The whole Keychain auto-resolution mechanism | | Sentinel logic (`_resolvedGitHubToken ??= ""`) | No longer needed without the lazy path | | Kept | Why | |------|-----| | `ResolveGitHubTokenFromEnv()` | Env vars are safe, no prompts | | `CheckAuthStatusAsync` + auth banner | Correct — shows "run copilot login" guidance | | `TryRecoverPersistentServerAsync` | Correct — restarts server which re-authenticates on its own | | Re-authenticate button | Correct — restarts server to pick up fresh `copilot login` credentials | ## For users who cannot self-authenticate The auth banner says: _"run `copilot login` in a terminal, then click Re-authenticate."_ This was the **original PR #446 design** before Keychain code was added during review rounds. ## Timeline of the regression | PR | What it did | What went wrong | |----|-------------|-----------------| | #446 | Added Keychain reads to forward token to server | Triggered password dialogs; corrupted Keychain ACL | | #456 | Made Keychain reads lazy (only on first auth failure) | Still fired on every server recovery cycle | | #462 | Stopped recovery from clearing the token cache | Token was never set in the first place (no env var) | | #463 | Added sentinel on auth success | Server's own internal Keychain reads still fired | | **#465** | **Removed all Keychain code** | **The right fix — server self-authenticates** | ## Tests 3057/3057 pass ✅ (7 tests for deleted methods removed, 3 env var tests kept) ## Why server self-authentication works The copilot headless server binary bundled inside PolyPilot.app and the Homebrew `copilot` binary are both signed with the **same GitHub Developer ID certificate**. macOS Keychain ACLs use code-signing identity (not binary path) to control access, so: - `copilot login` (Homebrew binary) writes the token to Keychain with an ACL scoped to the GitHub Developer ID - `copilot --headless` (bundled binary) reads the token via keytar — **same Developer ID = no password prompt** - PolyPilot's `/usr/bin/security` calls used a **different signer** (Apple's built-in tool), which triggered the ACL mismatch and the password dialogs - Clicking Allow/Deny on those dialogs **rewrote the ACL**, breaking the server's native keytar access — creating the regression spiral This was verified via `codesign -dvvv` on both binaries — they share the same Identifier, Authority chain, and team certificate. The Keychain entry's partition list (visible in `securityd` logs) confirms it uses team-id-based access control, not path-based. **Conclusion:** The server has always been able to self-authenticate. The only thing that broke it was PolyPilot calling `/usr/bin/security`, which used a different code-signing identity and corrupted the ACL. Removing those calls is the correct fix. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eWeen#462) ## Problem Every couple of hours, the user gets prompted to "allow copilot-cli" and must enter their macOS login password ~5 times. This is a regression from PR PureWeen#456 — that fix addressed the 10-second polling loop but missed a second cache-clearing path. ## Root Cause `TryRecoverPersistentServerAsync()` clears `_resolvedGitHubToken = null` on every recovery cycle (line 1339). Recovery is triggered by: - Watchdog timeouts (token expiry ~1-8h) - Wake/sleep health check failures - Auth polling success → server restart After clearing, the next `CheckAuthStatusAsync()` call sees `_resolvedGitHubToken == null` → enters the lazy Keychain resolution path → spawns `security find-generic-password -s copilot-cli -w` → **macOS password dialog**. **Re-reading the Keychain is useless** — the stored token is a static snapshot from `copilot login`. If it expired, re-reading returns the same expired token. Only running `copilot login` again would write a new one. ## Fix Remove the `_resolvedGitHubToken = null` from `TryRecoverPersistentServerAsync`. The cached token is still forwarded to the new server via `tokenToForward`. Only two paths now clear the cache: - `ReconnectAsync` — explicit user action (settings change) - `ReauthenticateAsync` — explicit user action (Re-authenticate button) When the token expires, the auth banner appears and the user clicks Re-authenticate → fresh Keychain read (correct — user-initiated). ## Changes | File | Change | |------|--------| | `CopilotService.cs` | Remove `_resolvedGitHubToken = null` from recovery path | | `auth-token-safety/SKILL.md` | Update INV-A3 to reflect new invariant | ## Tests 3064/3064 pass ✅ ## Related - Fixes regression from PureWeen#456 (which fixed regression from PureWeen#446) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eWeen#463) ## Problem After PR PureWeen#462 merged, users still get macOS "allow copilot-cli" password dialogs every ~1-2 hours. Process monitoring confirmed the parent PID is the PolyPilot .NET process itself (`dotnet exec`), not external copilot binaries. ## Root Cause PR PureWeen#462 prevented `TryRecoverPersistentServerAsync` from clearing `_resolvedGitHubToken`. But the token was **never set in the first place** — on startup, `ResolveGitHubTokenFromEnv()` returns null (no env var), and when `CheckAuthStatusAsync` finds the server authenticated, it returns `true` without setting the field. So `_resolvedGitHubToken` stays `null`. Later, any transient auth failure (connection blip, client recreation, server token rotation) calls `CheckAuthStatusAsync` again. This time auth fails, the guard `_resolvedGitHubToken == null` is satisfied, and the lazy path fires: 1. `ResolveGitHubTokenForServer()` → `TryReadCopilotKeychainToken()` 2. Tries 3 service names: `copilot-cli`, `github-copilot`, `GitHub Copilot` 3. Each spawns `/usr/bin/security find-generic-password -s <name> -w` (3s timeout) 4. **3 macOS password dialogs** (PolyPilot not in the Keychain ACL for this entry) ## Evidence Process monitor output confirmed PolyPilot as the source: ``` security PID=66918, Parent PID=66646 66646 66630 /usr/local/share/dotnet/dotnet exec --runtimeconfig ... security PID=67068, Parent PID=66646 (3s later) security PID=67146, Parent PID=66646 (3s later) ``` ## Fix Set `_resolvedGitHubToken ??= string.Empty` when `CheckAuthStatusAsync` finds the server authenticated. The empty string sentinel means "server can self-authenticate, no Keychain read needed." The lazy path guard `_resolvedGitHubToken == null` is no longer satisfied. `ServerManager.StartServerAsync` uses `!string.IsNullOrEmpty(githubToken)` so the empty sentinel is never forwarded as a `COPILOT_GITHUB_TOKEN` env var. ## Behavior Matrix | Scenario | Before | After | |----------|--------|-------| | Startup, server self-auths | `_resolvedGitHubToken` stays null | Set to `""` (sentinel) | | Later transient auth failure | Lazy path fires → 3 password dialogs | Lazy path skipped → auth banner shown | | Server genuinely cant auth | Lazy path fires once (correct) | Same — `""` not set because auth failed | | ReauthenticateAsync (user) | Fresh Keychain read | Same — clears to null first | | ReconnectAsync (settings) | Clears to null | Same | ## Tests 3064/3064 pass ✅ ## Related - Fixes remaining issue after PureWeen#462 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eWeen#465) ## Problem PR PureWeen#446 added Keychain-reading code to help users whose headless server couldn't self-authenticate. This caused a **4-PR regression chain** (PureWeen#446 → PureWeen#456 → PureWeen#462 → PureWeen#463) of recurring macOS password dialogs: 1. `TryReadCopilotKeychainToken` spawned `/usr/bin/security find-generic-password` for 3 service names sequentially — **3 password dialogs per call** 2. Clicking Allow/Deny on those dialogs **rewrote the Keychain ACL**, breaking the server's own native keytar access 3. The server fell back to its own `/usr/bin/security` calls → **more dialogs every 1-2 hours** 4. PRs PureWeen#456, PureWeen#462, PureWeen#463 each fixed one trigger path but the core approach was wrong ## Root Cause The headless copilot server **authenticates on its own** at startup via its native credential store. This has worked reliably across dozens of worktree switches (different binary paths each time) without any Keychain intervention from PolyPilot. The original PR PureWeen#446 user issue ("Session was not created with authentication info") was a server auth loss that should have been solved by **restarting the server** — which `TryRecoverPersistentServerAsync` already does — not by reading the Keychain from the UI process. ## Fix Remove all Keychain-reading code entirely (**-612 lines, +42 lines**): | Removed | Why | |---------|-----| | `ResolveGitHubTokenForServer()` | Keychain + gh CLI reads — triggers password dialogs | | `TryReadCopilotKeychainToken()` | 3-service-name loop — 3× password dialogs per call | | `RunProcessWithTimeout()` | Only used by above | | `_tokenResolutionLock` SemaphoreSlim | Guarded the lazy Keychain path | | Lazy resolution path in `CheckAuthStatusAsync` | The whole Keychain auto-resolution mechanism | | Sentinel logic (`_resolvedGitHubToken ??= ""`) | No longer needed without the lazy path | | Kept | Why | |------|-----| | `ResolveGitHubTokenFromEnv()` | Env vars are safe, no prompts | | `CheckAuthStatusAsync` + auth banner | Correct — shows "run copilot login" guidance | | `TryRecoverPersistentServerAsync` | Correct — restarts server which re-authenticates on its own | | Re-authenticate button | Correct — restarts server to pick up fresh `copilot login` credentials | ## For users who cannot self-authenticate The auth banner says: _"run `copilot login` in a terminal, then click Re-authenticate."_ This was the **original PR PureWeen#446 design** before Keychain code was added during review rounds. ## Timeline of the regression | PR | What it did | What went wrong | |----|-------------|-----------------| | PureWeen#446 | Added Keychain reads to forward token to server | Triggered password dialogs; corrupted Keychain ACL | | PureWeen#456 | Made Keychain reads lazy (only on first auth failure) | Still fired on every server recovery cycle | | PureWeen#462 | Stopped recovery from clearing the token cache | Token was never set in the first place (no env var) | | PureWeen#463 | Added sentinel on auth success | Server's own internal Keychain reads still fired | | **PureWeen#465** | **Removed all Keychain code** | **The right fix — server self-authenticates** | ## Tests 3057/3057 pass ✅ (7 tests for deleted methods removed, 3 env var tests kept) ## Why server self-authentication works The copilot headless server binary bundled inside PolyPilot.app and the Homebrew `copilot` binary are both signed with the **same GitHub Developer ID certificate**. macOS Keychain ACLs use code-signing identity (not binary path) to control access, so: - `copilot login` (Homebrew binary) writes the token to Keychain with an ACL scoped to the GitHub Developer ID - `copilot --headless` (bundled binary) reads the token via keytar — **same Developer ID = no password prompt** - PolyPilot's `/usr/bin/security` calls used a **different signer** (Apple's built-in tool), which triggered the ACL mismatch and the password dialogs - Clicking Allow/Deny on those dialogs **rewrote the ACL**, breaking the server's native keytar access — creating the regression spiral This was verified via `codesign -dvvv` on both binaries — they share the same Identifier, Authority chain, and team certificate. The Keychain entry's partition list (visible in `securityd` logs) confirms it uses team-id-based access control, not path-based. **Conclusion:** The server has always been able to self-authenticate. The only thing that broke it was PolyPilot calling `/usr/bin/security`, which used a different code-signing identity and corrupted the ACL. Removing those calls is the correct fix. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
Every couple of hours, the user gets prompted to "allow copilot-cli" and must enter their macOS login password ~5 times. This is a regression from PR #456 — that fix addressed the 10-second polling loop but missed a second cache-clearing path.
Root Cause
TryRecoverPersistentServerAsync()clears_resolvedGitHubToken = nullon every recovery cycle (line 1339). Recovery is triggered by:After clearing, the next
CheckAuthStatusAsync()call sees_resolvedGitHubToken == null→ enters the lazy Keychain resolution path → spawnssecurity find-generic-password -s copilot-cli -w→ macOS password dialog.Re-reading the Keychain is useless — the stored token is a static snapshot from
copilot login. If it expired, re-reading returns the same expired token. Only runningcopilot loginagain would write a new one.Fix
Remove the
_resolvedGitHubToken = nullfromTryRecoverPersistentServerAsync. The cached token is still forwarded to the new server viatokenToForward. Only two paths now clear the cache:ReconnectAsync— explicit user action (settings change)ReauthenticateAsync— explicit user action (Re-authenticate button)When the token expires, the auth banner appears and the user clicks Re-authenticate → fresh Keychain read (correct — user-initiated).
Changes
CopilotService.cs_resolvedGitHubToken = nullfrom recovery pathauth-token-safety/SKILL.mdTests
3064/3064 pass ✅
Related