fix: remove all macOS Keychain reads — server self-authenticates#465
fix: remove all macOS Keychain reads — server self-authenticates#465
Conversation
Investigation SummaryHow we confirmed PolyPilot was the source
Why the 4-PR fix chain didn't workEach PR fixed one trigger path but missed the core issue:
Why removing Keychain code is safe
For users with corrupted ACLsIf a user was affected by the old Keychain code, they can restore with: security delete-generic-password -s "copilot-cli" ~/Library/Keychains/login.keychain-db
copilot login |
## Problem PR #446 added Keychain-reading code (TryReadCopilotKeychainToken, ResolveGitHubTokenForServer, RunProcessWithTimeout) to help users whose headless server could not self-authenticate. This caused a 4-PR regression chain (#446 → #456 → #462 → #463): 1. Each /usr/bin/security call triggers a macOS password dialog (PolyPilot is not in the copilot-cli Keychain ACL) 2. TryReadCopilotKeychainToken tried 3 service names sequentially, each spawning a separate dialog (3× password prompts per call) 3. Clicking Allow/Deny rewrote the Keychain ACL, breaking the server own native keytar access 4. The server fell back to its own /usr/bin/security calls, creating a spiral of recurring password prompts every 1-2 hours ## Root Cause Analysis The headless copilot server authenticates on its own at startup via its native credential store. This has worked reliably across dozens of worktree switches (different binary paths each time). The original PR #446 user issue ("Session was not created with authentication info") was a server auth loss that should have been solved by restarting the server — which TryRecoverPersistentServerAsync already does — not by reading the Keychain from the UI process. ## Changes - Remove ResolveGitHubTokenForServer (Keychain + gh CLI reads) - Remove TryReadCopilotKeychainToken (3-service-name loop) - Remove RunProcessWithTimeout (only used by above) - Remove _tokenResolutionLock SemaphoreSlim (guarded lazy path) - Remove lazy Keychain resolution path in CheckAuthStatusAsync - Remove sentinel logic (_resolvedGitHubToken ??= string.Empty) - Simplify ReauthenticateAsync to just restart the server - Keep ResolveGitHubTokenFromEnv (env vars are safe, no prompts) - Keep auth banner + Re-authenticate button (correct UX) - Rewrite auth-token-safety skill doc with new invariant - Remove 7 tests for deleted methods, keep 3 env var tests ## For users who cannot self-authenticate The auth banner says: "run copilot login in a terminal, then click Re-authenticate." Re-authenticate restarts the server, which picks up the fresh credentials. This was the original PR #446 design before Keychain code was added during review rounds. Tests: 3057/3057 pass ✅ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0f47e66 to
731a702
Compare
🔍 Multi-Model Code Review — PR #465PR: fix: remove all macOS Keychain reads — server self-authenticates Consensus Findings (2+ of 3 models agree)🟡 MODERATE — Stale XML doc and comments reference deleted code/invariantsFlagged by: Opus, Sonnet | Not flagged by: Codex Two stale comment blocks remain in 1. XML doc on The entire lazy-resolution block this describes was removed. INV-A1/INV-A2 no longer exist in the rewritten SKILL.md. Suggested replacement: "Checks the CLI server's auth status and shows a banner if not authenticated. Returns true if authenticated." 2. Comment in auth polling loop (lines ~923–926):
Why it matters: Future engineers reading these comments will be confused about invariants and methods that don't exist, and may waste time searching for them or try to re-add Keychain code based on the doc's description. 🟢 MINOR — Test coverage gap for new behavior pathsFlagged by: Sonnet, Codex | Not flagged by: Opus The 7 removed tests correctly correspond to the 7 deleted methods. However, no replacement tests were added to validate the new behavior:
Risk: Low — the new code is simpler than what it replaced. But a regression guard would help prevent the 4-PR regression chain from repeating. Items NOT reaching consensus (1 of 3 models only)
Verification Summary
Recommended Action
Overall this is a well-scoped, clean removal that eliminates a genuine 4-PR regression chain. The architectural decision (server self-authenticates) is sound and well-documented in the PR description and rewritten SKILL.md. |
Addresses review feedback: two comment blocks still referenced ResolveGitHubTokenForServer() and INV-A1/INV-A2 invariants that no longer exist after the Keychain removal. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Multi-Model Code Review — PR #465 (R2)PR: fix: remove all macOS Keychain reads — server self-authenticates R1 Findings Status
New R2 FindingsNo new consensus findings. One model (Codex) flagged a potential recovery gap for expired env-var tokens in Verification Summary
Recommended Action✅ Approve — All R1 findings are resolved. No new issues found by consensus. The PR is a clean, well-scoped removal of ~620 lines of dangerous Keychain code that caused a 4-PR regression chain. Architecture is sound: the server self-authenticates via its native credential store. Documentation accurately reflects the new design. |
…eWeen#465) ## Problem PR PureWeen#446 added Keychain-reading code to help users whose headless server couldn't self-authenticate. This caused a **4-PR regression chain** (PureWeen#446 → PureWeen#456 → PureWeen#462 → PureWeen#463) of recurring macOS password dialogs: 1. `TryReadCopilotKeychainToken` spawned `/usr/bin/security find-generic-password` for 3 service names sequentially — **3 password dialogs per call** 2. Clicking Allow/Deny on those dialogs **rewrote the Keychain ACL**, breaking the server's own native keytar access 3. The server fell back to its own `/usr/bin/security` calls → **more dialogs every 1-2 hours** 4. PRs PureWeen#456, PureWeen#462, PureWeen#463 each fixed one trigger path but the core approach was wrong ## Root Cause The headless copilot server **authenticates on its own** at startup via its native credential store. This has worked reliably across dozens of worktree switches (different binary paths each time) without any Keychain intervention from PolyPilot. The original PR PureWeen#446 user issue ("Session was not created with authentication info") was a server auth loss that should have been solved by **restarting the server** — which `TryRecoverPersistentServerAsync` already does — not by reading the Keychain from the UI process. ## Fix Remove all Keychain-reading code entirely (**-612 lines, +42 lines**): | Removed | Why | |---------|-----| | `ResolveGitHubTokenForServer()` | Keychain + gh CLI reads — triggers password dialogs | | `TryReadCopilotKeychainToken()` | 3-service-name loop — 3× password dialogs per call | | `RunProcessWithTimeout()` | Only used by above | | `_tokenResolutionLock` SemaphoreSlim | Guarded the lazy Keychain path | | Lazy resolution path in `CheckAuthStatusAsync` | The whole Keychain auto-resolution mechanism | | Sentinel logic (`_resolvedGitHubToken ??= ""`) | No longer needed without the lazy path | | Kept | Why | |------|-----| | `ResolveGitHubTokenFromEnv()` | Env vars are safe, no prompts | | `CheckAuthStatusAsync` + auth banner | Correct — shows "run copilot login" guidance | | `TryRecoverPersistentServerAsync` | Correct — restarts server which re-authenticates on its own | | Re-authenticate button | Correct — restarts server to pick up fresh `copilot login` credentials | ## For users who cannot self-authenticate The auth banner says: _"run `copilot login` in a terminal, then click Re-authenticate."_ This was the **original PR PureWeen#446 design** before Keychain code was added during review rounds. ## Timeline of the regression | PR | What it did | What went wrong | |----|-------------|-----------------| | PureWeen#446 | Added Keychain reads to forward token to server | Triggered password dialogs; corrupted Keychain ACL | | PureWeen#456 | Made Keychain reads lazy (only on first auth failure) | Still fired on every server recovery cycle | | PureWeen#462 | Stopped recovery from clearing the token cache | Token was never set in the first place (no env var) | | PureWeen#463 | Added sentinel on auth success | Server's own internal Keychain reads still fired | | **PureWeen#465** | **Removed all Keychain code** | **The right fix — server self-authenticates** | ## Tests 3057/3057 pass ✅ (7 tests for deleted methods removed, 3 env var tests kept) ## Why server self-authentication works The copilot headless server binary bundled inside PolyPilot.app and the Homebrew `copilot` binary are both signed with the **same GitHub Developer ID certificate**. macOS Keychain ACLs use code-signing identity (not binary path) to control access, so: - `copilot login` (Homebrew binary) writes the token to Keychain with an ACL scoped to the GitHub Developer ID - `copilot --headless` (bundled binary) reads the token via keytar — **same Developer ID = no password prompt** - PolyPilot's `/usr/bin/security` calls used a **different signer** (Apple's built-in tool), which triggered the ACL mismatch and the password dialogs - Clicking Allow/Deny on those dialogs **rewrote the ACL**, breaking the server's native keytar access — creating the regression spiral This was verified via `codesign -dvvv` on both binaries — they share the same Identifier, Authority chain, and team certificate. The Keychain entry's partition list (visible in `securityd` logs) confirms it uses team-id-based access control, not path-based. **Conclusion:** The server has always been able to self-authenticate. The only thing that broke it was PolyPilot calling `/usr/bin/security`, which used a different code-signing identity and corrupted the ACL. Removing those calls is the correct fix. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
PR #446 added Keychain-reading code to help users whose headless server couldn't self-authenticate. This caused a 4-PR regression chain (#446 → #456 → #462 → #463) of recurring macOS password dialogs:
TryReadCopilotKeychainTokenspawned/usr/bin/security find-generic-passwordfor 3 service names sequentially — 3 password dialogs per call/usr/bin/securitycalls → more dialogs every 1-2 hoursRoot Cause
The headless copilot server authenticates on its own at startup via its native credential store. This has worked reliably across dozens of worktree switches (different binary paths each time) without any Keychain intervention from PolyPilot.
The original PR #446 user issue ("Session was not created with authentication info") was a server auth loss that should have been solved by restarting the server — which
TryRecoverPersistentServerAsyncalready does — not by reading the Keychain from the UI process.Fix
Remove all Keychain-reading code entirely (-612 lines, +42 lines):
ResolveGitHubTokenForServer()TryReadCopilotKeychainToken()RunProcessWithTimeout()_tokenResolutionLockSemaphoreSlimCheckAuthStatusAsync_resolvedGitHubToken ??= "")ResolveGitHubTokenFromEnv()CheckAuthStatusAsync+ auth bannerTryRecoverPersistentServerAsynccopilot logincredentialsFor users who cannot self-authenticate
The auth banner says: "run
copilot loginin a terminal, then click Re-authenticate." This was the original PR #446 design before Keychain code was added during review rounds.Timeline of the regression
Tests
3057/3057 pass ✅ (7 tests for deleted methods removed, 3 env var tests kept)
Why server self-authentication works
The copilot headless server binary bundled inside PolyPilot.app and the Homebrew
copilotbinary are both signed with the same GitHub Developer ID certificate. macOS Keychain ACLs use code-signing identity (not binary path) to control access, so:copilot login(Homebrew binary) writes the token to Keychain with an ACL scoped to the GitHub Developer IDcopilot --headless(bundled binary) reads the token via keytar — same Developer ID = no password prompt/usr/bin/securitycalls used a different signer (Apple's built-in tool), which triggered the ACL mismatch and the password dialogsThis was verified via
codesign -dvvvon both binaries — they share the same Identifier, Authority chain, and team certificate. The Keychain entry's partition list (visible insecuritydlogs) confirms it uses team-id-based access control, not path-based.Conclusion: The server has always been able to self-authenticate. The only thing that broke it was PolyPilot calling
/usr/bin/security, which used a different code-signing identity and corrupted the ACL. Removing those calls is the correct fix.