Skip to content

fix: remove all macOS Keychain reads — server self-authenticates#465

Merged
PureWeen merged 2 commits intomainfrom
fix/remove-keychain-reads
Apr 1, 2026
Merged

fix: remove all macOS Keychain reads — server self-authenticates#465
PureWeen merged 2 commits intomainfrom
fix/remove-keychain-reads

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

@PureWeen PureWeen commented Apr 1, 2026

Problem

PR #446 added Keychain-reading code to help users whose headless server couldn't self-authenticate. This caused a 4-PR regression chain (#446#456#462#463) of recurring macOS password dialogs:

  1. TryReadCopilotKeychainToken spawned /usr/bin/security find-generic-password for 3 service names sequentially — 3 password dialogs per call
  2. Clicking Allow/Deny on those dialogs rewrote the Keychain ACL, breaking the server's own native keytar access
  3. The server fell back to its own /usr/bin/security calls → more dialogs every 1-2 hours
  4. PRs fix: lazy Keychain reads to stop hourly password prompts #456, fix: stop recurring Keychain password prompts on server recovery #462, fix: set sentinel on auth success to prevent lazy Keychain reads #463 each fixed one trigger path but the core approach was wrong

Root Cause

The headless copilot server authenticates on its own at startup via its native credential store. This has worked reliably across dozens of worktree switches (different binary paths each time) without any Keychain intervention from PolyPilot.

The original PR #446 user issue ("Session was not created with authentication info") was a server auth loss that should have been solved by restarting the server — which TryRecoverPersistentServerAsync already does — not by reading the Keychain from the UI process.

Fix

Remove all Keychain-reading code entirely (-612 lines, +42 lines):

Removed Why
ResolveGitHubTokenForServer() Keychain + gh CLI reads — triggers password dialogs
TryReadCopilotKeychainToken() 3-service-name loop — 3× password dialogs per call
RunProcessWithTimeout() Only used by above
_tokenResolutionLock SemaphoreSlim Guarded the lazy Keychain path
Lazy resolution path in CheckAuthStatusAsync The whole Keychain auto-resolution mechanism
Sentinel logic (_resolvedGitHubToken ??= "") No longer needed without the lazy path
Kept Why
ResolveGitHubTokenFromEnv() Env vars are safe, no prompts
CheckAuthStatusAsync + auth banner Correct — shows "run copilot login" guidance
TryRecoverPersistentServerAsync Correct — restarts server which re-authenticates on its own
Re-authenticate button Correct — restarts server to pick up fresh copilot login credentials

For users who cannot self-authenticate

The auth banner says: "run copilot login in a terminal, then click Re-authenticate." This was the original PR #446 design before Keychain code was added during review rounds.

Timeline of the regression

PR What it did What went wrong
#446 Added Keychain reads to forward token to server Triggered password dialogs; corrupted Keychain ACL
#456 Made Keychain reads lazy (only on first auth failure) Still fired on every server recovery cycle
#462 Stopped recovery from clearing the token cache Token was never set in the first place (no env var)
#463 Added sentinel on auth success Server's own internal Keychain reads still fired
#465 Removed all Keychain code The right fix — server self-authenticates

Tests

3057/3057 pass ✅ (7 tests for deleted methods removed, 3 env var tests kept)

Why server self-authentication works

The copilot headless server binary bundled inside PolyPilot.app and the Homebrew copilot binary are both signed with the same GitHub Developer ID certificate. macOS Keychain ACLs use code-signing identity (not binary path) to control access, so:

  • copilot login (Homebrew binary) writes the token to Keychain with an ACL scoped to the GitHub Developer ID
  • copilot --headless (bundled binary) reads the token via keytar — same Developer ID = no password prompt
  • PolyPilot's /usr/bin/security calls used a different signer (Apple's built-in tool), which triggered the ACL mismatch and the password dialogs
  • Clicking Allow/Deny on those dialogs rewrote the ACL, breaking the server's native keytar access — creating the regression spiral

This was verified via codesign -dvvv on both binaries — they share the same Identifier, Authority chain, and team certificate. The Keychain entry's partition list (visible in securityd logs) confirms it uses team-id-based access control, not path-based.

Conclusion: The server has always been able to self-authenticate. The only thing that broke it was PolyPilot calling /usr/bin/security, which used a different code-signing identity and corrupted the ACL. Removing those calls is the correct fix.

@PureWeen
Copy link
Copy Markdown
Owner Author

PureWeen commented Apr 1, 2026

Investigation Summary

How we confirmed PolyPilot was the source

  • Ran a process monitor (pgrep loop catching /usr/bin/security spawns and their parent PIDs)
  • Every keychain prompt traced back to dotnet exec (the PolyPilot .NET process) as the parent
  • Killed all other copilot processes (Agency, copilot --yolo, GitHub Desktop) — dialogs continued
  • Even after restarting the headless server, dialogs continued from the server binary's own internal Keychain reads — which were broken because PolyPilot's earlier /usr/bin/security calls had corrupted the ACL

Why the 4-PR fix chain didn't work

Each PR fixed one trigger path but missed the core issue:

Why removing Keychain code is safe

  • Verified via codesign -dvvv: the bundled copilot and Homebrew copilot share the same GitHub Developer ID certificate
  • macOS Keychain ACLs use code-signing identity, not binary path
  • The server authenticates via keytar (native Security.framework) → same Developer ID → no prompt
  • The ONLY thing that caused prompts was /usr/bin/security (Apple's tool, different signer)

For users with corrupted ACLs

If a user was affected by the old Keychain code, they can restore with:

security delete-generic-password -s "copilot-cli" ~/Library/Keychains/login.keychain-db
copilot login

## Problem

PR #446 added Keychain-reading code (TryReadCopilotKeychainToken,
ResolveGitHubTokenForServer, RunProcessWithTimeout) to help users
whose headless server could not self-authenticate. This caused a
4-PR regression chain (#446#456#462#463):

1. Each /usr/bin/security call triggers a macOS password dialog
   (PolyPilot is not in the copilot-cli Keychain ACL)
2. TryReadCopilotKeychainToken tried 3 service names sequentially,
   each spawning a separate dialog (3× password prompts per call)
3. Clicking Allow/Deny rewrote the Keychain ACL, breaking the
   server own native keytar access
4. The server fell back to its own /usr/bin/security calls, creating
   a spiral of recurring password prompts every 1-2 hours

## Root Cause Analysis

The headless copilot server authenticates on its own at startup via
its native credential store. This has worked reliably across dozens
of worktree switches (different binary paths each time). The original
PR #446 user issue ("Session was not created with authentication
info") was a server auth loss that should have been solved by
restarting the server — which TryRecoverPersistentServerAsync already
does — not by reading the Keychain from the UI process.

## Changes

- Remove ResolveGitHubTokenForServer (Keychain + gh CLI reads)
- Remove TryReadCopilotKeychainToken (3-service-name loop)
- Remove RunProcessWithTimeout (only used by above)
- Remove _tokenResolutionLock SemaphoreSlim (guarded lazy path)
- Remove lazy Keychain resolution path in CheckAuthStatusAsync
- Remove sentinel logic (_resolvedGitHubToken ??= string.Empty)
- Simplify ReauthenticateAsync to just restart the server
- Keep ResolveGitHubTokenFromEnv (env vars are safe, no prompts)
- Keep auth banner + Re-authenticate button (correct UX)
- Rewrite auth-token-safety skill doc with new invariant
- Remove 7 tests for deleted methods, keep 3 env var tests

## For users who cannot self-authenticate

The auth banner says: "run copilot login in a terminal, then click
Re-authenticate." Re-authenticate restarts the server, which picks
up the fresh credentials. This was the original PR #446 design before
Keychain code was added during review rounds.

Tests: 3057/3057 pass ✅

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen force-pushed the fix/remove-keychain-reads branch from 0f47e66 to 731a702 Compare April 1, 2026 17:13
@PureWeen
Copy link
Copy Markdown
Owner Author

PureWeen commented Apr 1, 2026

🔍 Multi-Model Code Review — PR #465

PR: fix: remove all macOS Keychain reads — server self-authenticates
Review type: R1 (first review)
Models: Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.3-Codex
CI Status: ⚠️ No CI checks reported on this branch


Consensus Findings (2+ of 3 models agree)

🟡 MODERATE — Stale XML doc and comments reference deleted code/invariants

Flagged by: Opus, Sonnet | Not flagged by: Codex

Two stale comment blocks remain in CopilotService.Utilities.cs on the PR branch:

1. XML doc on CheckAuthStatusAsync (lines ~854–857):

/// On first auth failure (when no token has been resolved yet), performs a lazy
/// full token resolution (including Keychain) and auto-restarts the server.
/// This avoids preemptive Keychain reads at startup while still fixing auth
/// for users whose headless server can't access the Keychain.
/// See .claude/skills/auth-token-safety/SKILL.md (INV-A1, INV-A2).

The entire lazy-resolution block this describes was removed. INV-A1/INV-A2 no longer exist in the rewritten SKILL.md. Suggested replacement: "Checks the CLI server's auth status and shows a banner if not authenticated. Returns true if authenticated."

2. Comment in auth polling loop (lines ~923–926):

// Use cached token only — do NOT call ResolveGitHubTokenForServer()
// here. The polling loop runs every 10s; re-reading Keychain would
// trigger a macOS password dialog on every cycle.
// See .claude/skills/auth-token-safety/SKILL.md (INV-A2).

ResolveGitHubTokenForServer() no longer exists. Suggested simplification: "Use cached env-var token only — the server self-authenticates."

Why it matters: Future engineers reading these comments will be confused about invariants and methods that don't exist, and may waste time searching for them or try to re-add Keychain code based on the doc's description.


🟢 MINOR — Test coverage gap for new behavior paths

Flagged by: Sonnet, Codex | Not flagged by: Opus

The 7 removed tests correctly correspond to the 7 deleted methods. However, no replacement tests were added to validate the new behavior:

  • No test that ReauthenticateAsync does not attempt token injection before recovery (prevents Keychain code from being re-added in a future "fix")
  • No test for the auth-failure → banner → manual copilot login UX path without lazy resolution

Risk: Low — the new code is simpler than what it replaced. But a regression guard would help prevent the 4-PR regression chain from repeating.


Items NOT reaching consensus (1 of 3 models only)

Finding Flagged by Why excluded
Stale env-var token in ReauthenticateAsync Codex only Opus/Sonnet verified this is correct — server self-authenticates; env-var forwarding is a best-effort fallback
_resolvedGitHubToken field is vestigial Opus only Sonnet confirmed field is correctly retained for env-var passthrough
Narrower recovery UX Codex only Intentional design — the whole point of the PR

Verification Summary

Check Result
Dangling references to deleted methods ✅ None found (all 3 models verified via grep)
Compile safety ✅ No orphaned call sites
Security regressions ✅ None — PR removes secret-reading code, doesn't add any
Race conditions ✅ Removed _tokenResolutionLock was only needed for deleted lazy path
Data loss risk ✅ None
Test removals match deleted code ✅ 7 tests for 3 deleted methods

Recommended Action

⚠️ Request changes — Fix the two stale comment blocks in CopilotService.Utilities.cs (lines ~854–857 and ~923–926) before merging. These reference deleted methods and non-existent invariants. The test coverage gap is a nice-to-have but not a merge blocker.

Overall this is a well-scoped, clean removal that eliminates a genuine 4-PR regression chain. The architectural decision (server self-authenticates) is sound and well-documented in the PR description and rewritten SKILL.md.

Addresses review feedback: two comment blocks still referenced
ResolveGitHubTokenForServer() and INV-A1/INV-A2 invariants that
no longer exist after the Keychain removal.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner Author

PureWeen commented Apr 1, 2026

🔍 Multi-Model Code Review — PR #465 (R2)

PR: fix: remove all macOS Keychain reads — server self-authenticates
Review type: R2 (re-review after author fixes)
Models: Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.3-Codex
CI Status: ⚠️ No CI checks reported on this branch
New commits since R1: 74663a68 — fix: remove stale comments referencing deleted Keychain code


R1 Findings Status

# R1 Severity Finding R2 Status Verified by
1 🟡 MODERATE Stale XML doc on CheckAuthStatusAsync (lines ~854–857) — described deleted Keychain lazy-resolution + dead INV-A refs Fixed in commit 74663a68 All 3 models
2 🟡 MODERATE Stale comment in auth polling loop (lines ~923–926) — referenced deleted ResolveGitHubTokenForServer() + dead INV-A2 Fixed in commit 74663a68 All 3 models
3 🟢 MINOR _resolvedGitHubToken field + forwarding plumbing is vestigial Reassessed — not vestigial. Field actively caches env-var tokens and forwards to StartServerAsync at ~9 call sites. Documented in SKILL.md "What's Allowed." All 3 models

New R2 Findings

No new consensus findings. One model (Codex) flagged a potential recovery gap for expired env-var tokens in ReauthenticateAsync, but Opus and Sonnet both verified this is correct by design — the server self-authenticates via keytar, and env-var forwarding is purely a best-effort fallback. The auth banner correctly tells the user to run copilot login. Not a bug.


Verification Summary

Check Result Models
Dangling references to deleted symbols ✅ Zero matches across all .cs, .razor, .md files All 3
_tokenResolutionLock.Dispose() removed alongside field ✅ Clean Opus, Sonnet
Sentinel removal (??= string.Empty) safe ✅ Guarded code also removed All 3
ReauthenticateAsync correctness without token resolution ✅ Server restart sufficient Opus, Sonnet
Commit 2 regression risk ✅ None — comment/doc cleanup only All 3
Test removals match deleted code 1:1 ✅ 7 tests for 3 deleted methods All 3
Security regressions ✅ None — PR removes secret-reading code All 3

Recommended Action

✅ Approve — All R1 findings are resolved. No new issues found by consensus. The PR is a clean, well-scoped removal of ~620 lines of dangerous Keychain code that caused a 4-PR regression chain. Architecture is sound: the server self-authenticates via its native credential store. Documentation accurately reflects the new design.

@PureWeen PureWeen merged commit fcbae6a into main Apr 1, 2026
@PureWeen PureWeen deleted the fix/remove-keychain-reads branch April 1, 2026 20:07
arisng pushed a commit to arisng/PolyPilot that referenced this pull request Apr 4, 2026
…eWeen#465)

## Problem

PR PureWeen#446 added Keychain-reading code to help users whose headless server
couldn't self-authenticate. This caused a **4-PR regression chain**
(PureWeen#446PureWeen#456PureWeen#462PureWeen#463) of recurring macOS password dialogs:

1. `TryReadCopilotKeychainToken` spawned `/usr/bin/security
find-generic-password` for 3 service names sequentially — **3 password
dialogs per call**
2. Clicking Allow/Deny on those dialogs **rewrote the Keychain ACL**,
breaking the server's own native keytar access
3. The server fell back to its own `/usr/bin/security` calls → **more
dialogs every 1-2 hours**
4. PRs PureWeen#456, PureWeen#462, PureWeen#463 each fixed one trigger path but the core
approach was wrong

## Root Cause

The headless copilot server **authenticates on its own** at startup via
its native credential store. This has worked reliably across dozens of
worktree switches (different binary paths each time) without any
Keychain intervention from PolyPilot.

The original PR PureWeen#446 user issue ("Session was not created with
authentication info") was a server auth loss that should have been
solved by **restarting the server** — which
`TryRecoverPersistentServerAsync` already does — not by reading the
Keychain from the UI process.

## Fix

Remove all Keychain-reading code entirely (**-612 lines, +42 lines**):

| Removed | Why |
|---------|-----|
| `ResolveGitHubTokenForServer()` | Keychain + gh CLI reads — triggers
password dialogs |
| `TryReadCopilotKeychainToken()` | 3-service-name loop — 3× password
dialogs per call |
| `RunProcessWithTimeout()` | Only used by above |
| `_tokenResolutionLock` SemaphoreSlim | Guarded the lazy Keychain path
|
| Lazy resolution path in `CheckAuthStatusAsync` | The whole Keychain
auto-resolution mechanism |
| Sentinel logic (`_resolvedGitHubToken ??= ""`) | No longer needed
without the lazy path |

| Kept | Why |
|------|-----|
| `ResolveGitHubTokenFromEnv()` | Env vars are safe, no prompts |
| `CheckAuthStatusAsync` + auth banner | Correct — shows "run copilot
login" guidance |
| `TryRecoverPersistentServerAsync` | Correct — restarts server which
re-authenticates on its own |
| Re-authenticate button | Correct — restarts server to pick up fresh
`copilot login` credentials |

## For users who cannot self-authenticate

The auth banner says: _"run `copilot login` in a terminal, then click
Re-authenticate."_ This was the **original PR PureWeen#446 design** before
Keychain code was added during review rounds.

## Timeline of the regression

| PR | What it did | What went wrong |
|----|-------------|-----------------|
| PureWeen#446 | Added Keychain reads to forward token to server | Triggered
password dialogs; corrupted Keychain ACL |
| PureWeen#456 | Made Keychain reads lazy (only on first auth failure) | Still
fired on every server recovery cycle |
| PureWeen#462 | Stopped recovery from clearing the token cache | Token was
never set in the first place (no env var) |
| PureWeen#463 | Added sentinel on auth success | Server's own internal Keychain
reads still fired |
| **PureWeen#465** | **Removed all Keychain code** | **The right fix — server
self-authenticates** |

## Tests
3057/3057 pass ✅ (7 tests for deleted methods removed, 3 env var tests
kept)

## Why server self-authentication works

The copilot headless server binary bundled inside PolyPilot.app and the
Homebrew `copilot` binary are both signed with the **same GitHub
Developer ID certificate**. macOS Keychain ACLs use code-signing
identity (not binary path) to control access, so:

- `copilot login` (Homebrew binary) writes the token to Keychain with an
ACL scoped to the GitHub Developer ID
- `copilot --headless` (bundled binary) reads the token via keytar —
**same Developer ID = no password prompt**
- PolyPilot's `/usr/bin/security` calls used a **different signer**
(Apple's built-in tool), which triggered the ACL mismatch and the
password dialogs
- Clicking Allow/Deny on those dialogs **rewrote the ACL**, breaking the
server's native keytar access — creating the regression spiral

This was verified via `codesign -dvvv` on both binaries — they share the
same Identifier, Authority chain, and team certificate. The Keychain
entry's partition list (visible in `securityd` logs) confirms it uses
team-id-based access control, not path-based.

**Conclusion:** The server has always been able to self-authenticate.
The only thing that broke it was PolyPilot calling `/usr/bin/security`,
which used a different code-signing identity and corrupted the ACL.
Removing those calls is the correct fix.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant