Skip to content

fix: set sentinel on auth success to prevent lazy Keychain reads#463

Merged
PureWeen merged 1 commit intomainfrom
fix/keychain-sentinel-on-auth-success
Apr 1, 2026
Merged

fix: set sentinel on auth success to prevent lazy Keychain reads#463
PureWeen merged 1 commit intomainfrom
fix/keychain-sentinel-on-auth-success

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

@PureWeen PureWeen commented Apr 1, 2026

Problem

After PR #462 merged, users still get macOS "allow copilot-cli" password dialogs every ~1-2 hours. Process monitoring confirmed the parent PID is the PolyPilot .NET process itself (dotnet exec), not external copilot binaries.

Root Cause

PR #462 prevented TryRecoverPersistentServerAsync from clearing _resolvedGitHubToken. But the token was never set in the first place — on startup, ResolveGitHubTokenFromEnv() returns null (no env var), and when CheckAuthStatusAsync finds the server authenticated, it returns true without setting the field. So _resolvedGitHubToken stays null.

Later, any transient auth failure (connection blip, client recreation, server token rotation) calls CheckAuthStatusAsync again. This time auth fails, the guard _resolvedGitHubToken == null is satisfied, and the lazy path fires:

  1. ResolveGitHubTokenForServer()TryReadCopilotKeychainToken()
  2. Tries 3 service names: copilot-cli, github-copilot, GitHub Copilot
  3. Each spawns /usr/bin/security find-generic-password -s <name> -w (3s timeout)
  4. 3 macOS password dialogs (PolyPilot not in the Keychain ACL for this entry)

Evidence

Process monitor output confirmed PolyPilot as the source:

security PID=66918, Parent PID=66646
  66646 66630 /usr/local/share/dotnet/dotnet exec --runtimeconfig ...
security PID=67068, Parent PID=66646  (3s later)
security PID=67146, Parent PID=66646  (3s later)

Fix

Set _resolvedGitHubToken ??= string.Empty when CheckAuthStatusAsync finds the server authenticated. The empty string sentinel means "server can self-authenticate, no Keychain read needed." The lazy path guard _resolvedGitHubToken == null is no longer satisfied.

ServerManager.StartServerAsync uses !string.IsNullOrEmpty(githubToken) so the empty sentinel is never forwarded as a COPILOT_GITHUB_TOKEN env var.

Behavior Matrix

Scenario Before After
Startup, server self-auths _resolvedGitHubToken stays null Set to "" (sentinel)
Later transient auth failure Lazy path fires → 3 password dialogs Lazy path skipped → auth banner shown
Server genuinely cant auth Lazy path fires once (correct) Same — "" not set because auth failed
ReauthenticateAsync (user) Fresh Keychain read Same — clears to null first
ReconnectAsync (settings) Clears to null Same

Tests

3064/3064 pass ✅

Related

After startup, _resolvedGitHubToken stays null when no env var is set
and the server self-authenticates. The lazy Keychain path in
CheckAuthStatusAsync is guarded by (_resolvedGitHubToken == null),
so any later transient auth failure triggers ResolveGitHubTokenForServer
→ TryReadCopilotKeychainToken → 3 service names × 3s timeout each =
3 macOS "allow copilot-cli" password dialogs.

Fix: set _resolvedGitHubToken to string.Empty when auth succeeds.
This sentinel means "server can self-authenticate, no Keychain needed."
ServerManager.StartServerAsync uses !string.IsNullOrEmpty() so the
empty sentinel is never forwarded as an env var.

Confirmed via process monitoring: parent PID of /usr/bin/security
calls is the PolyPilot .NET process (dotnet exec), not external
copilot binaries.

Tests: 3064/3064 pass ✅ (2 flaky timing tests pass on retry)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Owner Author

PureWeen commented Apr 1, 2026

Multi-Model Code Review: PR #463 (R1)

PR: fix: set sentinel on auth success to prevent lazy Keychain reads
Models: claude-opus-4.6 ✅, claude-sonnet-4.6 ✅, gpt-5.3-codex ✅
CI: ⚠️ No checks reported on the branch
Prior reviews: None (0 comments, 0 reviews)


Core Fix Verification — ✅ Correct (3/3 models agree)

All three models independently verified:

  • ??= correctly preserves existing real tokens (no-op when non-null) ✅
  • "" sentinel is never forwarded as COPILOT_GITHUB_TOKENServerManager.cs:97 uses !string.IsNullOrEmpty(githubToken)
  • ReconnectAsync (clears to null) and ReauthenticateAsync (overwrites with fresh token) are unaffected ✅
  • All 7+ StartServerAsync call sites that pass _resolvedGitHubToken are safe with ""
  • Token expiration recovery path is correct: auth banner shown → user Re-authenticates → fresh Keychain read ✅

Consensus Findings

🟢 MINOR — Non-atomic ??= race window (3/3 models flagged)

PolyPilot/Services/CopilotService.Utilities.cs:872

_resolvedGitHubToken ??= string.Empty compiles to a non-atomic read-check-write. Theoretical interleave:

  1. Thread A (CheckAuthStatusAsync) reads _resolvedGitHubToken == null
  2. Thread B (ReauthenticateAsync) sets _resolvedGitHubToken = "realToken"
  3. Thread A writes string.Empty → clobbers the real token

All 3 models agree this is not a blocking issue because:

  • Window is a few CPU cycles between read and write
  • ReauthenticateAsync calls StopAuthPolling() first, so a poll must already be in-flight to race
  • All other writes to _resolvedGitHubToken in the codebase are equally unprotected — this is a pre-existing pattern, not introduced by this PR
  • Even if it fires, the server already demonstrated self-authentication (Thread A confirmed IsAuthenticated)

🟢 MINOR — No test coverage for sentinel behavior (3/3 models flagged)

PolyPilot.Tests/ServerRecoveryTests.cs

No test asserts that CheckAuthStatusAsync sets _resolvedGitHubToken to "" on auth success, or that ??= preserves existing tokens. The regression this PR fixes could silently reappear. StubServerManager.LastGitHubToken already exists and could pin this. Non-blocking for a targeted bugfix PR, but recommended as follow-up.


Summary

Check Result
Regressions ("" at all StartServerAsync sites) ✅ Safe
??= semantics (preserves real tokens) ✅ Correct
ReauthenticateAsync / ReconnectAsync interaction ✅ Correct
Security ✅ No risk
Thread safety 🟢 Theoretical race, pre-existing pattern
Data loss / sessions ✅ None
Test coverage 🟢 No direct tests — acceptable for bugfix
SKILL.md documentation ✅ Accurate

Recommended action: ✅ Approve

@PureWeen PureWeen merged commit 94eb18b into main Apr 1, 2026
@PureWeen PureWeen deleted the fix/keychain-sentinel-on-auth-success branch April 1, 2026 14:42
PureWeen added a commit that referenced this pull request Apr 1, 2026
## Problem

PR #446 added Keychain-reading code (TryReadCopilotKeychainToken,
ResolveGitHubTokenForServer, RunProcessWithTimeout) to help users
whose headless server could not self-authenticate. This caused a
4-PR regression chain (#446#456#462#463):

1. Each /usr/bin/security call triggers a macOS password dialog
   (PolyPilot is not in the copilot-cli Keychain ACL)
2. TryReadCopilotKeychainToken tried 3 service names sequentially,
   each spawning a separate dialog (3× password prompts per call)
3. Clicking Allow/Deny rewrote the Keychain ACL, breaking the
   server own native keytar access
4. The server fell back to its own /usr/bin/security calls, creating
   a spiral of recurring password prompts every 1-2 hours

## Root Cause Analysis

The headless copilot server authenticates on its own at startup via
its native credential store. This has worked reliably across dozens
of worktree switches (different binary paths each time). The original
PR #446 user issue ("Session was not created with authentication
info") was a server auth loss that should have been solved by
restarting the server — which TryRecoverPersistentServerAsync already
does — not by reading the Keychain from the UI process.

## Changes

- Remove ResolveGitHubTokenForServer (Keychain + gh CLI reads)
- Remove TryReadCopilotKeychainToken (3-service-name loop)
- Remove RunProcessWithTimeout (only used by above)
- Remove _tokenResolutionLock SemaphoreSlim (guarded lazy path)
- Remove lazy Keychain resolution path in CheckAuthStatusAsync
- Remove sentinel logic (_resolvedGitHubToken ??= string.Empty)
- Simplify ReauthenticateAsync to just restart the server
- Keep ResolveGitHubTokenFromEnv (env vars are safe, no prompts)
- Keep auth banner + Re-authenticate button (correct UX)
- Rewrite auth-token-safety skill doc with new invariant
- Remove 7 tests for deleted methods, keep 3 env var tests

## For users who cannot self-authenticate

The auth banner says: "run copilot login in a terminal, then click
Re-authenticate." Re-authenticate restarts the server, which picks
up the fresh credentials. This was the original PR #446 design before
Keychain code was added during review rounds.

Tests: 3057/3057 pass ✅

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen added a commit that referenced this pull request Apr 1, 2026
## Problem

PR #446 added Keychain-reading code (TryReadCopilotKeychainToken,
ResolveGitHubTokenForServer, RunProcessWithTimeout) to help users
whose headless server could not self-authenticate. This caused a
4-PR regression chain (#446#456#462#463):

1. Each /usr/bin/security call triggers a macOS password dialog
   (PolyPilot is not in the copilot-cli Keychain ACL)
2. TryReadCopilotKeychainToken tried 3 service names sequentially,
   each spawning a separate dialog (3× password prompts per call)
3. Clicking Allow/Deny rewrote the Keychain ACL, breaking the
   server own native keytar access
4. The server fell back to its own /usr/bin/security calls, creating
   a spiral of recurring password prompts every 1-2 hours

## Root Cause Analysis

The headless copilot server authenticates on its own at startup via
its native credential store. This has worked reliably across dozens
of worktree switches (different binary paths each time). The original
PR #446 user issue ("Session was not created with authentication
info") was a server auth loss that should have been solved by
restarting the server — which TryRecoverPersistentServerAsync already
does — not by reading the Keychain from the UI process.

## Changes

- Remove ResolveGitHubTokenForServer (Keychain + gh CLI reads)
- Remove TryReadCopilotKeychainToken (3-service-name loop)
- Remove RunProcessWithTimeout (only used by above)
- Remove _tokenResolutionLock SemaphoreSlim (guarded lazy path)
- Remove lazy Keychain resolution path in CheckAuthStatusAsync
- Remove sentinel logic (_resolvedGitHubToken ??= string.Empty)
- Simplify ReauthenticateAsync to just restart the server
- Keep ResolveGitHubTokenFromEnv (env vars are safe, no prompts)
- Keep auth banner + Re-authenticate button (correct UX)
- Rewrite auth-token-safety skill doc with new invariant
- Remove 7 tests for deleted methods, keep 3 env var tests

## For users who cannot self-authenticate

The auth banner says: "run copilot login in a terminal, then click
Re-authenticate." Re-authenticate restarts the server, which picks
up the fresh credentials. This was the original PR #446 design before
Keychain code was added during review rounds.

Tests: 3057/3057 pass ✅

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen added a commit that referenced this pull request Apr 1, 2026
## Problem

PR #446 added Keychain-reading code to help users whose headless server
couldn't self-authenticate. This caused a **4-PR regression chain**
(#446#456#462#463) of recurring macOS password dialogs:

1. `TryReadCopilotKeychainToken` spawned `/usr/bin/security
find-generic-password` for 3 service names sequentially — **3 password
dialogs per call**
2. Clicking Allow/Deny on those dialogs **rewrote the Keychain ACL**,
breaking the server's own native keytar access
3. The server fell back to its own `/usr/bin/security` calls → **more
dialogs every 1-2 hours**
4. PRs #456, #462, #463 each fixed one trigger path but the core
approach was wrong

## Root Cause

The headless copilot server **authenticates on its own** at startup via
its native credential store. This has worked reliably across dozens of
worktree switches (different binary paths each time) without any
Keychain intervention from PolyPilot.

The original PR #446 user issue ("Session was not created with
authentication info") was a server auth loss that should have been
solved by **restarting the server** — which
`TryRecoverPersistentServerAsync` already does — not by reading the
Keychain from the UI process.

## Fix

Remove all Keychain-reading code entirely (**-612 lines, +42 lines**):

| Removed | Why |
|---------|-----|
| `ResolveGitHubTokenForServer()` | Keychain + gh CLI reads — triggers
password dialogs |
| `TryReadCopilotKeychainToken()` | 3-service-name loop — 3× password
dialogs per call |
| `RunProcessWithTimeout()` | Only used by above |
| `_tokenResolutionLock` SemaphoreSlim | Guarded the lazy Keychain path
|
| Lazy resolution path in `CheckAuthStatusAsync` | The whole Keychain
auto-resolution mechanism |
| Sentinel logic (`_resolvedGitHubToken ??= ""`) | No longer needed
without the lazy path |

| Kept | Why |
|------|-----|
| `ResolveGitHubTokenFromEnv()` | Env vars are safe, no prompts |
| `CheckAuthStatusAsync` + auth banner | Correct — shows "run copilot
login" guidance |
| `TryRecoverPersistentServerAsync` | Correct — restarts server which
re-authenticates on its own |
| Re-authenticate button | Correct — restarts server to pick up fresh
`copilot login` credentials |

## For users who cannot self-authenticate

The auth banner says: _"run `copilot login` in a terminal, then click
Re-authenticate."_ This was the **original PR #446 design** before
Keychain code was added during review rounds.

## Timeline of the regression

| PR | What it did | What went wrong |
|----|-------------|-----------------|
| #446 | Added Keychain reads to forward token to server | Triggered
password dialogs; corrupted Keychain ACL |
| #456 | Made Keychain reads lazy (only on first auth failure) | Still
fired on every server recovery cycle |
| #462 | Stopped recovery from clearing the token cache | Token was
never set in the first place (no env var) |
| #463 | Added sentinel on auth success | Server's own internal Keychain
reads still fired |
| **#465** | **Removed all Keychain code** | **The right fix — server
self-authenticates** |

## Tests
3057/3057 pass ✅ (7 tests for deleted methods removed, 3 env var tests
kept)

## Why server self-authentication works

The copilot headless server binary bundled inside PolyPilot.app and the
Homebrew `copilot` binary are both signed with the **same GitHub
Developer ID certificate**. macOS Keychain ACLs use code-signing
identity (not binary path) to control access, so:

- `copilot login` (Homebrew binary) writes the token to Keychain with an
ACL scoped to the GitHub Developer ID
- `copilot --headless` (bundled binary) reads the token via keytar —
**same Developer ID = no password prompt**
- PolyPilot's `/usr/bin/security` calls used a **different signer**
(Apple's built-in tool), which triggered the ACL mismatch and the
password dialogs
- Clicking Allow/Deny on those dialogs **rewrote the ACL**, breaking the
server's native keytar access — creating the regression spiral

This was verified via `codesign -dvvv` on both binaries — they share the
same Identifier, Authority chain, and team certificate. The Keychain
entry's partition list (visible in `securityd` logs) confirms it uses
team-id-based access control, not path-based.

**Conclusion:** The server has always been able to self-authenticate.
The only thing that broke it was PolyPilot calling `/usr/bin/security`,
which used a different code-signing identity and corrupted the ACL.
Removing those calls is the correct fix.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
arisng pushed a commit to arisng/PolyPilot that referenced this pull request Apr 4, 2026
…eWeen#463)

## Problem

After PR PureWeen#462 merged, users still get macOS "allow copilot-cli" password
dialogs every ~1-2 hours. Process monitoring confirmed the parent PID is
the PolyPilot .NET process itself (`dotnet exec`), not external copilot
binaries.

## Root Cause

PR PureWeen#462 prevented `TryRecoverPersistentServerAsync` from clearing
`_resolvedGitHubToken`. But the token was **never set in the first
place** — on startup, `ResolveGitHubTokenFromEnv()` returns null (no env
var), and when `CheckAuthStatusAsync` finds the server authenticated, it
returns `true` without setting the field. So `_resolvedGitHubToken`
stays `null`.

Later, any transient auth failure (connection blip, client recreation,
server token rotation) calls `CheckAuthStatusAsync` again. This time
auth fails, the guard `_resolvedGitHubToken == null` is satisfied, and
the lazy path fires:
1. `ResolveGitHubTokenForServer()` → `TryReadCopilotKeychainToken()`
2. Tries 3 service names: `copilot-cli`, `github-copilot`, `GitHub
Copilot`
3. Each spawns `/usr/bin/security find-generic-password -s <name> -w`
(3s timeout)
4. **3 macOS password dialogs** (PolyPilot not in the Keychain ACL for
this entry)

## Evidence

Process monitor output confirmed PolyPilot as the source:
```
security PID=66918, Parent PID=66646
  66646 66630 /usr/local/share/dotnet/dotnet exec --runtimeconfig ...
security PID=67068, Parent PID=66646  (3s later)
security PID=67146, Parent PID=66646  (3s later)
```

## Fix

Set `_resolvedGitHubToken ??= string.Empty` when `CheckAuthStatusAsync`
finds the server authenticated. The empty string sentinel means "server
can self-authenticate, no Keychain read needed." The lazy path guard
`_resolvedGitHubToken == null` is no longer satisfied.

`ServerManager.StartServerAsync` uses
`!string.IsNullOrEmpty(githubToken)` so the empty sentinel is never
forwarded as a `COPILOT_GITHUB_TOKEN` env var.

## Behavior Matrix

| Scenario | Before | After |
|----------|--------|-------|
| Startup, server self-auths | `_resolvedGitHubToken` stays null | Set
to `""` (sentinel) |
| Later transient auth failure | Lazy path fires → 3 password dialogs |
Lazy path skipped → auth banner shown |
| Server genuinely cant auth | Lazy path fires once (correct) | Same —
`""` not set because auth failed |
| ReauthenticateAsync (user) | Fresh Keychain read | Same — clears to
null first |
| ReconnectAsync (settings) | Clears to null | Same |

## Tests
3064/3064 pass ✅

## Related
- Fixes remaining issue after PureWeen#462

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
arisng pushed a commit to arisng/PolyPilot that referenced this pull request Apr 4, 2026
…eWeen#465)

## Problem

PR PureWeen#446 added Keychain-reading code to help users whose headless server
couldn't self-authenticate. This caused a **4-PR regression chain**
(PureWeen#446PureWeen#456PureWeen#462PureWeen#463) of recurring macOS password dialogs:

1. `TryReadCopilotKeychainToken` spawned `/usr/bin/security
find-generic-password` for 3 service names sequentially — **3 password
dialogs per call**
2. Clicking Allow/Deny on those dialogs **rewrote the Keychain ACL**,
breaking the server's own native keytar access
3. The server fell back to its own `/usr/bin/security` calls → **more
dialogs every 1-2 hours**
4. PRs PureWeen#456, PureWeen#462, PureWeen#463 each fixed one trigger path but the core
approach was wrong

## Root Cause

The headless copilot server **authenticates on its own** at startup via
its native credential store. This has worked reliably across dozens of
worktree switches (different binary paths each time) without any
Keychain intervention from PolyPilot.

The original PR PureWeen#446 user issue ("Session was not created with
authentication info") was a server auth loss that should have been
solved by **restarting the server** — which
`TryRecoverPersistentServerAsync` already does — not by reading the
Keychain from the UI process.

## Fix

Remove all Keychain-reading code entirely (**-612 lines, +42 lines**):

| Removed | Why |
|---------|-----|
| `ResolveGitHubTokenForServer()` | Keychain + gh CLI reads — triggers
password dialogs |
| `TryReadCopilotKeychainToken()` | 3-service-name loop — 3× password
dialogs per call |
| `RunProcessWithTimeout()` | Only used by above |
| `_tokenResolutionLock` SemaphoreSlim | Guarded the lazy Keychain path
|
| Lazy resolution path in `CheckAuthStatusAsync` | The whole Keychain
auto-resolution mechanism |
| Sentinel logic (`_resolvedGitHubToken ??= ""`) | No longer needed
without the lazy path |

| Kept | Why |
|------|-----|
| `ResolveGitHubTokenFromEnv()` | Env vars are safe, no prompts |
| `CheckAuthStatusAsync` + auth banner | Correct — shows "run copilot
login" guidance |
| `TryRecoverPersistentServerAsync` | Correct — restarts server which
re-authenticates on its own |
| Re-authenticate button | Correct — restarts server to pick up fresh
`copilot login` credentials |

## For users who cannot self-authenticate

The auth banner says: _"run `copilot login` in a terminal, then click
Re-authenticate."_ This was the **original PR PureWeen#446 design** before
Keychain code was added during review rounds.

## Timeline of the regression

| PR | What it did | What went wrong |
|----|-------------|-----------------|
| PureWeen#446 | Added Keychain reads to forward token to server | Triggered
password dialogs; corrupted Keychain ACL |
| PureWeen#456 | Made Keychain reads lazy (only on first auth failure) | Still
fired on every server recovery cycle |
| PureWeen#462 | Stopped recovery from clearing the token cache | Token was
never set in the first place (no env var) |
| PureWeen#463 | Added sentinel on auth success | Server's own internal Keychain
reads still fired |
| **PureWeen#465** | **Removed all Keychain code** | **The right fix — server
self-authenticates** |

## Tests
3057/3057 pass ✅ (7 tests for deleted methods removed, 3 env var tests
kept)

## Why server self-authentication works

The copilot headless server binary bundled inside PolyPilot.app and the
Homebrew `copilot` binary are both signed with the **same GitHub
Developer ID certificate**. macOS Keychain ACLs use code-signing
identity (not binary path) to control access, so:

- `copilot login` (Homebrew binary) writes the token to Keychain with an
ACL scoped to the GitHub Developer ID
- `copilot --headless` (bundled binary) reads the token via keytar —
**same Developer ID = no password prompt**
- PolyPilot's `/usr/bin/security` calls used a **different signer**
(Apple's built-in tool), which triggered the ACL mismatch and the
password dialogs
- Clicking Allow/Deny on those dialogs **rewrote the ACL**, breaking the
server's native keytar access — creating the regression spiral

This was verified via `codesign -dvvv` on both binaries — they share the
same Identifier, Authority chain, and team certificate. The Keychain
entry's partition list (visible in `securityd` logs) confirms it uses
team-id-based access control, not path-based.

**Conclusion:** The server has always been able to self-authenticate.
The only thing that broke it was PolyPilot calling `/usr/bin/security`,
which used a different code-signing identity and corrupted the ACL.
Removing those calls is the correct fix.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant