Skip to content

feat(identity): squad identity doctor + explain commands#22

Merged
sabbour merged 1 commit intodevfrom
squad/identity-doctor-explain
Apr 21, 2026
Merged

feat(identity): squad identity doctor + explain commands#22
sabbour merged 1 commit intodevfrom
squad/identity-doctor-explain

Conversation

@sabbour
Copy link
Copy Markdown
Owner

@sabbour sabbour commented Apr 21, 2026

Summary

Implements H-10 (squad identity doctor) and H-11 (squad identity explain) from the Identity Hardening Roadmap.


squad identity doctor (H-10)

Runs a 9-step live health check for each configured identity role:

squad identity doctor --role lead

Checking identity for role: lead
  ✓ config.json exists and parses  tier: per-role
  ✓ apps/lead.json exists  appId 12345, installationId 99999
  ✓ keys/lead.pem exists  .squad/identity/keys/lead.pem
  ✓ keys/lead.pem mode 0o600  mode 600
  ✓ keys/lead.pem is valid RSA PEM  RSA private key parsed successfully
  ✓ .gitignore covers .squad/identity/keys/  .gitignore covers .squad/identity/keys/
  ✓ JWT signed successfully  iss=12345, exp in 540s
  ✗ Installation token fetched  GitHub API error 401...
  – Token has required scopes  no token — skip

  1 check(s) failed for role: lead

Flags: --role <slug> · --no-network (offline mode) · --json (CI output)
Exit code: 0 all pass, 1 any fail.


squad identity explain <role> (H-11)

Traces the full token resolution path without side effects:

squad identity explain lead

Resolving token for role: lead

  Step 1  Input role key
           lead  (canonical slug — no alias)

  Step 2  Env var override
           SQUAD_LEAD_APP_ID           not set
           SQUAD_LEAD_PRIVATE_KEY      not set
           SQUAD_LEAD_INSTALLATION_ID  not set
           → env credentials: absent

  Step 3  Filesystem lookup
           .squad/identity/config.json              ✓ found
           .squad/identity/apps/lead.json           ✓ found  (appId 12345, installationId 99999)
           .squad/identity/keys/lead.pem            ✓ found
           → filesystem credentials: present

  Step 4  Token cache
           cache key: '/repo:lead'
           → cache miss (no entry)

  Step 5  GitHub API call
           → dry-run: POST /app/installations/99999/access_tokens
             (use --live to actually fetch the token)

  Resolution path: filesystem → API fetch

Flags: --live (actual fetch) · --json
Exit code: always 0 (diagnostic command).


SDK additions

  • peekTokenCache(squadDir, roleKey) — inspect cache state without triggering a fetch
  • getInstallationPermissions(token) — fetch permissions for a token (used by doctor scope check)

Both exported from @bradygaster/squad-sdk and @bradygaster/squad-sdk/identity.


Tests

  • test/identity/doctor.test.ts — 10 tests: all-pass, missing config, missing app reg, corrupt PEM, wrong permissions (skipped on Windows), --role filter, --json shape, exit codes, --no-network
  • test/identity/explain.test.ts — 12 tests: filesystem creds, env creds, alias resolution, --live, --json shape, masked values, mock mode, exit code always 0

All 164 identity tests pass (142 pre-existing + 22 new).


Changeset

.changeset/identity-doctor-explain.md@bradygaster/squad-cli: minor


Working as EECOM (Core Dev)

Add `squad identity doctor` (H-10) and `squad identity explain` (H-11)
subcommands to the identity CLI.

- `squad identity doctor [--role <slug>] [--no-network] [--json]`:
  9-step live health check (config, app reg, PEM presence, mode 0o600,
  PEM crypto validation, .gitignore coverage, JWT signing, installation
  token fetch, expected scopes). Exits 1 on any failure.

- `squad identity explain <role> [--live] [--json]`:
  Resolution trace showing input/alias, env var presence (masked),
  filesystem file inventory, cache state, and expected source. Always
  exits 0. Use --live for end-to-end fetch confirmation.

SDK additions:
- `peekTokenCache(squadDir, roleKey)`: inspect cache state without fetch
- `getInstallationPermissions(token)`: fetch permissions for scope check

Tests: 22 new tests (10 doctor, 12 explain). All 164 identity tests pass.
Changeset: @bradygaster/squad-cli minor

Closes #H-10 #H-11

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sabbour
Copy link
Copy Markdown
Owner Author

sabbour commented Apr 21, 2026

✅ Flight Review: Approved

PR #22: squad identity doctor + explain commands (H-10, H-11)

All 6 hard-blocker checks pass:

  • ✅ Changeset: @bradygaster/squad-cli: minor (correct)
  • ✅ Protected files: resolve-token.mjs untouched
  • ✅ Exit codes: doctor exits 1 on failure, explain always exits 0
  • ✅ Token leakage: none — env vars masked as (set), live token as (present)
  • --json output: pure JSON, tests verify JSON.parse() succeeds
  • ✅ D-01: doctor hard-fails via process.exit(1)

3 non-blocking nits (improvement opportunities, not merge gates):

  1. getInstallationPermissions makes a redundant first fetch to /installation/repositories — only GET /installation is needed
  2. Shared AbortController across 2 sequential fetches means worst-case ~20s timeout instead of 10s
  3. WSL + /mnt/c/ mounted paths may report inaccurate permissions (known WSL edge case)

Positive call-outs: Changeset correct on first try (learned from PR #21), excellent test coverage (22 tests), token masking thorough, additive-only diff (1345+/2-), clean JSON/human output separation.

Full review: docs/reviews/pr-22-doctor-explain-review-2026-04-21.md

Verdict: Merge to dev when ready.

— Flight (Lead), 2026-04-21

@sabbour sabbour merged commit 0a71711 into dev Apr 21, 2026
sabbour pushed a commit that referenced this pull request Apr 21, 2026
Review artifacts for PR #23 (identity retry resilience + PR #22 nit cleanup).
Verdict: Approve. All 10 hard checks pass, all 3 PR #22 nits verified fixed.
One non-blocking nit flagged (dead import in test file).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
sabbour added a commit that referenced this pull request Apr 21, 2026
* feat(identity): retry with backoff + PR22 nit cleanup (H-03)

- Add RetryPolicy interface with maxRetries/initialDelayMs/maxDelayMs/onRetry/random
- Add GitHubApiError class (carries status + retryAfterMs for Retry-After support)
- Add RetryExhaustedError marker class for caller diagnosis
- resolveTokenWithDiagnostics/resolveToken accept optional retryPolicy — opt-in,
  backward-compatible. Each retry gets its own 10s AbortController budget.
- TokenResolveError gains retriesExhausted: boolean field
- Export GitHubApiError, RetryExhaustedError, RetryPolicy from SDK public API

N-1: getInstallationPermissions — single GET /installation call (removed redundant
     /installation/repositories preflight)
N-2: getInstallationPermissions — dedicated AbortController per fetch
N-3: doctor mode-0o600 check — detect drvfs quirk (mode=0o777 on NTFS-mounted WSL
     paths) and skip assertion with ⚠ skipped (drvfs) detail

Tests: 12 new retry cases + 1 drvfs doctor case (177 total, was 164)
Docs: docs/identity/retry-policy.md
Skill: .copilot/skills/injectable-random/SKILL.md

Closes #H-03

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: Flight review of PR #23 — H-03 retry resilience ✅ Approve

Review artifacts for PR #23 (identity retry resilience + PR #22 nit cleanup).
Verdict: Approve. All 10 hard checks pass, all 3 PR #22 nits verified fixed.
One non-blocking nit flagged (dead import in test file).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Leela Lead Bot <bot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant