Skip to content

Phase 2 Week 3a — mTLS + executor JWT + HMAC heartbeat#12

Merged
l17728 merged 14 commits into
mainfrom
feat/phase-2-w3a-mtls-jwt-hmac
May 14, 2026
Merged

Phase 2 Week 3a — mTLS + executor JWT + HMAC heartbeat#12
l17728 merged 14 commits into
mainfrom
feat/phase-2-w3a-mtls-jwt-hmac

Conversation

@l17728
Copy link
Copy Markdown
Owner

@l17728 l17728 commented May 14, 2026

Summary

W3a of docs/v2.0/08-mvp-roadmap.md §2.6 Day 1-3 — replaces executor-side bearer auth with SVID-style mTLS + Ed25519 JWT + HMAC heartbeat (SEC-01 + SEC-04):

  • mTLS substrate. Controller bootstraps a self-signed CA + server cert + Ed25519 JWT signing key under ${DLW_CA_DIR} (file-persisted, chmod 600). New POST /executors/register (enrollment-token auth) signs an executor CSR; POST /executors/{eid}/renew refreshes the JWT (+ cert when a CSR is supplied). W1 /join deleted.
  • JWT + HMAC. Three chained FastAPI deps — require_executor_mtlsrequire_executor_jwtrequire_hmac_heartbeat. require_executor_epoch refactored to chain under the JWT dep and assert the path id matches the mTLS identity (confused-deputy guard). In-process nonce store bounds replay to a ±5min window.
  • Executor side. New cert.py + auth_lifecycle.py; client.py is AuthState-driven (mTLS + JWT + HMAC per request); runner spawns a 3rd background loop for cert/JWT renewal. load_or_register re-registers on restart (idempotent epoch bump); a poll 401 triggers re-register.
  • uvicorn TLS. Real --ssl-* termination; --ssl-cert-reqs 1 (CERT_OPTIONAL — /register has no client cert) with app-layer enforcement. A uvicorn_tls_patch injects the peer cert into the ASGI scope (httptools backend doesn't by default).
  • UI auth unchanged. /api/v1/tasks/* keeps require_bearer; a new check_no_bearer_on_executor_routes lint locks executor routes onto mTLS+JWT.

Spec: docs/superpowers/specs/2026-05-14-phase-2-w3a-mtls-jwt-hmac-design.md.
Plan: docs/superpowers/plans/2026-05-14-phase-2-w3a-mtls-jwt-hmac.md.

W3b (HF reverse-proxy) and W3c (active/standby) are companion specs.

Test plan

  • Backend pytest: 215 passed, 1 deselected. Zero regressions. ~34 new tests (CA/JWT/HMAC modules, 3 FastAPI deps, /register + /renew endpoints, executor cert + auth_lifecycle + client) + ~15 migrated W1 setups.
  • One real-TLS e2e: uvicorn subprocess with --ssl-*, register → HMAC-signed heartbeat over real mTLS.
  • test_executor_e2e un-skipped + migrated (full HF→S3 pipeline through the W3a-auth runner).
  • alembic upgrade clean from W2b2 head; downgrade clean.
  • tools/lint_invariants.py + tools/lint_no_direct_status_write.py return 0; new check_no_bearer_on_executor_routes enforces mTLS-only executor routes.
  • cryptography>=43,<44 + pyjwt[crypto]>=2.9,<3.0 added to pyproject.toml + uv.lock.
  • OpenAPI: /register + /renew + heartbeat HMAC headers documented; /join removed.

Out of scope (deferred — see spec §1.2)

HF reverse-proxy (W3b); active/standby + chaos drill (W3c); OIDC / multi-tenant / UI auth (Phase 3); Vault/KMS for keys (Phase 3); CRL / cert-manager (Phase 3+); envelope encryption of hmac_seed (Phase 3); PG/Redis nonce store (Phase 3).

🤖 Generated with Claude Code

l17728 and others added 14 commits May 14, 2026 14:13
…t design

W3a scope: replace executor-side bearer auth with SVID-style mTLS +
Ed25519 JWT + HMAC heartbeat (SEC-01 + SEC-04). Self-signed CA
file-persisted under DLW_CA_DIR; POST /register (CSR signing) replaces
W1 /join; POST /{eid}/renew for cert+JWT lifecycle; in-process nonce
store for anti-replay. uvicorn-level TLS termination. UI bearer auth
retained (Phase 3 OIDC replaces it).

W3b (HF reverse-proxy) and W3c (active/standby) are companion specs.

Adds 2 runtime deps (cryptography explicit pin + pyjwt[crypto]).
One alembic adds executors.hmac_seed_encrypted. require_executor_epoch
refactored to close a confused-deputy gap (path id must match the
mTLS-authenticated identity). ~27 new pytest cases incl. one real-TLS
e2e; ~12-15 W1 test setups migrated off /join.

Branches off main at ba89a91 (PR #11 merge).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 tasks across 4 milestones (M1 auth substrate, M2 controller deps +
endpoints, M3 endpoint migration + real-TLS e2e, M4 executor side +
lint + PR). TDD per task with complete code; subagent-driven
implementer-only mode. ~27 new pytest cases + one real-TLS e2e;
~13-15 W1 test setups migrated off /join; 2 new runtime deps
(cryptography + pyjwt); one alembic adds executors.hmac_seed_encrypted.

Branches off main at ba89a91 (PR #11 merge).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed (W3a M1)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eputy guard (W3a M2)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n (W3a M2)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…+JWT (W3a M3)

subtasks.py /report now uses require_executor_jwt + confused-deputy
guard (reporting executor must own the subtask). conftest gains
register_test_executor / executor_request_headers / signed_heartbeat_headers
helpers. test_executors + test_subtasks + test_happy_path +
test_executor_service migrated off /join to /register. test_executor_e2e
skipped (W9 un-skips after the executor-side rewrite).

208 passed, 1 skipped, 1 deselected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix _extract_peer_cert to work with uvicorn's httptools backend: uvicorn
does not expose the asyncio transport in the ASGI scope dict.  Add
uvicorn_tls_patch.install_transport_scope_patch() which monkey-patches
HttpToolsProtocol.on_headers_complete to inject scope["transport"] before
each request is dispatched.  Use ssl_object.getpeercert(binary_form=True)
instead of transport.get_extra_info("peercert") (the latter returns a dict,
not DER bytes).  Change ssl-cert-reqs to CERT_OPTIONAL (1) so /register and
/health/live work without a client cert while /heartbeat enforces mTLS at
the app layer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…a M4)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ControllerClient is AuthState-driven: per-request httpx client with
verify=<ca> + cert=(<client-cert>,<client-key>); JWT + epoch headers on
every call; heartbeat additionally HMAC-signs the body. update_auth()
swaps the AuthState (renew loop). Runner gains load_or_register
bootstrap + a 3rd background loop (_auth_renew_loop); the W1
EPOCH_MISMATCH re-join path generalized to re-register on any poll 401.
cli.py builds the client without auth (runner fills it).

config.py adds enrollment_token / executor_cert_dir / executor_ca_bundle.
test_client + test_runner rewritten off the deleted /join API;
test_executor_e2e un-skipped + migrated (registers via /register, drives
the runner with an injected AuthState + X-Client-Cert-PEM header seam).

215 passed, 1 deselected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ok (W3a M4)

- Add check_no_bearer_on_executor_routes() to lint_invariants.py; wired in main()
- OpenAPI: rename client_csr→client_csr_pem in ExecutorRegisterRequest; expand
  ExecutorRegisterResponse with hmac_seed_hex/cert_renew_in_seconds/jwt_renew_in_seconds;
  enrich /executors/{executorId}/renew with request body + typed response;
  add X-HMAC-Timestamp/Nonce/Signature headers to heartbeat operation
- Append mTLS+JWT+HMAC operator runbook section to docs/operator/executor-runbook.md
The controller bootstrap creates ${DLW_CA_DIR} (default ./.ca) and the
executor creates ${DLW_EXECUTOR_EXECUTOR_CERT_DIR} (default
./.executor-certs) at runtime — both are local key material, never
committed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gitleaks' default generic-api-key rule flags `key: ed25519.Ed25519PrivateKey`
dataclass field annotations in dlw.auth.* as secrets (a `key:` token
followed by a high-entropy identifier). These are Python type annotations;
the real CA/JWT keys are generated at runtime and persisted to chmod-600
files under ${DLW_CA_DIR}, never committed. The allowlist is scoped to the
ed25519 type-name regexes so genuine leaks are still caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@l17728 l17728 merged commit 1611d61 into main May 14, 2026
12 checks passed
@l17728 l17728 deleted the feat/phase-2-w3a-mtls-jwt-hmac branch May 14, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant