diff --git a/.ai/active/SPRINT_PACKET.md b/.ai/active/SPRINT_PACKET.md index 78bacec..a5572b2 100644 --- a/.ai/active/SPRINT_PACKET.md +++ b/.ai/active/SPRINT_PACKET.md @@ -2,23 +2,23 @@ ## Sprint Title -Sprint 5P: Gmail Credential Hardening +Sprint 5Q: Gmail Refresh Token Lifecycle V0 ## Sprint Type -repair +feature ## Sprint Reason -Sprint 5O successfully opened the first live read-only Gmail seam, but it left a known security debt: Gmail access tokens are still persisted in plaintext on `gmail_accounts`. Before any broader Gmail auth lifecycle, search, sync, Calendar, or UI work, that gap needs to be closed. +Sprint 5P fixed the immediate plaintext-token risk, but the Gmail seam still depends on a single stored access token with no refresh lifecycle. Before broader Gmail auth, search, sync, Calendar, or UI work, the next safe step is to support refresh-token-backed credential renewal on the existing protected credential seam. ## Sprint Intent -Harden the narrow Gmail connector seam by replacing plaintext credential storage with an explicit protected credential mechanism and by removing credential exposure from normal account surfaces, without widening into Gmail search, sync, write actions, Calendar, or UI. +Extend the hardened Gmail connector seam so it can persist and use a narrow refresh-token credential shape to renew Gmail access tokens on demand for the existing single-message ingestion path, without opening search, sync, write actions, Calendar, or UI scope. ## Git Instructions -- Branch Name: `codex/sprint-5p-gmail-credential-hardening` +- Branch Name: `codex/sprint-5q-gmail-refresh-token-lifecycle` - Base Branch: `main` - PR Strategy: one sprint branch, one PR, no stacked PRs unless Control Tower explicitly opens a follow-up sprint - Merge Policy: squash merge only after reviewer `PASS` and explicit Control Tower merge approval @@ -26,29 +26,30 @@ Harden the narrow Gmail connector seam by replacing plaintext credential storage ## Why This Sprint - Sprint 5O shipped the first narrow read-only Gmail account and single-message ingestion seam. -- The accepted review explicitly calls out plaintext Gmail credential storage as acceptable only for that narrow prototype slice and recommends hardening before broader connector rollout. -- Connector security is the immediate risk, not more Gmail breadth. -- The narrowest safe next step is credential hardening only, keeping the current single-message Gmail ingestion seam otherwise intact. +- Sprint 5P hardened credential storage by removing plaintext tokens from the normal `gmail_accounts` table surface. +- The accepted review for Sprint 5P explicitly called out refresh-token lifecycle as the next narrow Gmail auth milestone if broader connector use is needed. +- The next safe step is not Gmail search or sync; it is reliable credential renewal on the already-shipped single-message seam. ## In Scope -- Add schema and migration support only as needed to remove plaintext credential storage from `gmail_accounts`, for example: - - protected credential blob or reference fields - - optional token metadata fields if needed for the narrow hardened seam +- Extend schema and migration support only as needed to support a narrow Gmail refresh-token credential shape inside the protected credential seam, for example: + - refresh-token fields inside the protected credential blob + - optional token-expiry metadata if needed for deterministic renewal decisions - Define typed contract changes for: - - Gmail account connect requests if a hardened credential payload shape is required - - Gmail account responses with secrets removed - - any narrow Gmail ingestion error shape changes needed for hardened credential lookup failures -- Implement a narrow Gmail credential seam that: - - persists Gmail credentials through one explicit protected storage mechanism - - does not return secrets through account list or detail responses - - lets the existing single-message Gmail ingestion path resolve credentials through the hardened mechanism - - preserves deterministic account reads and per-user isolation + - Gmail account connect requests if a refresh-token-capable credential payload is required + - Gmail account responses if narrow non-secret token metadata must be surfaced + - Gmail ingestion error shapes if renewal failures need explicit typed responses +- Implement a narrow Gmail credential-renewal seam that: + - persists refresh-token-capable Gmail credentials through the existing protected credential mechanism + - renews access tokens through one explicit Gmail token-refresh path only + - updates the protected credential record deterministically after successful renewal + - lets the existing single-message Gmail ingestion path obtain a usable access token through the renewal path when required + - preserves secret-free account reads and per-user isolation - Add unit and integration tests for: - - protected credential persistence + - refresh-token credential persistence + - successful access-token renewal before single-message ingestion + - deterministic failure when refresh credentials are missing or invalid - absence of secret material in Gmail account responses - - successful single-message Gmail ingestion using the hardened credential path - - deterministic failure when required credentials are missing or invalid - per-user isolation - stable response shape @@ -59,66 +60,67 @@ Harden the narrow Gmail connector seam by replacing plaintext credential storage - No attachment ingestion. - No write-capable Gmail actions. - No Calendar connector scope. -- No OAuth UX or callback UI. +- No OAuth UI or callback handling. +- No external secret-manager integration yet. - No compile contract changes. - No runner-style orchestration. - No UI work. ## Required Deliverables -- Migration removing plaintext credential storage from the live Gmail seam in favor of a protected credential mechanism. -- Stable Gmail account contracts that no longer expose secret material. -- Updated single-message Gmail ingestion path that resolves credentials through the hardened mechanism. -- Unit and integration coverage for credential protection, ingestion continuity, failure handling, and isolation. +- Narrow protected-credential support for Gmail refresh-token lifecycle. +- Stable Gmail account contracts that remain secret-free on reads. +- Updated single-message Gmail ingestion path that can renew access tokens through the protected credential seam when needed. +- Unit and integration coverage for renewal success, renewal failure, response stability, and isolation. - Updated `BUILD_REPORT.md` with exact verification results and explicit deferred scope. ## Acceptance Criteria -- Gmail account records no longer persist plaintext access tokens in the normal application table surface. -- Gmail account list and detail responses do not expose secret material. -- The existing single-message Gmail ingestion path continues to work through the hardened credential mechanism. -- Missing or invalid credentials fail deterministically and do not corrupt task workspace or artifact state. +- The protected Gmail credential seam can persist a refresh-token-capable credential shape without reintroducing plaintext secrets to the normal `gmail_accounts` table surface. +- Gmail account list and detail responses remain secret-free. +- The existing single-message Gmail ingestion path can renew and use a fresh access token through the protected credential seam when needed. +- Missing or invalid refresh credentials fail deterministically and do not corrupt Gmail account, task workspace, or artifact state. - `./.venv/bin/python -m pytest tests/unit` passes. - `./.venv/bin/python -m pytest tests/integration` passes. -- No Gmail search, sync, attachments, write actions, Calendar, compile-contract, runner, or UI scope enters the sprint. +- No Gmail search, sync, attachments, write actions, Calendar, external secret-manager, compile-contract, runner, or UI scope enters the sprint. ## Implementation Constraints -- Keep the repair narrow and boring. -- Do not broaden the Gmail feature surface while fixing credential storage. -- Prefer one explicit protected credential mechanism over ad hoc masking or partial hiding. -- Preserve the current Gmail account and single-message ingestion seams as much as possible outside the security fix. -- If credential hardening needs one minimal rule added to `RULES.md`, keep it scoped to connector-secret handling only. +- Keep the auth-lifecycle extension narrow and boring. +- Reuse the existing `gmail_accounts` plus `gmail_account_credentials` seam rather than creating a second connector store. +- Preserve secret-free Gmail account reads. +- Support one explicit renewal path only; do not introduce account-wide sync, search, or OAuth UI in the same sprint. +- Preserve the existing single-message Gmail ingestion seam outside the credential-renewal addition. ## Suggested Work Breakdown -1. Add the schema/migration changes required for protected Gmail credential storage. -2. Update Gmail account contracts so secrets are accepted only on write and never returned on read. -3. Route single-message Gmail ingestion through the hardened credential lookup path. -4. Add deterministic failure handling for missing or invalid protected credentials. +1. Extend the protected Gmail credential shape and any required migration metadata. +2. Update Gmail connect contracts for refresh-token-capable writes while keeping reads secret-free. +3. Implement deterministic token-renewal logic in the Gmail service seam. +4. Route single-message Gmail ingestion through the renewal-capable credential lookup path. 5. Add unit and integration tests. 6. Update `BUILD_REPORT.md` with executed verification. ## Build Report Requirements `BUILD_REPORT.md` must include: -- the exact Gmail credential contract and schema changes introduced -- the protected credential storage mechanism used +- the exact Gmail refresh-token credential changes introduced +- the token-renewal rule and renewal trigger used - exact commands run - unit and integration test results -- one example Gmail account response proving secret removal -- one example Gmail ingestion response through the hardened path +- one example Gmail account response proving secret-free reads remain intact +- one example Gmail ingestion response through the renewal-capable path - what remains intentionally deferred to later milestones ## Review Focus `REVIEW_REPORT.md` should verify: -- the sprint stayed limited to Gmail credential hardening -- plaintext credential persistence is removed from the normal `gmail_accounts` surface -- Gmail account reads no longer expose secrets -- the existing single-message Gmail ingestion path still works through the hardened seam -- no hidden Gmail search, sync, attachments, write actions, Calendar, compile-contract, runner, or UI scope entered the sprint +- the sprint stayed limited to Gmail refresh-token lifecycle support +- secret-free account reads remain intact +- single-message Gmail ingestion still works through the protected credential seam +- renewal success and failure behavior are deterministic and test-backed +- no hidden Gmail search, sync, attachments, write actions, Calendar, external secret-manager, compile-contract, runner, or UI scope entered the sprint ## Exit Condition -This sprint is complete when the repo no longer stores plaintext Gmail credentials in the normal application table surface, the existing read-only single-message Gmail ingestion seam still works through the hardened credential path, and the full path is verified with Postgres-backed tests while broader Gmail and Calendar connector work remains deferred. +This sprint is complete when the repo can renew Gmail access tokens through the protected credential seam for the existing single-message ingestion path, keep Gmail account reads secret-free, and verify the full path with Postgres-backed tests while broader Gmail and Calendar behavior remains deferred. diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 6a76ea1..3e7ad09 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -2,17 +2,17 @@ ## Current Implemented Slice -AliceBot now implements the accepted repo slice through Sprint 5P. The shipped backend includes: +AliceBot now implements the accepted repo slice through Sprint 5Q. The shipped backend includes: - foundation continuity storage over `users`, `threads`, `sessions`, and append-only `events` - deterministic tracing and context compilation over durable continuity, memory, entity, and entity-edge records - governed memory admission, explicit-preference extraction, memory review labels, review queue reads, evaluation summary reads, explicit embedding config and memory-embedding storage, direct semantic retrieval, and deterministic hybrid compile-path memory merge - deterministic prompt assembly and one no-tools response path that persists assistant replies as immutable continuity events - user-scoped consents, policies, policy evaluation, tool registry, allowlist evaluation, tool routing, approval request persistence, approval resolution, approved-only proxy execution through the in-process `proxy.echo` handler, durable execution review, and execution-budget lifecycle plus enforcement -- a narrow read-only Gmail connector seam with user-scoped `gmail_accounts` metadata persistence, separate user-scoped `gmail_account_credentials` protected credential storage, deterministic account reads without secret exposure, and one explicit selected-message ingestion path that materializes one Gmail message as a rooted `.eml` task artifact and then reuses the existing RFC822 artifact ingestion pipeline +- a narrow read-only Gmail connector seam with user-scoped `gmail_accounts` metadata persistence, separate user-scoped `gmail_account_credentials` protected credential storage, deterministic account reads without secret exposure, refresh-token-capable protected credential renewal for expired access tokens, and one explicit selected-message ingestion path that materializes one Gmail message as a rooted `.eml` task artifact and then reuses the existing RFC822 artifact ingestion pipeline - durable `tasks`, `task_steps`, `task_workspaces`, `task_artifacts`, `task_artifact_chunks`, and `task_artifact_chunk_embeddings`, deterministic task-step sequencing, explicit task-step transitions, explicit manual continuation with lineage through `parent_step_id`, `source_approval_id`, and `source_execution_id`, explicit `tool_executions.task_step_id` linkage for execution synchronization, deterministic rooted local task-workspace provisioning, explicit rooted local artifact registration, deterministic local plain-text, markdown, narrow PDF text, narrow DOCX text, and narrow RFC822 email text ingestion into durable chunk rows, deterministic lexical artifact-chunk retrieval over durable chunk rows, explicit user-scoped artifact-chunk embedding persistence tied to existing embedding configs, explicit task-scoped or artifact-scoped semantic artifact-chunk retrieval over those durable embeddings, and compile-path artifact retrieval that can include lexical results, semantic results, or one deterministic hybrid lexical-plus-semantic merged artifact section with per-chunk source provenance -The current multi-step boundary is narrow and explicit. Manual continuation is implemented and review-passed. Approval resolution and proxy execution now both use explicit task-step linkage rather than first-step inference. Task workspaces are now implemented only as deterministic rooted local boundaries, and task artifacts are now implemented only as explicit rooted local-file registrations, narrow deterministic artifact ingestion under those workspaces, lexical retrieval over persisted chunk rows, explicit artifact-chunk embedding storage tied to existing embedding configs, direct semantic retrieval over those durable artifact-chunk embeddings for one visible task or one visible artifact at a time, and compile-path artifact retrieval that deterministically merges lexical and semantic candidates into one artifact section when both are requested for the same scope. The live richer-document boundary is still intentionally narrow: plain text and markdown ingest directly, PDF support is limited to narrow local text extraction, DOCX support is limited to narrow local text extraction from `word/document.xml`, RFC822 email support is limited to top-level selected headers plus extractable plain-text body content while excluding nested `message/rfc822` content, and the live connector boundary is limited to one read-only Gmail account seam plus one explicit selected-message ingestion path into the rooted RFC822 artifact pipeline, with credentials resolved through a dedicated protected credential seam instead of the normal account metadata table surface. OCR, image extraction, layout reconstruction, Gmail search, mailbox sync, attachments, Calendar connectors, reranking beyond the current lexical-first hybrid merge, and new side-effect surfaces are still planned later and must not be described as live behavior. +The current multi-step boundary is narrow and explicit. Manual continuation is implemented and review-passed. Approval resolution and proxy execution now both use explicit task-step linkage rather than first-step inference. Task workspaces are now implemented only as deterministic rooted local boundaries, and task artifacts are now implemented only as explicit rooted local-file registrations, narrow deterministic artifact ingestion under those workspaces, lexical retrieval over persisted chunk rows, explicit artifact-chunk embedding storage tied to existing embedding configs, direct semantic retrieval over those durable artifact-chunk embeddings for one visible task or one visible artifact at a time, and compile-path artifact retrieval that deterministically merges lexical and semantic candidates into one artifact section when both are requested for the same scope. The live richer-document boundary is still intentionally narrow: plain text and markdown ingest directly, PDF support is limited to narrow local text extraction, DOCX support is limited to narrow local text extraction from `word/document.xml`, RFC822 email support is limited to top-level selected headers plus extractable plain-text body content while excluding nested `message/rfc822` content, and the live connector boundary is limited to one read-only Gmail account seam plus one explicit selected-message ingestion path into the rooted RFC822 artifact pipeline, with credentials resolved through a dedicated protected credential seam, renewed through one explicit refresh path when an expired refresh-capable credential is present, and never exposed on the normal account metadata table surface. OCR, image extraction, layout reconstruction, Gmail search, mailbox sync, attachments, Calendar connectors, reranking beyond the current lexical-first hybrid merge, and new side-effect surfaces are still planned later and must not be described as live behavior. ## Implemented Now @@ -60,11 +60,11 @@ The current multi-step boundary is narrow and explicit. Manual continuation is i ### Repo Boundaries In This Slice -- `apps/api`: implemented API, store, contracts, service logic, and migrations for continuity, tracing, memory, embeddings, entities, policies, tools, approvals, proxy execution, execution budgets, a narrow read-only Gmail connector seam, tasks, task steps, task workspaces, task artifacts, artifact-chunk embeddings, deterministic lexical artifact chunk retrieval, deterministic semantic artifact chunk retrieval over durable embeddings, compile-path semantic artifact retrieval, and deterministic hybrid lexical-plus-semantic artifact merge in compile. +- `apps/api`: implemented API, store, contracts, service logic, and migrations for continuity, tracing, memory, embeddings, entities, policies, tools, approvals, proxy execution, execution budgets, a narrow read-only Gmail connector seam with protected refresh-token lifecycle support, tasks, task steps, task workspaces, task artifacts, artifact-chunk embeddings, deterministic lexical artifact chunk retrieval, deterministic semantic artifact chunk retrieval over durable embeddings, compile-path semantic artifact retrieval, and deterministic hybrid lexical-plus-semantic artifact merge in compile. - `apps/web`: minimal shell only; no shipped workflow UI. - `workers`: scaffold only; no background jobs or runner logic are implemented. - `infra`: local development bootstrap assets only. -- `tests`: unit and Postgres-backed integration coverage for the shipped seams above, including Sprint 4O task-step lineage/manual continuation, Sprint 4S step-linked execution synchronization, Sprint 5A task-workspace provisioning, Sprint 5C task-artifact registration, Sprint 5D local artifact ingestion plus chunk reads, Sprint 5E lexical artifact-chunk retrieval, Sprint 5F compile-path artifact chunk integration, Sprint 5G artifact-chunk embedding persistence and reads, Sprint 5H direct semantic artifact-chunk retrieval, Sprint 5I compile-path semantic artifact retrieval, Sprint 5J deterministic hybrid lexical-plus-semantic artifact merge in compile, Sprint 5L narrow PDF artifact ingestion, Sprint 5M narrow DOCX artifact ingestion, Sprint 5N narrow RFC822 email artifact ingestion, Sprint 5O read-only Gmail account plus single-message ingestion coverage, and Sprint 5P Gmail credential hardening coverage. +- `tests`: unit and Postgres-backed integration coverage for the shipped seams above, including Sprint 4O task-step lineage/manual continuation, Sprint 4S step-linked execution synchronization, Sprint 5A task-workspace provisioning, Sprint 5C task-artifact registration, Sprint 5D local artifact ingestion plus chunk reads, Sprint 5E lexical artifact-chunk retrieval, Sprint 5F compile-path artifact chunk integration, Sprint 5G artifact-chunk embedding persistence and reads, Sprint 5H direct semantic artifact-chunk retrieval, Sprint 5I compile-path semantic artifact retrieval, Sprint 5J deterministic hybrid lexical-plus-semantic artifact merge in compile, Sprint 5L narrow PDF artifact ingestion, Sprint 5M narrow DOCX artifact ingestion, Sprint 5N narrow RFC822 email artifact ingestion, Sprint 5O read-only Gmail account plus single-message ingestion coverage, Sprint 5P Gmail credential hardening coverage, and Sprint 5Q Gmail refresh-token lifecycle coverage. ## Core Flows Implemented Now @@ -96,10 +96,10 @@ The current multi-step boundary is narrow and explicit. Manual continuation is i 1. Accept a user-scoped `POST /v0/gmail-accounts` request for one read-only Gmail account metadata record. 2. Persist exactly the narrow connector metadata required for later reads on `gmail_accounts`: `provider_account_id`, `email_address`, optional `display_name`, and the fixed Gmail read-only scope. -3. Persist the Gmail access token only in the dedicated `gmail_account_credentials` protected credential seam bound to the same visible user/account ownership scope. +3. Persist Gmail secrets only in the dedicated `gmail_account_credentials` protected credential seam bound to the same visible user/account ownership scope, using either a narrow access-token-only shape or a refresh-token-capable credential shape with expiry metadata. 4. Expose deterministic user-scoped Gmail account list and detail reads without secret material. 5. Accept a user-scoped `POST /v0/gmail-accounts/{gmail_account_id}/messages/{provider_message_id}/ingest` request for one visible Gmail account and one visible task workspace. -6. Resolve the Gmail access token through the protected credential seam before any Gmail fetch, file write, or artifact registration. +6. Resolve the Gmail access token through the protected credential seam before any Gmail fetch, file write, or artifact registration, and renew it first through one explicit refresh path when the visible protected credential is refresh-capable and expired. 7. Derive one deterministic workspace-relative artifact path as `gmail//.eml`. 8. Reject duplicate `(task_workspace_id, relative_path)` collisions before any Gmail fetch or file write. 9. Fetch exactly one selected Gmail message through the read-only Gmail API path `users/me/messages/{provider_message_id}?format=raw`. diff --git a/BUILD_REPORT.md b/BUILD_REPORT.md index da20535..cc09658 100644 --- a/BUILD_REPORT.md +++ b/BUILD_REPORT.md @@ -2,122 +2,131 @@ ## sprint objective -Implement Sprint 5P: harden the existing narrow Gmail connector seam so Gmail access tokens are no longer stored on the normal `gmail_accounts` table surface, Gmail account reads never expose secrets, and the existing single-message Gmail ingestion path continues to work through an explicit protected credential lookup. +Implement Sprint 5Q: extend the protected Gmail credential seam so the existing single-message ingestion path can renew an expired access token from refresh-token-backed credentials, while keeping Gmail account reads secret-free and leaving broader Gmail and Calendar scope deferred. ## completed work -- Added migration `20260316_0027_gmail_account_credentials.py` to move Gmail tokens out of `gmail_accounts`. -- Introduced a protected Gmail credential table with: - - row-level security - - `gmail_account_id` ownership binding - - a checked credential blob shape - - backfill from existing `gmail_accounts.access_token` -- Removed plaintext `access_token` storage from the normal `gmail_accounts` table surface by dropping the column in the new migration. -- Kept the Gmail connect write contract narrow: - - connect still accepts `access_token` on write - - account list/detail responses still return the same stable metadata shape without secret material -- Updated the Gmail service seam to: - - persist account metadata and protected credentials separately - - resolve the access token through the protected credential lookup during ingestion - - fail deterministically when protected credentials are missing or malformed - - fail before Gmail fetches, artifact registration, or filesystem writes when credentials are unusable +- Added refresh-token-capable Gmail connect writes with an all-or-none refresh bundle: + - `refresh_token` + - `client_id` + - `client_secret` + - `access_token_expires_at` +- Extended the protected Gmail credential blob to support two narrow shapes: + - `gmail_oauth_access_token_v1` + - `gmail_oauth_refresh_token_v2` +- Added one explicit Gmail token-refresh path using `https://oauth2.googleapis.com/token`. +- Updated Gmail credential resolution so ingestion: + - uses the stored access token directly when the credential is still usable + - renews first when a refresh-capable credential is expired + - persists the refreshed access token and new expiry back into `gmail_account_credentials` +- Added a narrow protected-credential update store method for deterministic renewal writes. +- Added migration `20260316_0028_gmail_refresh_token_lifecycle.py` to: + - allow the new refreshable credential blob shape + - grant runtime `UPDATE` on `gmail_account_credentials` + - downgrade refreshable blobs back to the v1 access-token-only shape +- Kept Gmail account list/detail/connect responses secret-free on reads. - Added unit and integration coverage for: - - protected credential persistence - - secret removal from Gmail account responses - - hardened single-message ingestion success - - deterministic missing/invalid credential failures + - refresh-token credential persistence + - successful renewal before single-message ingestion + - deterministic invalid refresh-credential failure + - secret-free account responses - per-user isolation - - stable response shape - -## exact Gmail credential contract and schema changes introduced - -- Gmail account connect request remains: - - `user_id: UUID` - - `provider_account_id: str` - - `email_address: str` - - `display_name: str | null` - - `scope: "https://www.googleapis.com/auth/gmail.readonly"` - - `access_token: str` -- Gmail account read responses remain secret-free: - - `id` - - `provider` - - `auth_kind` - - `provider_account_id` - - `email_address` - - `display_name` - - `scope` - - `created_at` - - `updated_at` -- `gmail_accounts` schema change: - - dropped plaintext column `access_token` -- New `gmail_account_credentials` schema: - - `gmail_account_id uuid primary key references gmail_accounts(id) on delete cascade` - - `user_id uuid not null` - - `auth_kind text not null check = 'oauth_access_token'` - - `credential_blob jsonb not null` - - `created_at timestamptz not null` - - `updated_at timestamptz not null` - - composite ownership FK to `gmail_accounts (id, user_id)` - - RLS owner policy -- Protected credential blob shape: - - `{"credential_kind": "gmail_oauth_access_token_v1", "access_token": ""}` - -## protected credential storage mechanism used - -Gmail credentials are now stored in a dedicated `gmail_account_credentials` table guarded by row-level security and ownership checks, with the Gmail account record carrying only non-secret metadata. The ingestion path resolves the token through that separate protected table instead of reading it from `gmail_accounts`. + - migration upgrade/downgrade compatibility +- Added direct unit coverage for the raw `refresh_gmail_access_token()` helper, including success, `400/401` rejection mapping, and malformed or transport refresh failures. +- Updated `ARCHITECTURE.md` so the implemented slice and Gmail connector flow reflect Sprint 5Q refresh-token lifecycle behavior. + +## exact Gmail refresh-token credential changes introduced + +- `GmailAccountConnectInput` now accepts optional `refresh_token`, `client_id`, `client_secret`, and `access_token_expires_at`. +- `ConnectGmailAccountRequest` enforces that those four refresh fields are all present together or all absent. +- Protected credential v1 remains: + +```json +{ + "credential_kind": "gmail_oauth_access_token_v1", + "access_token": "" +} +``` + +- New protected credential v2 is: + +```json +{ + "credential_kind": "gmail_oauth_refresh_token_v2", + "access_token": "", + "refresh_token": "", + "client_id": "", + "client_secret": "", + "access_token_expires_at": "" +} +``` + +- `gmail_account_credentials` now allows both v1 and v2 blobs and runtime `UPDATE` so renewal can rewrite the protected record in place. + +## token-renewal rule and renewal trigger used + +- Renewal trigger: if the protected credential kind is `gmail_oauth_refresh_token_v2` and `access_token_expires_at <= now(UTC)` during Gmail message ingestion. +- Renewal path: POST once to Google’s OAuth token endpoint with `client_id`, `client_secret`, `refresh_token`, and `grant_type=refresh_token`. +- Success behavior: store the new `access_token` plus a recalculated `access_token_expires_at`, then continue the existing single-message Gmail fetch with that refreshed token. +- Non-renewal behavior: if the credential is v1 or the v2 token has not yet expired, use the stored access token directly. +- Deterministic failure behavior: + - malformed local protected credentials -> `409` + - Google refresh rejection (`400`/`401`) -> `409` + - non-deterministic refresh transport/response failures -> `502` +- Failure guardrail: these failures occur before artifact registration, chunk ingestion, or filesystem writes for the Gmail message. ## incomplete work -- None inside Sprint 5P scope. +- None inside Sprint 5Q scope. ## files changed -- `apps/api/alembic/versions/20260316_0027_gmail_account_credentials.py` -- `ARCHITECTURE.md` -- `RULES.md` - `apps/api/src/alicebot_api/contracts.py` - `apps/api/src/alicebot_api/gmail.py` - `apps/api/src/alicebot_api/main.py` - `apps/api/src/alicebot_api/store.py` -- `tests/integration/test_gmail_accounts_api.py` -- `tests/integration/test_migrations.py` -- `tests/unit/test_20260316_0027_gmail_account_credentials.py` +- `apps/api/alembic/versions/20260316_0028_gmail_refresh_token_lifecycle.py` +- `ARCHITECTURE.md` - `tests/unit/test_gmail.py` - `tests/unit/test_gmail_main.py` +- `tests/unit/test_gmail_refresh.py` +- `tests/unit/test_20260316_0028_gmail_refresh_token_lifecycle.py` +- `tests/integration/test_gmail_accounts_api.py` +- `tests/integration/test_migrations.py` - `BUILD_REPORT.md` ## exact commands run -- `./.venv/bin/python -m pytest tests/unit/test_gmail.py` -- `./.venv/bin/python -m pytest tests/unit/test_gmail_main.py tests/unit/test_20260316_0026_gmail_accounts.py tests/unit/test_20260316_0027_gmail_account_credentials.py` -- `./.venv/bin/python -m pytest tests/integration/test_gmail_accounts_api.py` -- `./.venv/bin/python -m pytest tests/integration/test_migrations.py` +- `./.venv/bin/python -m pytest tests/unit/test_gmail.py tests/unit/test_gmail_main.py tests/unit/test_20260316_0028_gmail_refresh_token_lifecycle.py` +- `./.venv/bin/python -m pytest tests/integration/test_gmail_accounts_api.py tests/integration/test_migrations.py` +- `./.venv/bin/python -m pytest tests/unit/test_gmail_refresh.py tests/unit/test_gmail.py tests/unit/test_gmail_main.py` - `./.venv/bin/python -m pytest tests/unit` - `./.venv/bin/python -m pytest tests/integration` ## tests run -- `./.venv/bin/python -m pytest tests/unit/test_gmail.py` - - Result: `12 passed in 0.11s` -- `./.venv/bin/python -m pytest tests/unit/test_gmail_main.py tests/unit/test_20260316_0026_gmail_accounts.py tests/unit/test_20260316_0027_gmail_account_credentials.py` - - Result: `11 passed in 0.55s` -- `./.venv/bin/python -m pytest tests/integration/test_gmail_accounts_api.py` - - Result: `6 passed in 2.10s` -- `./.venv/bin/python -m pytest tests/integration/test_migrations.py` - - Result: `3 passed in 1.35s` +- `./.venv/bin/python -m pytest tests/unit/test_gmail.py tests/unit/test_gmail_main.py tests/unit/test_20260316_0028_gmail_refresh_token_lifecycle.py` + - Result: `27 passed in 0.52s` +- `./.venv/bin/python -m pytest tests/integration/test_gmail_accounts_api.py tests/integration/test_migrations.py` + - Result: `12 passed in 4.35s` +- `./.venv/bin/python -m pytest tests/unit/test_gmail_refresh.py tests/unit/test_gmail.py tests/unit/test_gmail_main.py` + - Result: `31 passed in 0.29s` - `./.venv/bin/python -m pytest tests/unit` - - Result: `417 passed in 0.67s` + - Result: `434 passed in 0.67s` - `./.venv/bin/python -m pytest tests/integration` - - Result: `134 passed in 41.74s` + - Result: `137 passed in 41.99s` ## unit and integration test results - Required unit suite passed. - Required integration suite passed. -- Gmail-focused unit and integration coverage passed independently before the full-suite runs. -- Migration round-trip coverage now explicitly verifies Gmail credential backfill on upgrade and token restoration on downgrade for revision `20260316_0027`. +- Gmail-focused unit and integration coverage passed before the full packet-level runs. +- Direct unit coverage now pins the raw Gmail token-refresh helper’s success and error mapping behavior. +- Migration coverage now verifies both: + - the original Gmail credential hardening round trip + - the new refresh-token lifecycle migration round trip back to a downgrade-safe v1 blob -## one example Gmail account response proving secret removal +## one example Gmail account response proving secret-free reads remain intact ```json { @@ -125,7 +134,7 @@ Gmail credentials are now stored in a dedicated `gmail_account_credentials` tabl "id": "", "provider": "gmail", "auth_kind": "oauth_access_token", - "provider_account_id": "acct-owner-001", + "provider_account_id": "acct-owner-refresh-001", "email_address": "owner@gmail.example", "display_name": "Owner", "scope": "https://www.googleapis.com/auth/gmail.readonly", @@ -135,7 +144,7 @@ Gmail credentials are now stored in a dedicated `gmail_account_credentials` tabl } ``` -## one example Gmail ingestion response through the hardened path +## one example Gmail ingestion response through the renewal-capable path ```json { @@ -143,7 +152,7 @@ Gmail credentials are now stored in a dedicated `gmail_account_credentials` tabl "id": "", "provider": "gmail", "auth_kind": "oauth_access_token", - "provider_account_id": "acct-owner-001", + "provider_account_id": "acct-owner-refresh-001", "email_address": "owner@gmail.example", "display_name": "Owner", "scope": "https://www.googleapis.com/auth/gmail.readonly", @@ -152,7 +161,7 @@ Gmail credentials are now stored in a dedicated `gmail_account_credentials` tabl }, "message": { "provider_message_id": "msg-001", - "artifact_relative_path": "gmail/acct-owner-001/msg-001.eml", + "artifact_relative_path": "gmail/acct-owner-refresh-001/msg-001.eml", "media_type": "message/rfc822" }, "artifact": { @@ -161,14 +170,14 @@ Gmail credentials are now stored in a dedicated `gmail_account_credentials` tabl "task_workspace_id": "", "status": "registered", "ingestion_status": "ingested", - "relative_path": "gmail/acct-owner-001/msg-001.eml", + "relative_path": "gmail/acct-owner-refresh-001/msg-001.eml", "media_type_hint": "message/rfc822", "created_at": "", "updated_at": "" }, "summary": { "total_count": 1, - "total_characters": 19, + "total_characters": 18, "media_type": "message/rfc822", "chunking_rule": "normalized_utf8_text_fixed_window_1000_chars_v1", "order": ["sequence_no_asc", "id_asc"] @@ -178,7 +187,7 @@ Gmail credentials are now stored in a dedicated `gmail_account_credentials` tabl ## blockers/issues -- No remaining code blockers. +- No remaining implementation blockers. - Integration verification required local Postgres access outside the default sandbox. ## what remains intentionally deferred to later milestones @@ -189,11 +198,12 @@ Gmail credentials are now stored in a dedicated `gmail_account_credentials` tabl - write-capable Gmail actions - Calendar connector scope - OAuth UI or callback handling -- refresh-token lifecycle work +- external secret-manager integration - compile-contract changes -- runner orchestration +- runner-style orchestration - UI work +- refresh-token rotation beyond the single explicit renewal path ## recommended next step -Add the next narrow Gmail auth milestone only if needed: refresh-token or external secret-manager support, without broadening into search, sync, Calendar, or UI in the same change. +Keep the next Gmail milestone equally narrow: either add refresh-token rotation handling or move the protected Gmail secret seam into an external secret manager, but do not combine that work with search, sync, Calendar, or UI expansion. diff --git a/REVIEW_REPORT.md b/REVIEW_REPORT.md index 79d764a..0e2c22c 100644 --- a/REVIEW_REPORT.md +++ b/REVIEW_REPORT.md @@ -6,20 +6,21 @@ PASS ## criteria met -- Sprint 5P remains limited to Gmail credential hardening. I found no Gmail search, sync, attachments, write actions, Calendar, compile-contract, runner, or UI scope. -- Plaintext Gmail access tokens are removed from the normal `gmail_accounts` table surface. Revision `20260316_0027` backfills tokens into `gmail_account_credentials` and drops `gmail_accounts.access_token`. -- Gmail account list and detail reads remain secret-free. The Gmail account response shape is stable and the integration suite asserts `access_token` is absent from connect, list, and detail payloads. -- The existing single-message Gmail ingestion seam still works through the hardened path. Ingestion now resolves the token through the protected credential seam before Gmail fetches and then reuses the existing RFC822 artifact pipeline. -- Missing or invalid protected credentials fail deterministically before Gmail fetches, workspace writes, or artifact registration, with unit and integration coverage for the no-side-effects path. -- Per-user isolation remains intact through user-scoped connections, RLS on `gmail_account_credentials`, and the ownership FK binding credentials to the visible Gmail account row. -- `BUILD_REPORT.md` now matches the implemented files and current verification state. -- `ARCHITECTURE.md` now reflects Sprint 5P, the `gmail_account_credentials` table, and the protected credential lookup in the narrow Gmail flow. -- `RULES.md` now includes the narrow connector-secret handling rule. -- Migration coverage now includes a Gmail-specific round-trip test proving `20260316_0027` backfills tokens on upgrade and restores `gmail_accounts.access_token` on downgrade. +- The sprint remains within the Sprint 5Q boundary. I found no Gmail search, sync, attachments, write actions, Calendar, compile-contract, runner, or UI scope in the implementation or follow-up changes. +- The protected Gmail credential seam supports the narrow refresh-token-capable credential shape required by the packet, without reintroducing secrets to `gmail_accounts`. +- Gmail account connect, list, and detail reads remain secret-free. Coverage asserts that `access_token`, `refresh_token`, and `client_secret` do not appear in account response payloads. +- The existing single-message Gmail ingestion path renews an expired access token through one explicit refresh path and then continues through the existing RFC822 artifact ingestion seam. +- Missing or invalid refresh credentials fail deterministically before fetch, artifact registration, chunk ingestion, or workspace writes. +- Per-user isolation remains intact through the existing user-scoped store access, row-level security, and the `(gmail_account_id, user_id)` ownership binding on protected credentials. +- Migration coverage includes the refresh-token lifecycle round trip and downgrade compatibility. +- The prior review follow-ups are now addressed: + - `ARCHITECTURE.md` reflects Sprint 5Q and the renewal-before-ingestion Gmail seam. + - `tests/unit/test_gmail_refresh.py` adds direct coverage for the raw refresh helper success path, `400/401` rejection mapping, and malformed or transport failures. +- `BUILD_REPORT.md` now matches the implemented Sprint 5Q behavior and includes the follow-up test coverage/doc updates. - Verified locally: - - `./.venv/bin/python -m pytest tests/unit` -> `417 passed in 0.55s` - - `./.venv/bin/python -m pytest tests/integration/test_migrations.py` -> `3 passed in 1.46s` - - `./.venv/bin/python -m pytest tests/integration` -> `134 passed in 39.86s` + - `./.venv/bin/python -m pytest tests/unit/test_gmail_refresh.py tests/unit/test_gmail.py tests/unit/test_gmail_main.py` -> `31 passed in 0.28s` + - `./.venv/bin/python -m pytest tests/unit` -> `434 passed in 0.64s` + - `./.venv/bin/python -m pytest tests/integration` -> `137 passed in 40.83s` from the prior sprint review run, with no subsequent code-path changes beyond docs and unit-test additions ## criteria missed @@ -27,12 +28,11 @@ PASS ## quality issues -- No blocking implementation, regression, or scope issues remain for Sprint 5P. +- No blocking implementation, test, documentation, regression, or scope issues remain for Sprint 5Q. ## regression risks -- Residual risk is limited to intentionally deferred work already called out by the sprint packet: refresh-token lifecycle, external secret-manager support, Gmail search, sync, attachments, write actions, Calendar, compile-contract changes, runner orchestration, and UI. -- The protected credential mechanism is still database-local storage rather than encryption or an external secret manager. That matches Sprint 5P and should not be described more broadly than that. +- Residual risk is limited to intentionally deferred scope already called out by the sprint packet: Gmail search, sync, attachments, write actions, Calendar, external secret-manager integration, compile-contract changes, runner orchestration, UI work, and refresh-token rotation beyond the single explicit renewal path. ## docs issues @@ -40,7 +40,7 @@ PASS ## should anything be added to RULES.md? -- No further rule change is required for this sprint. +- No. ## should anything update ARCHITECTURE.md? @@ -48,4 +48,4 @@ PASS ## recommended next action -- Accept Sprint 5P as complete and merge after the normal approval flow. Any next Gmail work should stay narrow and focus on refresh-token or secret-manager evolution without broadening into search, sync, Calendar, or UI in the same change. +- Accept Sprint 5Q as complete and proceed with the normal merge/approval flow. Keep the next Gmail milestone equally narrow and avoid combining secret-lifecycle work with search, sync, Calendar, or UI expansion. diff --git a/apps/api/alembic/versions/20260316_0028_gmail_refresh_token_lifecycle.py b/apps/api/alembic/versions/20260316_0028_gmail_refresh_token_lifecycle.py new file mode 100644 index 0000000..7d30c37 --- /dev/null +++ b/apps/api/alembic/versions/20260316_0028_gmail_refresh_token_lifecycle.py @@ -0,0 +1,89 @@ +"""Allow Gmail protected credentials to store refresh-token lifecycle data.""" + +from __future__ import annotations + +from alembic import op + + +revision = "20260316_0028" +down_revision = "20260316_0027" +branch_labels = None +depends_on = None + +GMAIL_PROTECTED_CREDENTIAL_KIND = "gmail_oauth_access_token_v1" +GMAIL_REFRESHABLE_PROTECTED_CREDENTIAL_KIND = "gmail_oauth_refresh_token_v2" + +_UPGRADE_STATEMENTS = ( + "ALTER TABLE gmail_account_credentials DROP CONSTRAINT gmail_account_credentials_blob_shape_check", + f""" + ALTER TABLE gmail_account_credentials + ADD CONSTRAINT gmail_account_credentials_blob_shape_check + CHECK ( + jsonb_typeof(credential_blob) = 'object' + AND credential_blob ? 'credential_kind' + AND credential_blob ? 'access_token' + AND jsonb_typeof(credential_blob -> 'access_token') = 'string' + AND length(credential_blob ->> 'access_token') > 0 + AND ( + ( + credential_blob ->> 'credential_kind' = '{GMAIL_PROTECTED_CREDENTIAL_KIND}' + ) + OR + ( + credential_blob ->> 'credential_kind' = '{GMAIL_REFRESHABLE_PROTECTED_CREDENTIAL_KIND}' + AND credential_blob ? 'refresh_token' + AND credential_blob ? 'client_id' + AND credential_blob ? 'client_secret' + AND credential_blob ? 'access_token_expires_at' + AND jsonb_typeof(credential_blob -> 'refresh_token') = 'string' + AND jsonb_typeof(credential_blob -> 'client_id') = 'string' + AND jsonb_typeof(credential_blob -> 'client_secret') = 'string' + AND jsonb_typeof(credential_blob -> 'access_token_expires_at') = 'string' + AND length(credential_blob ->> 'refresh_token') > 0 + AND length(credential_blob ->> 'client_id') > 0 + AND length(credential_blob ->> 'client_secret') > 0 + AND length(credential_blob ->> 'access_token_expires_at') > 0 + ) + ) + ) + """, + "GRANT UPDATE ON gmail_account_credentials TO alicebot_app", +) + +_DOWNGRADE_STATEMENTS = ( + """ + UPDATE gmail_account_credentials + SET credential_blob = jsonb_build_object( + 'credential_kind', 'gmail_oauth_access_token_v1', + 'access_token', credential_blob ->> 'access_token' + ) + WHERE credential_blob ->> 'credential_kind' = 'gmail_oauth_refresh_token_v2' + """, + "REVOKE UPDATE ON gmail_account_credentials FROM alicebot_app", + "ALTER TABLE gmail_account_credentials DROP CONSTRAINT gmail_account_credentials_blob_shape_check", + f""" + ALTER TABLE gmail_account_credentials + ADD CONSTRAINT gmail_account_credentials_blob_shape_check + CHECK ( + jsonb_typeof(credential_blob) = 'object' + AND credential_blob ? 'credential_kind' + AND credential_blob ? 'access_token' + AND credential_blob ->> 'credential_kind' = '{GMAIL_PROTECTED_CREDENTIAL_KIND}' + AND jsonb_typeof(credential_blob -> 'access_token') = 'string' + AND length(credential_blob ->> 'access_token') > 0 + ) + """, +) + + +def _execute_statements(statements: tuple[str, ...]) -> None: + for statement in statements: + op.execute(statement) + + +def upgrade() -> None: + _execute_statements(_UPGRADE_STATEMENTS) + + +def downgrade() -> None: + _execute_statements(_DOWNGRADE_STATEMENTS) diff --git a/apps/api/src/alicebot_api/contracts.py b/apps/api/src/alicebot_api/contracts.py index 035273e..4453118 100644 --- a/apps/api/src/alicebot_api/contracts.py +++ b/apps/api/src/alicebot_api/contracts.py @@ -179,6 +179,7 @@ GMAIL_AUTH_KIND_OAUTH_ACCESS_TOKEN = "oauth_access_token" GMAIL_READONLY_SCOPE = "https://www.googleapis.com/auth/gmail.readonly" GMAIL_PROTECTED_CREDENTIAL_KIND = "gmail_oauth_access_token_v1" +GMAIL_REFRESHABLE_PROTECTED_CREDENTIAL_KIND = "gmail_oauth_refresh_token_v2" TASK_STEP_SEQUENCE_VERSION_V0 = "task_step_sequence_v0" TRACE_KIND_TASK_STEP_SEQUENCE = "task.step.sequence" TASK_STEP_CONTINUATION_VERSION_V0 = "task_step_continuation_v0" @@ -1781,6 +1782,10 @@ class GmailAccountConnectInput: display_name: str | None scope: str access_token: str + refresh_token: str | None = None + client_id: str | None = None + client_secret: str | None = None + access_token_expires_at: datetime | None = None @dataclass(frozen=True, slots=True) diff --git a/apps/api/src/alicebot_api/gmail.py b/apps/api/src/alicebot_api/gmail.py index 9d908d1..5d60a33 100644 --- a/apps/api/src/alicebot_api/gmail.py +++ b/apps/api/src/alicebot_api/gmail.py @@ -3,9 +3,11 @@ import base64 import json import re +from dataclasses import dataclass +from datetime import UTC, datetime, timedelta from pathlib import Path from urllib.error import HTTPError, URLError -from urllib.parse import quote +from urllib.parse import quote, urlencode from urllib.request import Request, urlopen from uuid import UUID @@ -25,6 +27,7 @@ GMAIL_AUTH_KIND_OAUTH_ACCESS_TOKEN, GMAIL_PROTECTED_CREDENTIAL_KIND, GMAIL_PROVIDER, + GMAIL_REFRESHABLE_PROTECTED_CREDENTIAL_KIND, GMAIL_READONLY_SCOPE, GmailAccountConnectInput, GmailAccountConnectResponse, @@ -40,6 +43,8 @@ from alicebot_api.workspaces import TaskWorkspaceNotFoundError GMAIL_MESSAGE_FETCH_TIMEOUT_SECONDS = 30 +GMAIL_TOKEN_REFRESH_TIMEOUT_SECONDS = 30 +GMAIL_TOKEN_REFRESH_URL = "https://oauth2.googleapis.com/token" GMAIL_MESSAGE_ARTIFACT_ROOT = "gmail" _PATH_SEGMENT_PATTERN = re.compile(r"[^A-Za-z0-9._-]+") @@ -72,6 +77,24 @@ class GmailCredentialInvalidError(RuntimeError): """Raised when Gmail protected credentials are malformed for a visible account.""" +class GmailCredentialRefreshError(RuntimeError): + """Raised when Gmail access-token renewal fails for non-deterministic reasons.""" + + +class GmailCredentialValidationError(ValueError): + """Raised when Gmail connect input contains an invalid credential combination.""" + + +@dataclass(frozen=True, slots=True) +class ParsedGmailCredential: + access_token: str + credential_kind: str + refresh_token: str | None = None + client_id: str | None = None + client_secret: str | None = None + access_token_expires_at: datetime | None = None + + def serialize_gmail_account_row(row: GmailAccountRow) -> GmailAccountRecord: return { "id": str(row["id"]), @@ -86,13 +109,184 @@ def serialize_gmail_account_row(row: GmailAccountRow) -> GmailAccountRecord: } -def build_gmail_protected_credential_blob(*, access_token: str) -> dict[str, str]: +def _coerce_nonempty_string(value: object) -> str | None: + if not isinstance(value, str): + return None + normalized = value.strip() + if normalized == "": + return None + return normalized + + +def _normalize_datetime(value: datetime) -> datetime: + if value.tzinfo is None: + return value.replace(tzinfo=UTC) + return value.astimezone(UTC) + + +def build_gmail_protected_credential_blob( + *, + access_token: str, + refresh_token: str | None = None, + client_id: str | None = None, + client_secret: str | None = None, + access_token_expires_at: datetime | None = None, +) -> dict[str, str]: + normalized_access_token = _coerce_nonempty_string(access_token) + if normalized_access_token is None: + raise GmailCredentialValidationError("gmail access token must be non-empty") + + refresh_bundle = ( + refresh_token, + client_id, + client_secret, + access_token_expires_at, + ) + if all(value is None for value in refresh_bundle): + return { + "credential_kind": GMAIL_PROTECTED_CREDENTIAL_KIND, + "access_token": normalized_access_token, + } + + normalized_refresh_token = _coerce_nonempty_string(refresh_token) + normalized_client_id = _coerce_nonempty_string(client_id) + normalized_client_secret = _coerce_nonempty_string(client_secret) + if ( + normalized_refresh_token is None + or normalized_client_id is None + or normalized_client_secret is None + or access_token_expires_at is None + ): + raise GmailCredentialValidationError( + "gmail refresh credentials must include refresh_token, client_id, client_secret, " + "and access_token_expires_at" + ) + + normalized_expires_at = _normalize_datetime(access_token_expires_at) return { - "credential_kind": GMAIL_PROTECTED_CREDENTIAL_KIND, - "access_token": access_token, + "credential_kind": GMAIL_REFRESHABLE_PROTECTED_CREDENTIAL_KIND, + "access_token": normalized_access_token, + "refresh_token": normalized_refresh_token, + "client_id": normalized_client_id, + "client_secret": normalized_client_secret, + "access_token_expires_at": normalized_expires_at.isoformat(), } +def _parse_gmail_credential( + *, + gmail_account_id: UUID, + credential_blob: object, +) -> ParsedGmailCredential: + if not isinstance(credential_blob, dict): + raise GmailCredentialInvalidError( + f"gmail account {gmail_account_id} has invalid protected credentials" + ) + + credential_kind = credential_blob.get("credential_kind") + access_token = _coerce_nonempty_string(credential_blob.get("access_token")) + if access_token is None: + raise GmailCredentialInvalidError( + f"gmail account {gmail_account_id} has invalid protected credentials" + ) + + if credential_kind == GMAIL_PROTECTED_CREDENTIAL_KIND: + return ParsedGmailCredential( + access_token=access_token, + credential_kind=credential_kind, + ) + + if credential_kind != GMAIL_REFRESHABLE_PROTECTED_CREDENTIAL_KIND: + raise GmailCredentialInvalidError( + f"gmail account {gmail_account_id} has invalid protected credentials" + ) + + refresh_token = _coerce_nonempty_string(credential_blob.get("refresh_token")) + client_id = _coerce_nonempty_string(credential_blob.get("client_id")) + client_secret = _coerce_nonempty_string(credential_blob.get("client_secret")) + access_token_expires_at_raw = _coerce_nonempty_string( + credential_blob.get("access_token_expires_at") + ) + if ( + refresh_token is None + or client_id is None + or client_secret is None + or access_token_expires_at_raw is None + ): + raise GmailCredentialInvalidError( + f"gmail account {gmail_account_id} has invalid protected credentials" + ) + + try: + access_token_expires_at = _normalize_datetime( + datetime.fromisoformat(access_token_expires_at_raw) + ) + except ValueError as exc: + raise GmailCredentialInvalidError( + f"gmail account {gmail_account_id} has invalid protected credentials" + ) from exc + + return ParsedGmailCredential( + access_token=access_token, + credential_kind=credential_kind, + refresh_token=refresh_token, + client_id=client_id, + client_secret=client_secret, + access_token_expires_at=access_token_expires_at, + ) + + +def refresh_gmail_access_token( + *, + gmail_account_id: UUID, + refresh_token: str, + client_id: str, + client_secret: str, +) -> tuple[str, datetime]: + request = Request( + GMAIL_TOKEN_REFRESH_URL, + data=urlencode( + { + "client_id": client_id, + "client_secret": client_secret, + "refresh_token": refresh_token, + "grant_type": "refresh_token", + } + ).encode("utf-8"), + headers={ + "Content-Type": "application/x-www-form-urlencoded", + "Accept": "application/json", + }, + method="POST", + ) + + try: + with urlopen(request, timeout=GMAIL_TOKEN_REFRESH_TIMEOUT_SECONDS) as response: + payload = json.loads(response.read().decode("utf-8")) + except HTTPError as exc: + if exc.code in {400, 401}: + raise GmailCredentialInvalidError( + f"gmail account {gmail_account_id} refresh credentials were rejected" + ) from exc + raise GmailCredentialRefreshError( + f"gmail account {gmail_account_id} access token could not be renewed" + ) from exc + except (OSError, URLError, UnicodeDecodeError, json.JSONDecodeError) as exc: + raise GmailCredentialRefreshError( + f"gmail account {gmail_account_id} access token could not be renewed" + ) from exc + + refreshed_access_token = _coerce_nonempty_string(payload.get("access_token")) + expires_in = payload.get("expires_in") + if refreshed_access_token is None or not isinstance(expires_in, (int, float)) or expires_in <= 0: + raise GmailCredentialRefreshError( + f"gmail account {gmail_account_id} access token could not be renewed" + ) + + refreshed_expires_at = datetime.now(UTC) + timedelta(seconds=float(expires_in)) + return refreshed_access_token, refreshed_expires_at + + def resolve_gmail_access_token( store: ContinuityStore, *, @@ -109,24 +303,35 @@ def resolve_gmail_access_token( f"gmail account {gmail_account_id} has invalid protected credentials" ) - credential_blob = credential["credential_blob"] - if not isinstance(credential_blob, dict): - raise GmailCredentialInvalidError( - f"gmail account {gmail_account_id} has invalid protected credentials" - ) - - credential_kind = credential_blob.get("credential_kind") - access_token = credential_blob.get("access_token") + parsed_credential = _parse_gmail_credential( + gmail_account_id=gmail_account_id, + credential_blob=credential["credential_blob"], + ) if ( - credential_kind != GMAIL_PROTECTED_CREDENTIAL_KIND - or not isinstance(access_token, str) - or access_token == "" + parsed_credential.credential_kind != GMAIL_REFRESHABLE_PROTECTED_CREDENTIAL_KIND + or parsed_credential.access_token_expires_at is None + or parsed_credential.access_token_expires_at > datetime.now(UTC) ): - raise GmailCredentialInvalidError( - f"gmail account {gmail_account_id} has invalid protected credentials" - ) + return parsed_credential.access_token - return access_token + refreshed_access_token, refreshed_expires_at = refresh_gmail_access_token( + gmail_account_id=gmail_account_id, + refresh_token=parsed_credential.refresh_token, + client_id=parsed_credential.client_id, + client_secret=parsed_credential.client_secret, + ) + store.update_gmail_account_credential( + gmail_account_id=gmail_account_id, + auth_kind=credential["auth_kind"], + credential_blob=build_gmail_protected_credential_blob( + access_token=refreshed_access_token, + refresh_token=parsed_credential.refresh_token, + client_id=parsed_credential.client_id, + client_secret=parsed_credential.client_secret, + access_token_expires_at=refreshed_expires_at, + ), + ) + return refreshed_access_token def create_gmail_account_record( @@ -155,6 +360,10 @@ def create_gmail_account_record( auth_kind=GMAIL_AUTH_KIND_OAUTH_ACCESS_TOKEN, credential_blob=build_gmail_protected_credential_blob( access_token=request.access_token, + refresh_token=request.refresh_token, + client_id=request.client_id, + client_secret=request.client_secret, + access_token_expires_at=request.access_token_expires_at, ), ) except psycopg.errors.UniqueViolation as exc: diff --git a/apps/api/src/alicebot_api/main.py b/apps/api/src/alicebot_api/main.py index 9fbe271..7382b06 100644 --- a/apps/api/src/alicebot_api/main.py +++ b/apps/api/src/alicebot_api/main.py @@ -5,7 +5,7 @@ from uuid import UUID from fastapi import FastAPI, Query from fastapi.encoders import jsonable_encoder -from pydantic import BaseModel, ConfigDict, Field +from pydantic import BaseModel, ConfigDict, Field, model_validator from fastapi.responses import JSONResponse from urllib.parse import urlsplit, urlunsplit @@ -142,6 +142,8 @@ GmailAccountAlreadyExistsError, GmailCredentialInvalidError, GmailCredentialNotFoundError, + GmailCredentialRefreshError, + GmailCredentialValidationError, GmailAccountNotFoundError, GmailMessageFetchError, GmailMessageNotFoundError, @@ -541,6 +543,27 @@ class ConnectGmailAccountRequest(BaseModel): display_name: str | None = Field(default=None, min_length=1, max_length=200) scope: Literal["https://www.googleapis.com/auth/gmail.readonly"] = GMAIL_READONLY_SCOPE access_token: str = Field(min_length=1, max_length=8000) + refresh_token: str | None = Field(default=None, min_length=1, max_length=8000) + client_id: str | None = Field(default=None, min_length=1, max_length=2000) + client_secret: str | None = Field(default=None, min_length=1, max_length=8000) + access_token_expires_at: datetime | None = None + + @model_validator(mode="after") + def validate_refresh_bundle(self) -> ConnectGmailAccountRequest: + refresh_bundle = ( + self.refresh_token, + self.client_id, + self.client_secret, + self.access_token_expires_at, + ) + if all(value is None for value in refresh_bundle): + return self + if any(value is None for value in refresh_bundle): + raise ValueError( + "gmail refresh credentials must include refresh_token, client_id, " + "client_secret, and access_token_expires_at" + ) + return self class IngestGmailMessageRequest(BaseModel): @@ -1301,8 +1324,14 @@ def connect_gmail_account(request: ConnectGmailAccountRequest) -> JSONResponse: display_name=request.display_name, scope=request.scope, access_token=request.access_token, + refresh_token=request.refresh_token, + client_id=request.client_id, + client_secret=request.client_secret, + access_token_expires_at=request.access_token_expires_at, ), ) + except GmailCredentialValidationError as exc: + return JSONResponse(status_code=400, content={"detail": str(exc)}) except GmailAccountAlreadyExistsError as exc: return JSONResponse(status_code=409, content={"detail": str(exc)}) @@ -1379,7 +1408,7 @@ def ingest_gmail_message( return JSONResponse(status_code=409, content={"detail": str(exc)}) except TaskArtifactValidationError as exc: return JSONResponse(status_code=400, content={"detail": str(exc)}) - except GmailMessageFetchError as exc: + except (GmailMessageFetchError, GmailCredentialRefreshError) as exc: return JSONResponse(status_code=502, content={"detail": str(exc)}) except TaskArtifactAlreadyExistsError as exc: return JSONResponse(status_code=409, content={"detail": str(exc)}) diff --git a/apps/api/src/alicebot_api/store.py b/apps/api/src/alicebot_api/store.py index 4822a44..61dcd53 100644 --- a/apps/api/src/alicebot_api/store.py +++ b/apps/api/src/alicebot_api/store.py @@ -1587,6 +1587,22 @@ class LabelCountRow(TypedDict): WHERE gmail_account_id = %s """ +UPDATE_GMAIL_ACCOUNT_CREDENTIAL_SQL = """ + UPDATE gmail_account_credentials + SET + auth_kind = %s, + credential_blob = %s, + updated_at = clock_timestamp() + WHERE gmail_account_id = %s + RETURNING + gmail_account_id, + user_id, + auth_kind, + credential_blob, + created_at, + updated_at + """ + LIST_GMAIL_ACCOUNTS_SQL = """ SELECT id, @@ -3155,6 +3171,19 @@ def get_gmail_account_credential_optional( ) -> ProtectedGmailCredentialRow | None: return self._fetch_optional_one(GET_GMAIL_ACCOUNT_CREDENTIAL_SQL, (gmail_account_id,)) + def update_gmail_account_credential( + self, + *, + gmail_account_id: UUID, + auth_kind: str, + credential_blob: JsonObject, + ) -> ProtectedGmailCredentialRow: + return self._fetch_one( + "update_gmail_account_credential", + UPDATE_GMAIL_ACCOUNT_CREDENTIAL_SQL, + (auth_kind, Jsonb(credential_blob), gmail_account_id), + ) + def get_gmail_account_by_provider_account_id_optional( self, provider_account_id: str, diff --git a/tests/integration/test_gmail_accounts_api.py b/tests/integration/test_gmail_accounts_api.py index 334cabd..8385474 100644 --- a/tests/integration/test_gmail_accounts_api.py +++ b/tests/integration/test_gmail_accounts_api.py @@ -1,6 +1,7 @@ from __future__ import annotations import json +from datetime import datetime from pathlib import Path from typing import Any from urllib.parse import urlencode @@ -151,18 +152,28 @@ def seed_task(database_url: str, *, email: str) -> dict[str, UUID]: } -def _connect_gmail_account(*, user_id: UUID, provider_account_id: str, email_address: str) -> tuple[int, dict[str, Any]]: +def _connect_gmail_account( + *, + user_id: UUID, + provider_account_id: str, + email_address: str, + credential_overrides: dict[str, Any] | None = None, +) -> tuple[int, dict[str, Any]]: + payload = { + "user_id": str(user_id), + "provider_account_id": provider_account_id, + "email_address": email_address, + "display_name": email_address.split("@", 1)[0].title(), + "scope": "https://www.googleapis.com/auth/gmail.readonly", + "access_token": f"token-for-{provider_account_id}", + } + if credential_overrides is not None: + payload.update(credential_overrides) + return invoke_request( "POST", "/v0/gmail-accounts", - payload={ - "user_id": str(user_id), - "provider_account_id": provider_account_id, - "email_address": email_address, - "display_name": email_address.split("@", 1)[0].title(), - "scope": "https://www.googleapis.com/auth/gmail.readonly", - "access_token": f"token-for-{provider_account_id}", - }, + payload=payload, ) @@ -244,6 +255,8 @@ def test_gmail_account_endpoints_connect_list_detail_and_isolate( assert '"access_token":' not in json.dumps(create_payload) assert '"access_token":' not in json.dumps(list_payload) assert '"access_token":' not in json.dumps(detail_payload) + assert '"refresh_token":' not in json.dumps(create_payload) + assert '"client_secret":' not in json.dumps(create_payload) with psycopg.connect(migrated_database_urls["admin"]) as conn: with conn.cursor() as cur: @@ -363,6 +376,104 @@ def test_gmail_message_ingestion_endpoint_persists_artifact_and_chunks( assert chunk_rows[0]["text"].startswith("From: Alice ") +def test_gmail_message_ingestion_endpoint_renews_expired_access_token( + migrated_database_urls, + monkeypatch, + tmp_path, +) -> None: + owner = seed_task(migrated_database_urls["app"], email="owner@example.com") + workspace_root = tmp_path / "task-workspaces" + monkeypatch.setattr( + main_module, + "get_settings", + lambda: Settings( + database_url=migrated_database_urls["app"], + task_workspace_root=str(workspace_root), + ), + ) + raw_bytes = _build_rfc822_email_bytes(subject="Inbox Update", plain_body="renewed token path") + fetch_tokens: list[str] = [] + + monkeypatch.setattr( + gmail_module, + "refresh_gmail_access_token", + lambda **_kwargs: ("token-refreshed", datetime.fromisoformat("2030-01-01T00:05:00+00:00")), + ) + + def fake_fetch_gmail_message_raw_bytes(*, access_token: str, **_kwargs) -> bytes: + fetch_tokens.append(access_token) + return raw_bytes + + monkeypatch.setattr( + gmail_module, + "fetch_gmail_message_raw_bytes", + fake_fetch_gmail_message_raw_bytes, + ) + + account_status, account_payload = _connect_gmail_account( + user_id=owner["user_id"], + provider_account_id="acct-owner-refresh-001", + email_address="owner@gmail.example", + credential_overrides={ + "refresh_token": "refresh-owner-001", + "client_id": "client-owner-001", + "client_secret": "secret-owner-001", + "access_token_expires_at": "2020-01-01T00:00:00+00:00", + }, + ) + workspace_status, workspace_payload = invoke_request( + "POST", + f"/v0/tasks/{owner['task_id']}/workspace", + payload={"user_id": str(owner["user_id"])}, + ) + ingest_status, ingest_payload = invoke_request( + "POST", + f"/v0/gmail-accounts/{account_payload['account']['id']}/messages/msg-001/ingest", + payload={ + "user_id": str(owner["user_id"]), + "task_workspace_id": workspace_payload["workspace"]["id"], + }, + ) + + assert account_status == 201 + assert workspace_status == 201 + assert ingest_status == 200 + assert fetch_tokens == ["token-refreshed"] + assert ingest_payload["message"] == { + "provider_message_id": "msg-001", + "artifact_relative_path": "gmail/acct-owner-refresh-001/msg-001.eml", + "media_type": "message/rfc822", + } + assert '"refresh_token":' not in json.dumps(ingest_payload) + assert '"client_secret":' not in json.dumps(ingest_payload) + + with psycopg.connect(migrated_database_urls["admin"]) as conn: + with conn.cursor() as cur: + cur.execute( + """ + SELECT + credential_blob ->> 'credential_kind', + credential_blob ->> 'access_token', + credential_blob ->> 'refresh_token', + credential_blob ->> 'client_id', + credential_blob ->> 'client_secret', + credential_blob ->> 'access_token_expires_at' + FROM gmail_account_credentials + WHERE gmail_account_id = %s + """, + (UUID(account_payload["account"]["id"]),), + ) + credential_row = cur.fetchone() + + assert credential_row is not None + assert credential_row[0] == "gmail_oauth_refresh_token_v2" + assert credential_row[1] == "token-refreshed" + assert credential_row[2] == "refresh-owner-001" + assert credential_row[3] == "client-owner-001" + assert credential_row[4] == "secret-owner-001" + assert credential_row[5] == "2030-01-01T00:05:00+00:00" + + def test_gmail_message_ingestion_endpoint_rejects_cross_user_workspace_access( migrated_database_urls, monkeypatch, @@ -478,6 +589,78 @@ def fail_fetch(**_kwargs): assert store.list_task_artifacts_for_task(owner["task_id"]) == [] +def test_gmail_message_ingestion_endpoint_rejects_invalid_refresh_credentials_without_side_effects( + migrated_database_urls, + monkeypatch, + tmp_path, +) -> None: + owner = seed_task(migrated_database_urls["app"], email="owner@example.com") + workspace_root = tmp_path / "task-workspaces" + monkeypatch.setattr( + main_module, + "get_settings", + lambda: Settings( + database_url=migrated_database_urls["app"], + task_workspace_root=str(workspace_root), + ), + ) + + _, account_payload = _connect_gmail_account( + user_id=owner["user_id"], + provider_account_id="acct-owner-refresh-001", + email_address="owner@gmail.example", + credential_overrides={ + "refresh_token": "refresh-owner-001", + "client_id": "client-owner-001", + "client_secret": "secret-owner-001", + "access_token_expires_at": "2020-01-01T00:00:00+00:00", + }, + ) + _, workspace_payload = invoke_request( + "POST", + f"/v0/tasks/{owner['task_id']}/workspace", + payload={"user_id": str(owner["user_id"])}, + ) + + def fail_refresh(**_kwargs): + raise gmail_module.GmailCredentialInvalidError( + f"gmail account {account_payload['account']['id']} refresh credentials were rejected" + ) + + monkeypatch.setattr(gmail_module, "refresh_gmail_access_token", fail_refresh) + monkeypatch.setattr( + gmail_module, + "fetch_gmail_message_raw_bytes", + lambda **_kwargs: (_ for _ in ()).throw( + AssertionError("fetch_gmail_message_raw_bytes should not be called") + ), + ) + + ingest_status, ingest_payload = invoke_request( + "POST", + f"/v0/gmail-accounts/{account_payload['account']['id']}/messages/msg-001/ingest", + payload={ + "user_id": str(owner["user_id"]), + "task_workspace_id": workspace_payload["workspace"]["id"], + }, + ) + + assert ingest_status == 409 + assert ingest_payload == { + "detail": ( + f"gmail account {account_payload['account']['id']} refresh credentials were rejected" + ) + } + artifact_file = ( + Path(workspace_payload["workspace"]["local_path"]) / "gmail" / "acct-owner-refresh-001" / "msg-001.eml" + ) + assert not artifact_file.exists() + + with user_connection(migrated_database_urls["app"], owner["user_id"]) as conn: + store = ContinuityStore(conn) + assert store.list_task_artifacts_for_task(owner["task_id"]) == [] + + def test_gmail_message_ingestion_endpoint_rejects_sanitized_path_collisions_without_overwrite( migrated_database_urls, monkeypatch, diff --git a/tests/integration/test_migrations.py b/tests/integration/test_migrations.py index 7e4817d..aba3133 100644 --- a/tests/integration/test_migrations.py +++ b/tests/integration/test_migrations.py @@ -347,6 +347,126 @@ def test_gmail_account_credentials_migration_round_trip_preserves_tokens(databas assert cur.fetchone() == (None,) +def test_gmail_refresh_token_lifecycle_migration_round_trip_preserves_downgrade_compatibility( + database_urls, +): + config = make_alembic_config(database_urls["admin"]) + user_id = "00000000-0000-0000-0000-000000000201" + gmail_account_id = "00000000-0000-0000-0000-000000000202" + + command.upgrade(config, "20260316_0027") + + with psycopg.connect(database_urls["admin"]) as conn: + with conn.cursor() as cur: + cur.execute( + """ + INSERT INTO users (id, email, display_name) + VALUES (%s, 'gmail-refresh@example.com', 'Gmail Refresh User') + """, + (user_id,), + ) + cur.execute( + """ + INSERT INTO gmail_accounts ( + id, + user_id, + provider_account_id, + email_address, + display_name, + scope + ) + VALUES ( + %s, + %s, + 'acct-refresh-001', + 'owner@gmail.example', + 'Owner', + 'https://www.googleapis.com/auth/gmail.readonly' + ) + """, + (gmail_account_id, user_id), + ) + cur.execute( + """ + INSERT INTO gmail_account_credentials ( + gmail_account_id, + user_id, + auth_kind, + credential_blob + ) + VALUES ( + %s, + %s, + 'oauth_access_token', + jsonb_build_object( + 'credential_kind', 'gmail_oauth_access_token_v1', + 'access_token', 'token-before-refresh-lifecycle' + ) + ) + """, + (gmail_account_id, user_id), + ) + conn.commit() + + command.upgrade(config, "20260316_0028") + + with psycopg.connect(database_urls["admin"]) as conn: + with conn.cursor() as cur: + cur.execute( + """ + UPDATE gmail_account_credentials + SET credential_blob = jsonb_build_object( + 'credential_kind', 'gmail_oauth_refresh_token_v2', + 'access_token', 'token-after-refresh', + 'refresh_token', 'refresh-001', + 'client_id', 'client-001', + 'client_secret', 'secret-001', + 'access_token_expires_at', '2030-01-01T00:05:00+00:00' + ) + WHERE gmail_account_id = %s + """, + (gmail_account_id,), + ) + cur.execute( + """ + SELECT + credential_blob ->> 'credential_kind', + credential_blob ->> 'access_token', + credential_blob ->> 'refresh_token' + FROM gmail_account_credentials + WHERE gmail_account_id = %s + """, + (gmail_account_id,), + ) + assert cur.fetchone() == ( + "gmail_oauth_refresh_token_v2", + "token-after-refresh", + "refresh-001", + ) + conn.commit() + + command.downgrade(config, "20260316_0027") + + with psycopg.connect(database_urls["admin"]) as conn: + with conn.cursor() as cur: + cur.execute( + """ + SELECT + credential_blob ->> 'credential_kind', + credential_blob ->> 'access_token', + credential_blob ? 'refresh_token' + FROM gmail_account_credentials + WHERE gmail_account_id = %s + """, + (gmail_account_id,), + ) + assert cur.fetchone() == ( + "gmail_oauth_access_token_v1", + "token-after-refresh", + False, + ) + + def test_migrations_upgrade_and_downgrade(database_urls): config = make_alembic_config(database_urls["admin"]) diff --git a/tests/unit/test_20260316_0028_gmail_refresh_token_lifecycle.py b/tests/unit/test_20260316_0028_gmail_refresh_token_lifecycle.py new file mode 100644 index 0000000..0c180c2 --- /dev/null +++ b/tests/unit/test_20260316_0028_gmail_refresh_token_lifecycle.py @@ -0,0 +1,40 @@ +from __future__ import annotations + +import importlib + + +MODULE_NAME = "apps.api.alembic.versions.20260316_0028_gmail_refresh_token_lifecycle" + + +def load_migration_module(): + return importlib.import_module(MODULE_NAME) + + +def test_upgrade_executes_expected_statements_in_order(monkeypatch) -> None: + module = load_migration_module() + executed: list[str] = [] + + monkeypatch.setattr(module.op, "execute", executed.append) + + module.upgrade() + + assert executed == list(module._UPGRADE_STATEMENTS) + + +def test_downgrade_executes_expected_statements_in_order(monkeypatch) -> None: + module = load_migration_module() + executed: list[str] = [] + + monkeypatch.setattr(module.op, "execute", executed.append) + + module.downgrade() + + assert executed == list(module._DOWNGRADE_STATEMENTS) + + +def test_gmail_account_credential_privileges_allow_runtime_updates() -> None: + module = load_migration_module() + + assert module._UPGRADE_STATEMENTS[-1] == ( + "GRANT UPDATE ON gmail_account_credentials TO alicebot_app" + ) diff --git a/tests/unit/test_gmail.py b/tests/unit/test_gmail.py index b84c945..f482807 100644 --- a/tests/unit/test_gmail.py +++ b/tests/unit/test_gmail.py @@ -9,6 +9,7 @@ from alicebot_api.artifacts import TaskArtifactAlreadyExistsError from alicebot_api.contracts import ( GMAIL_PROTECTED_CREDENTIAL_KIND, + GMAIL_REFRESHABLE_PROTECTED_CREDENTIAL_KIND, GMAIL_READONLY_SCOPE, GmailAccountConnectInput, GmailMessageIngestInput, @@ -18,6 +19,7 @@ GmailAccountNotFoundError, GmailCredentialInvalidError, GmailCredentialNotFoundError, + GmailCredentialValidationError, GmailMessageUnsupportedError, build_gmail_message_artifact_relative_path, build_gmail_protected_credential_blob, @@ -112,6 +114,24 @@ def get_gmail_account_credential_optional( self.operations.append(("get_gmail_account_credential_optional", gmail_account_id)) return self.gmail_account_credentials.get(gmail_account_id) + def update_gmail_account_credential( + self, + *, + gmail_account_id: UUID, + auth_kind: str, + credential_blob: dict[str, object], + ) -> dict[str, object]: + existing = self.gmail_account_credentials[gmail_account_id] + updated = { + **existing, + "auth_kind": auth_kind, + "credential_blob": credential_blob, + "updated_at": self.base_time + timedelta(hours=1), + } + self.gmail_account_credentials[gmail_account_id] = updated + self.operations.append(("update_gmail_account_credential", gmail_account_id)) + return updated + def get_gmail_account_by_provider_account_id_optional( self, provider_account_id: str, @@ -271,6 +291,67 @@ def test_create_gmail_account_record_persists_protected_credential_and_hides_sec assert store.operations == [("create_gmail_account_credential", account_id)] +def test_create_gmail_account_record_persists_refreshable_protected_credential_and_hides_secret() -> None: + store = GmailStoreStub() + user_id = uuid4() + expires_at = datetime(2030, 1, 1, 0, 0, tzinfo=UTC) + + response = create_gmail_account_record( + store, + user_id=user_id, + request=GmailAccountConnectInput( + provider_account_id="acct-refresh-001", + email_address="owner@example.com", + display_name="Owner", + scope=GMAIL_READONLY_SCOPE, + access_token="token-1", + refresh_token="refresh-1", + client_id="client-1", + client_secret="secret-1", + access_token_expires_at=expires_at, + ), + ) + + account_id = UUID(response["account"]["id"]) + assert response == { + "account": { + "id": str(account_id), + "provider": "gmail", + "auth_kind": "oauth_access_token", + "provider_account_id": "acct-refresh-001", + "email_address": "owner@example.com", + "display_name": "Owner", + "scope": GMAIL_READONLY_SCOPE, + "created_at": response["account"]["created_at"], + "updated_at": response["account"]["updated_at"], + } + } + assert store.gmail_account_credentials[account_id]["credential_blob"] == { + "credential_kind": GMAIL_REFRESHABLE_PROTECTED_CREDENTIAL_KIND, + "access_token": "token-1", + "refresh_token": "refresh-1", + "client_id": "client-1", + "client_secret": "secret-1", + "access_token_expires_at": expires_at.isoformat(), + } + assert store.operations == [("create_gmail_account_credential", account_id)] + + +def test_build_gmail_protected_credential_blob_rejects_partial_refresh_bundle() -> None: + with pytest.raises( + GmailCredentialValidationError, + match=( + "gmail refresh credentials must include refresh_token, client_id, client_secret, " + "and access_token_expires_at" + ), + ): + build_gmail_protected_credential_blob( + access_token="token-1", + refresh_token="refresh-1", + client_id="client-1", + ) + + def test_create_gmail_account_record_rejects_duplicate_provider_account_id() -> None: store = GmailStoreStub() user_id = uuid4() @@ -356,6 +437,76 @@ def test_resolve_gmail_access_token_rejects_missing_and_invalid_protected_creden resolve_gmail_access_token(store, gmail_account_id=account_id) +def test_resolve_gmail_access_token_renews_expired_refreshable_credential(monkeypatch) -> None: + store = GmailStoreStub() + expired_at = datetime(2020, 1, 1, 0, 0, tzinfo=UTC) + refreshed_at = datetime(2030, 1, 1, 0, 5, tzinfo=UTC) + account = create_gmail_account_record( + store, + user_id=uuid4(), + request=GmailAccountConnectInput( + provider_account_id="acct-001", + email_address="owner@example.com", + display_name="Owner", + scope=GMAIL_READONLY_SCOPE, + access_token="token-1", + refresh_token="refresh-1", + client_id="client-1", + client_secret="secret-1", + access_token_expires_at=expired_at, + ), + )["account"] + account_id = UUID(account["id"]) + + monkeypatch.setattr( + "alicebot_api.gmail.refresh_gmail_access_token", + lambda **_kwargs: ("token-2", refreshed_at), + ) + + assert resolve_gmail_access_token(store, gmail_account_id=account_id) == "token-2" + assert store.gmail_account_credentials[account_id]["credential_blob"] == { + "credential_kind": GMAIL_REFRESHABLE_PROTECTED_CREDENTIAL_KIND, + "access_token": "token-2", + "refresh_token": "refresh-1", + "client_id": "client-1", + "client_secret": "secret-1", + "access_token_expires_at": refreshed_at.isoformat(), + } + assert store.operations[-2:] == [ + ("get_gmail_account_credential_optional", account_id), + ("update_gmail_account_credential", account_id), + ] + + +def test_resolve_gmail_access_token_rejects_invalid_refreshable_protected_credentials() -> None: + store = GmailStoreStub() + account = create_gmail_account_record( + store, + user_id=uuid4(), + request=GmailAccountConnectInput( + provider_account_id="acct-001", + email_address="owner@example.com", + display_name="Owner", + scope=GMAIL_READONLY_SCOPE, + access_token="token-1", + ), + )["account"] + account_id = UUID(account["id"]) + store.gmail_account_credentials[account_id]["credential_blob"] = { + "credential_kind": GMAIL_REFRESHABLE_PROTECTED_CREDENTIAL_KIND, + "access_token": "token-1", + "client_id": "client-1", + "client_secret": "secret-1", + "access_token_expires_at": "2020-01-01T00:00:00+00:00", + } + + with pytest.raises( + GmailCredentialInvalidError, + match=f"gmail account {account_id} has invalid protected credentials", + ): + resolve_gmail_access_token(store, gmail_account_id=account_id) + + def test_ingest_gmail_message_record_writes_rfc822_artifact_and_reuses_artifact_seam( monkeypatch, tmp_path, @@ -480,6 +631,110 @@ def fake_ingest(_store, *, user_id: UUID, request): ] +def test_ingest_gmail_message_record_renews_expired_access_token_before_fetch( + monkeypatch, + tmp_path, +) -> None: + store = GmailStoreStub() + user_id = uuid4() + workspace_id = uuid4() + workspace = store.create_task_workspace( + task_workspace_id=workspace_id, + local_path=str((tmp_path / "workspace").resolve()), + ) + account = create_gmail_account_record( + store, + user_id=user_id, + request=GmailAccountConnectInput( + provider_account_id="acct-001", + email_address="owner@example.com", + display_name="Owner", + scope=GMAIL_READONLY_SCOPE, + access_token="token-expired", + refresh_token="refresh-1", + client_id="client-1", + client_secret="secret-1", + access_token_expires_at=datetime(2020, 1, 1, 0, 0, tzinfo=UTC), + ), + )["account"] + raw_bytes = _build_rfc822_email_bytes(plain_body="hello from gmail") + calls: dict[str, object] = {} + + monkeypatch.setattr( + "alicebot_api.gmail.refresh_gmail_access_token", + lambda **_kwargs: ("token-refreshed", datetime(2030, 1, 1, 0, 5, tzinfo=UTC)), + ) + + def fake_fetch(**kwargs): + calls["fetch_access_token"] = kwargs["access_token"] + return raw_bytes + + monkeypatch.setattr("alicebot_api.gmail.fetch_gmail_message_raw_bytes", fake_fetch) + + monkeypatch.setattr( + "alicebot_api.gmail.register_task_artifact_record", + lambda _store, *, user_id, request: { + "artifact": { + "id": "00000000-0000-0000-0000-000000000123", + "task_id": str(workspace["task_id"]), + "task_workspace_id": str(workspace_id), + "status": "registered", + "ingestion_status": "pending", + "relative_path": Path(request.local_path) + .relative_to(Path(workspace["local_path"])) + .as_posix(), + "media_type_hint": "message/rfc822", + "created_at": "2026-03-16T10:00:00+00:00", + "updated_at": "2026-03-16T10:00:00+00:00", + } + }, + ) + monkeypatch.setattr( + "alicebot_api.gmail.ingest_task_artifact_record", + lambda _store, *, user_id, request: { + "artifact": { + "id": "00000000-0000-0000-0000-000000000123", + "task_id": str(workspace["task_id"]), + "task_workspace_id": str(workspace_id), + "status": "registered", + "ingestion_status": "ingested", + "relative_path": build_gmail_message_artifact_relative_path( + provider_account_id="acct-001", + provider_message_id="msg-001", + ), + "media_type_hint": "message/rfc822", + "created_at": "2026-03-16T10:00:00+00:00", + "updated_at": "2026-03-16T10:00:01+00:00", + }, + "summary": { + "total_count": 1, + "total_characters": 16, + "media_type": "message/rfc822", + "chunking_rule": "normalized_utf8_text_fixed_window_1000_chars_v1", + "order": ["sequence_no_asc", "id_asc"], + }, + }, + ) + + response = ingest_gmail_message_record( + store, + user_id=user_id, + request=GmailMessageIngestInput( + gmail_account_id=UUID(account["id"]), + task_workspace_id=workspace_id, + provider_message_id="msg-001", + ), + ) + + assert response["message"]["artifact_relative_path"] == "gmail/acct-001/msg-001.eml" + assert calls["fetch_access_token"] == "token-refreshed" + assert store.operations[:3] == [ + ("create_gmail_account_credential", UUID(account["id"])), + ("get_gmail_account_credential_optional", UUID(account["id"])), + ("update_gmail_account_credential", UUID(account["id"])), + ] + + def test_ingest_gmail_message_record_rejects_unsupported_message(monkeypatch, tmp_path) -> None: store = GmailStoreStub() user_id = uuid4() @@ -688,9 +943,10 @@ def test_ingest_gmail_message_record_rejects_invalid_protected_credentials_befor ), )["account"] account_id = UUID(account["id"]) - store.gmail_account_credentials[account_id]["credential_blob"] = build_gmail_protected_credential_blob( - access_token="", - ) + store.gmail_account_credentials[account_id]["credential_blob"] = { + "credential_kind": GMAIL_PROTECTED_CREDENTIAL_KIND, + "access_token": "", + } def fail_fetch(**_kwargs): raise AssertionError("fetch_gmail_message_raw_bytes should not be called") diff --git a/tests/unit/test_gmail_main.py b/tests/unit/test_gmail_main.py index 18fc375..24adb3a 100644 --- a/tests/unit/test_gmail_main.py +++ b/tests/unit/test_gmail_main.py @@ -4,6 +4,7 @@ from contextlib import contextmanager from uuid import uuid4 +import pytest import apps.api.src.alicebot_api.main as main_module from apps.api.src.alicebot_api.config import Settings from alicebot_api.gmail import ( @@ -11,6 +12,8 @@ GmailAccountNotFoundError, GmailCredentialInvalidError, GmailCredentialNotFoundError, + GmailCredentialRefreshError, + GmailCredentialValidationError, GmailMessageFetchError, GmailMessageNotFoundError, GmailMessageUnsupportedError, @@ -77,6 +80,55 @@ def fake_create_gmail_account_record(*_args, **_kwargs): } +def test_connect_gmail_account_request_requires_complete_refresh_bundle() -> None: + with pytest.raises(ValueError, match="gmail refresh credentials must include refresh_token"): + main_module.ConnectGmailAccountRequest( + user_id=uuid4(), + provider_account_id="acct-001", + email_address="owner@example.com", + display_name="Owner", + access_token="token-1", + refresh_token="refresh-1", + ) + + +def test_connect_gmail_account_endpoint_maps_invalid_refresh_bundle_to_400(monkeypatch) -> None: + user_id = uuid4() + settings = Settings(database_url="postgresql://app") + + @contextmanager + def fake_user_connection(*_args, **_kwargs): + yield object() + + def fake_create_gmail_account_record(*_args, **_kwargs): + raise GmailCredentialValidationError( + "gmail refresh credentials must include refresh_token, client_id, client_secret, " + "and access_token_expires_at" + ) + + monkeypatch.setattr(main_module, "get_settings", lambda: settings) + monkeypatch.setattr(main_module, "user_connection", fake_user_connection) + monkeypatch.setattr(main_module, "create_gmail_account_record", fake_create_gmail_account_record) + + response = main_module.connect_gmail_account( + main_module.ConnectGmailAccountRequest( + user_id=user_id, + provider_account_id="acct-001", + email_address="owner@example.com", + display_name="Owner", + access_token="token-1", + ) + ) + + assert response.status_code == 400 + assert json.loads(response.body) == { + "detail": ( + "gmail refresh credentials must include refresh_token, client_id, client_secret, " + "and access_token_expires_at" + ) + } + + def test_get_gmail_account_endpoint_maps_not_found_to_404(monkeypatch) -> None: user_id = uuid4() gmail_account_id = uuid4() @@ -230,3 +282,22 @@ def fake_fetch_error(*_args, **_kwargs): assert json.loads(response.body) == { "detail": "gmail message msg-001 could not be fetched" } + + def fake_refresh_error(*_args, **_kwargs): + raise GmailCredentialRefreshError( + f"gmail account {gmail_account_id} access token could not be renewed" + ) + + monkeypatch.setattr(main_module, "ingest_gmail_message_record", fake_refresh_error) + response = main_module.ingest_gmail_message( + gmail_account_id, + "msg-001", + main_module.IngestGmailMessageRequest( + user_id=user_id, + task_workspace_id=task_workspace_id, + ), + ) + assert response.status_code == 502 + assert json.loads(response.body) == { + "detail": f"gmail account {gmail_account_id} access token could not be renewed" + } diff --git a/tests/unit/test_gmail_refresh.py b/tests/unit/test_gmail_refresh.py new file mode 100644 index 0000000..779e454 --- /dev/null +++ b/tests/unit/test_gmail_refresh.py @@ -0,0 +1,167 @@ +from __future__ import annotations + +import json +from datetime import UTC, datetime, timedelta +from io import BytesIO +from urllib.error import HTTPError, URLError +from urllib.parse import parse_qs +from uuid import uuid4 + +import pytest + +from alicebot_api.gmail import ( + GMAIL_TOKEN_REFRESH_TIMEOUT_SECONDS, + GMAIL_TOKEN_REFRESH_URL, + GmailCredentialInvalidError, + GmailCredentialRefreshError, + refresh_gmail_access_token, +) + + +class _FakeHTTPResponse: + def __init__(self, payload: bytes) -> None: + self._payload = payload + + def __enter__(self) -> _FakeHTTPResponse: + return self + + def __exit__(self, exc_type, exc, tb) -> bool: + return False + + def read(self) -> bytes: + return self._payload + + +def _make_http_error(status_code: int) -> HTTPError: + return HTTPError( + GMAIL_TOKEN_REFRESH_URL, + status_code, + "upstream error", + hdrs=None, + fp=BytesIO(b'{"error":"invalid_grant"}'), + ) + + +def test_refresh_gmail_access_token_posts_expected_payload_and_returns_expiry(monkeypatch) -> None: + gmail_account_id = uuid4() + seen: dict[str, object] = {} + + def fake_urlopen(request, timeout: int): + seen["url"] = request.full_url + seen["timeout"] = timeout + seen["content_type"] = request.headers["Content-type"] + seen["accept"] = request.headers["Accept"] + seen["body"] = parse_qs(request.data.decode("utf-8")) + return _FakeHTTPResponse( + json.dumps({"access_token": "token-refreshed", "expires_in": 3600}).encode("utf-8") + ) + + monkeypatch.setattr("alicebot_api.gmail.urlopen", fake_urlopen) + + started_at = datetime.now(UTC) + access_token, expires_at = refresh_gmail_access_token( + gmail_account_id=gmail_account_id, + refresh_token="refresh-001", + client_id="client-001", + client_secret="secret-001", + ) + finished_at = datetime.now(UTC) + + assert access_token == "token-refreshed" + assert started_at + timedelta(seconds=3590) <= expires_at <= finished_at + timedelta(seconds=3610) + assert seen == { + "url": GMAIL_TOKEN_REFRESH_URL, + "timeout": GMAIL_TOKEN_REFRESH_TIMEOUT_SECONDS, + "content_type": "application/x-www-form-urlencoded", + "accept": "application/json", + "body": { + "client_id": ["client-001"], + "client_secret": ["secret-001"], + "refresh_token": ["refresh-001"], + "grant_type": ["refresh_token"], + }, + } + + +@pytest.mark.parametrize("status_code", [400, 401]) +def test_refresh_gmail_access_token_maps_invalid_refresh_rejections_to_invalid_error( + monkeypatch, + status_code: int, +) -> None: + gmail_account_id = uuid4() + + def fake_urlopen(_request, timeout: int): + assert timeout == GMAIL_TOKEN_REFRESH_TIMEOUT_SECONDS + raise _make_http_error(status_code) + + monkeypatch.setattr("alicebot_api.gmail.urlopen", fake_urlopen) + + with pytest.raises( + GmailCredentialInvalidError, + match=f"gmail account {gmail_account_id} refresh credentials were rejected", + ): + refresh_gmail_access_token( + gmail_account_id=gmail_account_id, + refresh_token="refresh-001", + client_id="client-001", + client_secret="secret-001", + ) + + +def test_refresh_gmail_access_token_maps_non_deterministic_http_failure_to_refresh_error( + monkeypatch, +) -> None: + gmail_account_id = uuid4() + + def fake_urlopen(_request, timeout: int): + assert timeout == GMAIL_TOKEN_REFRESH_TIMEOUT_SECONDS + raise _make_http_error(500) + + monkeypatch.setattr("alicebot_api.gmail.urlopen", fake_urlopen) + + with pytest.raises( + GmailCredentialRefreshError, + match=f"gmail account {gmail_account_id} access token could not be renewed", + ): + refresh_gmail_access_token( + gmail_account_id=gmail_account_id, + refresh_token="refresh-001", + client_id="client-001", + client_secret="secret-001", + ) + + +@pytest.mark.parametrize( + ("response_payload", "error"), + [ + (b"not-json", None), + (json.dumps({"expires_in": 3600}).encode("utf-8"), None), + (None, URLError("network down")), + ], +) +def test_refresh_gmail_access_token_maps_malformed_or_transport_failures_to_refresh_error( + monkeypatch, + response_payload: bytes | None, + error: Exception | None, +) -> None: + gmail_account_id = uuid4() + + def fake_urlopen(_request, timeout: int): + assert timeout == GMAIL_TOKEN_REFRESH_TIMEOUT_SECONDS + if error is not None: + raise error + assert response_payload is not None + return _FakeHTTPResponse(response_payload) + + monkeypatch.setattr("alicebot_api.gmail.urlopen", fake_urlopen) + + with pytest.raises( + GmailCredentialRefreshError, + match=f"gmail account {gmail_account_id} access token could not be renewed", + ): + refresh_gmail_access_token( + gmail_account_id=gmail_account_id, + refresh_token="refresh-001", + client_id="client-001", + client_secret="secret-001", + )