Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 56 additions & 66 deletions .ai/active/SPRINT_PACKET.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,133 +2,123 @@

## Sprint Title

Sprint 5O: Read-Only Gmail Connection and Single-Message Ingestion V0
Sprint 5P: Gmail Credential Hardening

## Sprint Type

feature
repair

## Sprint Reason

Sprint 5N proved the Gmail-adjacent document seam locally by ingesting RFC822 email artifacts into the existing chunk substrate. The next safe step is the first live read-only Gmail slice, but only enough to connect one account and ingest one visible Gmail message through that same RFC822-to-chunk seam. This opens connector work without collapsing into search, sync, Calendar, UI, or write-capable behavior.
Sprint 5O successfully opened the first live read-only Gmail seam, but it left a known security debt: Gmail access tokens are still persisted in plaintext on `gmail_accounts`. Before any broader Gmail auth lifecycle, search, sync, Calendar, or UI work, that gap needs to be closed.

## Sprint Intent

Add the first live read-only Gmail connector seam by supporting user-scoped Gmail connection metadata plus ingestion of one selected Gmail message into the existing artifact-ingestion pipeline as an RFC822-style artifact, without adding write actions, background sync, Calendar, or UI.
Harden the narrow Gmail connector seam by replacing plaintext credential storage with an explicit protected credential mechanism and by removing credential exposure from normal account surfaces, without widening into Gmail search, sync, write actions, Calendar, or UI.

## Git Instructions

- Branch Name: `codex/sprint-5o-gmail-connection-single-message-ingestion`
- Branch Name: `codex/sprint-5p-gmail-credential-hardening`
- Base Branch: `main`
- PR Strategy: one sprint branch, one PR, no stacked PRs unless Control Tower explicitly opens a follow-up sprint
- Merge Policy: squash merge only after reviewer `PASS` and explicit Control Tower merge approval

## Why This Sprint

- Sprint 5A shipped deterministic rooted task-workspace provisioning.
- Sprint 5C through 5J shipped the durable artifact, chunk, lexical, semantic, and hybrid compile substrate.
- Sprint 5N shipped narrow local RFC822 parsing on that same artifact-ingestion seam.
- The product brief requires read-only Gmail connectors in v1.
- The narrowest safe connector step is not mailbox sync or UI; it is a user-scoped read-only Gmail connection plus one explicit message ingestion path that reuses the already-accepted RFC822 extraction seam.
- Sprint 5O shipped the first narrow read-only Gmail account and single-message ingestion seam.
- The accepted review explicitly calls out plaintext Gmail credential storage as acceptable only for that narrow prototype slice and recommends hardening before broader connector rollout.
- Connector security is the immediate risk, not more Gmail breadth.
- The narrowest safe next step is credential hardening only, keeping the current single-message Gmail ingestion seam otherwise intact.

## In Scope

- Add schema and migration support for:
- `gmail_accounts`
- Define typed contracts for:
- Gmail account create or connect requests
- Gmail account list and detail responses
- single-message Gmail ingestion requests
- single-message Gmail ingestion responses
- Implement a narrow Gmail connector seam that:
- stores one user-scoped read-only Gmail account connection record with only the metadata needed for later reads
- uses one explicit Gmail read-only auth/config path only
- fetches one selected Gmail message by explicit provider message id
- converts that message into the existing artifact registration plus RFC822-style ingestion pipeline
- persists the resulting artifact under one visible task workspace
- reuses the existing `task_artifacts` and `task_artifact_chunks` contracts
- preserves per-user isolation throughout account read and message ingestion flows
- Implement the minimal API or service paths needed for:
- connecting one Gmail account
- listing Gmail accounts
- reading one Gmail account
- ingesting one Gmail message into one visible task workspace
- Add schema and migration support only as needed to remove plaintext credential storage from `gmail_accounts`, for example:
- protected credential blob or reference fields
- optional token metadata fields if needed for the narrow hardened seam
- Define typed contract changes for:
- Gmail account connect requests if a hardened credential payload shape is required
- Gmail account responses with secrets removed
- any narrow Gmail ingestion error shape changes needed for hardened credential lookup failures
- Implement a narrow Gmail credential seam that:
- persists Gmail credentials through one explicit protected storage mechanism
- does not return secrets through account list or detail responses
- lets the existing single-message Gmail ingestion path resolve credentials through the hardened mechanism
- preserves deterministic account reads and per-user isolation
- Add unit and integration tests for:
- Gmail account persistence
- deterministic account listing
- single-message Gmail ingestion through the existing artifact seam
- rejection of cross-user workspace access
- rejection of unsupported or missing Gmail messages
- protected credential persistence
- absence of secret material in Gmail account responses
- successful single-message Gmail ingestion using the hardened credential path
- deterministic failure when required credentials are missing or invalid
- per-user isolation
- stable response shape

## Out of Scope

- No Gmail message search.
- No Gmail search.
- No mailbox sync or backfill jobs.
- No attachment ingestion.
- No write-capable Gmail actions.
- No Calendar connector scope.
- No OAuth UX or web callback UI beyond the minimal backend contract needed to represent a connected account.
- No OAuth UX or callback UI.
- No compile contract changes.
- No runner-style orchestration.
- No UI work.

## Required Deliverables

- Migration for `gmail_accounts`.
- Stable contracts for Gmail account connect/list/detail and single-message ingestion.
- Minimal read-only Gmail account persistence seam.
- Minimal explicit single-message Gmail ingestion path that feeds the existing artifact and RFC822 chunk seams.
- Unit and integration coverage for persistence, isolation, ingestion routing, and response stability.
- Migration removing plaintext credential storage from the live Gmail seam in favor of a protected credential mechanism.
- Stable Gmail account contracts that no longer expose secret material.
- Updated single-message Gmail ingestion path that resolves credentials through the hardened mechanism.
- Unit and integration coverage for credential protection, ingestion continuity, failure handling, and isolation.
- Updated `BUILD_REPORT.md` with exact verification results and explicit deferred scope.

## Acceptance Criteria

- A client can persist one user-scoped read-only Gmail account connection record.
- A client can list and read Gmail account records deterministically.
- A client can ingest one selected Gmail message into one visible task workspace through the existing artifact-ingestion seam.
- Gmail message ingestion results in durable `task_artifacts` and `task_artifact_chunks` rows compatible with existing retrieval and compile behavior.
- Cross-user account and workspace access is rejected deterministically.
- Gmail account records no longer persist plaintext access tokens in the normal application table surface.
- Gmail account list and detail responses do not expose secret material.
- The existing single-message Gmail ingestion path continues to work through the hardened credential mechanism.
- Missing or invalid credentials fail deterministically and do not corrupt task workspace or artifact state.
- `./.venv/bin/python -m pytest tests/unit` passes.
- `./.venv/bin/python -m pytest tests/integration` passes.
- No Gmail search, mailbox sync, attachments, write actions, Calendar, compile-contract, runner, or UI scope enters the sprint.
- No Gmail search, sync, attachments, write actions, Calendar, compile-contract, runner, or UI scope enters the sprint.

## Implementation Constraints

- Keep connector work narrow and boring.
- Reuse the existing rooted workspace, artifact, and RFC822 chunk seams rather than creating a parallel email-content store.
- Keep Gmail handling explicitly read-only.
- Support one explicit selected-message ingestion path only; do not introduce account-wide sync or search in the same sprint.
- Preserve existing retrieval and compile contracts by feeding the already-shipped chunk substrate.
- Keep the repair narrow and boring.
- Do not broaden the Gmail feature surface while fixing credential storage.
- Prefer one explicit protected credential mechanism over ad hoc masking or partial hiding.
- Preserve the current Gmail account and single-message ingestion seams as much as possible outside the security fix.
- If credential hardening needs one minimal rule added to `RULES.md`, keep it scoped to connector-secret handling only.

## Suggested Work Breakdown

1. Add `gmail_accounts` schema and migration.
2. Define Gmail account and single-message ingestion contracts.
3. Implement deterministic Gmail account create, list, and detail behavior.
4. Implement explicit selected-message Gmail ingestion into the existing artifact and RFC822 ingestion seam.
1. Add the schema/migration changes required for protected Gmail credential storage.
2. Update Gmail account contracts so secrets are accepted only on write and never returned on read.
3. Route single-message Gmail ingestion through the hardened credential lookup path.
4. Add deterministic failure handling for missing or invalid protected credentials.
5. Add unit and integration tests.
6. Update `BUILD_REPORT.md` with executed verification.

## Build Report Requirements

`BUILD_REPORT.md` must include:
- the exact Gmail account and single-message ingestion contract changes introduced
- the Gmail message-to-artifact conversion rule used
- the exact Gmail credential contract and schema changes introduced
- the protected credential storage mechanism used
- exact commands run
- unit and integration test results
- one example Gmail account response
- one example single-message ingestion response
- one example Gmail account response proving secret removal
- one example Gmail ingestion response through the hardened path
- what remains intentionally deferred to later milestones

## Review Focus

`REVIEW_REPORT.md` should verify:
- the sprint stayed limited to read-only Gmail connection metadata and single-message ingestion
- Gmail message ingestion reuses the existing rooted workspace, artifact, and RFC822 chunk seams
- persistence, isolation, and ingestion determinism are test-backed
- no hidden Gmail search, mailbox sync, attachments, write actions, Calendar, compile-contract, runner, or UI scope entered the sprint
- the sprint stayed limited to Gmail credential hardening
- plaintext credential persistence is removed from the normal `gmail_accounts` surface
- Gmail account reads no longer expose secrets
- the existing single-message Gmail ingestion path still works through the hardened seam
- no hidden Gmail search, sync, attachments, write actions, Calendar, compile-contract, runner, or UI scope entered the sprint

## Exit Condition

This sprint is complete when the repo can persist deterministic user-scoped read-only Gmail account records and ingest one selected Gmail message into the existing artifact/chunk seam with passing Postgres-backed tests, while still deferring broader Gmail connector behavior, Calendar, and UI.
This sprint is complete when the repo no longer stores plaintext Gmail credentials in the normal application table surface, the existing read-only single-message Gmail ingestion seam still works through the hardened credential path, and the full path is verified with Postgres-backed tests while broader Gmail and Calendar connector work remains deferred.
Loading