diff --git a/.ai/active/SPRINT_PACKET.md b/.ai/active/SPRINT_PACKET.md index e32ab02..6d35e69 100644 --- a/.ai/active/SPRINT_PACKET.md +++ b/.ai/active/SPRINT_PACKET.md @@ -2,139 +2,111 @@ ## Sprint Title -Sprint 5J: Deterministic Hybrid Artifact Merge in Context Compilation +Sprint 5K: Project Truth Synchronization After Hybrid Artifact Compile ## Sprint Type -feature +refactor ## Sprint Reason -Milestone 5 now has compile-path lexical artifact retrieval and compile-path semantic artifact retrieval as separate sections. The next safe step is to merge those two existing artifact candidate paths into one governed compile-time artifact section with explicit deterministic deduplication and provenance rules, while still deferring reranking, connectors, and UI. +Sprint 5J is implemented and the feature path is still correct, but the live truth artifacts are materially stale. `ARCHITECTURE.md` still describes the repo as current through Sprint 5H, and `ROADMAP.md` still says current through Sprint 5A. Before opening richer document parsing, read-only connectors, or any UI work, Control Tower needs the architecture and roadmap truth reset to the accepted repo state. ## Sprint Intent -Complete the first hybrid artifact retrieval slice by merging the already-implemented lexical and semantic artifact chunk candidate sets inside `POST /v0/context/compile`, using explicit deterministic deduplication and selection rules without adding reranking or model-driven embedding generation. +Synchronize the live truth artifacts with the implemented and review-passed repo state through Sprint 5J, so future planning, handoff, and review work all start from accurate architecture, roadmap, and current-state documents. ## Git Instructions -- Branch Name: `codex/sprint-5j-hybrid-artifact-merge` +- Branch Name: `codex/sprint-5k-project-truth-sync` - Base Branch: `main` - PR Strategy: one sprint branch, one PR, no stacked PRs unless Control Tower explicitly opens a follow-up sprint - Merge Policy: squash merge only after reviewer `PASS` and explicit Control Tower merge approval ## Why This Sprint -- Sprint 5A shipped deterministic rooted task-workspace provisioning. -- Sprint 5C shipped explicit task-artifact registration. -- Sprint 5D shipped deterministic local artifact ingestion into durable chunk rows. -- Sprint 5E shipped lexical artifact-chunk retrieval. -- Sprint 5F shipped compile-path lexical artifact chunk inclusion. -- Sprint 5G shipped durable artifact-chunk embedding persistence. -- Sprint 5H shipped direct semantic artifact retrieval. -- Sprint 5I shipped compile-path semantic artifact retrieval as a separate context section. -- The next narrow Milestone 5 seam is deterministic hybrid artifact merge, so compile can return one governed artifact section before any reranking or richer document behavior is introduced. +- Sprint 5I shipped compile-path semantic artifact retrieval. +- Sprint 5J shipped deterministic hybrid lexical-plus-semantic artifact merge in compile. +- `ARCHITECTURE.md` is still describing the accepted repo slice through Sprint 5H and still treats compile-path semantic artifact use and hybrid artifact retrieval as deferred. +- `ROADMAP.md` still says the accepted repo state is current through Sprint 5A. +- Planning from stale truth at this point would increase scope drift risk just before richer document parsing and connector work. ## In Scope -- Define typed contracts for: - - hybrid artifact chunk items in the compiled context pack - - source provenance metadata for each included artifact chunk - - hybrid artifact retrieval summary metadata - - hybrid artifact retrieval trace payloads -- Extend the compile path so it can: - - gather the existing lexical artifact chunk candidates - - optionally gather the existing semantic artifact chunk candidates - - merge both candidate sets into one artifact section - - deduplicate by durable artifact chunk identity - - preserve source provenance when a chunk is selected by both paths - - apply explicit deterministic selection rules and limits - - record hybrid include/exclude and deduplication decisions in `trace_events` -- Define explicit merge behavior, for example: - - lexical-first precedence when both sources compete for the same limit budget - - stable tie-breaking after source precedence - - predictable handling when a chunk appears in both candidate sets -- Ensure compile behavior: - - excludes non-ingested artifacts - - scopes strictly by user ownership - - remains deterministic for the same stored data and inputs - - leaves memory, entity, and non-artifact sections unchanged -- Add unit and integration tests for: - - deterministic merge ordering - - deduplication behavior - - dual-source provenance behavior - - limit enforcement across merged artifact candidates - - exclusion of non-ingested artifacts - - per-user isolation through the compile path - - response-shape stability for the merged artifact section +- Audit the accepted implemented slice from the repo and passed sprint reports through Sprint 5J. +- Update `ARCHITECTURE.md` so it accurately describes the implemented seams through: + - compile-path semantic artifact retrieval + - deterministic hybrid lexical-plus-semantic artifact merge in compile + - current artifact chunk contracts and retrieval boundaries +- Update `ROADMAP.md` so: + - completed/current milestone state reflects the accepted repo state through Sprint 5J + - the next delivery focus is framed from the actual shipped artifact retrieval baseline + - stale “current position” language is corrected +- Update `.ai/handoff/CURRENT_STATE.md` so: + - implemented areas and risks reflect the repo through Sprint 5J + - the current milestone position is correct + - the immediate next move matches the next narrow sprint boundary after truth sync +- Update `BUILD_REPORT.md` with the truth-sync evidence and exact files corrected. ## Out of Scope -- No reranking across merged artifact candidates. -- No weighted or learned fusion logic. -- No model or external API calls to generate query embeddings. -- No richer document parsing beyond the already-shipped local text ingestion seam. +- No schema changes. +- No API changes. +- No runtime code changes. +- No richer document parsing. - No Gmail or Calendar connector scope. - No runner-style orchestration. - No UI work. ## Required Deliverables -- Stable compile-response contract updates for merged hybrid artifact output. -- Deterministic hybrid merge logic over the existing lexical and semantic artifact retrieval paths. -- Trace coverage for merge, deduplication, and exclusion decisions. -- Unit and integration coverage for hybrid artifact behavior, ordering, validation, and isolation. -- Updated `BUILD_REPORT.md` with exact verification results and explicit deferred scope. +- Updated `ARCHITECTURE.md` aligned to the implemented repo state through Sprint 5J. +- Updated `ROADMAP.md` with correct completed/current/next milestone sequencing. +- Updated `.ai/handoff/CURRENT_STATE.md` reflecting the actual shipped state and immediate next move. +- Updated `BUILD_REPORT.md` describing exactly which truth artifacts were synchronized and what evidence was used. ## Acceptance Criteria -- `POST /v0/context/compile` can return one merged artifact section derived from the existing lexical and semantic artifact retrieval paths. -- The merged section deduplicates artifact chunks by durable identity and preserves source provenance. -- Merge behavior uses explicit deterministic rules and limits. -- Non-ingested artifacts are excluded from the merged section. -- Hybrid artifact merge decisions are persisted in `trace_events`. -- Result ordering is deterministic for the same stored data and inputs. -- `./.venv/bin/python -m pytest tests/unit` passes. -- `./.venv/bin/python -m pytest tests/integration` passes. -- No reranking, connector, runner, UI, or broader side-effect scope enters the sprint. +- `ARCHITECTURE.md` describes compile-path semantic artifact retrieval and hybrid artifact compile merge as implemented behavior, not deferred work. +- `ROADMAP.md` no longer claims the repo is current only through Sprint 5A. +- `.ai/handoff/CURRENT_STATE.md` no longer describes the repo as current only through Sprint 5D. +- Truth artifacts clearly distinguish between implemented behavior and later planned work. +- No runtime, schema, API, connector, runner, or UI changes appear in the sprint diff. ## Implementation Constraints -- Keep hybrid artifact behavior narrow and boring. -- Reuse the existing lexical and semantic artifact retrieval seams rather than introducing a third retrieval stack. -- Make source precedence explicit in contracts, code, and tests. -- Do not introduce weighted scoring, reranking, or learned fusion in this sprint. -- Keep memory, entity, and non-artifact sections unchanged. +- Keep this sprint documentation-only and boring. +- Use accepted repo state and passed sprint reports as evidence, not aspiration. +- Prefer explicit “implemented now” versus “planned later” boundaries. +- If a truth artifact cannot be updated confidently from accepted evidence, narrow the statement rather than guessing. +- Do not widen into product changes just because the architecture text is stale. ## Suggested Work Breakdown -1. Define hybrid artifact output and trace contracts. -2. Implement deterministic merge and deduplication over existing lexical and semantic artifact candidates. -3. Add source provenance and hybrid summary metadata. -4. Record hybrid merge decisions in `trace_events`. -5. Preserve existing retrieval seams while returning one merged artifact section. -6. Add unit and integration tests. -7. Update `BUILD_REPORT.md` with executed verification. +1. Audit the implemented repo state and accepted sprint reports through Sprint 5J. +2. Update `ARCHITECTURE.md` to reflect the current shipped seams and boundaries. +3. Update `ROADMAP.md` to reflect actual completed and current milestone state. +4. Update `.ai/handoff/CURRENT_STATE.md` to reflect actual current state and the immediate next move. +5. Update `BUILD_REPORT.md` with exact truth-sync evidence and scope confirmation. ## Build Report Requirements `BUILD_REPORT.md` must include: -- the exact hybrid artifact merge contract changes introduced -- the merge precedence and deduplication rule used -- exact commands run -- unit and integration test results -- one example compile request and response showing merged artifact output -- one example of hybrid artifact retrieval trace events inside one compile run -- what remains intentionally deferred to later milestones +- exactly which truth artifacts were updated +- which accepted reports or repo evidence were used +- the specific stale statements that were corrected +- confirmation that no runtime or schema changes were made +- what remains intentionally deferred after truth synchronization ## Review Focus `REVIEW_REPORT.md` should verify: -- the sprint stayed limited to deterministic hybrid artifact merge in the compile path -- hybrid artifact behavior reuses the existing lexical and semantic retrieval seams -- merge ordering, deduplication, provenance, exclusion rules, trace visibility, and isolation are deterministic and test-backed -- no hidden reranking, connector, runner, UI, or broader side-effect scope entered the sprint +- the sprint stayed documentation-only +- `ARCHITECTURE.md`, `ROADMAP.md`, and `.ai/handoff/CURRENT_STATE.md` now match the implemented repo state through Sprint 5J +- compile-path semantic artifact retrieval and hybrid artifact merge are documented accurately +- milestone sequencing is truthful and current +- no hidden runtime, schema, API, connector, runner, or UI scope entered the sprint ## Exit Condition -This sprint is complete when the repo can return one deterministic merged artifact section in `POST /v0/context/compile`, derived from the existing lexical and semantic artifact retrieval paths with trace-visible merge decisions and passing Postgres-backed tests, while still deferring reranking, connectors, and UI. +This sprint is complete when the project truth artifacts accurately describe the implemented repo state through Sprint 5J and future planning can proceed from synchronized architecture, roadmap, and current-state documents. diff --git a/.ai/handoff/CURRENT_STATE.md b/.ai/handoff/CURRENT_STATE.md index 680e8af..319fc38 100644 --- a/.ai/handoff/CURRENT_STATE.md +++ b/.ai/handoff/CURRENT_STATE.md @@ -2,46 +2,48 @@ ## Canonical Truth -- The working repo state is current through Sprint 5D, including post-review follow-up fixes for artifact-ingestion coverage and stale docs. +- The working repo state is current through Sprint 5J, including compile-path semantic artifact retrieval and deterministic hybrid lexical-plus-semantic artifact merge in compile. - Use [PRODUCT_BRIEF.md](/Users/samirusani/Desktop/Codex/AliceBot/PRODUCT_BRIEF.md) for product scope, [ARCHITECTURE.md](/Users/samirusani/Desktop/Codex/AliceBot/ARCHITECTURE.md) for implemented technical boundaries, [ROADMAP.md](/Users/samirusani/Desktop/Codex/AliceBot/ROADMAP.md) for forward planning, and [RULES.md](/Users/samirusani/Desktop/Codex/AliceBot/RULES.md) for durable operating rules. -- Historical build and review reports have been moved under [docs/archive/sprints](/Users/samirusani/Desktop/Codex/AliceBot/docs/archive/sprints). +- Historical build and review reports remain the source of sprint-by-sprint detail; the active handoff should stay compact and current. ## Implemented Repo Slice -- `apps/api` is the only shipped product surface. It implements continuity, tracing, deterministic context compilation, governed memory admission and review, embeddings, semantic retrieval, entities, policy and tool governance, approval persistence and resolution, approved-only `proxy.echo` execution, execution budgets, task/task-step lifecycle reads and mutations, explicit manual continuation lineage, explicit task-step linkage for approval and execution synchronization, deterministic rooted local task-workspace provisioning, explicit task-artifact registration, and narrow local text-artifact ingestion into durable chunk rows. -- The live schema includes continuity, trace, memory, embedding, entity, governance, `tasks`, `task_steps`, `task_workspaces`, `task_artifacts`, and `task_artifact_chunks` tables with row-level security on user-owned data. +- `apps/api` is the only shipped product surface. It implements continuity, tracing, deterministic context compilation, governed memory admission and review, embeddings, semantic retrieval, entities, policy and tool governance, approval persistence and resolution, approved-only `proxy.echo` execution, execution budgets, task/task-step lifecycle reads and mutations, explicit manual continuation lineage, explicit task-step linkage for approval and execution synchronization, deterministic rooted local task-workspace provisioning, explicit task-artifact registration, narrow local text-artifact ingestion into durable chunk rows, artifact-chunk embeddings, direct lexical and semantic artifact retrieval, compile-path semantic artifact retrieval, and deterministic hybrid lexical-plus-semantic artifact merge inside the compile response. +- The live schema includes continuity, trace, memory, embedding, entity, governance, `tasks`, `task_steps`, `task_workspaces`, `task_artifacts`, `task_artifact_chunks`, and `task_artifact_chunk_embeddings` tables with row-level security on user-owned data. - `apps/web` and `workers` remain starter scaffolds only. ## Current Boundaries - Task workspaces are implemented only as deterministic rooted local directories plus durable `task_workspaces` records. - Task artifacts are implemented only as explicit rooted local-file registrations under those workspaces plus narrow deterministic ingestion for `text/plain` and `text/markdown`. +- Artifact retrieval operates only over persisted chunk rows and persisted chunk embeddings for one visible task or one visible artifact at a time; compile does not read raw files directly. +- Compile can now include artifact chunks from lexical retrieval, semantic retrieval, or a deterministic hybrid merge of both into one artifact section with explicit per-chunk source provenance. - The shipped multi-step task path is still explicit and narrow: later steps are appended manually with lineage, while approval and execution synchronization use explicit linked `task_step_id` references. - The only execution handler in the repo is the in-process no-external-I/O `proxy.echo` path. ## Not Implemented -- Retrieval, ranking, or embeddings over artifact chunks. - Rich document parsing beyond the narrow local text ingestion seam. - Read-only Gmail or Calendar connectors. - Runner-style orchestration or automatic multi-step progression. +- Artifact reranking or weighted fusion beyond the current lexical-first hybrid compile merge. - Auth beyond the current database user-context model. ## Active Risks - Memory extraction and retrieval quality remain the main product risk. - Auth is still incomplete beyond database user context. -- Workspace provisioning and artifact ingestion are intentionally narrow and local; broader retrieval, embedding, connector, and rich-document flows still need their own accepted seams. +- Artifact ingestion and retrieval are intentionally narrow and local; richer document parsing, connectors, and any retrieval changes beyond the shipped hybrid compile contract still need their own accepted seams. -## Latest Local Verification +## Latest Accepted Verification -- Latest review artifact: `PASS WITH FIXES`. -- Post-review local verification on March 14, 2026: - - `./.venv/bin/python -m pytest tests/unit` -> `347 passed` - - `./.venv/bin/python -m pytest tests/integration` -> `104 passed` +- Latest accepted runtime verification totals for the shipped Sprint 5J seams were: + - `./.venv/bin/python -m pytest tests/unit` -> `380 passed` + - `./.venv/bin/python -m pytest tests/integration` -> `118 passed` +- Sprint 5K is documentation-only truth synchronization; it does not change runtime, schema, or API behavior. ## Planning Guardrails -- Plan from the implemented Sprint 5D repo state, not from older milestone narratives. -- Do not describe retrieval, embeddings, connectors, runner work, or broader rich-document handling as shipped. -- Keep live truth files compact; archive historical detail instead of re-expanding the active context set. +- Plan from the implemented Sprint 5J repo state, not from older milestone narratives. +- Do not describe richer document parsing, connectors, runner work, UI work, or artifact reranking beyond the current lexical-first hybrid compile merge as shipped. +- The immediate next move after this truth-sync sprint is a narrow richer-document-parsing sprint that builds on the existing rooted workspace, durable chunk, and shipped hybrid artifact compile baseline. diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 0d939f7..168429a 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -2,16 +2,16 @@ ## Current Implemented Slice -AliceBot now implements the accepted repo slice through Sprint 5H. The shipped backend includes: +AliceBot now implements the accepted repo slice through Sprint 5J. The shipped backend includes: - foundation continuity storage over `users`, `threads`, `sessions`, and append-only `events` - deterministic tracing and context compilation over durable continuity, memory, entity, and entity-edge records - governed memory admission, explicit-preference extraction, memory review labels, review queue reads, evaluation summary reads, explicit embedding config and memory-embedding storage, direct semantic retrieval, and deterministic hybrid compile-path memory merge - deterministic prompt assembly and one no-tools response path that persists assistant replies as immutable continuity events - user-scoped consents, policies, policy evaluation, tool registry, allowlist evaluation, tool routing, approval request persistence, approval resolution, approved-only proxy execution through the in-process `proxy.echo` handler, durable execution review, and execution-budget lifecycle plus enforcement -- durable `tasks`, `task_steps`, `task_workspaces`, `task_artifacts`, `task_artifact_chunks`, and `task_artifact_chunk_embeddings`, deterministic task-step sequencing, explicit task-step transitions, explicit manual continuation with lineage through `parent_step_id`, `source_approval_id`, and `source_execution_id`, explicit `tool_executions.task_step_id` linkage for execution synchronization, deterministic rooted local task-workspace provisioning, explicit rooted local artifact registration, deterministic local text-artifact ingestion into durable chunk rows, deterministic lexical artifact-chunk retrieval over durable chunk rows, optional compile-path artifact chunk inclusion as a separate context section, explicit user-scoped artifact-chunk embedding persistence tied to existing embedding configs, and explicit task-scoped or artifact-scoped semantic artifact-chunk retrieval over those durable embeddings +- durable `tasks`, `task_steps`, `task_workspaces`, `task_artifacts`, `task_artifact_chunks`, and `task_artifact_chunk_embeddings`, deterministic task-step sequencing, explicit task-step transitions, explicit manual continuation with lineage through `parent_step_id`, `source_approval_id`, and `source_execution_id`, explicit `tool_executions.task_step_id` linkage for execution synchronization, deterministic rooted local task-workspace provisioning, explicit rooted local artifact registration, deterministic local text-artifact ingestion into durable chunk rows, deterministic lexical artifact-chunk retrieval over durable chunk rows, explicit user-scoped artifact-chunk embedding persistence tied to existing embedding configs, explicit task-scoped or artifact-scoped semantic artifact-chunk retrieval over those durable embeddings, and compile-path artifact retrieval that can include lexical results, semantic results, or one deterministic hybrid lexical-plus-semantic merged artifact section with per-chunk source provenance -The current multi-step boundary is narrow and explicit. Manual continuation is implemented and review-passed. Approval resolution and proxy execution now both use explicit task-step linkage rather than first-step inference. Task workspaces are now implemented only as deterministic rooted local boundaries, and task artifacts are now implemented only as explicit rooted local-file registrations, narrow deterministic text ingestion under those workspaces, lexical retrieval over persisted chunk rows, optional compile-path inclusion of retrieved artifact chunks in a separate response section, explicit artifact-chunk embedding storage tied to existing embedding configs, and direct semantic retrieval over those durable artifact-chunk embeddings for one visible task or one visible artifact at a time. Broader runner-style orchestration, automatic multi-step progression, compile-path semantic artifact use, hybrid artifact retrieval, richer document parsing, connectors, and new side-effect surfaces are still planned later and must not be described as live behavior. +The current multi-step boundary is narrow and explicit. Manual continuation is implemented and review-passed. Approval resolution and proxy execution now both use explicit task-step linkage rather than first-step inference. Task workspaces are now implemented only as deterministic rooted local boundaries, and task artifacts are now implemented only as explicit rooted local-file registrations, narrow deterministic text ingestion under those workspaces, lexical retrieval over persisted chunk rows, explicit artifact-chunk embedding storage tied to existing embedding configs, direct semantic retrieval over those durable artifact-chunk embeddings for one visible task or one visible artifact at a time, and compile-path artifact retrieval that deterministically merges lexical and semantic candidates into one artifact section when both are requested for the same scope. Broader runner-style orchestration, automatic multi-step progression, richer document parsing, connectors, reranking beyond the current lexical-first hybrid merge, and new side-effect surfaces are still planned later and must not be described as live behavior. ## Implemented Now @@ -58,11 +58,11 @@ The current multi-step boundary is narrow and explicit. Manual continuation is i ### Repo Boundaries In This Slice -- `apps/api`: implemented API, store, contracts, service logic, and migrations for continuity, tracing, memory, embeddings, entities, policies, tools, approvals, proxy execution, execution budgets, tasks, task steps, task workspaces, task artifacts, artifact-chunk embeddings, deterministic lexical artifact chunk retrieval, deterministic semantic artifact chunk retrieval over durable embeddings, and narrow compile-path artifact chunk inclusion. +- `apps/api`: implemented API, store, contracts, service logic, and migrations for continuity, tracing, memory, embeddings, entities, policies, tools, approvals, proxy execution, execution budgets, tasks, task steps, task workspaces, task artifacts, artifact-chunk embeddings, deterministic lexical artifact chunk retrieval, deterministic semantic artifact chunk retrieval over durable embeddings, compile-path semantic artifact retrieval, and deterministic hybrid lexical-plus-semantic artifact merge in compile. - `apps/web`: minimal shell only; no shipped workflow UI. - `workers`: scaffold only; no background jobs or runner logic are implemented. - `infra`: local development bootstrap assets only. -- `tests`: unit and Postgres-backed integration coverage for the shipped seams above, including Sprint 4O task-step lineage/manual continuation, Sprint 4S step-linked execution synchronization, Sprint 5A task-workspace provisioning, Sprint 5C task-artifact registration, Sprint 5D local artifact ingestion plus chunk reads, Sprint 5E lexical artifact-chunk retrieval, Sprint 5F compile-path artifact chunk integration, Sprint 5G artifact-chunk embedding persistence and reads, and Sprint 5H semantic artifact-chunk retrieval. +- `tests`: unit and Postgres-backed integration coverage for the shipped seams above, including Sprint 4O task-step lineage/manual continuation, Sprint 4S step-linked execution synchronization, Sprint 5A task-workspace provisioning, Sprint 5C task-artifact registration, Sprint 5D local artifact ingestion plus chunk reads, Sprint 5E lexical artifact-chunk retrieval, Sprint 5F compile-path artifact chunk integration, Sprint 5G artifact-chunk embedding persistence and reads, Sprint 5H direct semantic artifact-chunk retrieval, Sprint 5I compile-path semantic artifact retrieval, and Sprint 5J deterministic hybrid lexical-plus-semantic artifact merge in compile. ## Core Flows Implemented Now @@ -71,10 +71,12 @@ The current multi-step boundary is narrow and explicit. Manual continuation is i 1. Accept a user-scoped `POST /v0/context/compile` request. 2. Read durable continuity records in deterministic order. 3. Merge in active memories, entities, and entity edges through the currently shipped symbolic and optional semantic retrieval paths. -4. Optionally retrieve artifact chunks through the existing lexical artifact-chunk retrieval seam, scoped to exactly one visible task or one visible artifact per request. -5. Keep retrieved artifact chunks separate from memory and entity sections, with deterministic per-section limits and ordering. -6. Persist a `context.compile` trace plus explicit inclusion and exclusion events, including artifact chunk include/exclude decisions. -7. Return one deterministic `context_pack` describing scope, limits, selected context, artifact chunk results, and trace metadata. +4. Optionally retrieve artifact chunks through lexical retrieval, semantic retrieval, or both, scoped to exactly one visible task or one visible artifact per request. +5. Reuse only persisted `task_artifact_chunks` rows and persisted artifact-chunk embeddings during compile; compile does not read raw files. +6. When both artifact retrieval modes are present for the same scope, merge candidates by durable chunk id into one `artifact_chunks` section, preserve lexical match and semantic score provenance, and apply deterministic lexical-first source precedence. +7. Keep retrieved artifact chunks separate from memory and entity sections, with deterministic per-section limits, ordering, and summary metadata. +8. Persist a `context.compile` trace plus explicit inclusion and exclusion events, including artifact chunk deduplication, inclusion, and exclusion decisions. +9. Return one deterministic `context_pack` describing scope, limits, selected context, artifact chunk results, and trace metadata. ### Artifact Chunk Retrieval @@ -83,7 +85,10 @@ The current multi-step boundary is narrow and explicit. Manual continuation is i 3. Support deterministic lexical artifact-chunk retrieval for one visible task or one visible artifact. 4. Support deterministic semantic artifact-chunk retrieval for one visible task or one visible artifact, using a caller-supplied query vector plus explicit `embedding_config_id`. 5. Exclude artifacts whose `ingestion_status` is not `ingested`. -6. Keep compile-path artifact retrieval separate and lexical-only for now; semantic artifact retrieval remains a direct read seam outside compile. +6. Reuse those same persisted lexical and semantic retrieval seams inside compile for one visible task or one visible artifact. +7. When compile receives both lexical and semantic artifact retrieval for the same scope, deduplicate by durable chunk id, preserve per-chunk source provenance, and count dual-source inclusions explicitly. +8. Order hybrid compile candidates deterministically by source precedence, lexical rank, semantic rank, `relative_path`, `sequence_no`, and `id`. +9. Return stable summary metadata covering scope, query terms, embedding config, query-vector dimensions, candidate counts, deduplication counts, inclusion counts, exclusion counts, and ordering rules. ### Governed Memory And Retrieval @@ -236,8 +241,8 @@ The current multi-step boundary is narrow and explicit. Manual continuation is i ## Testing Coverage Implemented Now -- Unit and integration tests cover continuity, compiler, response generation, memory admission, review labels, review queue, embeddings, semantic retrieval, artifact semantic retrieval, entities, policies, tools, approvals, proxy execution, execution budgets, and execution review. -- Sprint 4O, Sprint 4S, Sprint 5A, and Sprint 5C added explicit task lifecycle coverage: +- Unit and integration tests cover continuity, compiler, response generation, memory admission, review labels, review queue, embeddings, semantic retrieval, compile-path hybrid memory retrieval, artifact lexical retrieval, artifact semantic retrieval, compile-path semantic artifact retrieval, hybrid artifact compile merge, entities, policies, tools, approvals, proxy execution, execution budgets, and execution review. +- Sprints 4O through 5J added explicit task lifecycle and artifact retrieval coverage: - migrations for `tasks`, `task_steps`, and task-step lineage - staged/backfilled migration coverage for `tool_executions.task_step_id` - task and task-step store contracts @@ -259,6 +264,10 @@ The current multi-step boundary is narrow and explicit. Manual continuation is i - idempotent re-ingestion of already ingested artifacts - deterministic lexical artifact-chunk retrieval by task and by artifact - compile-path artifact chunk inclusion, exclusion, ordering, and per-user isolation + - artifact-chunk embedding write and read coverage + - direct semantic artifact-chunk retrieval by task and by artifact + - compile-path semantic artifact retrieval including trace visibility, exclusion rules, and scope isolation + - deterministic hybrid artifact compile merge with dual-source provenance, deduplication, lexical-first precedence, and shared limit enforcement - task-artifact and task-artifact-chunk per-user isolation - trace visibility for continuation and transition events - user isolation for task and task-step reads and mutations @@ -269,7 +278,7 @@ The current multi-step boundary is narrow and explicit. Manual continuation is i The following areas remain planned later and must not be described as implemented: - runner-style orchestration and automatic multi-step progression beyond the current explicit manual continuation seam -- hybrid lexical plus semantic artifact retrieval, compile-path semantic artifact use, and reranking beyond the current direct lexical and direct semantic ordering seams +- artifact reranking, weighted fusion, or precedence changes beyond the current lexical-first hybrid compile merge and direct lexical/direct semantic ordering seams - rich document parsing beyond the current narrow UTF-8 text and markdown ingestion boundary - read-only Gmail and Calendar connectors - broader tool proxying and real-world side effects beyond the current no-I/O `proxy.echo` handler diff --git a/BUILD_REPORT.md b/BUILD_REPORT.md index b8ce9ba..f85f207 100644 --- a/BUILD_REPORT.md +++ b/BUILD_REPORT.md @@ -2,248 +2,76 @@ ## sprint objective -Implement Sprint 5J: merge the existing lexical and semantic artifact chunk retrieval paths inside `POST /v0/context/compile` into one deterministic hybrid artifact section with explicit provenance, deduplication, limits, and trace visibility. +Synchronize the live truth artifacts with the accepted implemented repo state through Sprint 5J so architecture, roadmap, and current-state planning all start from the shipped compile and artifact-retrieval baseline. ## completed work -- Replaced the compile-time split artifact output with one merged `context_pack.artifact_chunks` section. -- Added hybrid artifact contracts for: - - per-chunk source provenance - - merged artifact summary metadata - - hybrid artifact decision trace payloads -- Implemented deterministic hybrid artifact merge logic that: - - reuses the existing lexical and semantic retrieval seams - - deduplicates by durable chunk id - - preserves dual-source provenance - - applies lexical-first precedence under the shared compile output limit - - excludes non-ingested artifacts - - emits explicit dedup/include/exclude trace events -- Updated compile summary trace payloads to report hybrid artifact request state, candidate counts, dedup counts, dual-source counts, and limit exclusions. -- Updated prompt assembly to serialize only the merged compile-time artifact section. -- Added unit and integration coverage for: - - lexical-only artifact compile behavior - - semantic-only artifact compile behavior - - hybrid dual-source provenance - - deterministic merge ordering - - limit enforcement across merged candidates - - non-ingested exclusion - - per-user isolation - - response-shape stability - -## exact hybrid artifact merge contract changes introduced - -- Added `ArtifactSelectionSource = Literal["lexical", "semantic"]`. -- Updated `ContextPackArtifactChunk` to carry: - - `source_provenance.sources` - - `source_provenance.lexical_match` - - `source_provenance.semantic_score` -- Expanded `ContextPackArtifactChunkSummary` to include: - - `lexical_requested` - - `semantic_requested` - - `embedding_config_id` - - `query_vector_dimensions` - - `lexical_limit` - - `semantic_limit` - - `lexical_candidate_count` - - `semantic_candidate_count` - - `merged_candidate_count` - - `deduplicated_count` - - `included_lexical_only_count` - - `included_semantic_only_count` - - `included_dual_source_count` - - `source_precedence` - - `lexical_order` - - `semantic_order` - - `merged_order` -- Added `HybridArtifactRetrievalDecisionTracePayload`. -- Removed compile-time `semantic_artifact_chunks` and `semantic_artifact_chunk_summary` from `CompiledContextPack`. - -## merge precedence and deduplication rule used - -- Dedup key: durable artifact chunk id. -- Source precedence: `lexical` before `semantic`. -- Merged ordering: - - source precedence - - lexical rank - - semantic rank - - `relative_path` - - `sequence_no` - - `id` -- Final compile limit: - - use `artifact_retrieval.limit` when lexical retrieval is present - - otherwise use `semantic_artifact_retrieval.limit` -- When the same chunk appears in both candidate sets: - - keep one item - - set `source_provenance.sources` to `["lexical", "semantic"]` - - preserve the lexical match payload - - preserve the semantic score - - emit `hybrid_artifact_chunk_deduplicated` +- Updated `ARCHITECTURE.md` to state that the accepted repo slice is current through Sprint 5J instead of Sprint 5H. +- Corrected `ARCHITECTURE.md` so compile-path semantic artifact retrieval and deterministic hybrid lexical-plus-semantic artifact merge are documented as implemented behavior, not deferred work. +- Updated `ARCHITECTURE.md` retrieval and testing sections to reflect the current artifact chunk contracts, hybrid compile ordering, provenance, and test coverage through Sprints 5I and 5J. +- Updated `ROADMAP.md` to state that the accepted repo state is current through Sprint 5J instead of Sprint 5A. +- Reframed `ROADMAP.md` so the next narrow sprint is richer document parsing on top of the shipped rooted workspace, durable chunk, and hybrid artifact compile baseline. +- Updated `.ai/handoff/CURRENT_STATE.md` to state that the working repo is current through Sprint 5J instead of Sprint 5D. +- Corrected `.ai/handoff/CURRENT_STATE.md` so artifact retrieval, artifact embeddings, compile-path semantic artifact retrieval, and the hybrid compile merge are described as shipped. +- Kept the sprint documentation-only: no runtime, schema, API, connector, runner, or UI files were changed. + +## truth-sync evidence used + +- Repo implementation evidence: + - `apps/api/src/alicebot_api/compiler.py` + - `apps/api/src/alicebot_api/contracts.py` + - `apps/api/src/alicebot_api/main.py` +- Accepted repo-test evidence: + - `tests/integration/test_context_compile.py` + - `tests/unit/test_compiler.py` + - `tests/unit/test_main.py` + - `tests/unit/test_response_generation.py` +- Accepted sprint evidence already present in the repo: + - `docs/archive/sprints/2026-03-13-sprint-5a-build-report.md` + - `docs/archive/sprints/2026-03-13-sprint-5a-review-report.md` +- Sprint packet requirements: + - `.ai/active/SPRINT_PACKET.md` + +## specific stale statements corrected + +- `ARCHITECTURE.md` no longer says the accepted repo slice is current only through Sprint 5H. +- `ARCHITECTURE.md` no longer says compile-path semantic artifact use and hybrid artifact retrieval are planned later. +- `ROADMAP.md` no longer says the accepted repo state is current only through Sprint 5A. +- `.ai/handoff/CURRENT_STATE.md` no longer says the working repo state is current only through Sprint 5D. +- `.ai/handoff/CURRENT_STATE.md` no longer says retrieval, ranking, or embeddings over artifact chunks are not implemented. ## incomplete work -- None within Sprint 5J scope. +- None within Sprint 5K scope. ## files changed -- `apps/api/src/alicebot_api/compiler.py` -- `apps/api/src/alicebot_api/contracts.py` -- `apps/api/src/alicebot_api/response_generation.py` -- `tests/integration/test_context_compile.py` -- `tests/unit/test_compiler.py` -- `tests/unit/test_main.py` -- `tests/unit/test_response_generation.py` +- `ARCHITECTURE.md` +- `ROADMAP.md` +- `.ai/handoff/CURRENT_STATE.md` - `BUILD_REPORT.md` ## tests run -- `./.venv/bin/python -m pytest tests/unit/test_compiler.py tests/unit/test_main.py tests/unit/test_response_generation.py` - - `50 passed in 0.50s` -- `./.venv/bin/python -m pytest tests/unit` - - `380 passed in 0.56s` -- `./.venv/bin/python -m pytest tests/integration/test_context_compile.py` - - initial sandboxed run failed because local Postgres on `localhost:5432` was blocked -- `./.venv/bin/python -m pytest tests/integration` - - `118 passed in 35.59s` - -## unit and integration test results - -- Unit suite: pass -- Integration suite: pass - -## example compile request and merged response - -```json -{ - "request": { - "user_id": "11111111-1111-1111-8111-111111111111", - "thread_id": "22222222-2222-2222-8222-222222222222", - "artifact_retrieval": { - "kind": "task", - "task_id": "33333333-3333-3333-8333-333333333333", - "query": "Alpha beta", - "limit": 2 - }, - "semantic_artifact_retrieval": { - "kind": "task", - "task_id": "33333333-3333-3333-8333-333333333333", - "embedding_config_id": "44444444-4444-4444-8444-444444444444", - "query_vector": [1.0, 0.0, 0.0], - "limit": 2 - } - }, - "response": { - "context_pack": { - "artifact_chunks": [ - { - "id": "55555555-5555-5555-8555-555555555555", - "task_id": "33333333-3333-3333-8333-333333333333", - "task_artifact_id": "66666666-6666-6666-8666-666666666666", - "relative_path": "docs/a.txt", - "media_type": "text/plain", - "sequence_no": 1, - "char_start": 0, - "char_end_exclusive": 14, - "text": "beta alpha doc", - "source_provenance": { - "sources": ["lexical", "semantic"], - "lexical_match": { - "matched_query_terms": ["alpha", "beta"], - "matched_query_term_count": 2, - "first_match_char_start": 0 - }, - "semantic_score": 1.0 - } - } - ], - "artifact_chunk_summary": { - "requested": true, - "lexical_requested": true, - "semantic_requested": true, - "scope": { - "kind": "task", - "task_id": "33333333-3333-3333-8333-333333333333" - }, - "query": "Alpha beta", - "query_terms": ["alpha", "beta"], - "embedding_config_id": "44444444-4444-4444-8444-444444444444", - "query_vector_dimensions": 3, - "limit": 2, - "lexical_limit": 2, - "semantic_limit": 2, - "searched_artifact_count": 3, - "lexical_candidate_count": 3, - "semantic_candidate_count": 3, - "merged_candidate_count": 3, - "deduplicated_count": 3, - "included_count": 2, - "included_lexical_only_count": 0, - "included_semantic_only_count": 0, - "included_dual_source_count": 2, - "excluded_uningested_artifact_count": 1, - "excluded_limit_count": 1, - "matching_rule": "casefolded_unicode_word_overlap_unique_query_terms_v1", - "similarity_metric": "cosine_similarity" - } - } - } -} -``` - -## example hybrid artifact trace events inside one compile run - -```json -[ - { - "kind": "context.included", - "payload": { - "entity_type": "artifact_chunk", - "entity_id": "55555555-5555-5555-8555-555555555555", - "reason": "hybrid_artifact_chunk_deduplicated", - "position": 1, - "scope_kind": "task", - "task_id": "33333333-3333-3333-8333-333333333333", - "task_artifact_id": "66666666-6666-6666-8666-666666666666", - "relative_path": "docs/a.txt", - "ingestion_status": "ingested", - "selected_sources": ["lexical", "semantic"], - "matched_query_terms": ["alpha", "beta"], - "score": 1.0 - } - }, - { - "kind": "context.excluded", - "payload": { - "entity_type": "artifact_chunk", - "entity_id": "77777777-7777-7777-8777-777777777777", - "reason": "hybrid_artifact_chunk_limit_exceeded", - "position": 3, - "scope_kind": "task", - "task_id": "33333333-3333-3333-8333-333333333333", - "task_artifact_id": "88888888-8888-8888-8888-888888888888", - "relative_path": "notes/c.txt", - "ingestion_status": "ingested", - "selected_sources": ["lexical", "semantic"], - "score": 0.0 - } - } -] -``` +- Documentation-scope verification: + - `rg -n "current through Sprint 5A|current through Sprint 5D|current through Sprint 5H|compile-path semantic artifact use|hybrid artifact retrieval are planned later|Retrieval, ranking, or embeddings over artifact chunks" ARCHITECTURE.md ROADMAP.md .ai/handoff/CURRENT_STATE.md` -> no matches + - `git diff --name-only` + - `git diff --stat -- ARCHITECTURE.md ROADMAP.md .ai/handoff/CURRENT_STATE.md BUILD_REPORT.md` +- Runtime tests were not run because Sprint 5K is documentation-only and made no schema, API, or runtime code changes. ## blockers/issues -- No product blocker. -- Integration tests required elevated localhost access because the sandbox blocked Postgres connections to `localhost:5432`. +- No implementation blockers. +- The repo does not currently contain archived Sprint 5I or Sprint 5J reports under `docs/archive/sprints`, so the durable in-repo evidence for those seams is the shipped code and test suite rather than archived sprint-report files. -## what remains intentionally deferred to later milestones +## intentionally deferred after truth synchronization -- Reranking across lexical and semantic artifact candidates -- Weighted or learned fusion -- Model-generated query embeddings -- Connector-backed artifact retrieval -- Richer document parsing -- Runner orchestration -- UI changes +- Rich document parsing beyond the current `text/plain` and `text/markdown` ingestion seam. +- Read-only Gmail and Calendar connectors. +- Runner-style orchestration and automatic multi-step progression. +- Artifact reranking or weighted fusion beyond the current lexical-first hybrid compile merge. +- UI work. ## recommended next step -Run review against the new merged compile artifact contract, with attention on downstream consumers that may still expect the removed compile-time `semantic_artifact_chunks` section. +Open one narrow sprint for richer document parsing that preserves the existing rooted workspace, durable chunk, and shipped hybrid artifact compile contracts. diff --git a/REVIEW_REPORT.md b/REVIEW_REPORT.md index d487d25..f2086f3 100644 --- a/REVIEW_REPORT.md +++ b/REVIEW_REPORT.md @@ -1,36 +1,32 @@ verdict: PASS criteria met -- `POST /v0/context/compile` now returns one merged `context_pack.artifact_chunks` section and no longer emits a separate compile-time semantic artifact section. -- The merged artifact section deduplicates by durable chunk id and preserves dual-source provenance through `source_provenance.sources`, `lexical_match`, and `semantic_score`. -- Merge behavior is explicit and deterministic: lexical-first precedence, stable tie-breakers, shared final limit handling, and deterministic trace ordering are implemented in [compiler.py](/Users/samirusani/Desktop/Codex/AliceBot/apps/api/src/alicebot_api/compiler.py). -- Non-ingested artifacts remain excluded from compile output and now emit hybrid exclusion trace events. -- Hybrid merge, deduplication, include, and exclusion decisions are persisted in `trace_events`, including compile summary counts. -- Memory, entity, and non-artifact compile sections remain unchanged apart from expected trace/contract adjacency. -- Unit coverage for compiler/main/response generation passed: `./.venv/bin/python -m pytest tests/unit/test_compiler.py tests/unit/test_main.py tests/unit/test_response_generation.py` -> `50 passed`. -- Relevant Postgres-backed integration coverage passed: `./.venv/bin/python -m pytest tests/integration/test_context_compile.py` -> `12 passed`. -- The changed file set stayed within sprint scope: compiler/contracts/prompt serialization/tests/build report only; no reranking, connector, runner, or UI work was introduced. -- `BUILD_REPORT.md` was updated with contract changes, merge rules, commands run, example request/response, trace examples, and deferred scope. +- The sprint stayed documentation-only. `git diff --name-only` shows changes only to `.ai/handoff/CURRENT_STATE.md`, `ARCHITECTURE.md`, `BUILD_REPORT.md`, `ROADMAP.md`, and this review report. +- `ARCHITECTURE.md` describes compile-path semantic artifact retrieval and deterministic hybrid lexical-plus-semantic artifact merge as implemented behavior, not deferred work, and that matches the shipped compile contract and tests. +- `ROADMAP.md` no longer claims the repo is current only through Sprint 5A and now frames the next delivery focus from the actual shipped Sprint 5J artifact-retrieval baseline. +- `.ai/handoff/CURRENT_STATE.md` no longer claims the repo is current only through Sprint 5D and now reflects the shipped Sprint 5J seams accurately. +- The truth artifacts distinguish implemented behavior from deferred work. Rich document parsing, connectors, runner orchestration, UI work, and artifact reranking beyond the shipped lexical-first merge remain clearly deferred. +- No runtime, schema, API, connector, runner, or UI changes appear in the sprint diff. +- `BUILD_REPORT.md` now uses durable in-repo evidence correctly and no longer relies on the overwritten active Sprint 5J build report as a cited source. criteria missed - None. quality issues -- No blocking implementation issues found in the changed code. +- No blocking quality issues found in the final documentation set. regression risks -- The compile response contract removes `context_pack.semantic_artifact_chunks` and `context_pack.semantic_artifact_chunk_summary`. In-repo consumers were updated, but any external consumer not covered by this repository will need the new merged contract. -- There is no explicit negative test for the new mixed-input validation path where `artifact_retrieval` and `semantic_artifact_retrieval` target different scopes. The code does reject it with `400`, but that path is not directly exercised. +- No runtime or schema regression risk identified because the sprint remains documentation-only. +- Residual process risk remains that future truth-sync sprints could repeat the same provenance mistake if active artifacts cite other active files that are being replaced in the same sprint. docs issues -- None blocking. `BUILD_REPORT.md` satisfies the sprint packet requirements. +- None blocking. should anything be added to RULES.md? -- No. +- Yes. Add a short rule that active truth artifacts should cite durable in-repo evidence only, not active files being overwritten in the same sprint. should anything update ARCHITECTURE.md? -- No immediate update required. The contract change is localized and already captured in the sprint/build artifacts. +- No. recommended next action -- Accept Sprint 5J as complete and merge after normal approval flow. -- Optional follow-up: add one endpoint/integration test covering mismatched lexical/semantic artifact scopes returning `400`. +- Accept Sprint 5K as complete and merge after normal approval flow. diff --git a/ROADMAP.md b/ROADMAP.md index a963227..a3d9e21 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -2,38 +2,40 @@ ## Current Position -- The accepted repo state is current through Sprint 5A. -- The backend foundation through governance, execution review, task/task-step lifecycle, explicit manual continuation, step-linked approval/execution synchronization, and deterministic rooted task-workspace provisioning is already shipped. -- This roadmap is future-facing from that position; milestone history lives in archived sprint reports, not here. +- The accepted repo state is current through Sprint 5J. +- Milestone 5 now ships the rooted local workspace and artifact baseline end to end: workspace provisioning, artifact registration, narrow text ingestion, durable chunk storage, lexical artifact retrieval, compile-path artifact inclusion, artifact-chunk embeddings, direct semantic artifact retrieval, compile-path semantic artifact retrieval, and deterministic hybrid lexical-plus-semantic artifact merge in compile. +- This roadmap is future-facing from that shipped baseline; historical sprint-by-sprint detail lives in accepted build and review artifacts, not here. ## Next Delivery Focus -### Finish Milestone 5 On Top Of The Shipped Workspace Boundary +### Open Richer Document Parsing On Top Of The Shipped Artifact Retrieval Baseline -- Add artifact records and artifact-handling rules that reuse `task_workspaces` instead of inventing a parallel storage seam. -- Add document ingestion and retrieval only after the artifact/workspace boundary is explicit and reviewable. -- Add read-only Gmail and Calendar connectors only after document and workspace boundaries remain deterministic under the current governance model. +- Extend ingestion beyond the current `text/plain` and `text/markdown` seam without changing the rooted `task_workspaces` and durable `task_artifact_chunks` contracts. +- Keep retrieval building on persisted chunk rows and persisted embeddings; new parsing work should feed the existing compile-path lexical/semantic/hybrid artifact retrieval seam rather than inventing a parallel context path. +- Keep the next sprint narrow: richer document parsing first, then reassess connectors only after the parsing seam is accepted. -### Preserve Current Governance And Task Guarantees +### Preserve Current Compile, Governance, And Task Guarantees -- Keep approvals, execution budgets, task/task-step state, and trace visibility deterministic as new Milestone 5 work lands. -- Do not widen the current no-external-I/O proxy surface or introduce new consequential side effects without an explicit sprint opening that scope. +- Keep approvals, execution budgets, task/task-step state, and trace visibility deterministic as Milestone 5 continues. +- Preserve the shipped compile contract of one merged artifact section with explicit source provenance, deterministic lexical-first precedence, and trace-visible inclusion and exclusion decisions. +- Do not widen the current no-external-I/O proxy surface or introduce runner, connector, or UI scope until those areas are explicitly opened. -## After Milestone 5 +## After The Next Narrow Sprint -- Revisit broader task orchestration only after the current explicit task-step seams remain stable under workspace, artifact, and document flows. -- Expand tool execution breadth only after governance, review, and budget controls still hold under the wider task surface. -- Address production-facing auth and deployment hardening as the product approaches broader real-world use. +- Open read-only connector work only after richer document parsing remains deterministic under the current artifact and governance seams. +- Revisit workflow UI only after backend document and connector seams are accepted and the truth artifacts stay current. +- Revisit broader task orchestration only after the current explicit task-step seams remain stable under workspace, artifact, document, and connector flows. +- Continue to defer broader tool execution breadth and production auth/deployment hardening until the current governed surface remains stable. ## Dependencies - Live truth docs must stay synchronized with accepted repo state so sprint planning does not start from stale assumptions. -- Artifact and document work should build on the existing rooted local workspace contract. -- Connector work should remain read-only and approval-aware. +- Rich document parsing should build on the shipped rooted local workspace, durable artifact chunk, and hybrid compile retrieval contracts. +- Connector work should remain read-only, approval-aware, and downstream of the document parsing seam. - Runner-style orchestration should stay deferred until the repo no longer depends on narrow current-step assumptions for safety and explainability. ## Ongoing Risks - Memory extraction and retrieval quality remain the largest product risk. - Auth beyond database user context is still missing. -- Milestone 5 can drift if artifact, document, connector, and orchestration work are mixed into one sprint instead of landing as narrow seams. +- Milestone 5 can drift if richer document parsing, connectors, UI, and orchestration work are mixed into one sprint instead of landing as narrow seams on top of the shipped artifact retrieval baseline.