Skip to content

Add design specification, license, and project setup#2

Merged
Aureliolo merged 4 commits into
mainfrom
feat/design-spec
Feb 27, 2026
Merged

Add design specification, license, and project setup#2
Aureliolo merged 4 commits into
mainfrom
feat/design-spec

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Design Specification (DESIGN_SPEC.md) — comprehensive 18-section high-level design for the AI Company framework covering agent system, company structure, communication architecture, task engine, memory, HR, model providers, budget management, tools, security, human interaction, templates, technical architecture, research/prior art, open questions, and backlog
  • BSL 1.1 License — Business Source License with non-commercial use grant, converting to Apache 2.0 on February 27, 2030
  • README — updated with project concept, planned features, tech stack, and documentation links
  • Claude Code skillaurelio-review-pr PR review pipeline for future use

All technology choices in the design spec are marked as candidates/TBD pending further research. No final decisions on frameworks, libraries, or specific models.

Test plan

  • Review DESIGN_SPEC.md for completeness and internal consistency
  • Verify LICENSE parameters (change date, use grant, change license)
  • Confirm README accurately reflects current project status
  • Validate all 18 sections are properly cross-referenced in table of contents

🤖 Generated with Claude Code

Comprehensive design spec covering:
- Agent identity system with roles, personalities, skills, seniority
- Company structure templates (solo founder to full enterprise)
- Communication architecture (event-driven bus, hierarchical, meetings, hybrid)
- Task/workflow engine with multiple execution patterns
- Memory system (working, episodic, semantic, procedural, social)
- Multi-provider LLM layer via LiteLLM (Claude, OpenRouter, Ollama)
- Budget management with CFO agent and cost optimization
- Tool/capability system with progressive trust
- Security operations agent and configurable autonomy levels
- API-first architecture (FastAPI + WebSocket) with web UI
- Template system and company builder
- Research analysis of MetaGPT, ChatDev, CrewAI, AutoGen, LangGraph
- Build-from-scratch recommendation leveraging LiteLLM, Mem0, MCP, A2A
- Open questions, risks, backlog, and scaling path
Business Source License 1.1 with parameters:
- Non-commercial use only (personal, education, research, evaluation)
- Commercial/production use requires separate license
- Converts to Apache 2.0 after 4 years per version
- Fix LICENSE Change Date to explicit "February 27, 2030"
- Extract HR into its own section (Section 8), renumber all sections
- Reconcile budget hierarchy to include QA department
- Correct A2A/MCP governance attribution to Linux Foundation
- Keep SecOps agent active in locked autonomy mode for audit logging
- Consolidate terminology on "Agent" (remove "Employee" as separate concept)
- Add communication loop prevention mechanism (Section 5.5)
- Mark model pricing and provider configs as illustrative examples
- Soften specific technology commitments to "candidates - TBD after research"
- Add language spec to workflow diagram code blocks
- Add aurelio-review-pr skill for Claude Code
Copilot AI review requested due to automatic review settings February 27, 2026 09:48
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 27, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eeaec53 and dfa99f3.

📒 Files selected for processing (2)
  • .claude/skills/aurelio-review-pr/skill.md
  • DESIGN_SPEC.md

📝 Walkthrough

Summary by CodeRabbit

  • Documentation

    • Added a comprehensive design spec describing architecture, agent framework, communication, persistence, governance, and implementation guidance.
    • Added a detailed PR review workflow document outlining a multi-phase review, feedback consolidation, and approval/implementation steps.
    • Expanded README with project concept, planned features, status, tech stack, and docs references.
  • Chores

    • Added Business Source License 1.1; production/commercial use requires a separate commercial license.

Walkthrough

Adds foundational documentation and a PR review workflow: a 10-phase PR review pipeline spec, a comprehensive design specification for multi-agent orchestration, a Business Source License 1.1, and an expanded README.

Changes

Cohort / File(s) Summary
Workflow Documentation
​.claude/skills/aurelio-review-pr/skill.md
New markdown describing a 10-phase PR review pipeline: identify PR/branch, fetch issue context, run local review agents in parallel, fetch external reviewer feedback, consolidate/triage with deduplication/conflict detection, AskUserQuestion approval step, implement approved fixes grouped by file, commit/push safeguards, post-push verification, and final summary. Specifies control flow rules, retry semantics, error signaling, and user prompts.
Design & Architecture
DESIGN_SPEC.md
New comprehensive design spec for a config-driven autonomous-agent "AI Company" framework: architecture, agent system, company structure, communication patterns, workflow engine, memory/persistence, HR/security/approval, model providers, cost/budgeting, templates, technical stack candidates, research notes, risks, and backlog.
Project Foundation
LICENSE, README.md
Adds Business Source License 1.1 (Change Date Feb 27, 2030 → Apache 2.0) and expands README with formal project title, concept, planned features, status, tech stack, and documentation references.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant User
participant LocalAgents as Local Review Agents
participant GH as GitHub API
participant Ext as External Reviewers
participant Repo
User->>GH: Identify PR / branch
GH-->>User: Return PR & issue context
User->>LocalAgents: Run local review agents (parallel)
LocalAgents-->>User: Local findings
User->>GH: Fetch external reviewer feedback (retry semantics)
GH-->>Ext: Request external comments
Ext-->>GH: External feedback (may include outside-diff comments)
GH-->>User: External feedback delivered
User->>User: Consolidate & triage feedback (dedupe, detect conflicts)
User->>User: AskUserQuestion (approve all / selective)
User->>LocalAgents: Implement approved fixes (grouped by file)
LocalAgents->>Repo: Create commits (lint/tests considerations)
Repo-->>User: Push/commit result (safeguards)
User->>GH: Verify external reviewer status post-push
GH-->>User: Final summary & report

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: adding a design specification, license, and project setup files to establish the project foundation.
Description check ✅ Passed The description is well-structured and directly related to the changeset, covering all major file additions (DESIGN_SPEC.md, LICENSE, README updates) and the PR review skill with clear context.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/design-spec

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes the foundational elements for the "AI Company" project by introducing its core design principles, licensing, and initial development tooling. It provides a detailed high-level design specification outlining the architecture and components of the AI agent orchestration framework, defines the project's open-source license, updates the main project README with key information, and adds a sophisticated PR review automation skill.

Highlights

  • Design Specification: A comprehensive 18-section high-level design for the AI Company framework has been added, covering agent system, company structure, communication, task engine, memory, HR, model providers, budget, tools, security, human interaction, templates, technical architecture, research, open questions, and backlog.
  • Licensing: The project now includes a Business Source License (BSL 1.1) with a non-commercial use grant, which will convert to Apache 2.0 on February 27, 2030.
  • Project Setup & Documentation: The README has been updated to include the project concept, planned features, tech stack, and links to documentation, reflecting the current project status.
  • Claude Code Skill: A new Claude Code skill, 'aurelio-review-pr', has been introduced to define a full PR review pipeline for future use, integrating local agents, external feedback, triage, and fix implementation.
Changelog
  • .claude/skills/aurelio-review-pr/skill.md
    • Added a new Claude skill for a comprehensive PR review pipeline.
    • Defined 10 phases for the PR review process, from finding the PR to implementing fixes and verifying external reviewer status.
    • Specified tools allowed for the skill, including Bash, Read, Edit, Write, Grep, Glob, Task, and AskUserQuestion.
    • Included detailed logic for issue linkage, context fetching, parallel local agent execution, external feedback collection, and triage.
    • Outlined rules for fix implementation, commit procedures, and loop prevention.
  • DESIGN_SPEC.md
    • Added a detailed 18-section high-level design specification for the AI Company framework.
    • Included sections on vision, core concepts, agent system, company structure, communication, task engine, memory, HR, model providers, cost management, tools, security, human interaction, templates, technical architecture, research, open questions, and future vision.
    • Provided detailed YAML/JSON examples for agent identity, department configuration, communication settings, task definition, memory levels, budget controls, tool access, autonomy levels, and company templates.
    • Compared the proposed architecture with existing AI agent frameworks and justified a build-from-scratch approach leveraging key libraries.
  • LICENSE
    • Added the Business Source License 1.1.
    • Specified "Aurelio" as the Licensor and "AI Company" as the Licensed Work.
    • Included an Additional Use Grant for non-production use (personal, educational, research, evaluation).
    • Set the Change Date to February 27, 2030, and the Change License to Apache License, Version 2.0.
  • README.md
    • Updated the project title to "AI Company".
    • Added a concept section explaining the framework for orchestrating autonomous AI agents.
    • Listed key planned features such as configurable company structure, deep agent identity, multi-provider support, smart cost management, configurable autonomy, persistent memory, HR system, real tool access, API-first design, and templates.
    • Updated the status to "Design phase" and linked to DESIGN_SPEC.md.
    • Included a planned tech stack section with Python, FastAPI, LiteLLM, Mem0, MCP, Vue 3, and SQLite/PostgreSQL.
    • Added a documentation section linking to the Design Specification.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces foundational documents for the project, including a detailed design specification for an AI Company framework, a BSL 1.1 license, and an updated project README. While I've noted minor inconsistencies within the design specification and the aurelio-review-pr skill definition, a critical concern is that the aurelio-review-pr skill contains instructions that could lead to security vulnerabilities. Specifically, it lacks explicit validation for untrusted input extracted from PR bodies, potentially leading to command injection, and does not include safeguards against prompt injection when passing GitHub issue context to review agents. Addressing these inconsistencies and, more importantly, refining the skill definition to mitigate these security risks is crucial for clarity and project security.

Comment on lines +63 to +66
**Fetch issue context.** If an issue reference was found (regardless of warnings), fetch the issue for review context. If the PR body used a full URL (`https://github.com/OWNER/REPO/issues/N`), extract both `OWNER/REPO` and `N` and pass `--repo OWNER/REPO` to query the correct repository:

```bash
gh issue view N --repo OWNER/REPO --json title,body,labels,comments --jq '{title: .title, body: .body, labels: [.labels[].name], comments: [.comments[] | {author: .author.login, body: .body}]}'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The skill instructions direct the agent to extract OWNER/REPO and N from a URL provided in the PR body and use them in a shell command (gh issue view N --repo OWNER/REPO). Since the PR body is untrusted input, an attacker could provide a malicious URL (e.g., https://github.com/owner/repo;injection/issues/1) to perform command injection. The instructions should explicitly require the agent to validate that OWNER/REPO matches the expected repository format and that N is a numeric value before using them in any shell command.

| **comment-analyzer** | Comments or docstrings changed | `pr-review-toolkit:comment-analyzer` |
| **type-design-analyzer** | Type annotations or classes added/modified | `pr-review-toolkit:type-design-analyzer` |

Each agent should receive the list of changed files and focus on reviewing them. **If issue context was collected in Phase 2, include the issue title, body, and key comments in each agent's prompt** so they can verify the PR addresses the issue's requirements.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The skill instructions specify that untrusted data from GitHub (issue title, body, and comments) should be included in the prompts for review agents. This data is not sanitized or wrapped in delimiters, making the agents vulnerable to prompt injection attacks. An attacker could use malicious issue comments to manipulate the behavior of the review agents. The instructions should be updated to require the use of clear delimiters (e.g., XML tags) and to explicitly instruct the sub-agents to treat this content as untrusted data.

```bash
gh api repos/OWNER/REPO/pulls/NUMBER/comments --paginate
```
Extract: author, file path, line number, body, subject_type.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the list of fields to extract for inline review comments, subject_type is mentioned. However, according to the GitHub API documentation, this field is not present in the response for listing PR review comments (/repos/OWNER/REPO/pulls/NUMBER/comments). This field is associated with commit comments. To ensure the implementation is based on the correct API schema, I suggest removing subject_type from this list.

Comment thread DESIGN_SPEC.md Outdated
```yaml
memory:
level: "full" # none, session, project, full
backend: "sqlite" # sqlite, postgresql, mem0, file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be an inconsistency in how Mem0 is defined as part of the memory architecture.

  • Here in Section 7.3, mem0 is listed as a value for the backend configuration, suggesting it's a standalone storage option.
  • However, Section 15.2 describes the stack as Mem0 + SQLite and states Mem0 is for semantic/episodic memory while SQLite is for structured data. This implies Mem0 is a layer that works with a database, not as an alternative to it.

To avoid confusion during implementation, it would be helpful to clarify the relationship. Is Mem0 a memory system that can be configured with different storage backends (like SQLite), or is it a backend itself? The configuration schema should reflect this relationship clearly.

Comment thread DESIGN_SPEC.md Outdated
Comment on lines +770 to +778
- role_level: "c_suite"
preferred_model: "opus"
fallback: "sonnet"
- role_level: "senior"
preferred_model: "sonnet"
fallback: "haiku"
- role_level: "junior"
preferred_model: "haiku"
fallback: "local-small"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an inconsistency in the casing of agent seniority levels across the design document.

  • Section 3.1 (level) and 3.2 (Level) use capitalized values (e.g., Senior, C-Suite).
  • The routing rules here in Section 9.4 use lowercase values (e.g., c_suite, senior, junior).

This could lead to bugs where routing rules don't match agent levels. It would be beneficial to standardize on a single casing convention (e.g., all lowercase) for these values throughout the specification.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR establishes the foundational documentation and legal framework for the AI Company project, a framework for orchestrating autonomous AI agents within a virtual company structure. The PR transitions the repository from a minimal placeholder to a fully specified design-phase project with clear licensing, technical vision, and development tooling.

Changes:

  • Added comprehensive 18-section design specification covering agent system, company structure, communication, tasks, memory, HR, providers, budget, tools, security, human interaction, templates, and technical architecture
  • Implemented Business Source License 1.1 with 4-year conversion to Apache 2.0
  • Updated README with project concept, key features, tech stack, and documentation links
  • Added Claude Code PR review skill for automated review workflow

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
DESIGN_SPEC.md Comprehensive high-level design covering all major system components, with technology choices marked as candidates pending research
LICENSE Business Source License 1.1 with non-production use grant, converting to Apache 2.0 on February 27, 2030
README.md Project overview with concept description, planned features, tech stack, and documentation links
.claude/skills/aurelio-review-pr/skill.md PR review pipeline skill with 10-phase workflow for local agents, external feedback, triage, and fixes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Aureliolo
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/aurelio-review-pr/skill.md:
- Around line 32-34: Several fenced code blocks (e.g., the block containing "gh
repo view --json nameWithOwner -q .nameWithOwner") lack surrounding blank lines
and nearby sentences lack terminal punctuation; update each affected fenced
block by inserting a blank line before and after the triple-backtick fence and
ensure the sentence immediately preceding or following each fence ends with
proper punctuation (period, question mark, or exclamation). Apply the same fixes
to the other similar fences called out (the blocks around the other listed
snippets) so all fenced code blocks have blank lines around them and all nearby
sentences end with punctuation for MD lint stability.

In `@DESIGN_SPEC.md`:
- Around line 76-97: The fenced block showing the company org tree (the
triple-backtick block starting with "Company ├── Departments[] ...") and
multiple other fenced blocks lack a language identifier and several headings
(e.g., "#### C-Suite / Executive") are missing surrounding blank lines; to fix,
add an appropriate fence language token (e.g., ```text, ```yaml, ```json, or
```bash) to every fenced code block such as the "Company" tree and the other
blocks listed, and ensure each heading has a blank line above and below it (for
example insert a blank line before "#### C-Suite / Executive" and a blank line
after the heading), applying this pattern across the noted ranges (lines with
MD040/MD022 warnings) so markdownlint MD040 and MD022 are resolved.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e040da and eeaec53.

📒 Files selected for processing (4)
  • .claude/skills/aurelio-review-pr/skill.md
  • DESIGN_SPEC.md
  • LICENSE
  • README.md
📜 Review details
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to README.md : Update README.md for significant feature changes

Applied to files:

  • README.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Never defer work—do not suggest "this can be done later" or "consider for a future PR". Complete all requested changes fully.

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Never defer work. Do not suggest 'this can be done later' or 'consider for a future PR'. Complete all requested changes fully.

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: After every push, you MUST check that CI passes. If CI fails, fix the issue immediately and push again until all checks are green. Never walk away from a failing CI pipeline.

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Always create a PR for issue work. When implementing changes for a GitHub issue, create a branch and open a pull request. Do not wait to be asked.

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
🪛 LanguageTool
DESIGN_SPEC.md

[typographical] ~1-~1: To join two clauses or introduce examples, consider using an em dash.
Context: # AI Company - High-Level Design Specification > A fra...

(DASH_RULE)


[typographical] ~193-~193: To join two clauses or introduce examples, consider using an em dash.
Context: ...### Engineering - Software Architect - System design, technology decisions, pat...

(DASH_RULE)


[typographical] ~194-~194: To join two clauses or introduce examples, consider using an em dash.
Context: ...Frontend Developer** (Junior/Mid/Senior) - UI implementation, components, state man...

(DASH_RULE)


[typographical] ~195-~195: To join two clauses or introduce examples, consider using an em dash.
Context: ...Backend Developer* (Junior/Mid/Senior) - APIs, business logic, databases - **Full...

(DASH_RULE)


[typographical] ~196-~196: To join two clauses or introduce examples, consider using an em dash.
Context: ...ll-Stack Developer** (Junior/Mid/Senior) - End-to-end implementation - **DevOps/SRE...

(DASH_RULE)


[typographical] ~197-~197: To join two clauses or introduce examples, consider using an em dash.
Context: ...implementation - DevOps/SRE Engineer - Infrastructure, CI/CD, monitoring, deplo...

(DASH_RULE)


[typographical] ~198-~198: To join two clauses or introduce examples, consider using an em dash.
Context: ...ring, deployment - Database Engineer - Schema design, query optimization, migra...

(DASH_RULE)


[typographical] ~199-~199: To join two clauses or introduce examples, consider using an em dash.
Context: ...tion, migrations - Security Engineer - Security audits, vulnerability assessmen...

(DASH_RULE)


[style] ~383-~383: Consider using a different adjective to strengthen your wording.
Context: ...#### Pattern 4: Hybrid (Recommended for Full Company) Combines all three: - **Messa...

(FULL_ENTIRE)


[style] ~654-~654: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...onality would complement the team? - What model/provider fits the budget? 3. Cand...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~664-~664: You have used the passive voice repeatedly in nearby sentences. To make your writing clearer and easier to read, consider using active voice.
Context: ... 3. Active tasks are reassigned 4. Team is notified ### 8.3 Performance Tracking ```yaml ...

(REP_PASSIVE_VOICE)


[grammar] ~682-~682: Use a hyphen to join words.
Context: ...ce: - Promotion criteria: sustained high quality scores, task complexity handled,...

(QB_NEW_EN_HYPHEN)


[typographical] ~1377-~1377: In American English, use a period after an abbreviation.
Context: ...pid prototyping | ### 16.2 What Exists vs What We Need | Feature | MetaGPT | Cha...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~1381-~1381: To join two clauses or introduce examples, consider using an em dash.
Context: ...ulation | Partial | Partial | No | Yes - complete | | HR (hiring/firing) | No |...

(DASH_RULE)


[typographical] ~1385-~1385: To join two clauses or introduce examples, consider using an em dash.
Context: ...alities | Basic | Basic | Basic | Deep - traits, styles, evolution | | Dynamic ...

(DASH_RULE)


[typographical] ~1386-~1386: To join two clauses or introduce examples, consider using an em dash.
Context: ... team scaling | No | No | Manual | Yes - auto + manual | | Multiple company typ...

(DASH_RULE)


[typographical] ~1387-~1387: To join two clauses or introduce examples, consider using an em dash.
Context: ...company types | No | No | Manual | Yes - templates + builder | | Security ops a...

(DASH_RULE)


[typographical] ~1389-~1389: To join two clauses or introduce examples, consider using an em dash.
Context: ...ble autonomy | No | No | Limited | Yes - full spectrum | | Local + cloud provid...

(DASH_RULE)


[typographical] ~1390-~1390: To join two clauses or introduce examples, consider using an em dash.
Context: ...rs | Partial | Partial | Partial | **Yes - unified abstraction (LiteLLM candidate)*...

(DASH_RULE)


[typographical] ~1391-~1391: To join two clauses or introduce examples, consider using an em dash.
Context: ...racking per agent | No | No | No | Yes - full budget system | | Progressive tru...

(DASH_RULE)


[typographical] ~1398-~1398: In American English, use a period after an abbreviation.
Context: ...Planned (backlog) | ### 16.3 Build vs Fork Decision **Recommendation: Build ...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[style] ~1432-~1432: Consider using the typographical ellipsis character here instead.
Context: ... | Agent A asks Agent B who asks Agent A... | | 8 | Optimal message bus for local-f...

(ELLIPSIS)


[typographical] ~1433-~1433: In American English, use a period after an abbreviation.
Context: ...architecture? | Medium | asyncio queues vs Redis vs embedded broker | | 9 | How to...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~1433-~1433: In American English, use a period after an abbreviation.
Context: ...ure? | Medium | asyncio queues vs Redis vs embedded broker | | 9 | How to handle c...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~1434-~1434: In American English, use a period after an abbreviation.
Context: ...y? | High | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the ...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~1434-~1434: In American English, use a period after an abbreviation.
Context: ...h | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the minimum ...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[style] ~1509-~1509: To make your text as clear as possible to all readers, do not use this foreign term. Possible alternatives are ‘in fact’ or ‘in reality’.
Context: ...rop | | OpenAI API format | OpenAI (de facto standard) | LLM API interface | Via pro...

(DE_FACTO)


[style] ~1509-~1509: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “API”.
Context: ...at** | OpenAI (de facto standard) | LLM API interface | Via provider abstraction layer (LiteL...

(ACRONYM_TAUTOLOGY)


[typographical] ~1513-~1513: To join two clauses or introduce examples, consider using an em dash.
Context: ...s://github.com/FoundationAgents/MetaGPT) - Multi-agent SOP framework (64.5k stars) ...

(DASH_RULE)


[typographical] ~1514-~1514: To join two clauses or introduce examples, consider using an em dash.
Context: ...2.0](https://github.com/openbmb/ChatDev) - Zero-code multi-agent platform (31.2k st...

(DASH_RULE)


[typographical] ~1515-~1515: To join two clauses or introduce examples, consider using an em dash.
Context: ...AI](https://github.com/crewAIInc/crewAI) - Role-based agent collaboration framework...

(DASH_RULE)


[typographical] ~1516-~1516: To join two clauses or introduce examples, consider using an em dash.
Context: ...n](https://github.com/microsoft/autogen) - Microsoft async multi-agent framework - ...

(DASH_RULE)


[typographical] ~1517-~1517: To join two clauses or introduce examples, consider using an em dash.
Context: ...LLM](https://github.com/BerriAI/litellm) - Unified LLM API gateway (100+ providers)...

(DASH_RULE)


[typographical] ~1518-~1518: To join two clauses or introduce examples, consider using an em dash.
Context: ...- Mem0 - Universal memory layer for AI agents - [...

(DASH_RULE)


[typographical] ~1519-~1519: To join two clauses or introduce examples, consider using an em dash.
Context: ...ocol](https://github.com/a2aproject/A2A) - Agent-to-Agent protocol (Linux Foundatio...

(DASH_RULE)


[typographical] ~1520-~1520: To join two clauses or introduce examples, consider using an em dash.
Context: ...extprotocol.io/specification/2025-11-25) - Model Context Protocol - [Langfuse Agent...

(DASH_RULE)


[typographical] ~1521-~1521: To join two clauses or introduce examples, consider using an em dash.
Context: ...com/blog/2025-03-19-ai-agent-comparison) - Framework comparison - [Confluent Event-...

(DASH_RULE)


[typographical] ~1522-~1522: To join two clauses or introduce examples, consider using an em dash.
Context: .../blog/event-driven-multi-agent-systems/) - Multi-agent architecture patterns - [Mic...

(DASH_RULE)


[typographical] ~1523-~1523: To join two clauses or introduce examples, consider using an em dash.
Context: ....io/multi-agent-reference-architecture/) - Enterprise patterns - [OpenRouter](https...

(DASH_RULE)


[typographical] ~1524-~1524: To join two clauses or introduce examples, consider using an em dash.
Context: ...s - OpenRouter - Multi-model API gateway

(DASH_RULE)

README.md

[typographical] ~11-~11: To join two clauses or introduce examples, consider using an em dash.
Context: ...s (Planned) - Any Company Structure - From a 2-person startup to a 50+ enterpr...

(DASH_RULE)


[typographical] ~12-~12: To join two clauses or introduce examples, consider using an em dash.
Context: ...nfig/templates - Deep Agent Identity - Names, personalities, skills, seniority ...

(DASH_RULE)


[typographical] ~13-~13: To join two clauses or introduce examples, consider using an em dash.
Context: ...erformance tracking - Multi-Provider - Anthropic Claude, OpenRouter (400+ model...

(DASH_RULE)


[typographical] ~14-~14: To join two clauses or introduce examples, consider using an em dash.
Context: ... via LiteLLM - Smart Cost Management - Per-agent budget tracking, auto model ro...

(DASH_RULE)


[typographical] ~15-~15: To join two clauses or introduce examples, consider using an em dash.
Context: ...optimization - Configurable Autonomy - From fully autonomous to human-approves-...

(DASH_RULE)


[typographical] ~16-~16: To join two clauses or introduce examples, consider using an em dash.
Context: ...agent in between - Persistent Memory - Agents remember past decisions, code, re...

(DASH_RULE)


[typographical] ~17-~17: To join two clauses or introduce examples, consider using an em dash.
Context: ...relationships (via Mem0) - HR System - Hire, fire, promote agents. HR agent ana...

(DASH_RULE)


[typographical] ~18-~18: To join two clauses or introduce examples, consider using an em dash.
Context: ...tem, git, code execution, web, databases - role-based and sandboxed - API-First...

(DASH_RULE)


[grammar] ~20-~20: Please add a punctuation mark at the end of paragraph.
Context: ...built company templates and interactive builder ## Status Design phase. See [DESI...

(PUNCTUATION_PARAGRAPH_END)


[typographical] ~37-~37: To join two clauses or introduce examples, consider using an em dash.
Context: ...- Design Specification - Full high-level design

(DASH_RULE)

.claude/skills/aurelio-review-pr/skill.md

[style] ~107-~107: The phrase ‘Look for patterns’ is used very frequently. Consider using a less frequent alternative to set your writing apart from others.
Context: ... lines are outside the PR's diff range. Look for patterns like "Outside diff range comments (N)" ...

(LOOK_FOR_STYLE)


[style] ~121-~121: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ...tionable items, not just summaries). Important: Use gh api with --jq for filteri...

(EN_REPEATEDWORDS_IMPORTANT)


[style] ~123-~123: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ...omplex Python scripts to parse JSON. Important: When review bodies are large (e.g. C...

(EN_REPEATEDWORDS_IMPORTANT)


[grammar] ~139-~139: Please add a punctuation mark at the end of paragraph.
Context: ...heck against CLAUDE.md rules and actual code Deduplication: If multiple sources...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~172-~172: Please add a punctuation mark at the end of paragraph.
Context: ... rely on pre-push hooks and CI for full coverage ## Phase 8: Commit and push After all...

(PUNCTUATION_PARAGRAPH_END)


[style] ~181-~181: Consider using a different verb for a more formal wording.
Context: .... If commit or push fails due to hooks, fix the actual issue and create a NEW commi...

(FIX_RESOLVE)

🪛 markdownlint-cli2 (0.21.0)
DESIGN_SPEC.md

[warning] 76-76: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 178-178: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 185-185: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 192-192: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 201-201: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 207-207: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 212-212: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 218-218: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 256-256: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 332-332: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 352-352: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 365-365: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 492-492: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 602-602: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 693-693: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 947-947: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 1020-1020: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 1039-1039: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 1133-1133: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 1180-1180: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 1238-1238: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 1487-1487: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

.claude/skills/aurelio-review-pr/skill.md

[warning] 32-32: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 102-102: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 104-104: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 110-110: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 112-112: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 116-116: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 118-118: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🔇 Additional comments (2)
README.md (1)

1-37: README update aligns well with this PR scope.

The document now matches the current project state and points readers to the design spec clearly.

Based on learnings: Applies to README.md : Update README.md for significant feature changes.

LICENSE (1)

6-25: BSL parameterization looks consistent and complete.

Line 23 and Line 24 correctly define the conversion trigger and target license, and the use-grant language is explicit.

Comment thread .claude/skills/aurelio-review-pr/skill.md
Comment thread DESIGN_SPEC.md Outdated
…d Copilot

- Add input validation for OWNER/REPO and issue number extracted from PR bodies (command injection fix)
- Add XML delimiters and untrusted-data instructions for issue context passed to sub-agents (prompt injection fix)
- Remove incorrect subject_type field from PR comments extraction
- Clarify Mem0 relationship: it's a memory layer on top of backends, not a backend itself
- Standardize agent level casing in routing rules to match Section 3.1/3.2 conventions
- Add language identifiers to all fenced code blocks (MD040)
- Add blank lines around headings and fenced code blocks (MD022/MD031)
@Aureliolo Aureliolo merged commit 8669a09 into main Feb 27, 2026
1 check was pending
@Aureliolo Aureliolo deleted the feat/design-spec branch February 27, 2026 13:48
Aureliolo added a commit that referenced this pull request Mar 5, 2026
…eviewers

Source changes (tracker.py):
- Fix budget_used_percent/alert_level threshold disagreement (#1)
- Add logger.warning before ValueError in _validate_time_range (#2)
- Split build_summary into _build_agent_spendings, _build_dept_spendings,
  _build_budget_context helpers to meet <50 line guideline (#3)
- Replace _aggregate tuple return with _AggregateResult NamedTuple (#4)
- Rewrite _aggregate as single-pass loop (#5)
- Aggregate departments from AgentSpending objects, not raw records (#6)
- Add DEBUG log in __init__ for object creation (#7)
- Fix module docstring "Section 10.2 service layer" accuracy (#15)
- Add DEBUG entry logs for get_total_cost/get_agent_cost (#16)

Event constants (events.py):
- Add BUDGET_TRACKER_CREATED, BUDGET_TOTAL_COST_QUERIED,
  BUDGET_AGENT_COST_QUERIED, BUDGET_TIME_RANGE_INVALID

Test improvements (test_tracker.py):
- Assert budget_used_percent in all alert-level tests (#8)
- Assert budget_total_monthly in configured summary test (#9)
- Add start==end raises ValueError test (#10)
- Add end-only time filter test for get_total_cost (#11)
- Add multi-record token aggregation assertions (#12)
- Add pytestmark = pytest.mark.timeout(30) (#13)
- Move @pytest.mark.unit to class level (#14)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aureliolo added a commit that referenced this pull request Mar 6, 2026
…t, Gemini, and greptile

- Add produced_artifacts field to AgentRunResult (#1)
- Wrap _log_completion in try/except to preserve valid results (#2)
- Add test for inner TimeoutError propagation without engine timeout (#3)
- Extract _run_loop_with_timeout from _execute (50-line limit) (#4)
- Extract _validate_run_inputs from run() (50-line limit) (#5)
- Rename metrics docstrings from "completed task" to "agent run" + add termination_reason to metrics event (#6)
- Fix raise exc from build_exc chain direction (#7)
- Replace asyncio.wait_for with asyncio.wait for timeout disambiguation (#8)
- Add test for _apply_post_execution_transitions failure resilience (#9)
- Add test for timeout cost recording behavior (#10)
- Fix hardcoded from_status in transition logs (#11)
- Add agent-task ownership check in _validate_task (#12)
- Split test_invalid_timeout_raises into two test methods (#13)
- Add negative validation tests for cost_per_task/duration_seconds (#14)
- Add test_blank_task_id_rejected (#15)
- Update _execute docstring to mention timeout, transitions, metrics (#16)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aureliolo added a commit that referenced this pull request Apr 27, 2026
…tion size, test gaps)

Round of fixes from /aurelio-review-pr:

Gemini high #1 (etag.py): full-body buffering risked memory exhaustion
on large responses and dropped buffered data when streams ended without
a final more_body=False chunk. The middleware now detects more_body=True
on the first body chunk and switches to pass-through mode (no ETag, no
buffering); single-chunk responses keep the buffered ETag path. The
truncated-stream fallback now flushes the captured start + buffered
body so the client sees a clean close.

Gemini high #2 (server.py + lifecycle.py + deployment.md): bumped uvicorn
timeout_graceful_shutdown 30 to 60 to cover the ~51s realistic worst case
(25s drain + 26s service teardown). Recommended k8s
terminationGracePeriodSeconds bumped 45 to 60 with matching docs/yaml
updates.

Gemini medium (drain.py): removed asyncio.Lock around the inflight
counter and idle event. Single-threaded asyncio + zero awaits between
the increment and event ops makes the lock pure overhead per HTTP
request. Module docstring updated to spell out the invariant.

CodeQL high (test_collector_async_init.py): two os.open wrapper stubs
defaulted to mode=0o777. Bumped to 0o600 to mirror production
collector.py and silence the world-writable warning.

silent-failure-hunter SEC-1 (dispatcher.py + slack/ntfy/email adapters
+ lifecycle_builder.py): replaced error=str(exc) with
error_type=type(exc).__name__ + safe_error_description(exc) on every
notification credential path so webhook URLs / SMTP creds cannot leak
into log sinks. The TaskGroup ExceptionGroup logger no longer
f-strings the raw exception list either; it logs sorted error_types.
Phase 2 auto-wire's logger.exception (which can hold operator settings
in frame locals) is now logger.error with the same safe pattern.

silent-failure-hunter MEDIUM (drain.py): _send_drain_response wraps
both ASGI send calls in try/except so a client disconnect during
shutdown logs at debug instead of bubbling out of the lifespan layer.

conventions-enforcer (etag.py): _emit_response was 75 lines (over the
50-line limit). Split into _emit_passthrough / _emit_not_modified /
_emit_with_etag, plus extracted _handle_body_message to keep
ETagMiddleware.__call__ under the C901 complexity cap.

frontend-reviewer (web/src/stores/meetings.ts + approvals.ts): dispose()
now resets module-level request-seq counters and pending-transition
sets, so stale-response checks and optimistic transitions cannot leak
across tests when a different test exercises these stores via a
component import.

docs-consistency (docs/design/notifications.md): documented the new
async start() / __aenter__ / __aexit__ NotificationSink lifecycle
contract, the lifecycle-lock idempotence guarantee, and the
SEC-1-safe error-logging conventions in the dispatcher.

New tests:
- test_etag.py: 304 strips Content-Type/Content-Length, existing
  Cache-Control is not overwritten, streaming response skips ETag
  with no buffering, mid-stream return flushes captured start +
  buffered body.
- test_request_drain.py: lifespan.shutdown drains BEFORE the inner
  app sees the message; inflight counter decrements when inner app
  raises.
- test_collector_async_init.py: start() falls back to a generated
  UUID when asyncio.wait_for times out (5s deadline contract); the
  fallback path logs using_generated_id=True and does not emit the
  TELEMETRY_DEPLOYMENT_ID_LOADED event.

Refs #1600
Aureliolo added a commit that referenced this pull request Apr 27, 2026
…tion size, test gaps)

Round of fixes from /aurelio-review-pr:

Gemini high #1 (etag.py): full-body buffering risked memory exhaustion
on large responses and dropped buffered data when streams ended without
a final more_body=False chunk. The middleware now detects more_body=True
on the first body chunk and switches to pass-through mode (no ETag, no
buffering); single-chunk responses keep the buffered ETag path. The
truncated-stream fallback now flushes the captured start + buffered
body so the client sees a clean close.

Gemini high #2 (server.py + lifecycle.py + deployment.md): bumped uvicorn
timeout_graceful_shutdown 30 to 60 to cover the ~51s realistic worst case
(25s drain + 26s service teardown). Recommended k8s
terminationGracePeriodSeconds bumped 45 to 60 with matching docs/yaml
updates.

Gemini medium (drain.py): removed asyncio.Lock around the inflight
counter and idle event. Single-threaded asyncio + zero awaits between
the increment and event ops makes the lock pure overhead per HTTP
request. Module docstring updated to spell out the invariant.

CodeQL high (test_collector_async_init.py): two os.open wrapper stubs
defaulted to mode=0o777. Bumped to 0o600 to mirror production
collector.py and silence the world-writable warning.

silent-failure-hunter SEC-1 (dispatcher.py + slack/ntfy/email adapters
+ lifecycle_builder.py): replaced error=str(exc) with
error_type=type(exc).__name__ + safe_error_description(exc) on every
notification credential path so webhook URLs / SMTP creds cannot leak
into log sinks. The TaskGroup ExceptionGroup logger no longer
f-strings the raw exception list either; it logs sorted error_types.
Phase 2 auto-wire's logger.exception (which can hold operator settings
in frame locals) is now logger.error with the same safe pattern.

silent-failure-hunter MEDIUM (drain.py): _send_drain_response wraps
both ASGI send calls in try/except so a client disconnect during
shutdown logs at debug instead of bubbling out of the lifespan layer.

conventions-enforcer (etag.py): _emit_response was 75 lines (over the
50-line limit). Split into _emit_passthrough / _emit_not_modified /
_emit_with_etag, plus extracted _handle_body_message to keep
ETagMiddleware.__call__ under the C901 complexity cap.

frontend-reviewer (web/src/stores/meetings.ts + approvals.ts): dispose()
now resets module-level request-seq counters and pending-transition
sets, so stale-response checks and optimistic transitions cannot leak
across tests when a different test exercises these stores via a
component import.

docs-consistency (docs/design/notifications.md): documented the new
async start() / __aenter__ / __aexit__ NotificationSink lifecycle
contract, the lifecycle-lock idempotence guarantee, and the
SEC-1-safe error-logging conventions in the dispatcher.

New tests:
- test_etag.py: 304 strips Content-Type/Content-Length, existing
  Cache-Control is not overwritten, streaming response skips ETag
  with no buffering, mid-stream return flushes captured start +
  buffered body.
- test_request_drain.py: lifespan.shutdown drains BEFORE the inner
  app sees the message; inflight counter decrements when inner app
  raises.
- test_collector_async_init.py: start() falls back to a generated
  UUID when asyncio.wait_for times out (5s deadline contract); the
  fallback path logs using_generated_id=True and does not emit the
  TELEMETRY_DEPLOYMENT_ID_LOADED event.

Refs #1600
Aureliolo added a commit that referenced this pull request Apr 27, 2026
…uild (#1650)

## Summary

Two real CI regressions on the last 4 main commits, both rooted in
changes that landed earlier this month and surfaced quietly. Plus the
downstream cascade they caused.

### #1 — `Publish Fine-Tune (cpu)` digest-mismatch (commit `8178b500`)

`docker manifest push` stdout was being grepped for
`sha256:[0-9a-f]{64}` to capture the manifest digest for cosign signing.
That output is not stable across Docker versions: some emit the digest,
some only print `Created manifest list <ref>`. On a runner whose Docker
happened to print only the latter, the grep returned empty ->
`DIGEST=""` -> the cosign sign step failed with `Push step did not
produce a digest -- cannot sign image`.

**Regressed in PR #1542 (2026-04-24)** when `publish-image` was
extracted into a composite action. Before then, the inline code in
`docker.yml` did `DIGEST="$PUSH_OUT"` (capturing whole stdout). The grep
extraction was a hardening attempt that became a foot-gun on Docker
versions that don't emit the digest.

**Fix:** Switch all three manifest-push sites
(`publish-image/action.yml`, `publish-apko-base/action.yml`,
`docker.yml`) to read the digest back from the registry via `docker
buildx imagetools inspect <ref> --format '{{.Manifest.Digest}}'`. That
command is deterministic across Docker versions and returns the
canonical manifest list digest -- exactly what cosign +
attest-build-provenance expect. buildx is preinstalled on
`ubuntu-latest` runners so no setup step is needed.

### #2 — DAST `ZAP API Scan` BASE_IMAGE blank (commit `8178b500`)

PR #1465 dropped the `:latest` default for `BASE_IMAGE` in
`docker/backend/Dockerfile` to satisfy Scorecard Pinned-Dependencies, on
the premise that "CI always provides BASE_IMAGE". DAST runs `docker
compose -f docker/compose.yml build backend` without supplying it, and
`compose.yml` had no `args:` mapping for `BASE_IMAGE`. The very next
weekly DAST cron failed with `failed to solve: base name (${BASE_IMAGE})
should not be blank`.

**Fix:**
- `docker/compose.yml`: add `build.args.BASE_IMAGE: ${BASE_IMAGE}`
pass-through to the backend service. Empty/unset still fails fast at
Dockerfile parse, preserving the Pinned-Dependencies invariant -- no
`:latest` default reintroduced.
- `.github/workflows/dast.yml`: build the apko-composed Wolfi base
in-workflow via the existing `build-apko-base` composite action, then
pass its `sha-<short>` tag (pinned by definition of the running commit,
no floating registry tag) through the new compose `build.args` mapping.
Adds the `security-events: write` scope required by `build-apko-base`
for Trivy SARIF upload.
- `.github/workflows/docker.yml` (validate-compose): add
`BASE_IMAGE=ci-validation-dummy:placeholder` to the dummy `docker/.env`
so `docker compose config --quiet` doesn't emit a "variable not set"
warning every CI run. The Dockerfile is never built during `compose
config`, so the placeholder is never consumed.

### #3 — `Update Release Notes` cascade (commit `bba4aa37`)

Direct downstream of #1: fine-tune cpu/gpu publish failed -> dev release
tag was never created -> the release-notes step couldn't find
`v0.7.4-dev.15` after 6 attempts. Resolves automatically once #1 lands.

### #4 — CodSpeed neutral on `074a8f11`

Out of scope here. Already tracked as #1640 (investigation issue opened
today).

## Test plan

- `docker buildx imagetools inspect --format '{{.Manifest.Digest}}'` is
the documented way to read a manifest list digest from the registry,
works for both single-arch (fine-tune-cpu) and multi-arch (backend,
sandbox, sidecar, fine-tune-gpu) manifests.
- DAST workflow: the `build-apko-base` composite action does `docker
load` of the built tarball and re-tags to
`ghcr.io/aureliolo/synthorg-backend-base:sha-<short>`, so the subsequent
`docker compose build backend` resolves `BASE_IMAGE` from the local
Docker daemon without going to the registry. Same flow that backend /
sandbox / sidecar / fine-tune builds already use in `docker.yml`.
- `validate-compose` job continues to pass; warning is silenced.
- Pre-commit hooks pass (no Python / Go / Dockerfile changes; YAML /
commitizen / gitleaks / no-em-dashes all green).
- Pre-PR review: 2 agents (`docs-consistency`, `infra-reviewer`). 1
valid finding (validate-compose warning), addressed in the second
commit.

## Review coverage

Pre-reviewed by 2 agents, 1 finding addressed.

## Files

- `.github/actions/publish-image/action.yml` — buildx imagetools digest
read
- `.github/actions/publish-apko-base/action.yml` — buildx imagetools
digest read
- `.github/workflows/docker.yml` — buildx imagetools digest read in
inline manifest path; placeholder BASE_IMAGE in validate-compose
- `.github/workflows/dast.yml` — build apko base in-workflow, pass
BASE_IMAGE through; add `security-events: write` scope; hoist
`TRIVY_VERSION` to workflow env
- `docker/compose.yml` — `args.BASE_IMAGE: ${BASE_IMAGE}` pass-through
on the backend service
Aureliolo added a commit that referenced this pull request May 1, 2026
…ace, list pages, missing surfaces (#1692)

Closes #1687.

## Summary

Bundle of ~140 dashboard/UX findings from the 2026-04-30 audit, plus the
two user-reported critical-path symptoms:

1. **"Could not apply template" trapped state.** The setup wizard's
tier-coverage error now routes the operator back to the Providers step
via a discriminated `error_code = 2004`
(`PROVIDER_TIER_COVERAGE_INSUFFICIENT`). Apply Template is gated
client-side until at least one provider has at least one model; an
inline help banner explains why the button is disabled before the user
has to click. Single in-flight guard prevents double-submit. AgentsStep
surfaces a banner for any agent referencing a missing provider/model
with the same "Go back to Providers step" affordance.

2. **Auth-on-reload glitch.** `handleUnauthorized` has a one-shot
redirect guard so a burst of concurrent 401s during reload triggers
exactly one redirect + websocket disconnect (not N flickers). Backend
distinguishes `auth.session.no_token = 1008` from `auth.session.expired
= 1009` so the dashboard can render login silently on a fresh page load
and toast "Session expired" only when a session actually lapsed.
`LoginPage` setup-status check logs structured context on failure.

## What changed

- **Setup wizard (10/10):** Tier-coverage error code recognition +
Go-back action; client-side gate; double-click guard; backend reword +
new `ProviderTierCoverageInsufficientError`; AgentsStep trapped-state
affordance; password-policy retry+timeout in AccountStep; Quick Setup
auto-covered; completion-error already single-state; template-fetch
retry already gated.
- **Auth-on-reload race (3/3):** One-shot redirect guard; LoginPage
structured logging; backend 401 detail-string discriminator
(`SESSION_NO_TOKEN` / `SESSION_EXPIRED`) extracted into
`synthorg.api.auth_response_discriminator` so `exception_handlers.py`
stays under the 800-line limit. Unrecognised detail strings emit a WARN
log so middleware/handler divergence doesn't silently fall through to
generic UNAUTHORIZED.
- **List-page UX (9/9):** ArtifactsPage / ConnectionsPage / UsersPage
adopt `SearchFilterSort` + `Pagination` + `useListPagination` (UsersPage
gets a client-side search filter); EscalationQueuePage gains a priority
filter (Critical/High/Standard buckets); TaskBoardPage has an
active-filter chip strip with click-to-remove; WorkflowsPage shows the
Esc-to-clear shortcut via `KeyboardShortcutHint`; all 7 detail pages
(Artifact / Agent / Task / Project / Client / Meeting / Provider) gain
prev/next navigation via the new `useDetailNavigation` hook +
`DetailNavBar` component (J/K + arrow keys; self-hides on deep links).
- **Missing UI surfaces (9/9):** New routes for workflow executions
(with Cancel), workflow versions (diff + rollback via
`VersionHistorySection`), webhook receipts inspector, coordination
metrics, meta-analytics, personalities admin, plus budget / company /
evaluation config version-history pages. Two new typed endpoint modules
(`workflow-executions.ts`, `webhooks.ts`) mirror the backend
`WorkflowExecution` and `WebhookReceipt` Pydantic models
field-for-field; matching MSW handlers register in `handlers/index.ts`.
- **Loading / empty / error states (7/7):** `EmptyState` in
PerformanceMetrics + meeting-minutes sub-components;
`ArtifactContentPreview` image-load failure logs structured context;
`ClientDetailPage` satisfaction-history error surfaced; `TrainingPage`
error logging restructured to preserve status code + error type (was
flattened by `sanitizeForLog`).
- **TS strictness:** New `web/src/utils/type-guards.ts` (`isObject` /
`hasKey` / `isArrayOf` / `isString` / `isFiniteNumber` / `parseOrNull` /
etc.) with full unit test coverage. Replaced unsafe `as` casts in
`yaml-to-nodes.ts`, `useWorkflowEditorCallbacks.ts`,
`workflow-editor/persistence.ts`, and the agents / approvals /
notifications stores' WS handlers.
- **Performance (7/7):** `ArtifactContentPreview` `<img>` width/height
(LCP); `ProviderGridItem` + `CeremonyRow` wrapped in `React.memo`;
`CodeEditorPanel` -> `LazyCodeMirrorEditor`; `SettingsSinksPage`
per-field selectors. notifications timer cleanup, theme.reattach
idempotency, and websocket heartbeat closure release were already in
place — verified.
- **Responsive (7/7):** `AgentSpendingTable` horizontal scroll;
`ConnectionFormModal` + `ProviderFormModal` + `TaskCreateDialog`
responsive `max-h`; `ChannelSidebar` hides below `lg:` (1024px);
`WorkflowEditorPage` canvas height responsive (9rem allowance below
`md:`).
- **UX consistency (4/4):** `WorkflowCard` type label uses `StatPill`
(consistent with ArtifactCard); `ConnectionCard` actions consolidated to
top-right (no split top-right + bottom); `CheckpointTable` uses
`StatusBadge` `label` prop. `WorkflowCard` action placement already
matches the per-card menu pattern used elsewhere — verified.
- **Accessibility (2/2):** `ProviderHealthBadge` declares `role="img"`
explicitly when `aria-label` is set; `NotificationItemCard` uses
semantic `<button>` for the main click target with the per-row icon
buttons as siblings (no nested-button HTML).
- **i18n (out of scope):** SynthOrg won't ship translations. The
"centralize 47 strings" checkbox was dropped;
`docs/internal/i18n-strings.{md,json}` +
`scripts/inventory_i18n_strings.py` deleted;
`.claude/skills/codebase-audit/SKILL.md` agent 37 reprompted; new
`docs/design/internationalization.md` documents the no-translation
stance and reversibility trigger; #1417 will be closed with a comment
after this lands. CLAUDE.md spelling rule corrected to International /
British English.

## Test plan

- `npm --prefix web run type-check` — green
- `npm --prefix web run lint` — green
- `npm --prefix web run test` — **2966 passed (243 files)**, including
new specs for type-guards (28), useDetailNavigation (6), DetailNavBar
(6), 9-surface smoke tests (9), AgentsStep trapped-state banner (4),
setup-wizard error-code capture + in-flight guard (4), auth-store with
one-shot guard reset.
- `uv run mypy src/ tests/` — green
- `uv run python -m pytest tests/ -m unit -n 8` — green when run
by-module / sequentially (16,660+ pass). The pre-push parallel-runner
intermittently shows 6-8 xdist worker crashes per run on a different
random subset of unrelated tests each time — these don't reproduce
manually and don't touch any file in this PR. Pushed with `--no-verify`
after confirming green sequential runs (per the documented Windows xdist
workaround).
- New backend pytest cases: 4 in `test_exception_handlers.py` covering
the 401 detail-string discriminator (`Missing authentication` / `Invalid
session cookie` / `Invalid JWT token` / unrecognised → fallthrough);
`test_setup_company_template_gating.py` updated to assert on
`error_detail.error_code == 2004` instead of substring matching.

## Review coverage

Pre-reviewed by 13 agents (`/pre-pr-review`). 18 findings raised, 16
implemented in this PR; 2 were already-resolved or false-positives.
Triage table at `_audit/pre-pr-review/triage.md`.

CRITICAL fixes from review:
- **API contract drift** — `WorkflowExecution` / `WebhookReceipt`
frontend types now mirror backend Pydantic models 1:1 (field renames:
`workflow_id` -> `definition_id`, `triggered_by` -> `activated_by`,
`error_message` -> `error`; status union corrected).
`listWorkflowExecutions` / `listWebhookActivity` now unwrap a flat list
(matches backend `ApiResponse[list[T]]`) instead of an envelope wrapper
that did not exist.

MAJOR fixes from review:
- New MSW handlers for workflow-executions and webhooks endpoints (per
`web/CLAUDE.md` 1:1-mirror rule).
- Type-guard module + DetailNavBar + useDetailNavigation gain unit
tests.
- Smoke tests for the 9 new pages.
- `AccountStep.fetchPolicy` retry loop logs every attempt at the same
level (single ERROR with structured `attempts` array) instead of
WARN-then-ERROR.
- `_discriminate_unauthorized` extracted to its own module to keep
`exception_handlers.py` registry under the 800-line limit; fallthrough
logs WARN.

MEDIUM polish: `ActiveFilterChips` extracted to a named props interface;
`MetaAnalyticsPage` shows fine-grained per-resource error banners;
`WebhookReceiptsPage` table swaps the bogus `bytes` column for
`processed_at`; `companyErrorCode` docstring documents its ephemeral
lifecycle.

## Out of scope

- Backend pagination plumbing (#2)
- API rate-limit polish (#5)
- Real i18n / translation framework (would be #1417 if it were ever
opened)

## Manual reproduction

- **Setup wizard "Could not apply template" trapped state**: log in
fresh, skip provider configuration, reach Company step → Apply Template
is disabled with an inline ErrorBanner explaining why and offering "Go
back to Providers step." Click → land on Providers step. Add a model →
return → Apply succeeds.
- **Auth-on-reload glitch**: log in, navigate to `/dashboard`, clear
cookies in DevTools, reload → single clean redirect to `/login`, no
flash of dashboard chrome, no toast spam, structured `log.error` from
LoginPage's setup-status path visible in the console.
Aureliolo added a commit that referenced this pull request May 3, 2026
The SsrfViolationService recorded mutations on the api.* event namespace
which excludes them from the signed audit chain. Switch to
SECURITY_SSRF_VIOLATION_RECORDED for create and route the resolution
event to SECURITY_SSRF_VIOLATION_ALLOWED / _DENIED based on the new
status so a single audit-chain reader can reconstruct the WHO+WHEN of
allow vs deny without unpacking the payload.

The two read-side events (API_SSRF_VIOLATION_LISTED,
API_SSRF_VIOLATION_FETCH_FAILED) stay on the api.* namespace because
list/fetch failures carry no audit-chain implication. The two now-unused
mutation constants are dropped from observability/events/api.py.

Refs #1733 (#2)
Aureliolo added a commit that referenced this pull request May 4, 2026
Closes #1733.

Bundles every Bucket A item from the 2026-05-03 codebase audit
(mechanical / pattern-replace / add-a-call / add-a-constraint) into one
mega-PR. Per the issue, this is one squash-merge unit; the commit log
inside the branch keeps each topic independently reviewable.

## Acceptance items (21/21 RESOLVED)

### Sub-section 1 — Audit-chain + security misroutes
- **#2 SSRF audit-chain misrouting** — `record()` and `update_status()`
now log under `SECURITY_*` constants; the resolution branch picks
`SECURITY_SSRF_VIOLATION_ALLOWED` / `SECURITY_SSRF_VIOLATION_DENIED`.
Failed resolutions emit a dedicated
`SECURITY_SSRF_VIOLATION_RESOLUTION_FAILED` so SIEM filters cannot
misclassify them as actual decisions. The two `API_*` constants moved
off the audit chain were dropped.
- **#10 Rate limits** — added `oauth.initiate` (10 req / 60s per user)
and `settings.import` (5 req / 3600s per user) to `RATE_LIMIT_POLICIES`;
both routes now apply `per_op_rate_limit_from_policy`.
- **#30 TSA URLs http→https** — DigiCert and Sectigo defaults switched.

### Sub-section 2 — Concurrency + lifecycle locks
- **#6 Race conditions** — `DelegationCircuitBreaker` (sync
`threading.RLock` for the hot path), `TrustService` (`asyncio.Lock`
across the read-modify-write of `_trust_states` + `_change_history`),
and the in-memory memory backend (`_store_lock` separate from
`_connect_lock`).
- **#7 Lifecycle locks** — installed `_lifecycle_lock` on `Worker` and
converted `ApprovalTimeoutScheduler.start()` to async with the canonical
pattern. `ApprovalTimeoutScheduler.stop()` now accepts a `timeout`
argument and sets the unrestartable `_stop_failed` flag on drain timeout
so a subsequent `start()` raises rather than spawning a duplicate task.
- **#9 tar.extractfile leaks** — both sites wrapped in `with` so the
file handle closes deterministically.

### Sub-section 3 — Persistence + migrations + N+1
- **#8 Migration parity** — added `CHECK (json_valid(...))` constraints
on `provider_audit_events.payload` and the three nullable
`preset_overrides` JSON columns via Atlas migration
`20260503181821_json_check_constraints.sql`. Postgres schema unchanged;
JSONB already enforces validity.
- **#13 N+1 batched** — added `save_many` to `ApprovalRepository`
(SQLite + Postgres). `ApprovalStore` collects expired items into
`to_persist` then issues one batched write; expire-callback failures log
at ERROR.
- **#24 Dual-backend conformance** — extended the approval conformance
with `save_many` (round-trip / empty / upsert / duplicate-id batch). The
audit's "9 missing repos" was wrong: investigation showed all nine are
already covered under
`tests/conformance/persistence/{test_auth_repositories,test_connection_repositories,test_ontology_repositories}.py`.

### Sub-section 4 — Settings wiring + Clock seam
- **#16 BackupService + ApprovalTimeoutScheduler** — wired into the
lifespan startup / shutdown. Investigation confirmed 11 of the 13
audit-flagged "dead" settings are already consumed; only `BackupService`
and `ApprovalTimeoutScheduler` were unwired and now are.
- **#17 Clock seam** — injected `clock: Clock | None = None` into 14
time-reading sites: `a2a/well_known.py`,
`api/controllers/{health,events}.py`,
`api/services/idempotency_service.py`, `api/state.py`
(`AppState.clock`),
`communication/bus/{memory,_nats_state,_nats_receive}.py`,
`engine/workflow/ceremony_scheduler.py`,
`integrations/health/checks/database.py`,
`memory/retrieval/hierarchical/workers.py` (3 worker classes),
`meta/validation/ci_validator.py`, `security/uncertainty.py`,
`tools/sandbox/docker_sandbox.py`.

### Sub-section 5 — Pydantic + pagination + magic numbers + Prometheus
- **#18 `extra="forbid"` sweep** — 489 ConfigDicts hardened across the
listed domains; 46 carve-outs preserved on classes that declare
`@computed_field` (Pydantic v2 `model_dump()` includes computed values,
so strict-extra would reject the round trip).
- **#19 Mandatory `limit`** — `limit: int = 100` (was `int | None =
None`) on every persistence list / query / `list_by_*` method (40 sites
across 26 files); the missing-limit check was removed.
- **#22 Magic numbers** — extracted module-level constants in
`engine/routing/scorer.py`, `engine/quality/graders/heuristic.py`, and
`templates/model_matcher.py`.
- **#12 Prometheus `tool_name`** — bounded via
`_LabelSnapshot.tool_names`, refreshed on each
`PrometheusCollector.refresh()`. `validate_tool_name` rejects unknown
values; the registry-fetch path narrows its swallow to `AttributeError /
TypeError / ValueError` and logs at ERROR.

### Sub-section 6 — Frontend + docs + consolidation
- **#21 Cobra Long + Example** — filled on 14 commands.
- **#28 `REJECTION_REASON_REQUIRED`** — extracted to
`web/src/pages/approvals/errors.ts`; 3 inline copies replaced.
- **R4 Normalisation helpers** — `strip_trailing_slash`,
`normalize_optional_string`, `normalize_path` in
`core/normalization.py`; 17 call sites migrated.
- **R8 Web dispatch consolidation** — `getNodeLabel` extracted to
`web/src/pages/org/node-utils.ts`; `budget.ts` replaced two parallel
switch statements with a `DIMENSION_RESOLVERS` strategy table.
- **R9 Tag dedup** — `deduplicate_tags` in `synthorg.memory.utils`;
`MemoryMetadata`, `MemoryQuery`, and `ProcedureLearningRecord` now call
the shared helper.
- **R11 Wrapper removal** — deleted `_group_records_by_agent`; callers
import `group_by_agent` directly.

## Pre-PR review — agent findings addressed in this PR

21 review agents ran on this branch. Critical / Major / Medium findings
all fixed in the final commit:

- **C1 / scheduler stop_failed** — drain timeout now sets
`_stop_failed`; outer `_try_stop` budget exceeds the inner so the
unrestartable guard fires before cancellation.
- **C2 / circuit-breaker TOCTOU** — `check()` runs the entire
OPEN-branch decision under `_state_lock`; the old split between
`get_state()` and the post-hoc cooldown read is gone.
- **C3 / SSRF success-event misrouting** — failure paths route to a
dedicated `SECURITY_SSRF_VIOLATION_RESOLUTION_FAILED`.
- **M1 / trust read outside lock** — `apply_trust_change` holds
`_state_lock` from the read through the `change_history` append.
- **M2 / silent expire-callback failure** — logs at ERROR with
structured fields.
- **M3 / mock-spec gate** — lifecycle test mocks now declare `spec=`.
- **M4 / Prometheus snapshot fixture** — promoted to session scope so
cross-file ordering under xdist `loadfile` cannot leave an empty
snapshot in place.
- **M5 / stale module docstring** — updated to mention the new failure
event.
- **Md1–Md7** — narrowed exception types in `_fetch_tool_names`; split
backup-manifest extraction into archive-corrupt / json-parse-failed /
schema-validation-failed / io-error categories; lifecycle startup
distinguishes `RuntimeError`; SSE revalidate iteration cap raised with
rationale; `save_many` empty-input post-condition asserted;
`_list_from_repo_locked` docstring expanded; CLAUDE.md and
`docs/reference/conventions.md` updated to reflect the audit-imposed
conventions.

False-positive findings (PEP 758 `except A, B:` flagged as a syntax
error in three agents, "missing `limit` arg" in one, "missing Postgres
peer migration" in one) were classified INVALID with reasoning recorded
in `_audit/pre-pr-review/triage.md`.

## New regression coverage

- `tests/unit/security/timeout/test_scheduler_lifecycle_locks.py` —
concurrent `start()` consolidates on a single task, `stop(timeout=...)`
sets `_stop_failed` on drain timeout, `start()` after `_stop_failed`
raises `RuntimeError`, `stop(timeout=0)` rejects.
- `tests/unit/communication/loop_prevention/test_circuit_breaker.py` —
TOCTOU regression: the OPEN branch acquires `_state_lock` exactly once,
and a cooldown elapsing mid-check still produces a stable result.
- `tests/conformance/persistence/test_json_constraints_sqlite.py` —
SQLite-only `IntegrityError` checks for the new `CHECK json_valid`
constraints (Postgres arm auto-skips since JSONB validates implicitly).
- `tests/conformance/persistence/test_approval_repository.py` — added
`save_many` duplicate-id-within-batch test (settles to last) and
post-condition assertion on the empty-batch no-op.

## Verification

- `uv run ruff check src/ tests/` — clean
- `uv run ruff format src/ tests/` — clean
- `uv run mypy src/ tests/` — clean (3,630 source files)
- `uv run python -m pytest tests/ -m unit -n 8` — 26,460 passed, 16
skipped (platform-only)
- `npm --prefix web run lint` — clean
- `npm --prefix web run type-check` — clean
- `npm --prefix web run test` — 2,994 passed
- `go -C cli vet ./...` — clean
- `go -C cli test ./...` — clean
- `go -C cli build ./...` — clean

## Notes for the reviewer

- Issue #1733 framed the work as 6 PR-sized buckets but explicitly opted
for one squash-merge. Sub-section commits are listed above for
navigation.
- Audit-correction calls (already noted in the issue body): #16 (11 of
13 settings already wired; only 2 bootstrap hooks were truly missing),
#24 (the 9 listed repos already had dual-backend conformance), and the
per-section path corrections.
Aureliolo added a commit that referenced this pull request May 4, 2026
14 valid findings from 18 review agents:

CRITICAL #1: restore issue-spec janitor defaults (5min interval + 24h idle TTL) in synthorg.settings.definitions.communication and lifecycle_helpers fallback constants. CRITICAL #4: fix broken anchor link in docs/api/layer.md by adding an Auth section to docs/api/core.md that documents synthorg.core.auth.{config,models,session,refresh_record,roles}.

HIGH #2: move sub.last_active=now mutation back inside _lock so the publish path matches the _Subscriber dataclass docstring's locking-invariant claim. HIGH #3: wrap EventStreamHub._janitor_loop body in try/except, re-raise CancelledError/MemoryError/RecursionError, log every other failure under new EVENT_STREAM_HUB_JANITOR_FAILED constant and continue. HIGH #8: add test_stop_timeout_marks_unrestartable using monkeypatch on asyncio.wait_for to deterministically exercise the timeout branch.

MEDIUM #5: add 5 new event-stream-hub event-constant rows (STARTED/STOPPED/STOP_TIMEOUT/JANITOR_PRUNED/JANITOR_FAILED) to docs/design/observability.md events table. MEDIUM #6: link new docs/reference/retry-patterns.md from CLAUDE.md reference index. MEDIUM #7: document the two new event-stream janitor settings in observability.md Runtime Settings. MEDIUM #9: add test_subscribe_initialises_last_active. MEDIUM #10: add parametrised test_constructor_rejects_invalid_arguments.

INFO #11/#12/#13/#14: useChannelHandler JSDoc note about useCallback wrapping + inline cleanup-ordering comment; retry-patterns.md maintainer note about updating per-pattern Sites lists; _Subscriber dataclass comment documenting the frozen-by-default deviation.
Aureliolo added a commit that referenced this pull request May 4, 2026
## Summary

Migrates every plain-`Exception` class in `src/synthorg/` to a
`DomainError`-rooted hierarchy and ships
`scripts/check_domain_error_hierarchy.py` -- an AST gate that prevents
new violations. The gate fails closed: any new class with a direct
`Exception` / `RuntimeError` / `LookupError` / `PermissionError` /
`ValueError` / `TypeError` / `KeyError` / `IndexError` /
`AttributeError` / `OSError` / `IOError` base must either reach
`DomainError` via another base or carry a per-line `# lint-allow:
domain-error-hierarchy -- <reason>` opt-out.

The gate landed alongside the migration; the frozen baseline is
**empty**, so `--no-baseline` exit 0 on first commit.

Closes #1738.

## Migrations

**Cited families (Sub-tasks 2A-2D)**:
- `backup/errors.py` -- `BackupError` + 8 subclasses;
`handle_backup_error` removed in favour of `handle_domain_error` reading
ClassVars.
- `config/errors.py` -- `ConfigError` chain (incl.
`ConfigValidationError` -> 422).
- `core/registry/errors.py` -- `StrategyFactoryError` chain.
- `templates/errors.py` -- auto-rooted via `ConfigError`.
- `hr/errors.py` (HRError + ~22 subclasses), `hr/scaling/errors.py`,
`security/trust/errors.py`.
- `settings/errors.py` -- `SettingNotFoundError` -> 404,
`SettingValidationError` -> 422.
- `meta/mcp/errors.py` -- `ArgumentValidationError` (->
`ValidationError`/422), `GuardrailViolationError` (->
`ForbiddenError`/403). The MCP-layer `domain_code` ClassVar
(`"invalid_argument"` / `"guardrail_violated"`) is preserved alongside
the new RFC 9457 metadata.

**5 version-controllers** -- Budget / Company / EvaluationConfig / Role
/ Workflow `VersionController` switch from
`Response(content=ApiResponse[...](error=...), status_code=...)` to
`raise NotFoundError / ValidationError / VersionConflictError`. The
`ApiResponse[T]` envelope shape is preserved on the wire; the
centralised `handle_domain_error` produces it via MRO dispatch.

**Sweep (Sub-task 3)**:
- `core/persistence_errors.py` -- `PersistenceError(Exception)` ->
`PersistenceError(DomainError)`. **Inner `VersionConflictError` renamed
to `PersistenceVersionConflictError`** to dispel the name collision with
`core.domain_errors.VersionConflictError`. Custom handlers
(`handle_record_not_found`, `handle_persistence_integrity_error`,
`handle_duplicate_record`, `handle_persistence_error`) stay registered
so persistence-layer 4xx responses keep their fixed public messages
instead of leaking record identifiers.
- ~16 scattered classes across `a2a/`, `api/auth/`, `api/cursor.py`,
`communication/{event_stream, mcp_errors, meeting/agent_caller}`,
`engine/quality/decomposers/llm.py`, `engine/strategy/principles.py`,
`engine/workflow/{blueprint_errors, service}.py`, `meta/{appliers,
errors}.py`, `telemetry/{privacy, reporters}.py`.
- 7 internal-sentinel classes carry per-line allowlist markers
(`TsaError`, MCP handler-local `_NotFoundError` / `_ConflictError`,
statistical Welch errors, dotted-path validators, internal
CAS/transaction-rollback sentinels) where stdlib bases are deliberate
per the docstrings.

## AST gate

- `scripts/check_domain_error_hierarchy.py` (~625 lines): two-pass AST
resolver -- pass 1 indexes every `class` definition under
`src/synthorg/`, pass 2 computes the closure of `(module, class)` pairs
reaching `DomainError`. Every direct stdlib base outside that closure is
a violation unless suppressed.
- Per-line opt-out: `# lint-allow: domain-error-hierarchy -- <non-empty
reason>` (regex-based to survive ruff-format multi-line class headers).
- Frozen baseline: `scripts/domain_error_hierarchy_baseline.txt`
(header-only on land; `--update-baseline` regenerates).
- Pre-push hook in `.pre-commit-config.yaml` (`stages: [pre-push]`).
- 27 unit tests in
`tests/unit/scripts/test_check_domain_error_hierarchy.py` covering
positive cases (DomainError direct subclass, intermediate, deeper MRO
chain, allowlist marker, baseline suppress, module-import alias),
negative cases (every forbidden stdlib base), baseline format, baseline
drift, `--no-baseline`, `--update-baseline`.

## Tests

**Pre-existing unit suite**: 26597 passed, 16 skipped. The `pytest
tests/ -m unit -n 8` baseline is preserved.

**New tests**:
- `tests/unit/scripts/test_check_domain_error_hierarchy.py` -- 27 tests
for the gate.
-
`tests/unit/api/test_exception_handlers.py::TestMigratedDomainErrorFamilies`
-- 14 parametrized e2e tests covering ClassVar -> HTTP mapping for every
migrated HR / Scaling / Trust / Settings / Config / registry family.
Catches `status_code` / `error_code` / `error_category` typos that the
AST gate cannot.
-
`tests/unit/api/controllers/test_workflow_versions.py::TestRollback::test_rollback_persistence_version_conflict_translates_to_409`
-- pins the `PersistenceVersionConflictError -> VersionConflictError`
translation in the workflow rollback controller (the only late-path
translation in the repo).
-
`tests/unit/api/controllers/test_workflow_versions.py::test_diff_same_version_returns_validation_error`
-- now asserts the full RFC 9457 envelope (error_code, error_category,
retryable), not just the status code.

## Documentation

- `CLAUDE.md` -- "Errors" bullet now lists the full 11-base forbidden
set and references the new gate + opt-out marker.
- `docs/reference/errors.md` -- new "Domain-error-hierarchy gate"
section.
- `docs/reference/conventions.md` Section 6 -- cross-links the new gate.

## Pre-PR review

Ran `/pre-pr-review` with 14 specialized agents. 12 valid findings: 2
CRITICAL + 5 MAJOR + 3 MEDIUM + 2 MINOR/INFO. **All implemented in the
second commit on this branch.**

- **CRITICAL #1** -- `TicketLimitExceededError` no longer leaks
`user_id` via `str(exc)` on the 429 response (the message is logged
server-side; the class `default_message` provides the user-safe
fallback).
- **CRITICAL #2** -- `CLAUDE.md` forbidden-bases list expanded from 5 to
11 entries to match what the gate actually enforces.
- **MAJOR #3** -- removed `# Issue #1738.` line from
`scripts/check_domain_error_hierarchy.py` and the baseline header (issue
refs belong in PR body, not source).
- **MAJOR #4** -- `workflow_rollback_service.py` docstrings updated to
reference the renamed `PersistenceVersionConflictError`.
- **MAJOR #5-7** -- test-coverage strengthening (full RFC 9457 envelope
assertions, e2e family-mapping tests, late-path translation regression
test).
- **MEDIUM #8** -- `ConfigError` + `ConfigValidationError` declare
instance attributes in the class body for mypy + IDE visibility.
- **MEDIUM #9** -- `ConstraintViolationError.status_code = 400` carries
a docstring note explaining why this deliberately diverges from the
2xxx-implies-422 convention (the dedicated
`handle_persistence_integrity_error` hardcodes 400).
- **MINOR/INFO #11-13** -- `WorkflowDefinitionRevisionMismatchError`
docstring notes the inherited 409, `_resolve_base` docstring documents
the deeply-chained-attribute edge case.

Reviewer agents with **zero findings**: conventions-enforcer,
logging-audit, persistence-reviewer, issue-resolution-verifier,
api-contract-drift, infra-reviewer.

## Test plan

- `uv run python -m pytest tests/ -m unit -n 8` -- 26597 passed, 16
skipped.
- `uv run python scripts/check_domain_error_hierarchy.py --no-baseline`
-- exit 0.
- `uv run mypy src/ tests/` -- 0 issues across 3648 files.
- `uv run ruff check src/ tests/ scripts/` -- clean.
- `uv run ruff format src/ tests/ scripts/ --check` -- 3684 files
already formatted.
- `uv run pre-commit run --hook-stage pre-push domain-error-hierarchy
--all-files` -- passes.

## Push note

Pre-push isolation regression gate (`pytest --count 2 -x -n 8` over the
affected suite) reports xdist worker crashes in 8 unrelated test files
(auth/postgres_session_store, controllers/{collaboration,
department_mutations, meetings, provider_health}, auth/{csrf,
lockout_store}, webhook_receipt_cleanup_loop). Manual single-pass green;
isolation-gate doubled-run on Python 3.14 + Windows ProactorEventLoop
crashes the worker, marking 16 of the new gate tests as collateral
damage. Filed #1755 with full details + suggested fixes. Pushed with
`--no-verify` per per-push authorization.

## Issue tracker

- Closes #1738.
- Tick the #1738 box on the master tracking issue #1735 after
squash-merge.
- See #1755 for the pre-push isolation gate flake (out of scope here).
Aureliolo added a commit that referenced this pull request May 9, 2026
- exception_handlers: `_clamp_log_value` accepts `float` (gemini #1)
- subworkflow_service: `SubworkflowHasParentsError.parent_ids` primitive
  projection so parent context surfaces in logs (gemini #2)
- check_no_controller_response_for_domain_errors: matcher catches
  qualified `litestar.Response` via `func_name.endswith('.Response')`
  (coderabbit #3)
- workflow_versions: log rollback context (`definition_id`,
  `target_version`) before raising `VersionConflictError` so the
  call-site context isn't lost when the centralised handler dispatches
  (coderabbit #4)
- errors.test.ts: 422 fixture uses `REQUEST_VALIDATION_ERROR` instead
  of `RATE_LIMITED` (coderabbit #5)
Aureliolo added a commit that referenced this pull request May 9, 2026
- exception_handlers: `_clamp_log_value` accepts `float` (gemini #1)
- subworkflow_service: `SubworkflowHasParentsError.parent_ids` primitive
  projection so parent context surfaces in logs (gemini #2)
- check_no_controller_response_for_domain_errors: matcher catches
  qualified `litestar.Response` via `func_name.endswith('.Response')`
  (coderabbit #3)
- workflow_versions: log rollback context (`definition_id`,
  `target_version`) before raising `VersionConflictError` so the
  call-site context isn't lost when the centralised handler dispatches
  (coderabbit #4)
- errors.test.ts: 422 fixture uses `REQUEST_VALIDATION_ERROR` instead
  of `RATE_LIMITED` (coderabbit #5)
Aureliolo added a commit that referenced this pull request May 10, 2026
Refs #1850.

This PR addresses sub-tasks **D + F + G + H + I** of #1850. Sub-tasks A
+ E shipped in #1843; the remaining five were orphaned. The umbrella
issue stays open until every sub-task is checked off, so this PR uses
`Refs #1850` (NOT `Closes`) to avoid re-triggering the auto-close
pitfall that closed #1778 prematurely.

## Sub-task D — Remove `openapi-typescript`

The 2026-04 audit framing was "migrate to openapi-typescript" but
verified state contradicted that intent: the dep was already installed,
only 2 files imported the bundle, and 31 hand-written type files
coexisted alongside it. Hand-written types win.

- Drop `openapi-typescript@7.13.0` from `web/package.json`
`devDependencies`.
- Drop the `generate-types` / `pretype-check` / `prebuild` npm scripts.
- Migrate `web/src/api/types/enums.ts` (16 enum types) to hand-written
`(typeof FOO_VALUES)[number]` union literals; runtime arrays become the
single source of truth.
- Migrate `web/src/api/endpoints/reports.ts` (3 types) to hand-written
shapes; values cross-checked against the Pydantic source at
`src/synthorg/budget/report_config.py:13` and
`src/synthorg/api/controllers/reports.py:35,47`.
- Regenerate `web/package-lock.json` (19 transitive deps removed).
- Drop the now-unused `web/src/api/types/generated.d.ts` line from
`.gitignore`.

## Sub-task F — `compare_ci` helper + callsite migration

The audit assumed `compare_ci` already existed; it didn't.

- Add `compare_ci(a: str, b: str) -> bool` to
`src/synthorg/core/normalization.py` (Unicode-safe via `casefold()`,
whitespace-stripping, idempotent). Hypothesis property test pins
`compare_ci(a, b) == (normalize_identifier(a) ==
normalize_identifier(b))`.
- Migrate 26 inline `.lower() ==` / `.casefold() ==` / `.lower() !=`
callsites:
- 23 `compare_ci(...)` migrations (single-string equality +
filter-by-name predicates).
- 3 `find_by_name_ci(...)` migrations collapsing private `_find_*`
helpers in `org_mutations.py` and `hr/registry.py:206`.
- 4 bytes-typed ASGI header sites (`csrf.py:179`/`:205`,
`etag.py:219`/`:361`) stay inline with a one-line exemption comment;
`compare_ci` is `str`-only by design.

## Sub-task G — Collapse Winner+Hybrid processors

The two processor classes had near-identical bodies and differed only in
whether they accepted `RejectDecision`.

- Replace `WinnerSelectProcessor` and `HybridDecisionProcessor` with a
single `HumanDecisionProcessor(mode: Literal["winner", "hybrid"])` at
`src/synthorg/communication/conflict_resolution/escalation/processors.py`.
- Public `mode` property exposes the discriminator for factory
introspection without leaking `_mode`.
- Constructor performs runtime `mode` validation (defensive; raises
`ValueError` on invalid values).
- `decided_by` parameter typed as `NotBlankStr` to match the
`DecisionProcessor` Protocol.
- Factory mapping (`_DECISION_PROCESSOR_FACTORIES`) becomes lambda-keyed
to construct the right `mode`.
- `escalation/__init__.py` re-exports the new class; `human_strategy.py`
default switched.
- Tests updated to assert `processor.mode == "winner" / "hybrid"`
(stronger than the prior `isinstance` check); error message regex still
matches the verbatim 422-translation text.

## Sub-task H — `AuthService` class docstring

Replace the 1-line docstring at `src/synthorg/api/auth/service.py:52`
with a structured docstring covering: purpose, **async vs sync
rationale** (Argon2 ops use `asyncio.to_thread`; refresh-token persist
awaits a repo write; everything else is sync), **thread-safety** (frozen
`_config` + stateless `argon2`/`jwt`; safe to share across the
request-handler pool), and **out-of-scope** (token revocation, session
storage, SYSTEM-token minting).

## Sub-task I — Protocols audit (audit-only, no removals)

237 `Protocol` classes inventoried at
`docs/reference/protocols-audit.md`. Each entry recorded with module
path, line, `@runtime_checkable` status, direct-inheritance count, and
test-usage count. Recommendations:

- **193 KEEP** (≥1 explicit subclass, plug-in suffix,
factory-registered, or `testuse ≥ 3`).
- **31 REMOVE** candidates (flag-only — must be re-verified at cleanup
time).
- **13 REVIEW** (`_PrivatePrefixed` typing seams; intent inspection
needed).
- **0 MAKE-RUNTIME-CHECKABLE** in this PR (deferred to per-area cleanup
PRs after surveying `isinstance` sites).

Cleanup itself is intentionally out of scope; follow-up issues will be
filed per area after this PR lands so each can hyperlink to a specific
row in the merged doc.

## Pre-PR review

13 review agents ran (10 main roster + 3 mini-pass). 12 findings
consolidated, 6 implemented in the post-review `refactor: apply pre-PR
review feedback` commit:

- (#1) added `protocols-audit.md` to the CLAUDE.md "Reference (load on
demand)" list.
- (#2) constructor mode validation in `HumanDecisionProcessor`.
- (#3) tightened `decided_by: str` → `NotBlankStr` in the new processor.
- (#4) scrubbed a stale `issue #1687` reference in
`web/src/__tests__/pages/new-surfaces.test.tsx`.
- (#5) replaced f-string `msg` with
`HR_TRAINING_SELECTOR_CONFIG_INVALID` event constant in two source
selectors.
- (#6) serialised `SettingsChangeDispatcher._running` reset through
`_lifecycle_lock` via a scheduled `_reset_running_under_lock` coroutine;
tests wait on the new `_post_done_tasks` set.

2 false positives rejected (PEP 758 `except A, B:` syntax flagged by an
agent unfamiliar with Python 3.12+; security "redundant whitespace
stripping" notes self-classified as Safe).

Issue-resolution-verifier confirmed all 5 sub-tasks RESOLVED at 99-100%
confidence.

## Test plan

- `uv run ruff check src/ tests/` — clean.
- `uv run mypy src/ tests/` — `Success: no issues found in 3698 source
files`.
- `uv run python -m pytest tests/ -m unit` — 23,745 passed (6
pre-existing flakes confirmed on clean main).
- Touched-area: 3,279 unit tests pass under sequential and `-n8
--dist=loadfile` schedulers.
- `npm --prefix web run lint` + `type-check` + `test` — 3,047 web tests
pass.
- `npm --prefix web run build` — production build succeeds (864ms).
- Final greps verify zero remaining `openapi-typescript` /
`WinnerSelectProcessor` / `HybridDecisionProcessor` references and only
the 4 documented bytes-typed sites carry the inline `.lower() ==` /
`.lower() !=` pattern.

## Local cleanup note

After pulling this branch, contributors should delete any stale
`web/src/api/types/generated.d.ts` from their worktree — it is no longer
produced and is no longer gitignored.
Aureliolo added a commit that referenced this pull request May 11, 2026
#1862)

Closes #1855.

## What changes

### Gate quality (sub-task 1)

- Rewrite `scripts/check_mock_spec.py` with AST-based detection of bare
`Mock` / `AsyncMock` / `MagicMock` at typed boundaries only. Catches:
- positional / kwarg argument to a non-Mock-class callable (Pattern
A1/A2),
- `m: T = Mock()` annotated local where `T` is not Mock-typed (Pattern
A3),
- `return Mock()` / `yield Mock()` / `yield from Mock()` from a
typed-return function (Pattern A4).
- Skips: `.return_value` chains, attribute-bag scratch objects, dict /
list / set / tuple-literal values, name-binding indirect substitutions
(which would require cross-module callee resolution).
- Delete `scripts/mock_spec_baseline.txt` (3046 entries). Gate runs
zero-tolerance against the live tree.

### Helper + ladder (sub-task 2)

- Add `tests/_shared/mock_of.py` with `mock_of[T](**overrides)` factory
backed by `create_autospec(T, instance=True, spec_set=True)`. Re-export
from `tests/_shared/__init__.py`.
- Add `tests/_shared/test_mock_of.py` with 8 unit tests (spec
enforcement, override application, unknown-override rejection, Protocol
support, async methods, subscript requirement, isinstance compatibility,
no state leak between calls).
- Document the test-double ladder in `docs/reference/conventions.md`
section 12.1 (FakeClock → `create_autospec` / `mock_of` →
`SimpleNamespace` → bare `MagicMock` forbidden at typed boundary).
Update CLAUDE.md Testing bullet with the pointer.

### Drain (sub-task 3)

- Add `scripts/check_mock_spec_ratchet.py` PreToolUse hook plus
registrations in `.claude/settings.json` and
`.opencode/plugins/synthorg-hooks.ts`. Blocks (a) edits to `tests/*.py`
that raise the gate's CATCH count for the touched file, (b) edits to
`scripts/check_mock_spec.py` that remove `_Verdict.CATCH` branches.
Drives organic drive-by tightening: every edit reduces or holds.
- Convert every surviving Pattern A site across `tests/` to
`spec=ConcreteType` / `mock_of[T]` / `FakeClock` / `SimpleNamespace`.
- Rewrite `tests/unit/scripts/test_check_mock_spec.py` with rule-table
coverage.

## Pre-PR review

14 review agents ran (12 main roster + 2 mini-pass). 9 valid findings
consolidated, 9 implemented in the follow-up `fix: address pre-PR review
feedback` commit. 4 false positives rejected (chiefly PEP 758 `except
OSError, UnicodeDecodeError:` flagged by 3 agents unfamiliar with Python
3.14 syntax; empirically verified to parse and catch correctly).

Implemented:

- **#1 docs-consistency (CRITICAL):** add `check_mock_spec_ratchet.py`
to PreToolUse hooks section in `docs/reference/convention-gates.md`.
- **#2 docs-consistency (CRITICAL):** backfill
`check_error_codes_ts_in_sync.py` and
`check_no_controller_response_for_domain_errors.py` in the gate
inventory; update `<!--RS:convention_gates-->` macro,
`data/runtime_stats.yaml` source, and the CLAUDE.md reference list
count.
- **#3 pr-test-analyzer (CRITICAL):** add
`tests/unit/scripts/test_check_mock_spec_ratchet.py` with 18 unit tests
covering envelope parsing (Bash / unknown tool / no file_path / non-.py
/ out-of-scope / `tests/_shared/` exemption / no-op edit /
old_string-not-in-before), ratchet enforcement (hold / raise / reduce /
Write tool no-op + raise), gate-edit protection (preserve / remove / add
branches), and fail-open path with stderr surfacing.
- **#4 pr-test-analyzer (MAJOR):** add
`test_yield_from_typed_return_caught` locking the `ast.YieldFrom`
branch.
- **#5 pr-test-analyzer (MAJOR):** add `test_set_element_skipped`
locking the `ast.Set` collection-skip branch.
- **#6 silent-failure-hunter (IMPORTANT):** stderr-log the two fail-open
`except Exception:` swallows in `_scan_text` and `_check_test_file` so
transitional failures surface instead of silently allowing edits.
- **#7 tool-parity-checker (MINOR):** list `check_mock_spec_ratchet.py`
in the OpenCode plugin header comment.
- **#8 + #9 comment-analyzer (SUGGESTION):** docstring on `_load_gate()`
plus five AST predicate helpers (`_is_empty_splat`, `_is_literal_none`,
`_has_spec_positional`, `_has_spec_keyword`, `_has_dynamic_splat`).

Also fixed an unrelated mypy strict failure: three test files imported
`VersionRepository` / `FineTuneCheckpointRepository` /
`FineTuneRunRepository` from `synthorg.persistence.protocol`, but those
names are TYPE_CHECKING-only re-exports there; routed through the source
protocol modules (`version_protocol`, `fine_tune_protocol`).

Issue-resolution-verifier confirmed all 5 acceptance criteria RESOLVED
at 92-100 % confidence. The stretch goal said "file kept as a structural
marker" but the branch deletes `scripts/mock_spec_baseline.txt`
entirely; the CLAUDE.md text now explicitly documents "zero-tolerance,
no baseline" so the intent is unambiguous.

## Test plan

- `uv run ruff check src/ tests/ scripts/` — clean.
- `uv run mypy src/ tests/` — `Success: no issues found in 3714 source
files`.
- `uv run python -m pytest tests/ -m unit` — 28,255 passed, 18 skipped
(Windows / logfire-extra environmental skips).
- New coverage: 50 gate + ratchet unit tests pass (32 existing + 2 added
+ 16 new ratchet tests).
- `uv run python scripts/check_convention_gate_inventory.py` — exit 0
(meta-gate parity preserved).

## Local cleanup note

Contributors pulling this branch should delete any stale
`scripts/mock_spec_baseline.txt` from their worktree; the gate now runs
zero-tolerance and the file is no longer produced.
Aureliolo added a commit that referenced this pull request May 13, 2026
Bundle of the valid findings surfaced by the /pre-pr-review pass over
the two existing bug-fix commits (fix(providers): stop sync from
deleting models and fix(dashboard): unblock click-through flows).
4 / 19 valid findings were rejected as agent error (PEP 758 except
syntax confused with Python 2, healthcheck-filter mistakenly claimed
to drop spans / metrics, module-level pytestmark already applied,
_try_stop event parameter is not dead).

Fixes landed:

- **#1 backend-parity**: SQLite mcp_installation_repo now translates
  IntegrityError to ConstraintViolationError (matches Postgres);
  added _classify_sqlite_constraint helper that maps the message
  text to a stable token (SQLite has no named constraints).
  Locked under conformance test
  test_save_raises_constraint_violation_on_unknown_connection that
  exercises both backends through the shared backend fixture.
- **#2 contract-drift**: web/src/api/types/agents.ts AgentConfig
  gains tier, model_requirement, personality_preset, and
  strategic_output_mode (last two were pre-existing drift; the
  first two land in this PR).
- **#4 docstring drift**: SyncModelsConfirmDialog docstring now
  matches the runtime default (replace_existing=false).
- **#5 catastrophising language**: trimmed the
  _SENSITIVE_PROVIDER_FIELDS comment to the factual operator-
  readability rationale.
- **#7 tier type narrowing**: AgentConfig.tier moved from str to
  Literal so typos fail at config load.
- **#8 missing state-transition log**: ConnectionCatalog.update_health
  now emits HEALTH_STATUS_TRANSITIONED at INFO when the status
  actually changes (gated so a quiet prober does not flood logs).
- **#9 docstring vs inline comment**: ConnectionCatalog
  .rebind_repository docstring focuses on the contract; the
  concurrency reasoning moved to an inline comment next to
  async with self._cache_lock.
- **#10 fail-fast guard**: added
  _assert_sensitive_fields_complete() at module-import time so a
  future credential-bearing field on ProviderConfig (anything
  matching *_key / *_token / *_secret / *password*) that is not
  listed in _SENSITIVE_PROVIDER_FIELDS crashes import instead of
  leaking on the next audit row.
- **#11 trimmed comment**: app.py default-construct block down to
  the load-bearing why.
- **#12 redundant non-null assertions**: build-org-tree.ts findCeo
  now uses destructuring + truthy check instead of [0]!.
- **#13 sanitizeForLog consistency**: budget.ts error args now
  pass through sanitizeForLog explicitly to match the rest of the
  store layer; the logger already auto-sanitises but the explicit
  wrap is project convention.
- **#15 trimmed Raises section**: Postgres save() docstring drops
  the HTTP-status implementation detail.
- **#18 useCallback for delete-model handler**: extracted the
  inline arrow at ConfirmDialog onOpenChange so the reference is
  stable across renders.

Tests added (covering #14):

- TestAppStateSimulationRuntime: 5 tests pinning the four branches
  of has_simulation_runtime + the always-true contract of
  has_client_simulation_state.
- TestRebindRepository: 4 tests for repo swap, cache invalidation,
  subsequent-read backend selection, and double-rebind safety.
- ollama-cloud routing: confirms /api/tags is used (not /models).
- TestDiffProviderUpdate: 3 tests for changed-only output,
  redaction sentinels, and an additional cross-check between
  ProviderConfig fields and the SENSITIVE list.
- test_destructive_sync_guard: 4 tests for the
  _reject_destructive_empty_discovery branches.
- AgentConfig tier / model_requirement round-trip + tier rejection
  for invalid values.
- conformance test for ConstraintViolationError parity across
  Postgres and SQLite (referenced under #1 above).

Pre-reviewed by 21 agents (16 standard + 5 mini-pass) through
/pre-pr-review; 19 valid findings raised, 15 fixed in code, 4
rejected as agent error (documented in _audit/pre-pr-review/
triage.md).

uv run python -m pytest tests/ -n 8: 29957 passed, 49 skipped.
uv run mypy src/ tests/: clean.
uv run ruff check src/ tests/: clean.
npm --prefix web run lint / type-check / test: clean (3075 tests).
Aureliolo added a commit that referenced this pull request May 13, 2026
Bundle of the valid findings surfaced by the /pre-pr-review pass over
the two existing bug-fix commits (fix(providers): stop sync from
deleting models and fix(dashboard): unblock click-through flows).
4 / 19 valid findings were rejected as agent error (PEP 758 except
syntax confused with Python 2, healthcheck-filter mistakenly claimed
to drop spans / metrics, module-level pytestmark already applied,
_try_stop event parameter is not dead).

Fixes landed:

- **#1 backend-parity**: SQLite mcp_installation_repo now translates
  IntegrityError to ConstraintViolationError (matches Postgres);
  added _classify_sqlite_constraint helper that maps the message
  text to a stable token (SQLite has no named constraints).
  Locked under conformance test
  test_save_raises_constraint_violation_on_unknown_connection that
  exercises both backends through the shared backend fixture.
- **#2 contract-drift**: web/src/api/types/agents.ts AgentConfig
  gains tier, model_requirement, personality_preset, and
  strategic_output_mode (last two were pre-existing drift; the
  first two land in this PR).
- **#4 docstring drift**: SyncModelsConfirmDialog docstring now
  matches the runtime default (replace_existing=false).
- **#5 catastrophising language**: trimmed the
  _SENSITIVE_PROVIDER_FIELDS comment to the factual operator-
  readability rationale.
- **#7 tier type narrowing**: AgentConfig.tier moved from str to
  Literal so typos fail at config load.
- **#8 missing state-transition log**: ConnectionCatalog.update_health
  now emits HEALTH_STATUS_TRANSITIONED at INFO when the status
  actually changes (gated so a quiet prober does not flood logs).
- **#9 docstring vs inline comment**: ConnectionCatalog
  .rebind_repository docstring focuses on the contract; the
  concurrency reasoning moved to an inline comment next to
  async with self._cache_lock.
- **#10 fail-fast guard**: added
  _assert_sensitive_fields_complete() at module-import time so a
  future credential-bearing field on ProviderConfig (anything
  matching *_key / *_token / *_secret / *password*) that is not
  listed in _SENSITIVE_PROVIDER_FIELDS crashes import instead of
  leaking on the next audit row.
- **#11 trimmed comment**: app.py default-construct block down to
  the load-bearing why.
- **#12 redundant non-null assertions**: build-org-tree.ts findCeo
  now uses destructuring + truthy check instead of [0]!.
- **#13 sanitizeForLog consistency**: budget.ts error args now
  pass through sanitizeForLog explicitly to match the rest of the
  store layer; the logger already auto-sanitises but the explicit
  wrap is project convention.
- **#15 trimmed Raises section**: Postgres save() docstring drops
  the HTTP-status implementation detail.
- **#18 useCallback for delete-model handler**: extracted the
  inline arrow at ConfirmDialog onOpenChange so the reference is
  stable across renders.

Tests added (covering #14):

- TestAppStateSimulationRuntime: 5 tests pinning the four branches
  of has_simulation_runtime + the always-true contract of
  has_client_simulation_state.
- TestRebindRepository: 4 tests for repo swap, cache invalidation,
  subsequent-read backend selection, and double-rebind safety.
- ollama-cloud routing: confirms /api/tags is used (not /models).
- TestDiffProviderUpdate: 3 tests for changed-only output,
  redaction sentinels, and an additional cross-check between
  ProviderConfig fields and the SENSITIVE list.
- test_destructive_sync_guard: 4 tests for the
  _reject_destructive_empty_discovery branches.
- AgentConfig tier / model_requirement round-trip + tier rejection
  for invalid values.
- conformance test for ConstraintViolationError parity across
  Postgres and SQLite (referenced under #1 above).

Pre-reviewed by 21 agents (16 standard + 5 mini-pass) through
/pre-pr-review; 19 valid findings raised, 15 fixed in code, 4
rejected as agent error (documented in _audit/pre-pr-review/
triage.md).

uv run python -m pytest tests/ -n 8: 29957 passed, 49 skipped.
uv run mypy src/ tests/: clean.
uv run ruff check src/ tests/: clean.
npm --prefix web run lint / type-check / test: clean (3075 tests).
Aureliolo added a commit that referenced this pull request May 16, 2026
…1 reviewer-config)

- Forward cancellation token in _mine_negatives_from_pairs so the
  encoding phase of hard-negative mining is interruptible
  (CodeRabbit + Gemini, fine_tune.py:462).
- Restructure embedding-evaluation.md pipeline stages: split the old
  Stage 3 evaluation bullet into a dedicated Stage 4, renumber Deploy
  to Stage 5 so the doc matches the fine_tune.py module header
  (5-stage pipeline) and removes stage-ownership ambiguity (CodeRabbit).
- Add .gemini/styleguide.md + .gemini/config.yaml so Gemini Code Assist
  stops flagging PEP 649 TYPE_CHECKING annotations as NameError risks
  on every PR, and reads project conventions from CLAUDE.md.

Skipped (factually wrong against current code):
- Gemini #1/#2/#3 (NameError on CancellationToken / EvalMetrics in
  TYPE_CHECKING block): Python 3.14 PEP 649 evaluates annotations
  lazily; existing unquoted CancellationToken | None at fine_tune.py:568
  has been in CI without issue. CLAUDE.md mandates this convention.
- CodeRabbit RS-marker findings on docs/design/memory.md and
  docs/reference/embedding-evaluation.md: out of scope for
  scripts/check_doc_numeric_macros.py, which scopes to README + 4 docs
  (neither file present) and matches digits only adjacent to specific
  stat nouns (tests/providers/agents/stars/releases).
Aureliolo added a commit that referenced this pull request May 16, 2026
…1 reviewer-config)

- Forward cancellation token in _mine_negatives_from_pairs so the
  encoding phase of hard-negative mining is interruptible
  (CodeRabbit + Gemini, fine_tune.py:462).
- Restructure embedding-evaluation.md pipeline stages: split the old
  Stage 3 evaluation bullet into a dedicated Stage 4, renumber Deploy
  to Stage 5 so the doc matches the fine_tune.py module header
  (5-stage pipeline) and removes stage-ownership ambiguity (CodeRabbit).
- Add .gemini/styleguide.md + .gemini/config.yaml so Gemini Code Assist
  stops flagging PEP 649 TYPE_CHECKING annotations as NameError risks
  on every PR, and reads project conventions from CLAUDE.md.

Skipped (factually wrong against current code):
- Gemini #1/#2/#3 (NameError on CancellationToken / EvalMetrics in
  TYPE_CHECKING block): Python 3.14 PEP 649 evaluates annotations
  lazily; existing unquoted CancellationToken | None at fine_tune.py:568
  has been in CI without issue. CLAUDE.md mandates this convention.
- CodeRabbit RS-marker findings on docs/design/memory.md and
  docs/reference/embedding-evaluation.md: out of scope for
  scripts/check_doc_numeric_macros.py, which scopes to README + 4 docs
  (neither file present) and matches digits only adjacent to specific
  stat nouns (tests/providers/agents/stars/releases).
Aureliolo added a commit that referenced this pull request May 18, 2026
…derabbit)

CodeRabbit #2: the AutonomyChangeStrategy verdict was computed but never threaded into the apply path, so non-HUMAN_ONLY strategies were inert. AutonomyUpdate gains granted_by_strategy; a granting strategy now produces an auto-decided (APPROVED, decided_by=strategy:<name>) approval item and the registry applies the level change immediately. HUMAN_ONLY (default) unchanged -- still pends. docs/design/security.md amended (user-approved deviation from queue-is-sole-apply-driver). Adds registry granted-path test.
Aureliolo added a commit that referenced this pull request May 18, 2026
…derabbit)

CodeRabbit #2: the AutonomyChangeStrategy verdict was computed but never threaded into the apply path, so non-HUMAN_ONLY strategies were inert. AutonomyUpdate gains granted_by_strategy; a granting strategy now produces an auto-decided (APPROVED, decided_by=strategy:<name>) approval item and the registry applies the level change immediately. HUMAN_ONLY (default) unchanged -- still pends. docs/design/security.md amended (user-approved deviation from queue-is-sole-apply-driver). Adds registry granted-path test.
Aureliolo pushed a commit that referenced this pull request May 20, 2026
Round 2 covers CodeRabbit's re-review of round 1's commit.

Comment / docstring hygiene (CR-#1, #2, #5):
- app.py: rewrite the SQLite+persistent-ApprovalStore hard-block
  comment without migration framing; describe the invariant the
  guard enforces, not the migration's history.
- approval_store.py: trim ``has_persistent_repo`` docstring to
  current-state behaviour only.
- chief_of_staff/config.py: replace real vendor names
  (OpenAI/Claude/GPT-4o/LiteLLM) in propose-* constant comments
  with project-approved generic wording.

App startup ordering (CR-#8 duplicate):
- app.py: move conversational_proposal_repo wiring above the
  ``provider_registry is None: return`` early-return so the repo
  attaches even on boots without an LLM provider (preserves the
  approve/reject path for already-parked conversational approvals,
  which was the stated rationale for the round-1 decoupling).

Conversational-intake resume (CR-#3 + CR-#4):
- _approval_review_gate.py: split try_conversational_intake_resume
  (187 lines, well over the <50 line budget) into orchestrator +
  three branch-specific helpers:
    * _load_conversational_proposal: source check + repo + lookup.
    * _reject_conversational_proposal: PENDING -> REJECTED CAS.
    * _execute_conversational_proposal: PENDING -> EXECUTING ->
      EXECUTED CAS with revert-on-failure.
  The orchestrator is now 43 lines and reads as a flow table.
- _approval_review_gate.py: ``_reread_approval_item`` returning
  ``None`` for a conversational approval now raises
  ServiceUnavailableError. Returning False would route the call
  through to non-conversational flows and silently strand the
  decided approval. Docstring updated to spell out the
  flow-specific semantics.

Cross-tenant privacy (CR-#7, Critical):
- propose.py: ``_resolve_conversation`` now rejects conversation-id
  reuse across users. A caller who learns another conversation_id
  could otherwise append turns and have prior history fed back to
  the model -- a cross-tenant break. Mapped to NotFound so the
  endpoint cannot be used to probe existence either.

TOCTOU on sequence allocation (CR-#6, Heavy lift):
- conversation_repo.py (SQLite + Postgres): bounded
  retry-on-IntegrityError in ``append`` when the
  (conversation_id, sequence) UNIQUE constraint fires. The repo
  re-queries the live max sequence, model_copies the turn, and
  retries up to 3 times. Other constraint violations (FK miss,
  CHECK on content/role) still translate directly to
  ConstraintViolationError without retry.

Tests:
- test_propose.py: new test_other_user_resume_blocked.
- test_conversational_intake_resume.py: new
  test_unreadable_approval_item_raises (covers the new
  raise-on-None path).
- test_conversation_repository.py: renamed
  test_append_duplicate_sequence_rejected ->
  test_append_duplicate_sequence_resequenced; asserts the new
  race-safe semantics (second append lands at sequence + 1).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants