Add design specification, license, and project setup#2
Conversation
Comprehensive design spec covering: - Agent identity system with roles, personalities, skills, seniority - Company structure templates (solo founder to full enterprise) - Communication architecture (event-driven bus, hierarchical, meetings, hybrid) - Task/workflow engine with multiple execution patterns - Memory system (working, episodic, semantic, procedural, social) - Multi-provider LLM layer via LiteLLM (Claude, OpenRouter, Ollama) - Budget management with CFO agent and cost optimization - Tool/capability system with progressive trust - Security operations agent and configurable autonomy levels - API-first architecture (FastAPI + WebSocket) with web UI - Template system and company builder - Research analysis of MetaGPT, ChatDev, CrewAI, AutoGen, LangGraph - Build-from-scratch recommendation leveraging LiteLLM, Mem0, MCP, A2A - Open questions, risks, backlog, and scaling path
Business Source License 1.1 with parameters: - Non-commercial use only (personal, education, research, evaluation) - Commercial/production use requires separate license - Converts to Apache 2.0 after 4 years per version
- Fix LICENSE Change Date to explicit "February 27, 2030" - Extract HR into its own section (Section 8), renumber all sections - Reconcile budget hierarchy to include QA department - Correct A2A/MCP governance attribution to Linux Foundation - Keep SecOps agent active in locked autonomy mode for audit logging - Consolidate terminology on "Agent" (remove "Employee" as separate concept) - Add communication loop prevention mechanism (Section 5.5) - Mark model pricing and provider configs as illustrative examples - Soften specific technology commitments to "candidates - TBD after research" - Add language spec to workflow diagram code blocks - Add aurelio-review-pr skill for Claude Code
|
Caution Review failedThe pull request is closed. ℹ️ Recent review infoConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (2)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds foundational documentation and a PR review workflow: a 10-phase PR review pipeline spec, a comprehensive design specification for multi-agent orchestration, a Business Source License 1.1, and an expanded README. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes the foundational elements for the "AI Company" project by introducing its core design principles, licensing, and initial development tooling. It provides a detailed high-level design specification outlining the architecture and components of the AI agent orchestration framework, defines the project's open-source license, updates the main project README with key information, and adds a sophisticated PR review automation skill. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces foundational documents for the project, including a detailed design specification for an AI Company framework, a BSL 1.1 license, and an updated project README. While I've noted minor inconsistencies within the design specification and the aurelio-review-pr skill definition, a critical concern is that the aurelio-review-pr skill contains instructions that could lead to security vulnerabilities. Specifically, it lacks explicit validation for untrusted input extracted from PR bodies, potentially leading to command injection, and does not include safeguards against prompt injection when passing GitHub issue context to review agents. Addressing these inconsistencies and, more importantly, refining the skill definition to mitigate these security risks is crucial for clarity and project security.
| **Fetch issue context.** If an issue reference was found (regardless of warnings), fetch the issue for review context. If the PR body used a full URL (`https://github.com/OWNER/REPO/issues/N`), extract both `OWNER/REPO` and `N` and pass `--repo OWNER/REPO` to query the correct repository: | ||
|
|
||
| ```bash | ||
| gh issue view N --repo OWNER/REPO --json title,body,labels,comments --jq '{title: .title, body: .body, labels: [.labels[].name], comments: [.comments[] | {author: .author.login, body: .body}]}' |
There was a problem hiding this comment.
The skill instructions direct the agent to extract OWNER/REPO and N from a URL provided in the PR body and use them in a shell command (gh issue view N --repo OWNER/REPO). Since the PR body is untrusted input, an attacker could provide a malicious URL (e.g., https://github.com/owner/repo;injection/issues/1) to perform command injection. The instructions should explicitly require the agent to validate that OWNER/REPO matches the expected repository format and that N is a numeric value before using them in any shell command.
| | **comment-analyzer** | Comments or docstrings changed | `pr-review-toolkit:comment-analyzer` | | ||
| | **type-design-analyzer** | Type annotations or classes added/modified | `pr-review-toolkit:type-design-analyzer` | | ||
|
|
||
| Each agent should receive the list of changed files and focus on reviewing them. **If issue context was collected in Phase 2, include the issue title, body, and key comments in each agent's prompt** so they can verify the PR addresses the issue's requirements. |
There was a problem hiding this comment.
The skill instructions specify that untrusted data from GitHub (issue title, body, and comments) should be included in the prompts for review agents. This data is not sanitized or wrapped in delimiters, making the agents vulnerable to prompt injection attacks. An attacker could use malicious issue comments to manipulate the behavior of the review agents. The instructions should be updated to require the use of clear delimiters (e.g., XML tags) and to explicitly instruct the sub-agents to treat this content as untrusted data.
| ```bash | ||
| gh api repos/OWNER/REPO/pulls/NUMBER/comments --paginate | ||
| ``` | ||
| Extract: author, file path, line number, body, subject_type. |
There was a problem hiding this comment.
In the list of fields to extract for inline review comments, subject_type is mentioned. However, according to the GitHub API documentation, this field is not present in the response for listing PR review comments (/repos/OWNER/REPO/pulls/NUMBER/comments). This field is associated with commit comments. To ensure the implementation is based on the correct API schema, I suggest removing subject_type from this list.
| ```yaml | ||
| memory: | ||
| level: "full" # none, session, project, full | ||
| backend: "sqlite" # sqlite, postgresql, mem0, file |
There was a problem hiding this comment.
There appears to be an inconsistency in how Mem0 is defined as part of the memory architecture.
- Here in Section 7.3,
mem0is listed as a value for thebackendconfiguration, suggesting it's a standalone storage option. - However, Section 15.2 describes the stack as
Mem0 + SQLiteand statesMem0is for semantic/episodic memory whileSQLiteis for structured data. This impliesMem0is a layer that works with a database, not as an alternative to it.
To avoid confusion during implementation, it would be helpful to clarify the relationship. Is Mem0 a memory system that can be configured with different storage backends (like SQLite), or is it a backend itself? The configuration schema should reflect this relationship clearly.
| - role_level: "c_suite" | ||
| preferred_model: "opus" | ||
| fallback: "sonnet" | ||
| - role_level: "senior" | ||
| preferred_model: "sonnet" | ||
| fallback: "haiku" | ||
| - role_level: "junior" | ||
| preferred_model: "haiku" | ||
| fallback: "local-small" |
There was a problem hiding this comment.
There's an inconsistency in the casing of agent seniority levels across the design document.
- Section 3.1 (
level) and 3.2 (Level) use capitalized values (e.g.,Senior,C-Suite). - The routing rules here in Section 9.4 use lowercase values (e.g.,
c_suite,senior,junior).
This could lead to bugs where routing rules don't match agent levels. It would be beneficial to standardize on a single casing convention (e.g., all lowercase) for these values throughout the specification.
There was a problem hiding this comment.
Pull request overview
This PR establishes the foundational documentation and legal framework for the AI Company project, a framework for orchestrating autonomous AI agents within a virtual company structure. The PR transitions the repository from a minimal placeholder to a fully specified design-phase project with clear licensing, technical vision, and development tooling.
Changes:
- Added comprehensive 18-section design specification covering agent system, company structure, communication, tasks, memory, HR, providers, budget, tools, security, human interaction, templates, and technical architecture
- Implemented Business Source License 1.1 with 4-year conversion to Apache 2.0
- Updated README with project concept, key features, tech stack, and documentation links
- Added Claude Code PR review skill for automated review workflow
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| DESIGN_SPEC.md | Comprehensive high-level design covering all major system components, with technology choices marked as candidates pending research |
| LICENSE | Business Source License 1.1 with non-production use grant, converting to Apache 2.0 on February 27, 2030 |
| README.md | Project overview with concept description, planned features, tech stack, and documentation links |
| .claude/skills/aurelio-review-pr/skill.md | PR review pipeline skill with 10-phase workflow for local agents, external feedback, triage, and fixes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/skills/aurelio-review-pr/skill.md:
- Around line 32-34: Several fenced code blocks (e.g., the block containing "gh
repo view --json nameWithOwner -q .nameWithOwner") lack surrounding blank lines
and nearby sentences lack terminal punctuation; update each affected fenced
block by inserting a blank line before and after the triple-backtick fence and
ensure the sentence immediately preceding or following each fence ends with
proper punctuation (period, question mark, or exclamation). Apply the same fixes
to the other similar fences called out (the blocks around the other listed
snippets) so all fenced code blocks have blank lines around them and all nearby
sentences end with punctuation for MD lint stability.
In `@DESIGN_SPEC.md`:
- Around line 76-97: The fenced block showing the company org tree (the
triple-backtick block starting with "Company ├── Departments[] ...") and
multiple other fenced blocks lack a language identifier and several headings
(e.g., "#### C-Suite / Executive") are missing surrounding blank lines; to fix,
add an appropriate fence language token (e.g., ```text, ```yaml, ```json, or
```bash) to every fenced code block such as the "Company" tree and the other
blocks listed, and ensure each heading has a blank line above and below it (for
example insert a blank line before "#### C-Suite / Executive" and a blank line
after the heading), applying this pattern across the noted ranges (lines with
MD040/MD022 warnings) so markdownlint MD040 and MD022 are resolved.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
.claude/skills/aurelio-review-pr/skill.mdDESIGN_SPEC.mdLICENSEREADME.md
📜 Review details
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to README.md : Update README.md for significant feature changes
Applied to files:
README.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Never defer work—do not suggest "this can be done later" or "consider for a future PR". Complete all requested changes fully.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Never defer work. Do not suggest 'this can be done later' or 'consider for a future PR'. Complete all requested changes fully.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: After every push, you MUST check that CI passes. If CI fails, fix the issue immediately and push again until all checks are green. Never walk away from a failing CI pipeline.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Always create a PR for issue work. When implementing changes for a GitHub issue, create a branch and open a pull request. Do not wait to be asked.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
🪛 LanguageTool
DESIGN_SPEC.md
[typographical] ~1-~1: To join two clauses or introduce examples, consider using an em dash.
Context: # AI Company - High-Level Design Specification > A fra...
(DASH_RULE)
[typographical] ~193-~193: To join two clauses or introduce examples, consider using an em dash.
Context: ...### Engineering - Software Architect - System design, technology decisions, pat...
(DASH_RULE)
[typographical] ~194-~194: To join two clauses or introduce examples, consider using an em dash.
Context: ...Frontend Developer** (Junior/Mid/Senior) - UI implementation, components, state man...
(DASH_RULE)
[typographical] ~195-~195: To join two clauses or introduce examples, consider using an em dash.
Context: ...Backend Developer* (Junior/Mid/Senior) - APIs, business logic, databases - **Full...
(DASH_RULE)
[typographical] ~196-~196: To join two clauses or introduce examples, consider using an em dash.
Context: ...ll-Stack Developer** (Junior/Mid/Senior) - End-to-end implementation - **DevOps/SRE...
(DASH_RULE)
[typographical] ~197-~197: To join two clauses or introduce examples, consider using an em dash.
Context: ...implementation - DevOps/SRE Engineer - Infrastructure, CI/CD, monitoring, deplo...
(DASH_RULE)
[typographical] ~198-~198: To join two clauses or introduce examples, consider using an em dash.
Context: ...ring, deployment - Database Engineer - Schema design, query optimization, migra...
(DASH_RULE)
[typographical] ~199-~199: To join two clauses or introduce examples, consider using an em dash.
Context: ...tion, migrations - Security Engineer - Security audits, vulnerability assessmen...
(DASH_RULE)
[style] ~383-~383: Consider using a different adjective to strengthen your wording.
Context: ...#### Pattern 4: Hybrid (Recommended for Full Company) Combines all three: - **Messa...
(FULL_ENTIRE)
[style] ~654-~654: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...onality would complement the team? - What model/provider fits the budget? 3. Cand...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~664-~664: You have used the passive voice repeatedly in nearby sentences. To make your writing clearer and easier to read, consider using active voice.
Context: ... 3. Active tasks are reassigned 4. Team is notified ### 8.3 Performance Tracking ```yaml ...
(REP_PASSIVE_VOICE)
[grammar] ~682-~682: Use a hyphen to join words.
Context: ...ce: - Promotion criteria: sustained high quality scores, task complexity handled,...
(QB_NEW_EN_HYPHEN)
[typographical] ~1377-~1377: In American English, use a period after an abbreviation.
Context: ...pid prototyping | ### 16.2 What Exists vs What We Need | Feature | MetaGPT | Cha...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[typographical] ~1381-~1381: To join two clauses or introduce examples, consider using an em dash.
Context: ...ulation | Partial | Partial | No | Yes - complete | | HR (hiring/firing) | No |...
(DASH_RULE)
[typographical] ~1385-~1385: To join two clauses or introduce examples, consider using an em dash.
Context: ...alities | Basic | Basic | Basic | Deep - traits, styles, evolution | | Dynamic ...
(DASH_RULE)
[typographical] ~1386-~1386: To join two clauses or introduce examples, consider using an em dash.
Context: ... team scaling | No | No | Manual | Yes - auto + manual | | Multiple company typ...
(DASH_RULE)
[typographical] ~1387-~1387: To join two clauses or introduce examples, consider using an em dash.
Context: ...company types | No | No | Manual | Yes - templates + builder | | Security ops a...
(DASH_RULE)
[typographical] ~1389-~1389: To join two clauses or introduce examples, consider using an em dash.
Context: ...ble autonomy | No | No | Limited | Yes - full spectrum | | Local + cloud provid...
(DASH_RULE)
[typographical] ~1390-~1390: To join two clauses or introduce examples, consider using an em dash.
Context: ...rs | Partial | Partial | Partial | **Yes - unified abstraction (LiteLLM candidate)*...
(DASH_RULE)
[typographical] ~1391-~1391: To join two clauses or introduce examples, consider using an em dash.
Context: ...racking per agent | No | No | No | Yes - full budget system | | Progressive tru...
(DASH_RULE)
[typographical] ~1398-~1398: In American English, use a period after an abbreviation.
Context: ...Planned (backlog) | ### 16.3 Build vs Fork Decision **Recommendation: Build ...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[style] ~1432-~1432: Consider using the typographical ellipsis character here instead.
Context: ... | Agent A asks Agent B who asks Agent A... | | 8 | Optimal message bus for local-f...
(ELLIPSIS)
[typographical] ~1433-~1433: In American English, use a period after an abbreviation.
Context: ...architecture? | Medium | asyncio queues vs Redis vs embedded broker | | 9 | How to...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[typographical] ~1433-~1433: In American English, use a period after an abbreviation.
Context: ...ure? | Medium | asyncio queues vs Redis vs embedded broker | | 9 | How to handle c...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[typographical] ~1434-~1434: In American English, use a period after an abbreviation.
Context: ...y? | High | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the ...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[typographical] ~1434-~1434: In American English, use a period after an abbreviation.
Context: ...h | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the minimum ...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[style] ~1509-~1509: To make your text as clear as possible to all readers, do not use this foreign term. Possible alternatives are ‘in fact’ or ‘in reality’.
Context: ...rop | | OpenAI API format | OpenAI (de facto standard) | LLM API interface | Via pro...
(DE_FACTO)
[style] ~1509-~1509: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “API”.
Context: ...at** | OpenAI (de facto standard) | LLM API interface | Via provider abstraction layer (LiteL...
(ACRONYM_TAUTOLOGY)
[typographical] ~1513-~1513: To join two clauses or introduce examples, consider using an em dash.
Context: ...s://github.com/FoundationAgents/MetaGPT) - Multi-agent SOP framework (64.5k stars) ...
(DASH_RULE)
[typographical] ~1514-~1514: To join two clauses or introduce examples, consider using an em dash.
Context: ...2.0](https://github.com/openbmb/ChatDev) - Zero-code multi-agent platform (31.2k st...
(DASH_RULE)
[typographical] ~1515-~1515: To join two clauses or introduce examples, consider using an em dash.
Context: ...AI](https://github.com/crewAIInc/crewAI) - Role-based agent collaboration framework...
(DASH_RULE)
[typographical] ~1516-~1516: To join two clauses or introduce examples, consider using an em dash.
Context: ...n](https://github.com/microsoft/autogen) - Microsoft async multi-agent framework - ...
(DASH_RULE)
[typographical] ~1517-~1517: To join two clauses or introduce examples, consider using an em dash.
Context: ...LLM](https://github.com/BerriAI/litellm) - Unified LLM API gateway (100+ providers)...
(DASH_RULE)
[typographical] ~1518-~1518: To join two clauses or introduce examples, consider using an em dash.
Context: ...- Mem0 - Universal memory layer for AI agents - [...
(DASH_RULE)
[typographical] ~1519-~1519: To join two clauses or introduce examples, consider using an em dash.
Context: ...ocol](https://github.com/a2aproject/A2A) - Agent-to-Agent protocol (Linux Foundatio...
(DASH_RULE)
[typographical] ~1520-~1520: To join two clauses or introduce examples, consider using an em dash.
Context: ...extprotocol.io/specification/2025-11-25) - Model Context Protocol - [Langfuse Agent...
(DASH_RULE)
[typographical] ~1521-~1521: To join two clauses or introduce examples, consider using an em dash.
Context: ...com/blog/2025-03-19-ai-agent-comparison) - Framework comparison - [Confluent Event-...
(DASH_RULE)
[typographical] ~1522-~1522: To join two clauses or introduce examples, consider using an em dash.
Context: .../blog/event-driven-multi-agent-systems/) - Multi-agent architecture patterns - [Mic...
(DASH_RULE)
[typographical] ~1523-~1523: To join two clauses or introduce examples, consider using an em dash.
Context: ....io/multi-agent-reference-architecture/) - Enterprise patterns - [OpenRouter](https...
(DASH_RULE)
[typographical] ~1524-~1524: To join two clauses or introduce examples, consider using an em dash.
Context: ...s - OpenRouter - Multi-model API gateway
(DASH_RULE)
README.md
[typographical] ~11-~11: To join two clauses or introduce examples, consider using an em dash.
Context: ...s (Planned) - Any Company Structure - From a 2-person startup to a 50+ enterpr...
(DASH_RULE)
[typographical] ~12-~12: To join two clauses or introduce examples, consider using an em dash.
Context: ...nfig/templates - Deep Agent Identity - Names, personalities, skills, seniority ...
(DASH_RULE)
[typographical] ~13-~13: To join two clauses or introduce examples, consider using an em dash.
Context: ...erformance tracking - Multi-Provider - Anthropic Claude, OpenRouter (400+ model...
(DASH_RULE)
[typographical] ~14-~14: To join two clauses or introduce examples, consider using an em dash.
Context: ... via LiteLLM - Smart Cost Management - Per-agent budget tracking, auto model ro...
(DASH_RULE)
[typographical] ~15-~15: To join two clauses or introduce examples, consider using an em dash.
Context: ...optimization - Configurable Autonomy - From fully autonomous to human-approves-...
(DASH_RULE)
[typographical] ~16-~16: To join two clauses or introduce examples, consider using an em dash.
Context: ...agent in between - Persistent Memory - Agents remember past decisions, code, re...
(DASH_RULE)
[typographical] ~17-~17: To join two clauses or introduce examples, consider using an em dash.
Context: ...relationships (via Mem0) - HR System - Hire, fire, promote agents. HR agent ana...
(DASH_RULE)
[typographical] ~18-~18: To join two clauses or introduce examples, consider using an em dash.
Context: ...tem, git, code execution, web, databases - role-based and sandboxed - API-First...
(DASH_RULE)
[grammar] ~20-~20: Please add a punctuation mark at the end of paragraph.
Context: ...built company templates and interactive builder ## Status Design phase. See [DESI...
(PUNCTUATION_PARAGRAPH_END)
[typographical] ~37-~37: To join two clauses or introduce examples, consider using an em dash.
Context: ...- Design Specification - Full high-level design
(DASH_RULE)
.claude/skills/aurelio-review-pr/skill.md
[style] ~107-~107: The phrase ‘Look for patterns’ is used very frequently. Consider using a less frequent alternative to set your writing apart from others.
Context: ... lines are outside the PR's diff range. Look for patterns like "Outside diff range comments (N)" ...
(LOOK_FOR_STYLE)
[style] ~121-~121: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ...tionable items, not just summaries). Important: Use gh api with --jq for filteri...
(EN_REPEATEDWORDS_IMPORTANT)
[style] ~123-~123: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ...omplex Python scripts to parse JSON. Important: When review bodies are large (e.g. C...
(EN_REPEATEDWORDS_IMPORTANT)
[grammar] ~139-~139: Please add a punctuation mark at the end of paragraph.
Context: ...heck against CLAUDE.md rules and actual code Deduplication: If multiple sources...
(PUNCTUATION_PARAGRAPH_END)
[grammar] ~172-~172: Please add a punctuation mark at the end of paragraph.
Context: ... rely on pre-push hooks and CI for full coverage ## Phase 8: Commit and push After all...
(PUNCTUATION_PARAGRAPH_END)
[style] ~181-~181: Consider using a different verb for a more formal wording.
Context: .... If commit or push fails due to hooks, fix the actual issue and create a NEW commi...
(FIX_RESOLVE)
🪛 markdownlint-cli2 (0.21.0)
DESIGN_SPEC.md
[warning] 76-76: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 178-178: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 185-185: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 192-192: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 201-201: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 207-207: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 212-212: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 218-218: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 256-256: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 332-332: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 352-352: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 365-365: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 492-492: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 602-602: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 693-693: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 947-947: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1020-1020: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1039-1039: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1133-1133: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1180-1180: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1238-1238: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1487-1487: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
.claude/skills/aurelio-review-pr/skill.md
[warning] 32-32: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 102-102: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 104-104: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 110-110: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 112-112: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 116-116: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 118-118: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🔇 Additional comments (2)
README.md (1)
1-37: README update aligns well with this PR scope.The document now matches the current project state and points readers to the design spec clearly.
Based on learnings: Applies to README.md : Update README.md for significant feature changes.
LICENSE (1)
6-25: BSL parameterization looks consistent and complete.Line 23 and Line 24 correctly define the conversion trigger and target license, and the use-grant language is explicit.
…d Copilot - Add input validation for OWNER/REPO and issue number extracted from PR bodies (command injection fix) - Add XML delimiters and untrusted-data instructions for issue context passed to sub-agents (prompt injection fix) - Remove incorrect subject_type field from PR comments extraction - Clarify Mem0 relationship: it's a memory layer on top of backends, not a backend itself - Standardize agent level casing in routing rules to match Section 3.1/3.2 conventions - Add language identifiers to all fenced code blocks (MD040) - Add blank lines around headings and fenced code blocks (MD022/MD031)
…eviewers Source changes (tracker.py): - Fix budget_used_percent/alert_level threshold disagreement (#1) - Add logger.warning before ValueError in _validate_time_range (#2) - Split build_summary into _build_agent_spendings, _build_dept_spendings, _build_budget_context helpers to meet <50 line guideline (#3) - Replace _aggregate tuple return with _AggregateResult NamedTuple (#4) - Rewrite _aggregate as single-pass loop (#5) - Aggregate departments from AgentSpending objects, not raw records (#6) - Add DEBUG log in __init__ for object creation (#7) - Fix module docstring "Section 10.2 service layer" accuracy (#15) - Add DEBUG entry logs for get_total_cost/get_agent_cost (#16) Event constants (events.py): - Add BUDGET_TRACKER_CREATED, BUDGET_TOTAL_COST_QUERIED, BUDGET_AGENT_COST_QUERIED, BUDGET_TIME_RANGE_INVALID Test improvements (test_tracker.py): - Assert budget_used_percent in all alert-level tests (#8) - Assert budget_total_monthly in configured summary test (#9) - Add start==end raises ValueError test (#10) - Add end-only time filter test for get_total_cost (#11) - Add multi-record token aggregation assertions (#12) - Add pytestmark = pytest.mark.timeout(30) (#13) - Move @pytest.mark.unit to class level (#14) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t, Gemini, and greptile - Add produced_artifacts field to AgentRunResult (#1) - Wrap _log_completion in try/except to preserve valid results (#2) - Add test for inner TimeoutError propagation without engine timeout (#3) - Extract _run_loop_with_timeout from _execute (50-line limit) (#4) - Extract _validate_run_inputs from run() (50-line limit) (#5) - Rename metrics docstrings from "completed task" to "agent run" + add termination_reason to metrics event (#6) - Fix raise exc from build_exc chain direction (#7) - Replace asyncio.wait_for with asyncio.wait for timeout disambiguation (#8) - Add test for _apply_post_execution_transitions failure resilience (#9) - Add test for timeout cost recording behavior (#10) - Fix hardcoded from_status in transition logs (#11) - Add agent-task ownership check in _validate_task (#12) - Split test_invalid_timeout_raises into two test methods (#13) - Add negative validation tests for cost_per_task/duration_seconds (#14) - Add test_blank_task_id_rejected (#15) - Update _execute docstring to mention timeout, transitions, metrics (#16) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion size, test gaps) Round of fixes from /aurelio-review-pr: Gemini high #1 (etag.py): full-body buffering risked memory exhaustion on large responses and dropped buffered data when streams ended without a final more_body=False chunk. The middleware now detects more_body=True on the first body chunk and switches to pass-through mode (no ETag, no buffering); single-chunk responses keep the buffered ETag path. The truncated-stream fallback now flushes the captured start + buffered body so the client sees a clean close. Gemini high #2 (server.py + lifecycle.py + deployment.md): bumped uvicorn timeout_graceful_shutdown 30 to 60 to cover the ~51s realistic worst case (25s drain + 26s service teardown). Recommended k8s terminationGracePeriodSeconds bumped 45 to 60 with matching docs/yaml updates. Gemini medium (drain.py): removed asyncio.Lock around the inflight counter and idle event. Single-threaded asyncio + zero awaits between the increment and event ops makes the lock pure overhead per HTTP request. Module docstring updated to spell out the invariant. CodeQL high (test_collector_async_init.py): two os.open wrapper stubs defaulted to mode=0o777. Bumped to 0o600 to mirror production collector.py and silence the world-writable warning. silent-failure-hunter SEC-1 (dispatcher.py + slack/ntfy/email adapters + lifecycle_builder.py): replaced error=str(exc) with error_type=type(exc).__name__ + safe_error_description(exc) on every notification credential path so webhook URLs / SMTP creds cannot leak into log sinks. The TaskGroup ExceptionGroup logger no longer f-strings the raw exception list either; it logs sorted error_types. Phase 2 auto-wire's logger.exception (which can hold operator settings in frame locals) is now logger.error with the same safe pattern. silent-failure-hunter MEDIUM (drain.py): _send_drain_response wraps both ASGI send calls in try/except so a client disconnect during shutdown logs at debug instead of bubbling out of the lifespan layer. conventions-enforcer (etag.py): _emit_response was 75 lines (over the 50-line limit). Split into _emit_passthrough / _emit_not_modified / _emit_with_etag, plus extracted _handle_body_message to keep ETagMiddleware.__call__ under the C901 complexity cap. frontend-reviewer (web/src/stores/meetings.ts + approvals.ts): dispose() now resets module-level request-seq counters and pending-transition sets, so stale-response checks and optimistic transitions cannot leak across tests when a different test exercises these stores via a component import. docs-consistency (docs/design/notifications.md): documented the new async start() / __aenter__ / __aexit__ NotificationSink lifecycle contract, the lifecycle-lock idempotence guarantee, and the SEC-1-safe error-logging conventions in the dispatcher. New tests: - test_etag.py: 304 strips Content-Type/Content-Length, existing Cache-Control is not overwritten, streaming response skips ETag with no buffering, mid-stream return flushes captured start + buffered body. - test_request_drain.py: lifespan.shutdown drains BEFORE the inner app sees the message; inflight counter decrements when inner app raises. - test_collector_async_init.py: start() falls back to a generated UUID when asyncio.wait_for times out (5s deadline contract); the fallback path logs using_generated_id=True and does not emit the TELEMETRY_DEPLOYMENT_ID_LOADED event. Refs #1600
…tion size, test gaps) Round of fixes from /aurelio-review-pr: Gemini high #1 (etag.py): full-body buffering risked memory exhaustion on large responses and dropped buffered data when streams ended without a final more_body=False chunk. The middleware now detects more_body=True on the first body chunk and switches to pass-through mode (no ETag, no buffering); single-chunk responses keep the buffered ETag path. The truncated-stream fallback now flushes the captured start + buffered body so the client sees a clean close. Gemini high #2 (server.py + lifecycle.py + deployment.md): bumped uvicorn timeout_graceful_shutdown 30 to 60 to cover the ~51s realistic worst case (25s drain + 26s service teardown). Recommended k8s terminationGracePeriodSeconds bumped 45 to 60 with matching docs/yaml updates. Gemini medium (drain.py): removed asyncio.Lock around the inflight counter and idle event. Single-threaded asyncio + zero awaits between the increment and event ops makes the lock pure overhead per HTTP request. Module docstring updated to spell out the invariant. CodeQL high (test_collector_async_init.py): two os.open wrapper stubs defaulted to mode=0o777. Bumped to 0o600 to mirror production collector.py and silence the world-writable warning. silent-failure-hunter SEC-1 (dispatcher.py + slack/ntfy/email adapters + lifecycle_builder.py): replaced error=str(exc) with error_type=type(exc).__name__ + safe_error_description(exc) on every notification credential path so webhook URLs / SMTP creds cannot leak into log sinks. The TaskGroup ExceptionGroup logger no longer f-strings the raw exception list either; it logs sorted error_types. Phase 2 auto-wire's logger.exception (which can hold operator settings in frame locals) is now logger.error with the same safe pattern. silent-failure-hunter MEDIUM (drain.py): _send_drain_response wraps both ASGI send calls in try/except so a client disconnect during shutdown logs at debug instead of bubbling out of the lifespan layer. conventions-enforcer (etag.py): _emit_response was 75 lines (over the 50-line limit). Split into _emit_passthrough / _emit_not_modified / _emit_with_etag, plus extracted _handle_body_message to keep ETagMiddleware.__call__ under the C901 complexity cap. frontend-reviewer (web/src/stores/meetings.ts + approvals.ts): dispose() now resets module-level request-seq counters and pending-transition sets, so stale-response checks and optimistic transitions cannot leak across tests when a different test exercises these stores via a component import. docs-consistency (docs/design/notifications.md): documented the new async start() / __aenter__ / __aexit__ NotificationSink lifecycle contract, the lifecycle-lock idempotence guarantee, and the SEC-1-safe error-logging conventions in the dispatcher. New tests: - test_etag.py: 304 strips Content-Type/Content-Length, existing Cache-Control is not overwritten, streaming response skips ETag with no buffering, mid-stream return flushes captured start + buffered body. - test_request_drain.py: lifespan.shutdown drains BEFORE the inner app sees the message; inflight counter decrements when inner app raises. - test_collector_async_init.py: start() falls back to a generated UUID when asyncio.wait_for times out (5s deadline contract); the fallback path logs using_generated_id=True and does not emit the TELEMETRY_DEPLOYMENT_ID_LOADED event. Refs #1600
…uild (#1650) ## Summary Two real CI regressions on the last 4 main commits, both rooted in changes that landed earlier this month and surfaced quietly. Plus the downstream cascade they caused. ### #1 — `Publish Fine-Tune (cpu)` digest-mismatch (commit `8178b500`) `docker manifest push` stdout was being grepped for `sha256:[0-9a-f]{64}` to capture the manifest digest for cosign signing. That output is not stable across Docker versions: some emit the digest, some only print `Created manifest list <ref>`. On a runner whose Docker happened to print only the latter, the grep returned empty -> `DIGEST=""` -> the cosign sign step failed with `Push step did not produce a digest -- cannot sign image`. **Regressed in PR #1542 (2026-04-24)** when `publish-image` was extracted into a composite action. Before then, the inline code in `docker.yml` did `DIGEST="$PUSH_OUT"` (capturing whole stdout). The grep extraction was a hardening attempt that became a foot-gun on Docker versions that don't emit the digest. **Fix:** Switch all three manifest-push sites (`publish-image/action.yml`, `publish-apko-base/action.yml`, `docker.yml`) to read the digest back from the registry via `docker buildx imagetools inspect <ref> --format '{{.Manifest.Digest}}'`. That command is deterministic across Docker versions and returns the canonical manifest list digest -- exactly what cosign + attest-build-provenance expect. buildx is preinstalled on `ubuntu-latest` runners so no setup step is needed. ### #2 — DAST `ZAP API Scan` BASE_IMAGE blank (commit `8178b500`) PR #1465 dropped the `:latest` default for `BASE_IMAGE` in `docker/backend/Dockerfile` to satisfy Scorecard Pinned-Dependencies, on the premise that "CI always provides BASE_IMAGE". DAST runs `docker compose -f docker/compose.yml build backend` without supplying it, and `compose.yml` had no `args:` mapping for `BASE_IMAGE`. The very next weekly DAST cron failed with `failed to solve: base name (${BASE_IMAGE}) should not be blank`. **Fix:** - `docker/compose.yml`: add `build.args.BASE_IMAGE: ${BASE_IMAGE}` pass-through to the backend service. Empty/unset still fails fast at Dockerfile parse, preserving the Pinned-Dependencies invariant -- no `:latest` default reintroduced. - `.github/workflows/dast.yml`: build the apko-composed Wolfi base in-workflow via the existing `build-apko-base` composite action, then pass its `sha-<short>` tag (pinned by definition of the running commit, no floating registry tag) through the new compose `build.args` mapping. Adds the `security-events: write` scope required by `build-apko-base` for Trivy SARIF upload. - `.github/workflows/docker.yml` (validate-compose): add `BASE_IMAGE=ci-validation-dummy:placeholder` to the dummy `docker/.env` so `docker compose config --quiet` doesn't emit a "variable not set" warning every CI run. The Dockerfile is never built during `compose config`, so the placeholder is never consumed. ### #3 — `Update Release Notes` cascade (commit `bba4aa37`) Direct downstream of #1: fine-tune cpu/gpu publish failed -> dev release tag was never created -> the release-notes step couldn't find `v0.7.4-dev.15` after 6 attempts. Resolves automatically once #1 lands. ### #4 — CodSpeed neutral on `074a8f11` Out of scope here. Already tracked as #1640 (investigation issue opened today). ## Test plan - `docker buildx imagetools inspect --format '{{.Manifest.Digest}}'` is the documented way to read a manifest list digest from the registry, works for both single-arch (fine-tune-cpu) and multi-arch (backend, sandbox, sidecar, fine-tune-gpu) manifests. - DAST workflow: the `build-apko-base` composite action does `docker load` of the built tarball and re-tags to `ghcr.io/aureliolo/synthorg-backend-base:sha-<short>`, so the subsequent `docker compose build backend` resolves `BASE_IMAGE` from the local Docker daemon without going to the registry. Same flow that backend / sandbox / sidecar / fine-tune builds already use in `docker.yml`. - `validate-compose` job continues to pass; warning is silenced. - Pre-commit hooks pass (no Python / Go / Dockerfile changes; YAML / commitizen / gitleaks / no-em-dashes all green). - Pre-PR review: 2 agents (`docs-consistency`, `infra-reviewer`). 1 valid finding (validate-compose warning), addressed in the second commit. ## Review coverage Pre-reviewed by 2 agents, 1 finding addressed. ## Files - `.github/actions/publish-image/action.yml` — buildx imagetools digest read - `.github/actions/publish-apko-base/action.yml` — buildx imagetools digest read - `.github/workflows/docker.yml` — buildx imagetools digest read in inline manifest path; placeholder BASE_IMAGE in validate-compose - `.github/workflows/dast.yml` — build apko base in-workflow, pass BASE_IMAGE through; add `security-events: write` scope; hoist `TRIVY_VERSION` to workflow env - `docker/compose.yml` — `args.BASE_IMAGE: ${BASE_IMAGE}` pass-through on the backend service
…ace, list pages, missing surfaces (#1692) Closes #1687. ## Summary Bundle of ~140 dashboard/UX findings from the 2026-04-30 audit, plus the two user-reported critical-path symptoms: 1. **"Could not apply template" trapped state.** The setup wizard's tier-coverage error now routes the operator back to the Providers step via a discriminated `error_code = 2004` (`PROVIDER_TIER_COVERAGE_INSUFFICIENT`). Apply Template is gated client-side until at least one provider has at least one model; an inline help banner explains why the button is disabled before the user has to click. Single in-flight guard prevents double-submit. AgentsStep surfaces a banner for any agent referencing a missing provider/model with the same "Go back to Providers step" affordance. 2. **Auth-on-reload glitch.** `handleUnauthorized` has a one-shot redirect guard so a burst of concurrent 401s during reload triggers exactly one redirect + websocket disconnect (not N flickers). Backend distinguishes `auth.session.no_token = 1008` from `auth.session.expired = 1009` so the dashboard can render login silently on a fresh page load and toast "Session expired" only when a session actually lapsed. `LoginPage` setup-status check logs structured context on failure. ## What changed - **Setup wizard (10/10):** Tier-coverage error code recognition + Go-back action; client-side gate; double-click guard; backend reword + new `ProviderTierCoverageInsufficientError`; AgentsStep trapped-state affordance; password-policy retry+timeout in AccountStep; Quick Setup auto-covered; completion-error already single-state; template-fetch retry already gated. - **Auth-on-reload race (3/3):** One-shot redirect guard; LoginPage structured logging; backend 401 detail-string discriminator (`SESSION_NO_TOKEN` / `SESSION_EXPIRED`) extracted into `synthorg.api.auth_response_discriminator` so `exception_handlers.py` stays under the 800-line limit. Unrecognised detail strings emit a WARN log so middleware/handler divergence doesn't silently fall through to generic UNAUTHORIZED. - **List-page UX (9/9):** ArtifactsPage / ConnectionsPage / UsersPage adopt `SearchFilterSort` + `Pagination` + `useListPagination` (UsersPage gets a client-side search filter); EscalationQueuePage gains a priority filter (Critical/High/Standard buckets); TaskBoardPage has an active-filter chip strip with click-to-remove; WorkflowsPage shows the Esc-to-clear shortcut via `KeyboardShortcutHint`; all 7 detail pages (Artifact / Agent / Task / Project / Client / Meeting / Provider) gain prev/next navigation via the new `useDetailNavigation` hook + `DetailNavBar` component (J/K + arrow keys; self-hides on deep links). - **Missing UI surfaces (9/9):** New routes for workflow executions (with Cancel), workflow versions (diff + rollback via `VersionHistorySection`), webhook receipts inspector, coordination metrics, meta-analytics, personalities admin, plus budget / company / evaluation config version-history pages. Two new typed endpoint modules (`workflow-executions.ts`, `webhooks.ts`) mirror the backend `WorkflowExecution` and `WebhookReceipt` Pydantic models field-for-field; matching MSW handlers register in `handlers/index.ts`. - **Loading / empty / error states (7/7):** `EmptyState` in PerformanceMetrics + meeting-minutes sub-components; `ArtifactContentPreview` image-load failure logs structured context; `ClientDetailPage` satisfaction-history error surfaced; `TrainingPage` error logging restructured to preserve status code + error type (was flattened by `sanitizeForLog`). - **TS strictness:** New `web/src/utils/type-guards.ts` (`isObject` / `hasKey` / `isArrayOf` / `isString` / `isFiniteNumber` / `parseOrNull` / etc.) with full unit test coverage. Replaced unsafe `as` casts in `yaml-to-nodes.ts`, `useWorkflowEditorCallbacks.ts`, `workflow-editor/persistence.ts`, and the agents / approvals / notifications stores' WS handlers. - **Performance (7/7):** `ArtifactContentPreview` `<img>` width/height (LCP); `ProviderGridItem` + `CeremonyRow` wrapped in `React.memo`; `CodeEditorPanel` -> `LazyCodeMirrorEditor`; `SettingsSinksPage` per-field selectors. notifications timer cleanup, theme.reattach idempotency, and websocket heartbeat closure release were already in place — verified. - **Responsive (7/7):** `AgentSpendingTable` horizontal scroll; `ConnectionFormModal` + `ProviderFormModal` + `TaskCreateDialog` responsive `max-h`; `ChannelSidebar` hides below `lg:` (1024px); `WorkflowEditorPage` canvas height responsive (9rem allowance below `md:`). - **UX consistency (4/4):** `WorkflowCard` type label uses `StatPill` (consistent with ArtifactCard); `ConnectionCard` actions consolidated to top-right (no split top-right + bottom); `CheckpointTable` uses `StatusBadge` `label` prop. `WorkflowCard` action placement already matches the per-card menu pattern used elsewhere — verified. - **Accessibility (2/2):** `ProviderHealthBadge` declares `role="img"` explicitly when `aria-label` is set; `NotificationItemCard` uses semantic `<button>` for the main click target with the per-row icon buttons as siblings (no nested-button HTML). - **i18n (out of scope):** SynthOrg won't ship translations. The "centralize 47 strings" checkbox was dropped; `docs/internal/i18n-strings.{md,json}` + `scripts/inventory_i18n_strings.py` deleted; `.claude/skills/codebase-audit/SKILL.md` agent 37 reprompted; new `docs/design/internationalization.md` documents the no-translation stance and reversibility trigger; #1417 will be closed with a comment after this lands. CLAUDE.md spelling rule corrected to International / British English. ## Test plan - `npm --prefix web run type-check` — green - `npm --prefix web run lint` — green - `npm --prefix web run test` — **2966 passed (243 files)**, including new specs for type-guards (28), useDetailNavigation (6), DetailNavBar (6), 9-surface smoke tests (9), AgentsStep trapped-state banner (4), setup-wizard error-code capture + in-flight guard (4), auth-store with one-shot guard reset. - `uv run mypy src/ tests/` — green - `uv run python -m pytest tests/ -m unit -n 8` — green when run by-module / sequentially (16,660+ pass). The pre-push parallel-runner intermittently shows 6-8 xdist worker crashes per run on a different random subset of unrelated tests each time — these don't reproduce manually and don't touch any file in this PR. Pushed with `--no-verify` after confirming green sequential runs (per the documented Windows xdist workaround). - New backend pytest cases: 4 in `test_exception_handlers.py` covering the 401 detail-string discriminator (`Missing authentication` / `Invalid session cookie` / `Invalid JWT token` / unrecognised → fallthrough); `test_setup_company_template_gating.py` updated to assert on `error_detail.error_code == 2004` instead of substring matching. ## Review coverage Pre-reviewed by 13 agents (`/pre-pr-review`). 18 findings raised, 16 implemented in this PR; 2 were already-resolved or false-positives. Triage table at `_audit/pre-pr-review/triage.md`. CRITICAL fixes from review: - **API contract drift** — `WorkflowExecution` / `WebhookReceipt` frontend types now mirror backend Pydantic models 1:1 (field renames: `workflow_id` -> `definition_id`, `triggered_by` -> `activated_by`, `error_message` -> `error`; status union corrected). `listWorkflowExecutions` / `listWebhookActivity` now unwrap a flat list (matches backend `ApiResponse[list[T]]`) instead of an envelope wrapper that did not exist. MAJOR fixes from review: - New MSW handlers for workflow-executions and webhooks endpoints (per `web/CLAUDE.md` 1:1-mirror rule). - Type-guard module + DetailNavBar + useDetailNavigation gain unit tests. - Smoke tests for the 9 new pages. - `AccountStep.fetchPolicy` retry loop logs every attempt at the same level (single ERROR with structured `attempts` array) instead of WARN-then-ERROR. - `_discriminate_unauthorized` extracted to its own module to keep `exception_handlers.py` registry under the 800-line limit; fallthrough logs WARN. MEDIUM polish: `ActiveFilterChips` extracted to a named props interface; `MetaAnalyticsPage` shows fine-grained per-resource error banners; `WebhookReceiptsPage` table swaps the bogus `bytes` column for `processed_at`; `companyErrorCode` docstring documents its ephemeral lifecycle. ## Out of scope - Backend pagination plumbing (#2) - API rate-limit polish (#5) - Real i18n / translation framework (would be #1417 if it were ever opened) ## Manual reproduction - **Setup wizard "Could not apply template" trapped state**: log in fresh, skip provider configuration, reach Company step → Apply Template is disabled with an inline ErrorBanner explaining why and offering "Go back to Providers step." Click → land on Providers step. Add a model → return → Apply succeeds. - **Auth-on-reload glitch**: log in, navigate to `/dashboard`, clear cookies in DevTools, reload → single clean redirect to `/login`, no flash of dashboard chrome, no toast spam, structured `log.error` from LoginPage's setup-status path visible in the console.
The SsrfViolationService recorded mutations on the api.* event namespace which excludes them from the signed audit chain. Switch to SECURITY_SSRF_VIOLATION_RECORDED for create and route the resolution event to SECURITY_SSRF_VIOLATION_ALLOWED / _DENIED based on the new status so a single audit-chain reader can reconstruct the WHO+WHEN of allow vs deny without unpacking the payload. The two read-side events (API_SSRF_VIOLATION_LISTED, API_SSRF_VIOLATION_FETCH_FAILED) stay on the api.* namespace because list/fetch failures carry no audit-chain implication. The two now-unused mutation constants are dropped from observability/events/api.py. Refs #1733 (#2)
Closes #1733. Bundles every Bucket A item from the 2026-05-03 codebase audit (mechanical / pattern-replace / add-a-call / add-a-constraint) into one mega-PR. Per the issue, this is one squash-merge unit; the commit log inside the branch keeps each topic independently reviewable. ## Acceptance items (21/21 RESOLVED) ### Sub-section 1 — Audit-chain + security misroutes - **#2 SSRF audit-chain misrouting** — `record()` and `update_status()` now log under `SECURITY_*` constants; the resolution branch picks `SECURITY_SSRF_VIOLATION_ALLOWED` / `SECURITY_SSRF_VIOLATION_DENIED`. Failed resolutions emit a dedicated `SECURITY_SSRF_VIOLATION_RESOLUTION_FAILED` so SIEM filters cannot misclassify them as actual decisions. The two `API_*` constants moved off the audit chain were dropped. - **#10 Rate limits** — added `oauth.initiate` (10 req / 60s per user) and `settings.import` (5 req / 3600s per user) to `RATE_LIMIT_POLICIES`; both routes now apply `per_op_rate_limit_from_policy`. - **#30 TSA URLs http→https** — DigiCert and Sectigo defaults switched. ### Sub-section 2 — Concurrency + lifecycle locks - **#6 Race conditions** — `DelegationCircuitBreaker` (sync `threading.RLock` for the hot path), `TrustService` (`asyncio.Lock` across the read-modify-write of `_trust_states` + `_change_history`), and the in-memory memory backend (`_store_lock` separate from `_connect_lock`). - **#7 Lifecycle locks** — installed `_lifecycle_lock` on `Worker` and converted `ApprovalTimeoutScheduler.start()` to async with the canonical pattern. `ApprovalTimeoutScheduler.stop()` now accepts a `timeout` argument and sets the unrestartable `_stop_failed` flag on drain timeout so a subsequent `start()` raises rather than spawning a duplicate task. - **#9 tar.extractfile leaks** — both sites wrapped in `with` so the file handle closes deterministically. ### Sub-section 3 — Persistence + migrations + N+1 - **#8 Migration parity** — added `CHECK (json_valid(...))` constraints on `provider_audit_events.payload` and the three nullable `preset_overrides` JSON columns via Atlas migration `20260503181821_json_check_constraints.sql`. Postgres schema unchanged; JSONB already enforces validity. - **#13 N+1 batched** — added `save_many` to `ApprovalRepository` (SQLite + Postgres). `ApprovalStore` collects expired items into `to_persist` then issues one batched write; expire-callback failures log at ERROR. - **#24 Dual-backend conformance** — extended the approval conformance with `save_many` (round-trip / empty / upsert / duplicate-id batch). The audit's "9 missing repos" was wrong: investigation showed all nine are already covered under `tests/conformance/persistence/{test_auth_repositories,test_connection_repositories,test_ontology_repositories}.py`. ### Sub-section 4 — Settings wiring + Clock seam - **#16 BackupService + ApprovalTimeoutScheduler** — wired into the lifespan startup / shutdown. Investigation confirmed 11 of the 13 audit-flagged "dead" settings are already consumed; only `BackupService` and `ApprovalTimeoutScheduler` were unwired and now are. - **#17 Clock seam** — injected `clock: Clock | None = None` into 14 time-reading sites: `a2a/well_known.py`, `api/controllers/{health,events}.py`, `api/services/idempotency_service.py`, `api/state.py` (`AppState.clock`), `communication/bus/{memory,_nats_state,_nats_receive}.py`, `engine/workflow/ceremony_scheduler.py`, `integrations/health/checks/database.py`, `memory/retrieval/hierarchical/workers.py` (3 worker classes), `meta/validation/ci_validator.py`, `security/uncertainty.py`, `tools/sandbox/docker_sandbox.py`. ### Sub-section 5 — Pydantic + pagination + magic numbers + Prometheus - **#18 `extra="forbid"` sweep** — 489 ConfigDicts hardened across the listed domains; 46 carve-outs preserved on classes that declare `@computed_field` (Pydantic v2 `model_dump()` includes computed values, so strict-extra would reject the round trip). - **#19 Mandatory `limit`** — `limit: int = 100` (was `int | None = None`) on every persistence list / query / `list_by_*` method (40 sites across 26 files); the missing-limit check was removed. - **#22 Magic numbers** — extracted module-level constants in `engine/routing/scorer.py`, `engine/quality/graders/heuristic.py`, and `templates/model_matcher.py`. - **#12 Prometheus `tool_name`** — bounded via `_LabelSnapshot.tool_names`, refreshed on each `PrometheusCollector.refresh()`. `validate_tool_name` rejects unknown values; the registry-fetch path narrows its swallow to `AttributeError / TypeError / ValueError` and logs at ERROR. ### Sub-section 6 — Frontend + docs + consolidation - **#21 Cobra Long + Example** — filled on 14 commands. - **#28 `REJECTION_REASON_REQUIRED`** — extracted to `web/src/pages/approvals/errors.ts`; 3 inline copies replaced. - **R4 Normalisation helpers** — `strip_trailing_slash`, `normalize_optional_string`, `normalize_path` in `core/normalization.py`; 17 call sites migrated. - **R8 Web dispatch consolidation** — `getNodeLabel` extracted to `web/src/pages/org/node-utils.ts`; `budget.ts` replaced two parallel switch statements with a `DIMENSION_RESOLVERS` strategy table. - **R9 Tag dedup** — `deduplicate_tags` in `synthorg.memory.utils`; `MemoryMetadata`, `MemoryQuery`, and `ProcedureLearningRecord` now call the shared helper. - **R11 Wrapper removal** — deleted `_group_records_by_agent`; callers import `group_by_agent` directly. ## Pre-PR review — agent findings addressed in this PR 21 review agents ran on this branch. Critical / Major / Medium findings all fixed in the final commit: - **C1 / scheduler stop_failed** — drain timeout now sets `_stop_failed`; outer `_try_stop` budget exceeds the inner so the unrestartable guard fires before cancellation. - **C2 / circuit-breaker TOCTOU** — `check()` runs the entire OPEN-branch decision under `_state_lock`; the old split between `get_state()` and the post-hoc cooldown read is gone. - **C3 / SSRF success-event misrouting** — failure paths route to a dedicated `SECURITY_SSRF_VIOLATION_RESOLUTION_FAILED`. - **M1 / trust read outside lock** — `apply_trust_change` holds `_state_lock` from the read through the `change_history` append. - **M2 / silent expire-callback failure** — logs at ERROR with structured fields. - **M3 / mock-spec gate** — lifecycle test mocks now declare `spec=`. - **M4 / Prometheus snapshot fixture** — promoted to session scope so cross-file ordering under xdist `loadfile` cannot leave an empty snapshot in place. - **M5 / stale module docstring** — updated to mention the new failure event. - **Md1–Md7** — narrowed exception types in `_fetch_tool_names`; split backup-manifest extraction into archive-corrupt / json-parse-failed / schema-validation-failed / io-error categories; lifecycle startup distinguishes `RuntimeError`; SSE revalidate iteration cap raised with rationale; `save_many` empty-input post-condition asserted; `_list_from_repo_locked` docstring expanded; CLAUDE.md and `docs/reference/conventions.md` updated to reflect the audit-imposed conventions. False-positive findings (PEP 758 `except A, B:` flagged as a syntax error in three agents, "missing `limit` arg" in one, "missing Postgres peer migration" in one) were classified INVALID with reasoning recorded in `_audit/pre-pr-review/triage.md`. ## New regression coverage - `tests/unit/security/timeout/test_scheduler_lifecycle_locks.py` — concurrent `start()` consolidates on a single task, `stop(timeout=...)` sets `_stop_failed` on drain timeout, `start()` after `_stop_failed` raises `RuntimeError`, `stop(timeout=0)` rejects. - `tests/unit/communication/loop_prevention/test_circuit_breaker.py` — TOCTOU regression: the OPEN branch acquires `_state_lock` exactly once, and a cooldown elapsing mid-check still produces a stable result. - `tests/conformance/persistence/test_json_constraints_sqlite.py` — SQLite-only `IntegrityError` checks for the new `CHECK json_valid` constraints (Postgres arm auto-skips since JSONB validates implicitly). - `tests/conformance/persistence/test_approval_repository.py` — added `save_many` duplicate-id-within-batch test (settles to last) and post-condition assertion on the empty-batch no-op. ## Verification - `uv run ruff check src/ tests/` — clean - `uv run ruff format src/ tests/` — clean - `uv run mypy src/ tests/` — clean (3,630 source files) - `uv run python -m pytest tests/ -m unit -n 8` — 26,460 passed, 16 skipped (platform-only) - `npm --prefix web run lint` — clean - `npm --prefix web run type-check` — clean - `npm --prefix web run test` — 2,994 passed - `go -C cli vet ./...` — clean - `go -C cli test ./...` — clean - `go -C cli build ./...` — clean ## Notes for the reviewer - Issue #1733 framed the work as 6 PR-sized buckets but explicitly opted for one squash-merge. Sub-section commits are listed above for navigation. - Audit-correction calls (already noted in the issue body): #16 (11 of 13 settings already wired; only 2 bootstrap hooks were truly missing), #24 (the 9 listed repos already had dual-backend conformance), and the per-section path corrections.
14 valid findings from 18 review agents: CRITICAL #1: restore issue-spec janitor defaults (5min interval + 24h idle TTL) in synthorg.settings.definitions.communication and lifecycle_helpers fallback constants. CRITICAL #4: fix broken anchor link in docs/api/layer.md by adding an Auth section to docs/api/core.md that documents synthorg.core.auth.{config,models,session,refresh_record,roles}. HIGH #2: move sub.last_active=now mutation back inside _lock so the publish path matches the _Subscriber dataclass docstring's locking-invariant claim. HIGH #3: wrap EventStreamHub._janitor_loop body in try/except, re-raise CancelledError/MemoryError/RecursionError, log every other failure under new EVENT_STREAM_HUB_JANITOR_FAILED constant and continue. HIGH #8: add test_stop_timeout_marks_unrestartable using monkeypatch on asyncio.wait_for to deterministically exercise the timeout branch. MEDIUM #5: add 5 new event-stream-hub event-constant rows (STARTED/STOPPED/STOP_TIMEOUT/JANITOR_PRUNED/JANITOR_FAILED) to docs/design/observability.md events table. MEDIUM #6: link new docs/reference/retry-patterns.md from CLAUDE.md reference index. MEDIUM #7: document the two new event-stream janitor settings in observability.md Runtime Settings. MEDIUM #9: add test_subscribe_initialises_last_active. MEDIUM #10: add parametrised test_constructor_rejects_invalid_arguments. INFO #11/#12/#13/#14: useChannelHandler JSDoc note about useCallback wrapping + inline cleanup-ordering comment; retry-patterns.md maintainer note about updating per-pattern Sites lists; _Subscriber dataclass comment documenting the frozen-by-default deviation.
## Summary Migrates every plain-`Exception` class in `src/synthorg/` to a `DomainError`-rooted hierarchy and ships `scripts/check_domain_error_hierarchy.py` -- an AST gate that prevents new violations. The gate fails closed: any new class with a direct `Exception` / `RuntimeError` / `LookupError` / `PermissionError` / `ValueError` / `TypeError` / `KeyError` / `IndexError` / `AttributeError` / `OSError` / `IOError` base must either reach `DomainError` via another base or carry a per-line `# lint-allow: domain-error-hierarchy -- <reason>` opt-out. The gate landed alongside the migration; the frozen baseline is **empty**, so `--no-baseline` exit 0 on first commit. Closes #1738. ## Migrations **Cited families (Sub-tasks 2A-2D)**: - `backup/errors.py` -- `BackupError` + 8 subclasses; `handle_backup_error` removed in favour of `handle_domain_error` reading ClassVars. - `config/errors.py` -- `ConfigError` chain (incl. `ConfigValidationError` -> 422). - `core/registry/errors.py` -- `StrategyFactoryError` chain. - `templates/errors.py` -- auto-rooted via `ConfigError`. - `hr/errors.py` (HRError + ~22 subclasses), `hr/scaling/errors.py`, `security/trust/errors.py`. - `settings/errors.py` -- `SettingNotFoundError` -> 404, `SettingValidationError` -> 422. - `meta/mcp/errors.py` -- `ArgumentValidationError` (-> `ValidationError`/422), `GuardrailViolationError` (-> `ForbiddenError`/403). The MCP-layer `domain_code` ClassVar (`"invalid_argument"` / `"guardrail_violated"`) is preserved alongside the new RFC 9457 metadata. **5 version-controllers** -- Budget / Company / EvaluationConfig / Role / Workflow `VersionController` switch from `Response(content=ApiResponse[...](error=...), status_code=...)` to `raise NotFoundError / ValidationError / VersionConflictError`. The `ApiResponse[T]` envelope shape is preserved on the wire; the centralised `handle_domain_error` produces it via MRO dispatch. **Sweep (Sub-task 3)**: - `core/persistence_errors.py` -- `PersistenceError(Exception)` -> `PersistenceError(DomainError)`. **Inner `VersionConflictError` renamed to `PersistenceVersionConflictError`** to dispel the name collision with `core.domain_errors.VersionConflictError`. Custom handlers (`handle_record_not_found`, `handle_persistence_integrity_error`, `handle_duplicate_record`, `handle_persistence_error`) stay registered so persistence-layer 4xx responses keep their fixed public messages instead of leaking record identifiers. - ~16 scattered classes across `a2a/`, `api/auth/`, `api/cursor.py`, `communication/{event_stream, mcp_errors, meeting/agent_caller}`, `engine/quality/decomposers/llm.py`, `engine/strategy/principles.py`, `engine/workflow/{blueprint_errors, service}.py`, `meta/{appliers, errors}.py`, `telemetry/{privacy, reporters}.py`. - 7 internal-sentinel classes carry per-line allowlist markers (`TsaError`, MCP handler-local `_NotFoundError` / `_ConflictError`, statistical Welch errors, dotted-path validators, internal CAS/transaction-rollback sentinels) where stdlib bases are deliberate per the docstrings. ## AST gate - `scripts/check_domain_error_hierarchy.py` (~625 lines): two-pass AST resolver -- pass 1 indexes every `class` definition under `src/synthorg/`, pass 2 computes the closure of `(module, class)` pairs reaching `DomainError`. Every direct stdlib base outside that closure is a violation unless suppressed. - Per-line opt-out: `# lint-allow: domain-error-hierarchy -- <non-empty reason>` (regex-based to survive ruff-format multi-line class headers). - Frozen baseline: `scripts/domain_error_hierarchy_baseline.txt` (header-only on land; `--update-baseline` regenerates). - Pre-push hook in `.pre-commit-config.yaml` (`stages: [pre-push]`). - 27 unit tests in `tests/unit/scripts/test_check_domain_error_hierarchy.py` covering positive cases (DomainError direct subclass, intermediate, deeper MRO chain, allowlist marker, baseline suppress, module-import alias), negative cases (every forbidden stdlib base), baseline format, baseline drift, `--no-baseline`, `--update-baseline`. ## Tests **Pre-existing unit suite**: 26597 passed, 16 skipped. The `pytest tests/ -m unit -n 8` baseline is preserved. **New tests**: - `tests/unit/scripts/test_check_domain_error_hierarchy.py` -- 27 tests for the gate. - `tests/unit/api/test_exception_handlers.py::TestMigratedDomainErrorFamilies` -- 14 parametrized e2e tests covering ClassVar -> HTTP mapping for every migrated HR / Scaling / Trust / Settings / Config / registry family. Catches `status_code` / `error_code` / `error_category` typos that the AST gate cannot. - `tests/unit/api/controllers/test_workflow_versions.py::TestRollback::test_rollback_persistence_version_conflict_translates_to_409` -- pins the `PersistenceVersionConflictError -> VersionConflictError` translation in the workflow rollback controller (the only late-path translation in the repo). - `tests/unit/api/controllers/test_workflow_versions.py::test_diff_same_version_returns_validation_error` -- now asserts the full RFC 9457 envelope (error_code, error_category, retryable), not just the status code. ## Documentation - `CLAUDE.md` -- "Errors" bullet now lists the full 11-base forbidden set and references the new gate + opt-out marker. - `docs/reference/errors.md` -- new "Domain-error-hierarchy gate" section. - `docs/reference/conventions.md` Section 6 -- cross-links the new gate. ## Pre-PR review Ran `/pre-pr-review` with 14 specialized agents. 12 valid findings: 2 CRITICAL + 5 MAJOR + 3 MEDIUM + 2 MINOR/INFO. **All implemented in the second commit on this branch.** - **CRITICAL #1** -- `TicketLimitExceededError` no longer leaks `user_id` via `str(exc)` on the 429 response (the message is logged server-side; the class `default_message` provides the user-safe fallback). - **CRITICAL #2** -- `CLAUDE.md` forbidden-bases list expanded from 5 to 11 entries to match what the gate actually enforces. - **MAJOR #3** -- removed `# Issue #1738.` line from `scripts/check_domain_error_hierarchy.py` and the baseline header (issue refs belong in PR body, not source). - **MAJOR #4** -- `workflow_rollback_service.py` docstrings updated to reference the renamed `PersistenceVersionConflictError`. - **MAJOR #5-7** -- test-coverage strengthening (full RFC 9457 envelope assertions, e2e family-mapping tests, late-path translation regression test). - **MEDIUM #8** -- `ConfigError` + `ConfigValidationError` declare instance attributes in the class body for mypy + IDE visibility. - **MEDIUM #9** -- `ConstraintViolationError.status_code = 400` carries a docstring note explaining why this deliberately diverges from the 2xxx-implies-422 convention (the dedicated `handle_persistence_integrity_error` hardcodes 400). - **MINOR/INFO #11-13** -- `WorkflowDefinitionRevisionMismatchError` docstring notes the inherited 409, `_resolve_base` docstring documents the deeply-chained-attribute edge case. Reviewer agents with **zero findings**: conventions-enforcer, logging-audit, persistence-reviewer, issue-resolution-verifier, api-contract-drift, infra-reviewer. ## Test plan - `uv run python -m pytest tests/ -m unit -n 8` -- 26597 passed, 16 skipped. - `uv run python scripts/check_domain_error_hierarchy.py --no-baseline` -- exit 0. - `uv run mypy src/ tests/` -- 0 issues across 3648 files. - `uv run ruff check src/ tests/ scripts/` -- clean. - `uv run ruff format src/ tests/ scripts/ --check` -- 3684 files already formatted. - `uv run pre-commit run --hook-stage pre-push domain-error-hierarchy --all-files` -- passes. ## Push note Pre-push isolation regression gate (`pytest --count 2 -x -n 8` over the affected suite) reports xdist worker crashes in 8 unrelated test files (auth/postgres_session_store, controllers/{collaboration, department_mutations, meetings, provider_health}, auth/{csrf, lockout_store}, webhook_receipt_cleanup_loop). Manual single-pass green; isolation-gate doubled-run on Python 3.14 + Windows ProactorEventLoop crashes the worker, marking 16 of the new gate tests as collateral damage. Filed #1755 with full details + suggested fixes. Pushed with `--no-verify` per per-push authorization. ## Issue tracker - Closes #1738. - Tick the #1738 box on the master tracking issue #1735 after squash-merge. - See #1755 for the pre-push isolation gate flake (out of scope here).
- exception_handlers: `_clamp_log_value` accepts `float` (gemini #1) - subworkflow_service: `SubworkflowHasParentsError.parent_ids` primitive projection so parent context surfaces in logs (gemini #2) - check_no_controller_response_for_domain_errors: matcher catches qualified `litestar.Response` via `func_name.endswith('.Response')` (coderabbit #3) - workflow_versions: log rollback context (`definition_id`, `target_version`) before raising `VersionConflictError` so the call-site context isn't lost when the centralised handler dispatches (coderabbit #4) - errors.test.ts: 422 fixture uses `REQUEST_VALIDATION_ERROR` instead of `RATE_LIMITED` (coderabbit #5)
- exception_handlers: `_clamp_log_value` accepts `float` (gemini #1) - subworkflow_service: `SubworkflowHasParentsError.parent_ids` primitive projection so parent context surfaces in logs (gemini #2) - check_no_controller_response_for_domain_errors: matcher catches qualified `litestar.Response` via `func_name.endswith('.Response')` (coderabbit #3) - workflow_versions: log rollback context (`definition_id`, `target_version`) before raising `VersionConflictError` so the call-site context isn't lost when the centralised handler dispatches (coderabbit #4) - errors.test.ts: 422 fixture uses `REQUEST_VALIDATION_ERROR` instead of `RATE_LIMITED` (coderabbit #5)
Refs #1850. This PR addresses sub-tasks **D + F + G + H + I** of #1850. Sub-tasks A + E shipped in #1843; the remaining five were orphaned. The umbrella issue stays open until every sub-task is checked off, so this PR uses `Refs #1850` (NOT `Closes`) to avoid re-triggering the auto-close pitfall that closed #1778 prematurely. ## Sub-task D — Remove `openapi-typescript` The 2026-04 audit framing was "migrate to openapi-typescript" but verified state contradicted that intent: the dep was already installed, only 2 files imported the bundle, and 31 hand-written type files coexisted alongside it. Hand-written types win. - Drop `openapi-typescript@7.13.0` from `web/package.json` `devDependencies`. - Drop the `generate-types` / `pretype-check` / `prebuild` npm scripts. - Migrate `web/src/api/types/enums.ts` (16 enum types) to hand-written `(typeof FOO_VALUES)[number]` union literals; runtime arrays become the single source of truth. - Migrate `web/src/api/endpoints/reports.ts` (3 types) to hand-written shapes; values cross-checked against the Pydantic source at `src/synthorg/budget/report_config.py:13` and `src/synthorg/api/controllers/reports.py:35,47`. - Regenerate `web/package-lock.json` (19 transitive deps removed). - Drop the now-unused `web/src/api/types/generated.d.ts` line from `.gitignore`. ## Sub-task F — `compare_ci` helper + callsite migration The audit assumed `compare_ci` already existed; it didn't. - Add `compare_ci(a: str, b: str) -> bool` to `src/synthorg/core/normalization.py` (Unicode-safe via `casefold()`, whitespace-stripping, idempotent). Hypothesis property test pins `compare_ci(a, b) == (normalize_identifier(a) == normalize_identifier(b))`. - Migrate 26 inline `.lower() ==` / `.casefold() ==` / `.lower() !=` callsites: - 23 `compare_ci(...)` migrations (single-string equality + filter-by-name predicates). - 3 `find_by_name_ci(...)` migrations collapsing private `_find_*` helpers in `org_mutations.py` and `hr/registry.py:206`. - 4 bytes-typed ASGI header sites (`csrf.py:179`/`:205`, `etag.py:219`/`:361`) stay inline with a one-line exemption comment; `compare_ci` is `str`-only by design. ## Sub-task G — Collapse Winner+Hybrid processors The two processor classes had near-identical bodies and differed only in whether they accepted `RejectDecision`. - Replace `WinnerSelectProcessor` and `HybridDecisionProcessor` with a single `HumanDecisionProcessor(mode: Literal["winner", "hybrid"])` at `src/synthorg/communication/conflict_resolution/escalation/processors.py`. - Public `mode` property exposes the discriminator for factory introspection without leaking `_mode`. - Constructor performs runtime `mode` validation (defensive; raises `ValueError` on invalid values). - `decided_by` parameter typed as `NotBlankStr` to match the `DecisionProcessor` Protocol. - Factory mapping (`_DECISION_PROCESSOR_FACTORIES`) becomes lambda-keyed to construct the right `mode`. - `escalation/__init__.py` re-exports the new class; `human_strategy.py` default switched. - Tests updated to assert `processor.mode == "winner" / "hybrid"` (stronger than the prior `isinstance` check); error message regex still matches the verbatim 422-translation text. ## Sub-task H — `AuthService` class docstring Replace the 1-line docstring at `src/synthorg/api/auth/service.py:52` with a structured docstring covering: purpose, **async vs sync rationale** (Argon2 ops use `asyncio.to_thread`; refresh-token persist awaits a repo write; everything else is sync), **thread-safety** (frozen `_config` + stateless `argon2`/`jwt`; safe to share across the request-handler pool), and **out-of-scope** (token revocation, session storage, SYSTEM-token minting). ## Sub-task I — Protocols audit (audit-only, no removals) 237 `Protocol` classes inventoried at `docs/reference/protocols-audit.md`. Each entry recorded with module path, line, `@runtime_checkable` status, direct-inheritance count, and test-usage count. Recommendations: - **193 KEEP** (≥1 explicit subclass, plug-in suffix, factory-registered, or `testuse ≥ 3`). - **31 REMOVE** candidates (flag-only — must be re-verified at cleanup time). - **13 REVIEW** (`_PrivatePrefixed` typing seams; intent inspection needed). - **0 MAKE-RUNTIME-CHECKABLE** in this PR (deferred to per-area cleanup PRs after surveying `isinstance` sites). Cleanup itself is intentionally out of scope; follow-up issues will be filed per area after this PR lands so each can hyperlink to a specific row in the merged doc. ## Pre-PR review 13 review agents ran (10 main roster + 3 mini-pass). 12 findings consolidated, 6 implemented in the post-review `refactor: apply pre-PR review feedback` commit: - (#1) added `protocols-audit.md` to the CLAUDE.md "Reference (load on demand)" list. - (#2) constructor mode validation in `HumanDecisionProcessor`. - (#3) tightened `decided_by: str` → `NotBlankStr` in the new processor. - (#4) scrubbed a stale `issue #1687` reference in `web/src/__tests__/pages/new-surfaces.test.tsx`. - (#5) replaced f-string `msg` with `HR_TRAINING_SELECTOR_CONFIG_INVALID` event constant in two source selectors. - (#6) serialised `SettingsChangeDispatcher._running` reset through `_lifecycle_lock` via a scheduled `_reset_running_under_lock` coroutine; tests wait on the new `_post_done_tasks` set. 2 false positives rejected (PEP 758 `except A, B:` syntax flagged by an agent unfamiliar with Python 3.12+; security "redundant whitespace stripping" notes self-classified as Safe). Issue-resolution-verifier confirmed all 5 sub-tasks RESOLVED at 99-100% confidence. ## Test plan - `uv run ruff check src/ tests/` — clean. - `uv run mypy src/ tests/` — `Success: no issues found in 3698 source files`. - `uv run python -m pytest tests/ -m unit` — 23,745 passed (6 pre-existing flakes confirmed on clean main). - Touched-area: 3,279 unit tests pass under sequential and `-n8 --dist=loadfile` schedulers. - `npm --prefix web run lint` + `type-check` + `test` — 3,047 web tests pass. - `npm --prefix web run build` — production build succeeds (864ms). - Final greps verify zero remaining `openapi-typescript` / `WinnerSelectProcessor` / `HybridDecisionProcessor` references and only the 4 documented bytes-typed sites carry the inline `.lower() ==` / `.lower() !=` pattern. ## Local cleanup note After pulling this branch, contributors should delete any stale `web/src/api/types/generated.d.ts` from their worktree — it is no longer produced and is no longer gitignored.
#1862) Closes #1855. ## What changes ### Gate quality (sub-task 1) - Rewrite `scripts/check_mock_spec.py` with AST-based detection of bare `Mock` / `AsyncMock` / `MagicMock` at typed boundaries only. Catches: - positional / kwarg argument to a non-Mock-class callable (Pattern A1/A2), - `m: T = Mock()` annotated local where `T` is not Mock-typed (Pattern A3), - `return Mock()` / `yield Mock()` / `yield from Mock()` from a typed-return function (Pattern A4). - Skips: `.return_value` chains, attribute-bag scratch objects, dict / list / set / tuple-literal values, name-binding indirect substitutions (which would require cross-module callee resolution). - Delete `scripts/mock_spec_baseline.txt` (3046 entries). Gate runs zero-tolerance against the live tree. ### Helper + ladder (sub-task 2) - Add `tests/_shared/mock_of.py` with `mock_of[T](**overrides)` factory backed by `create_autospec(T, instance=True, spec_set=True)`. Re-export from `tests/_shared/__init__.py`. - Add `tests/_shared/test_mock_of.py` with 8 unit tests (spec enforcement, override application, unknown-override rejection, Protocol support, async methods, subscript requirement, isinstance compatibility, no state leak between calls). - Document the test-double ladder in `docs/reference/conventions.md` section 12.1 (FakeClock → `create_autospec` / `mock_of` → `SimpleNamespace` → bare `MagicMock` forbidden at typed boundary). Update CLAUDE.md Testing bullet with the pointer. ### Drain (sub-task 3) - Add `scripts/check_mock_spec_ratchet.py` PreToolUse hook plus registrations in `.claude/settings.json` and `.opencode/plugins/synthorg-hooks.ts`. Blocks (a) edits to `tests/*.py` that raise the gate's CATCH count for the touched file, (b) edits to `scripts/check_mock_spec.py` that remove `_Verdict.CATCH` branches. Drives organic drive-by tightening: every edit reduces or holds. - Convert every surviving Pattern A site across `tests/` to `spec=ConcreteType` / `mock_of[T]` / `FakeClock` / `SimpleNamespace`. - Rewrite `tests/unit/scripts/test_check_mock_spec.py` with rule-table coverage. ## Pre-PR review 14 review agents ran (12 main roster + 2 mini-pass). 9 valid findings consolidated, 9 implemented in the follow-up `fix: address pre-PR review feedback` commit. 4 false positives rejected (chiefly PEP 758 `except OSError, UnicodeDecodeError:` flagged by 3 agents unfamiliar with Python 3.14 syntax; empirically verified to parse and catch correctly). Implemented: - **#1 docs-consistency (CRITICAL):** add `check_mock_spec_ratchet.py` to PreToolUse hooks section in `docs/reference/convention-gates.md`. - **#2 docs-consistency (CRITICAL):** backfill `check_error_codes_ts_in_sync.py` and `check_no_controller_response_for_domain_errors.py` in the gate inventory; update `<!--RS:convention_gates-->` macro, `data/runtime_stats.yaml` source, and the CLAUDE.md reference list count. - **#3 pr-test-analyzer (CRITICAL):** add `tests/unit/scripts/test_check_mock_spec_ratchet.py` with 18 unit tests covering envelope parsing (Bash / unknown tool / no file_path / non-.py / out-of-scope / `tests/_shared/` exemption / no-op edit / old_string-not-in-before), ratchet enforcement (hold / raise / reduce / Write tool no-op + raise), gate-edit protection (preserve / remove / add branches), and fail-open path with stderr surfacing. - **#4 pr-test-analyzer (MAJOR):** add `test_yield_from_typed_return_caught` locking the `ast.YieldFrom` branch. - **#5 pr-test-analyzer (MAJOR):** add `test_set_element_skipped` locking the `ast.Set` collection-skip branch. - **#6 silent-failure-hunter (IMPORTANT):** stderr-log the two fail-open `except Exception:` swallows in `_scan_text` and `_check_test_file` so transitional failures surface instead of silently allowing edits. - **#7 tool-parity-checker (MINOR):** list `check_mock_spec_ratchet.py` in the OpenCode plugin header comment. - **#8 + #9 comment-analyzer (SUGGESTION):** docstring on `_load_gate()` plus five AST predicate helpers (`_is_empty_splat`, `_is_literal_none`, `_has_spec_positional`, `_has_spec_keyword`, `_has_dynamic_splat`). Also fixed an unrelated mypy strict failure: three test files imported `VersionRepository` / `FineTuneCheckpointRepository` / `FineTuneRunRepository` from `synthorg.persistence.protocol`, but those names are TYPE_CHECKING-only re-exports there; routed through the source protocol modules (`version_protocol`, `fine_tune_protocol`). Issue-resolution-verifier confirmed all 5 acceptance criteria RESOLVED at 92-100 % confidence. The stretch goal said "file kept as a structural marker" but the branch deletes `scripts/mock_spec_baseline.txt` entirely; the CLAUDE.md text now explicitly documents "zero-tolerance, no baseline" so the intent is unambiguous. ## Test plan - `uv run ruff check src/ tests/ scripts/` — clean. - `uv run mypy src/ tests/` — `Success: no issues found in 3714 source files`. - `uv run python -m pytest tests/ -m unit` — 28,255 passed, 18 skipped (Windows / logfire-extra environmental skips). - New coverage: 50 gate + ratchet unit tests pass (32 existing + 2 added + 16 new ratchet tests). - `uv run python scripts/check_convention_gate_inventory.py` — exit 0 (meta-gate parity preserved). ## Local cleanup note Contributors pulling this branch should delete any stale `scripts/mock_spec_baseline.txt` from their worktree; the gate now runs zero-tolerance and the file is no longer produced.
Bundle of the valid findings surfaced by the /pre-pr-review pass over the two existing bug-fix commits (fix(providers): stop sync from deleting models and fix(dashboard): unblock click-through flows). 4 / 19 valid findings were rejected as agent error (PEP 758 except syntax confused with Python 2, healthcheck-filter mistakenly claimed to drop spans / metrics, module-level pytestmark already applied, _try_stop event parameter is not dead). Fixes landed: - **#1 backend-parity**: SQLite mcp_installation_repo now translates IntegrityError to ConstraintViolationError (matches Postgres); added _classify_sqlite_constraint helper that maps the message text to a stable token (SQLite has no named constraints). Locked under conformance test test_save_raises_constraint_violation_on_unknown_connection that exercises both backends through the shared backend fixture. - **#2 contract-drift**: web/src/api/types/agents.ts AgentConfig gains tier, model_requirement, personality_preset, and strategic_output_mode (last two were pre-existing drift; the first two land in this PR). - **#4 docstring drift**: SyncModelsConfirmDialog docstring now matches the runtime default (replace_existing=false). - **#5 catastrophising language**: trimmed the _SENSITIVE_PROVIDER_FIELDS comment to the factual operator- readability rationale. - **#7 tier type narrowing**: AgentConfig.tier moved from str to Literal so typos fail at config load. - **#8 missing state-transition log**: ConnectionCatalog.update_health now emits HEALTH_STATUS_TRANSITIONED at INFO when the status actually changes (gated so a quiet prober does not flood logs). - **#9 docstring vs inline comment**: ConnectionCatalog .rebind_repository docstring focuses on the contract; the concurrency reasoning moved to an inline comment next to async with self._cache_lock. - **#10 fail-fast guard**: added _assert_sensitive_fields_complete() at module-import time so a future credential-bearing field on ProviderConfig (anything matching *_key / *_token / *_secret / *password*) that is not listed in _SENSITIVE_PROVIDER_FIELDS crashes import instead of leaking on the next audit row. - **#11 trimmed comment**: app.py default-construct block down to the load-bearing why. - **#12 redundant non-null assertions**: build-org-tree.ts findCeo now uses destructuring + truthy check instead of [0]!. - **#13 sanitizeForLog consistency**: budget.ts error args now pass through sanitizeForLog explicitly to match the rest of the store layer; the logger already auto-sanitises but the explicit wrap is project convention. - **#15 trimmed Raises section**: Postgres save() docstring drops the HTTP-status implementation detail. - **#18 useCallback for delete-model handler**: extracted the inline arrow at ConfirmDialog onOpenChange so the reference is stable across renders. Tests added (covering #14): - TestAppStateSimulationRuntime: 5 tests pinning the four branches of has_simulation_runtime + the always-true contract of has_client_simulation_state. - TestRebindRepository: 4 tests for repo swap, cache invalidation, subsequent-read backend selection, and double-rebind safety. - ollama-cloud routing: confirms /api/tags is used (not /models). - TestDiffProviderUpdate: 3 tests for changed-only output, redaction sentinels, and an additional cross-check between ProviderConfig fields and the SENSITIVE list. - test_destructive_sync_guard: 4 tests for the _reject_destructive_empty_discovery branches. - AgentConfig tier / model_requirement round-trip + tier rejection for invalid values. - conformance test for ConstraintViolationError parity across Postgres and SQLite (referenced under #1 above). Pre-reviewed by 21 agents (16 standard + 5 mini-pass) through /pre-pr-review; 19 valid findings raised, 15 fixed in code, 4 rejected as agent error (documented in _audit/pre-pr-review/ triage.md). uv run python -m pytest tests/ -n 8: 29957 passed, 49 skipped. uv run mypy src/ tests/: clean. uv run ruff check src/ tests/: clean. npm --prefix web run lint / type-check / test: clean (3075 tests).
Bundle of the valid findings surfaced by the /pre-pr-review pass over the two existing bug-fix commits (fix(providers): stop sync from deleting models and fix(dashboard): unblock click-through flows). 4 / 19 valid findings were rejected as agent error (PEP 758 except syntax confused with Python 2, healthcheck-filter mistakenly claimed to drop spans / metrics, module-level pytestmark already applied, _try_stop event parameter is not dead). Fixes landed: - **#1 backend-parity**: SQLite mcp_installation_repo now translates IntegrityError to ConstraintViolationError (matches Postgres); added _classify_sqlite_constraint helper that maps the message text to a stable token (SQLite has no named constraints). Locked under conformance test test_save_raises_constraint_violation_on_unknown_connection that exercises both backends through the shared backend fixture. - **#2 contract-drift**: web/src/api/types/agents.ts AgentConfig gains tier, model_requirement, personality_preset, and strategic_output_mode (last two were pre-existing drift; the first two land in this PR). - **#4 docstring drift**: SyncModelsConfirmDialog docstring now matches the runtime default (replace_existing=false). - **#5 catastrophising language**: trimmed the _SENSITIVE_PROVIDER_FIELDS comment to the factual operator- readability rationale. - **#7 tier type narrowing**: AgentConfig.tier moved from str to Literal so typos fail at config load. - **#8 missing state-transition log**: ConnectionCatalog.update_health now emits HEALTH_STATUS_TRANSITIONED at INFO when the status actually changes (gated so a quiet prober does not flood logs). - **#9 docstring vs inline comment**: ConnectionCatalog .rebind_repository docstring focuses on the contract; the concurrency reasoning moved to an inline comment next to async with self._cache_lock. - **#10 fail-fast guard**: added _assert_sensitive_fields_complete() at module-import time so a future credential-bearing field on ProviderConfig (anything matching *_key / *_token / *_secret / *password*) that is not listed in _SENSITIVE_PROVIDER_FIELDS crashes import instead of leaking on the next audit row. - **#11 trimmed comment**: app.py default-construct block down to the load-bearing why. - **#12 redundant non-null assertions**: build-org-tree.ts findCeo now uses destructuring + truthy check instead of [0]!. - **#13 sanitizeForLog consistency**: budget.ts error args now pass through sanitizeForLog explicitly to match the rest of the store layer; the logger already auto-sanitises but the explicit wrap is project convention. - **#15 trimmed Raises section**: Postgres save() docstring drops the HTTP-status implementation detail. - **#18 useCallback for delete-model handler**: extracted the inline arrow at ConfirmDialog onOpenChange so the reference is stable across renders. Tests added (covering #14): - TestAppStateSimulationRuntime: 5 tests pinning the four branches of has_simulation_runtime + the always-true contract of has_client_simulation_state. - TestRebindRepository: 4 tests for repo swap, cache invalidation, subsequent-read backend selection, and double-rebind safety. - ollama-cloud routing: confirms /api/tags is used (not /models). - TestDiffProviderUpdate: 3 tests for changed-only output, redaction sentinels, and an additional cross-check between ProviderConfig fields and the SENSITIVE list. - test_destructive_sync_guard: 4 tests for the _reject_destructive_empty_discovery branches. - AgentConfig tier / model_requirement round-trip + tier rejection for invalid values. - conformance test for ConstraintViolationError parity across Postgres and SQLite (referenced under #1 above). Pre-reviewed by 21 agents (16 standard + 5 mini-pass) through /pre-pr-review; 19 valid findings raised, 15 fixed in code, 4 rejected as agent error (documented in _audit/pre-pr-review/ triage.md). uv run python -m pytest tests/ -n 8: 29957 passed, 49 skipped. uv run mypy src/ tests/: clean. uv run ruff check src/ tests/: clean. npm --prefix web run lint / type-check / test: clean (3075 tests).
…1 reviewer-config) - Forward cancellation token in _mine_negatives_from_pairs so the encoding phase of hard-negative mining is interruptible (CodeRabbit + Gemini, fine_tune.py:462). - Restructure embedding-evaluation.md pipeline stages: split the old Stage 3 evaluation bullet into a dedicated Stage 4, renumber Deploy to Stage 5 so the doc matches the fine_tune.py module header (5-stage pipeline) and removes stage-ownership ambiguity (CodeRabbit). - Add .gemini/styleguide.md + .gemini/config.yaml so Gemini Code Assist stops flagging PEP 649 TYPE_CHECKING annotations as NameError risks on every PR, and reads project conventions from CLAUDE.md. Skipped (factually wrong against current code): - Gemini #1/#2/#3 (NameError on CancellationToken / EvalMetrics in TYPE_CHECKING block): Python 3.14 PEP 649 evaluates annotations lazily; existing unquoted CancellationToken | None at fine_tune.py:568 has been in CI without issue. CLAUDE.md mandates this convention. - CodeRabbit RS-marker findings on docs/design/memory.md and docs/reference/embedding-evaluation.md: out of scope for scripts/check_doc_numeric_macros.py, which scopes to README + 4 docs (neither file present) and matches digits only adjacent to specific stat nouns (tests/providers/agents/stars/releases).
…1 reviewer-config) - Forward cancellation token in _mine_negatives_from_pairs so the encoding phase of hard-negative mining is interruptible (CodeRabbit + Gemini, fine_tune.py:462). - Restructure embedding-evaluation.md pipeline stages: split the old Stage 3 evaluation bullet into a dedicated Stage 4, renumber Deploy to Stage 5 so the doc matches the fine_tune.py module header (5-stage pipeline) and removes stage-ownership ambiguity (CodeRabbit). - Add .gemini/styleguide.md + .gemini/config.yaml so Gemini Code Assist stops flagging PEP 649 TYPE_CHECKING annotations as NameError risks on every PR, and reads project conventions from CLAUDE.md. Skipped (factually wrong against current code): - Gemini #1/#2/#3 (NameError on CancellationToken / EvalMetrics in TYPE_CHECKING block): Python 3.14 PEP 649 evaluates annotations lazily; existing unquoted CancellationToken | None at fine_tune.py:568 has been in CI without issue. CLAUDE.md mandates this convention. - CodeRabbit RS-marker findings on docs/design/memory.md and docs/reference/embedding-evaluation.md: out of scope for scripts/check_doc_numeric_macros.py, which scopes to README + 4 docs (neither file present) and matches digits only adjacent to specific stat nouns (tests/providers/agents/stars/releases).
…derabbit) CodeRabbit #2: the AutonomyChangeStrategy verdict was computed but never threaded into the apply path, so non-HUMAN_ONLY strategies were inert. AutonomyUpdate gains granted_by_strategy; a granting strategy now produces an auto-decided (APPROVED, decided_by=strategy:<name>) approval item and the registry applies the level change immediately. HUMAN_ONLY (default) unchanged -- still pends. docs/design/security.md amended (user-approved deviation from queue-is-sole-apply-driver). Adds registry granted-path test.
…derabbit) CodeRabbit #2: the AutonomyChangeStrategy verdict was computed but never threaded into the apply path, so non-HUMAN_ONLY strategies were inert. AutonomyUpdate gains granted_by_strategy; a granting strategy now produces an auto-decided (APPROVED, decided_by=strategy:<name>) approval item and the registry applies the level change immediately. HUMAN_ONLY (default) unchanged -- still pends. docs/design/security.md amended (user-approved deviation from queue-is-sole-apply-driver). Adds registry granted-path test.
Round 2 covers CodeRabbit's re-review of round 1's commit. Comment / docstring hygiene (CR-#1, #2, #5): - app.py: rewrite the SQLite+persistent-ApprovalStore hard-block comment without migration framing; describe the invariant the guard enforces, not the migration's history. - approval_store.py: trim ``has_persistent_repo`` docstring to current-state behaviour only. - chief_of_staff/config.py: replace real vendor names (OpenAI/Claude/GPT-4o/LiteLLM) in propose-* constant comments with project-approved generic wording. App startup ordering (CR-#8 duplicate): - app.py: move conversational_proposal_repo wiring above the ``provider_registry is None: return`` early-return so the repo attaches even on boots without an LLM provider (preserves the approve/reject path for already-parked conversational approvals, which was the stated rationale for the round-1 decoupling). Conversational-intake resume (CR-#3 + CR-#4): - _approval_review_gate.py: split try_conversational_intake_resume (187 lines, well over the <50 line budget) into orchestrator + three branch-specific helpers: * _load_conversational_proposal: source check + repo + lookup. * _reject_conversational_proposal: PENDING -> REJECTED CAS. * _execute_conversational_proposal: PENDING -> EXECUTING -> EXECUTED CAS with revert-on-failure. The orchestrator is now 43 lines and reads as a flow table. - _approval_review_gate.py: ``_reread_approval_item`` returning ``None`` for a conversational approval now raises ServiceUnavailableError. Returning False would route the call through to non-conversational flows and silently strand the decided approval. Docstring updated to spell out the flow-specific semantics. Cross-tenant privacy (CR-#7, Critical): - propose.py: ``_resolve_conversation`` now rejects conversation-id reuse across users. A caller who learns another conversation_id could otherwise append turns and have prior history fed back to the model -- a cross-tenant break. Mapped to NotFound so the endpoint cannot be used to probe existence either. TOCTOU on sequence allocation (CR-#6, Heavy lift): - conversation_repo.py (SQLite + Postgres): bounded retry-on-IntegrityError in ``append`` when the (conversation_id, sequence) UNIQUE constraint fires. The repo re-queries the live max sequence, model_copies the turn, and retries up to 3 times. Other constraint violations (FK miss, CHECK on content/role) still translate directly to ConstraintViolationError without retry. Tests: - test_propose.py: new test_other_user_resume_blocked. - test_conversational_intake_resume.py: new test_unreadable_approval_item_raises (covers the new raise-on-None path). - test_conversation_repository.py: renamed test_append_duplicate_sequence_rejected -> test_append_duplicate_sequence_resequenced; asserts the new race-safe semantics (second append lands at sequence + 1).
Summary
DESIGN_SPEC.md) — comprehensive 18-section high-level design for the AI Company framework covering agent system, company structure, communication architecture, task engine, memory, HR, model providers, budget management, tools, security, human interaction, templates, technical architecture, research/prior art, open questions, and backlogaurelio-review-prPR review pipeline for future useAll technology choices in the design spec are marked as candidates/TBD pending further research. No final decisions on frameworks, libraries, or specific models.
Test plan
🤖 Generated with Claude Code