Skip to content

feat: add approval workflow gates to TaskEngine#365

Closed
Aureliolo wants to merge 5 commits into
mainfrom
feat/task-engine-approval
Closed

feat: add approval workflow gates to TaskEngine#365
Aureliolo wants to merge 5 commits into
mainfrom
feat/task-engine-approval

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Approval gate integration: Wire SecOps ESCALATE verdicts and request_human_approval tool calls into the engine execution loop, parking agent context when human approval is required
  • RequestHumanApprovalTool: Agent-callable tool that creates ApprovalItem in the approval store and signals execution parking via metadata
  • ApprovalGate service: Coordinates context serialization (via ParkService), persistence (via ParkedContextRepository), and resume with decision message injection
  • Loop integration: Both ReactLoop and PlanExecuteLoop check for escalations after tool execution and return PARKED termination reason
  • ToolInvoker escalation tracking: Detects ESCALATE verdicts and parking metadata, exposes pending_escalations with proper tuple[EscalationInfo, ...] typing
  • API controller hardening: requested_by bound to auth user (not request body), prompt injection mitigation in resume messages, broadened WebSocket error handling
  • WebSocket fix: Frontend approval store now uses approval_id key from backend payload and fetches full items via API
  • Observability: 8 structured event constants for approval gate lifecycle

Closes #258, closes #259

Review coverage

Pre-reviewed by 15 specialized agents (docs-consistency, code-reviewer, python-reviewer, test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, conventions-enforcer, security-reviewer, api-contract-drift, test-quality-reviewer, async-concurrency-reviewer, issue-resolution-verifier).

43 findings identified and addressed across Critical (14), Major (23), and Medium (6) severities. Key fixes:

  • Security: identity spoofing, prompt injection, schema length limits
  • Lifecycle: premature context deletion, orphaned records, zombie parking states
  • Type safety: tuple[Any, ...]tuple[EscalationInfo, ...]
  • Frontend: WebSocket payload mismatch (real-time updates were completely broken)
  • Tests: markers, precise assertions, new error path coverage

Test plan

  • uv run ruff check src/ tests/ — clean
  • uv run mypy src/ tests/ — clean (926 files, strict)
  • uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80 — 7540 passed, 94.60% coverage
  • Verify WebSocket real-time updates work in dashboard (manual)
  • Verify approval create/approve/reject flows via API (manual)

Implement the approval gate layer that parks agent execution when
human approval is required and provides infrastructure for resuming
after a decision. Also adds the `request_human_approval` tool for
agents to explicitly request approval.

- Add EscalationInfo/ResumePayload models and approval gate events
- Add ApprovalGate service (park/resume via ParkService)
- Add RequestHumanApprovalTool (creates ApprovalItem, signals parking)
- Add ToolInvoker escalation tracking (ESCALATE verdicts + tool metadata)
- Integrate approval gate into execute_tool_calls and both loop types
- Wire into AgentEngine, AppState, and approvals controller resume path

Closes #258
Closes #259
Copilot AI review requested due to automatic review settings March 13, 2026 22:42
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 13, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 13, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8ecd92da-7556-4e7d-8ab0-1ec05fd56931

📥 Commits

Reviewing files that changed from the base of the PR and between 0664529 and 91823f4.

📒 Files selected for processing (1)
  • src/ai_company/api/controllers/approvals.py

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Human approval workflow system for sensitive agent actions with request/approve/reject capabilities
    • Parked context support for agents awaiting approval decisions
    • Real-time WebSocket updates for approval status changes
    • Enhanced approval item expiration with callback handling
  • Documentation

    • Added approval gate integration details to API, engine, and tools documentation
    • Expanded logging conventions with structured event guidance and examples
  • Bug Fixes

    • Improved approval decision conflict detection and error handling
    • Enhanced WebSocket payload validation for approval events

Walkthrough

This PR implements approval workflow gates in the engine by introducing an ApprovalGate service that coordinates parking and resumption of agent execution pending human approval. It adds a request_human_approval tool, integrates approval checks into execution loops, updates the API layer for approval handling, and extends frontend WebSocket handling to fetch full approval items.

Changes

Cohort / File(s) Summary
Approval Gate Core
src/ai_company/engine/approval_gate.py, src/ai_company/engine/approval_gate_models.py
New ApprovalGate service to manage parking/resumption with EscalationInfo and ResumePayload models; handles context serialization, persistence, and resume flow with comprehensive error handling.
Tool & Invoker Integration
src/ai_company/tools/approval_tool.py, src/ai_company/tools/invoker.py, src/ai_company/tools/registry.py, src/ai_company/tools/__init__.py
RequestHumanApprovalTool added for escalation initiation; ToolInvoker extended with pending_escalations tracking for security and parking-based escalations; ToolRegistry.all_tools() method added.
Engine Loop Integration
src/ai_company/engine/agent_engine.py, src/ai_company/engine/loop_helpers.py, src/ai_company/engine/plan_execute_loop.py, src/ai_company/engine/react_loop.py, src/ai_company/engine/__init__.py
Approval gate wired into execution loops; execute_tool_calls extended with approval gate parameter and parking flow; PlanExecuteLoop and ReactLoop now accept approval_gate and pass it through; AgentEngine creates and configures approval gate when approval_store is present.
API Layer
src/ai_company/api/approval_store.py, src/ai_company/api/controllers/approvals.py, src/ai_company/api/dto.py, src/ai_company/api/state.py
ApprovalStore.save_if_pending() added for conditional persistence; approvals controller updated with resume triggering and conflict detection; CreateApprovalRequest.requested_by removed (set from auth user); AppState now holds ApprovalGate reference.
Factory & Configuration
src/ai_company/engine/_security_factory.py
New factory module providing make_security_interceptor and registry_with_approval_tool; wires approval store into SecurityInterceptionStrategy and augments tool registry with approval tool.
Park/Resume Services
src/ai_company/security/timeout/park_service.py, src/ai_company/security/timeout/parked_context.py
ParkService and ParkedContext updated to support optional task_id (None for taskless agents).
Observability
src/ai_company/observability/events/approval_gate.py
New module defining APPROVAL_GATE_\* event constants for initialization, escalation detection, context parking/resumption, and failure tracking.
Frontend
web/src/stores/approvals.ts
WebSocket event handlers refactored to fetch full approval items by approval_id instead of working with partial payloads; added async flow for submitted/approved/rejected/expired events.
Documentation & Tests
CLAUDE.md, README.md, docs/design/engine.md, tests/unit/engine/test_approval_gate.py, tests/unit/engine/test_approval_gate_models.py, tests/unit/engine/test_loop_helpers_approval.py, tests/unit/tools/test_approval_tool.py, tests/unit/tools/test_invoker_escalation.py, tests/unit/observability/test_events.py, web/src/__tests__/stores/approvals.test.ts, tests/unit/api/controllers/test_approvals.py, tests/unit/api/test_dto.py
Updated documentation reflecting approval gate integration; comprehensive test coverage for approval gate lifecycle, escalation tracking, tool execution, and frontend WS handling.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Agent Loop
    participant ToolInvoker as Tool Invoker
    participant Gate as ApprovalGate
    participant Store as ParkService
    participant Repo as ParkedContextRepository
    participant API as Approval API

    Agent->>ToolInvoker: execute_tool_calls(with approval_gate)
    ToolInvoker->>ToolInvoker: invoke tools
    ToolInvoker->>Gate: check pending_escalations
    Gate->>Agent: has escalations?
    
    alt Escalation Detected
        Agent->>Gate: should_park(escalations)
        Gate-->>Agent: EscalationInfo
        Agent->>Store: serialize context
        Store-->>Agent: serialized context
        Agent->>Repo: persist parked context
        Repo-->>Agent: ParkedContext with approval_id
        Agent-->>Agent: return PARKED ExecutionResult
    else No Escalation
        Agent-->>Agent: continue execution normally
    end
Loading
sequenceDiagram
    participant API as Approval API
    participant Store as ApprovalStore
    participant Gate as ApprovalGate
    participant ParkSvc as ParkService
    participant Repo as ParkedContextRepository
    participant AgentLoop as Agent Loop

    API->>Store: approve(approval_id, decision)
    Store->>Store: save_if_pending(item)
    Store-->>API: updated item or None
    
    alt Approval Saved
        API->>Gate: resume_context(approval_id)
        Gate->>ParkSvc: deserialize context
        Gate->>Repo: load parked context
        Repo-->>Gate: ParkedContext
        Gate-->>API: (AgentContext, decision)
        API->>API: _trigger_resume with decision_reason
        API->>AgentLoop: inject resume message
        AgentLoop-->>AgentLoop: continue with approval decision
        Gate->>Repo: delete parked context
    else Conflict (Not PENDING)
        API-->>API: raise ConflictError
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add approval workflow gates to TaskEngine' is clear, specific, and directly reflects the primary change: integrating approval gating into the engine execution pipeline.
Description check ✅ Passed The description is comprehensive and related to the changeset, detailing approval gate integration, RequestHumanApprovalTool, ApprovalGate service, loop integration, ToolInvoker escalation tracking, API hardening, WebSocket fixes, and observability additions.
Linked Issues check ✅ Passed The PR fully implements both linked issues: #258 wires SecOps ESCALATE verdicts into the engine loop with context parking [ApprovalGate, loop_helpers, ReactLoop, PlanExecuteLoop], and #259 delivers the RequestHumanApprovalTool with ApprovalItem creation and resume injection [approval_tool, approval_gate].
Out of Scope Changes check ✅ Passed All changes are scoped to approval workflow gating: core approval gate implementation, engine loop integration, API controller updates, ToolInvoker escalation tracking, WebSocket corrections, and comprehensive test coverage. No unrelated refactoring or feature creep detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/task-engine-approval
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/task-engine-approval
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive human approval workflow into the TaskEngine, allowing agents to pause execution and request human intervention for sensitive actions. It establishes a dedicated approval gate service, integrates it deeply into the agent execution loops, and provides a new tool for agents to explicitly request approvals. The changes also include significant API and frontend enhancements to support this new functionality, ensuring secure and observable approval processes.

Highlights

  • Approval Gate Integration: Integrated SecOps ESCALATE verdicts and request_human_approval tool calls into the engine execution loop, parking agent context when human approval is required.
  • RequestHumanApprovalTool: Introduced an agent-callable tool that creates an ApprovalItem in the approval store and signals execution parking via metadata.
  • ApprovalGate Service: Developed a new service to coordinate context serialization, persistence, and resume with decision message injection for approval workflows.
  • Execution Loop Integration: Both ReactLoop and PlanExecuteLoop now check for escalations after tool execution and return a PARKED termination reason when approval is needed.
  • ToolInvoker Escalation Tracking: Enhanced ToolInvoker to detect ESCALATE verdicts and parking metadata, exposing pending_escalations with proper typing.
  • API Controller Hardening: Improved security by binding requested_by to the authenticated user, mitigating prompt injection in resume messages, and broadening WebSocket error handling.
  • WebSocket Frontend Fix: Corrected the frontend approval store to use the approval_id key from the backend payload and fetch full items via API, resolving real-time update issues.
  • Observability: Added 8 structured event constants to track the approval gate lifecycle for enhanced monitoring.
Changelog
  • CLAUDE.md
    • Updated descriptions for api/, engine/, and tools/ directories to reflect approval gate integration.
    • Added APPROVAL_GATE_ESCALATION_DETECTED to the list of observability event constants.
  • README.md
    • Updated the project status to indicate that approval workflow gates are now implemented.
  • docs/design/engine.md
    • Added documentation explaining how the ApprovalGate handles escalations, context parking, and resume within the engine's run process.
  • src/ai_company/api/approval_store.py
    • Added robust error handling for the _on_expire callback to prevent failures from propagating.
  • src/ai_company/api/controllers/approvals.py
    • Improved error handling for WebSocket event publishing.
    • Introduced a new helper function _log_approval_decision for consistent logging of approval outcomes.
    • Enforced that the requested_by field in create_approval is populated from the authenticated user, not the request body, and added an UnauthorizedError check.
    • Refactored approve and reject methods to use the new _log_approval_decision helper.
  • src/ai_company/api/state.py
    • Imported ApprovalGate and added it as a managed dependency within the AppState.
  • src/ai_company/engine/init.py
    • Imported and exposed ApprovalGate, EscalationInfo, and ResumePayload from the new approval gate modules.
  • src/ai_company/engine/agent_engine.py
    • Modified the AgentEngine constructor to initialize the ApprovalGate and integrate it into the execution loop.
    • Added methods _make_approval_gate and _make_default_loop to configure the approval gate and default ReactLoop.
    • Updated _make_tool_invoker to dynamically register the RequestHumanApprovalTool if an approval store is available.
  • src/ai_company/engine/approval_gate.py
    • Added new file implementing the ApprovalGate service, responsible for parking and resuming agent contexts based on approval decisions.
  • src/ai_company/engine/approval_gate_models.py
    • Added new file defining EscalationInfo and ResumePayload Pydantic models for data transfer within the approval gate system.
  • src/ai_company/engine/loop_helpers.py
    • Imported approval gate related events and models.
    • Modified execute_tool_calls to incorporate approval_gate checks and park the agent context if an escalation is detected.
    • Added a new helper function _park_for_approval to encapsulate the logic for parking an agent context.
  • src/ai_company/engine/plan_execute_loop.py
    • Modified the PlanExecuteLoop constructor to accept an optional approval_gate instance.
    • Updated _handle_step_tool_calls to pass the approval_gate to execute_tool_calls for escalation handling.
  • src/ai_company/engine/react_loop.py
    • Modified the ReactLoop constructor to accept an optional approval_gate instance.
    • Updated _process_turn_response to pass the approval_gate to execute_tool_calls for escalation handling.
  • src/ai_company/observability/events/approval_gate.py
    • Added new file defining constants for various approval gate lifecycle events.
  • src/ai_company/tools/init.py
    • Imported and exposed the new RequestHumanApprovalTool.
  • src/ai_company/tools/approval_tool.py
    • Added new file implementing the RequestHumanApprovalTool, allowing agents to request human approval for actions.
  • src/ai_company/tools/invoker.py
    • Added _pending_escalations to track escalations and a pending_escalations property for external access.
    • Modified _check_security to populate _pending_escalations when a security verdict is ESCALATE and includes an approval_id.
    • Refactored invoke and invoke_all methods to clear _pending_escalations at the start of each call.
    • Introduced _track_parking_metadata to detect and record parking requests from tool execution results.
  • src/ai_company/tools/registry.py
    • Added an all_tools method to retrieve all registered tool instances.
  • tests/unit/engine/test_approval_gate.py
    • Added new unit tests for the ApprovalGate service, covering parking, resuming, and message building.
  • tests/unit/engine/test_approval_gate_models.py
    • Added new unit tests for the EscalationInfo and ResumePayload models, verifying their structure and immutability.
  • tests/unit/engine/test_loop_helpers_approval.py
    • Added new unit tests for the approval gate integration within loop helpers, specifically execute_tool_calls.
  • tests/unit/observability/test_events.py
    • Updated the test to include the new approval_gate domain module in the discovery process.
  • tests/unit/tools/test_approval_tool.py
    • Added new unit tests for the RequestHumanApprovalTool, covering item creation, metadata, risk classification, and error handling.
  • tests/unit/tools/test_invoker_escalation.py
    • Added new unit tests for ToolInvoker's escalation tracking, including security verdicts and tool metadata.
  • web/src/stores/approvals.ts
    • Updated handleWsEvent to correctly process WebSocket events using approval_id from the backend payload.
    • Implemented fetching full approval items via API for new submissions and status changes.
    • Added logic to remove approval items from the local list if they are deleted or expired on the backend.
Activity
  • The pull request was pre-reviewed by 15 specialized agents, including docs-consistency, code-reviewer, python-reviewer, security-reviewer, and test-quality-reviewer.
  • A total of 43 findings were identified and addressed across Critical (14), Major (23), and Medium (6) severities.
  • Key fixes included security vulnerabilities (identity spoofing, prompt injection, schema length limits), lifecycle issues (premature context deletion, orphaned records, zombie parking states), type safety improvements, a critical WebSocket payload mismatch fix, and enhanced test coverage.
  • The author has performed static analysis checks (ruff check, mypy) and unit tests (pytest) with high coverage (94.60%).
  • Manual verification steps are planned for WebSocket real-time updates and approval flows via API.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 13, 2026 22:43 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive approval workflow system, a significant and well-executed feature. The changes are extensive, touching the agent engine, API, tools, and frontend, and include the addition of an ApprovalGate service, a RequestHumanApprovalTool, and robust context parking/resuming logic. The implementation quality is high, with strong attention to security (e.g., preventing identity spoofing in the approvals API), robustness (e.g., improved WebSocket event handling), and observability. The inclusion of thorough unit tests for all new components is commendable. I have one suggestion regarding inconsistent logging of fatal errors in the ToolInvoker for improved diagnostics.

Comment thread src/ai_company/tools/invoker.py Outdated
Comment on lines 453 to 454
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Inconsistent logging for non-recoverable errors. This except block for MemoryError and RecursionError no longer logs the exception before re-raising, but other methods in this class like _safe_deepcopy_args and _execute_tool still do. For consistent error handling and diagnostics, it's better to either log these fatal errors consistently in all methods where they are caught and re-raised, or handle logging only at a higher level. I'd suggest restoring the logging here to match the behavior of the other methods.

        except (MemoryError, RecursionError) as exc:
            logger.exception(
                TOOL_INVOKE_NON_RECOVERABLE,
                tool_call_id=tool_call.id,
                tool_name=tool_call.name,
                error=f"{type(exc).__name__}: {exc}",
            )
            raise

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 13, 2026

Greptile Summary

This PR wires approval workflow gates into the TaskEngine execution loop, adding RequestHumanApprovalTool, ApprovalGate, and the supporting plumbing to park agent contexts when escalations occur and resume them after a human decision. The architecture is sound — EscalationInfo/ResumePayload models are well-typed, build_resume_message uses repr() to isolate user-supplied content, save_if_pending closes a race condition on concurrent decisions, and the frontend WebSocket handler is correctly rewritten to use the approval_id key and re-fetch full items from the API.

Critical issue:

  • Python 3 SyntaxError across all new files: Every new except clause that re-raises fatal errors uses except MemoryError, RecursionError: — the Python 2 bare-comma form that is a SyntaxError in Python 3. Any module containing this pattern will fail to import. Affects approval_store.py, approval_gate.py, loop_helpers.py, approval_tool.py, invoker.py, and approvals.py (controller). The fix is except (MemoryError, RecursionError): in all cases. (The existing pre-PR code in invoker.py already uses the correct parenthesised form.)

Other findings:

  • APPROVAL_GATE_CONTEXT_PARKED is emitted in the "no task_execution" debug path inside _park_for_approval before park_context() is called, producing a spurious or duplicate event on both failure and success paths.
  • In the frontend approvals.ts, total.value is incremented unconditionally when a approval.submitted WS event arrives while filters are active, even though the new item may not satisfy the current filter criteria — this can cause the displayed total to drift from reality.

Confidence Score: 1/5

  • Not safe to merge — all new modules will fail to import in Python 3 due to a systematic syntax error.
  • The PR's architecture and security improvements are well-thought-out, but all six new/modified backend files consistently use except MemoryError, RecursionError: — a Python 2 bare-comma except form that is a SyntaxError in Python 3. Any attempt to import approval_store, approval_gate, loop_helpers, approval_tool, invoker (new sections), or approvals (controller) will raise SyntaxError before any code runs. This single pattern blocks the entire feature from functioning and must be resolved before the PR can land.
  • src/ai_company/api/approval_store.py, src/ai_company/engine/approval_gate.py, src/ai_company/engine/loop_helpers.py, src/ai_company/tools/approval_tool.py, src/ai_company/tools/invoker.py, src/ai_company/api/controllers/approvals.py — all contain except MemoryError, RecursionError: (Python 3 SyntaxError).

Important Files Changed

Filename Overview
src/ai_company/api/approval_store.py Adds save_if_pending for race-condition-safe concurrent decisions and wraps the _on_expire callback in a try/except. New except block uses Python 2 bare-comma syntax (except MemoryError, RecursionError:) which is a SyntaxError in Python 3 — the entire module will fail to import.
src/ai_company/api/controllers/approvals.py Adds _trigger_resume (log-only stub for future scheduler), binds requested_by to auth user (security improvement), uses save_if_pending for optimistic locking, and broadens WebSocket error handling. _trigger_resume correctly avoids consuming the parked record. Python 2 except syntax also present at line 100.
src/ai_company/engine/approval_gate.py New ApprovalGate service coordinates parking and resume. Good lifecycle management (deserialization failure preserves parked record; delete failure logs but still returns context). build_resume_message uses repr() to isolate user-supplied content — solid prompt-injection mitigation. All four except MemoryError, RecursionError: clauses are Python 3 SyntaxErrors.
src/ai_company/engine/loop_helpers.py Adds _park_for_approval helper and threads approval_gate through execute_tool_calls. Five except MemoryError, RecursionError: clauses are Python 3 SyntaxErrors. Additionally, APPROVAL_GATE_CONTEXT_PARKED is emitted before parking actually occurs in the "no task_execution" debug path.
src/ai_company/tools/approval_tool.py New RequestHumanApprovalTool creates ApprovalItem in the store and returns requires_parking metadata. Correctly uses APPROVAL_GATE_RISK_CLASSIFIED for no-classifier fallback and APPROVAL_GATE_ESCALATION_DETECTED for success log. Two except MemoryError, RecursionError: clauses are Python 3 SyntaxErrors.
src/ai_company/tools/invoker.py Adds pending_escalations property and _track_parking_metadata method. Correctly clears escalations at the start of invoke() and invoke_all(), and sorts escalations deterministically by tool-call index after invoke_all. Three except MemoryError, RecursionError: clauses are Python 3 SyntaxErrors (existing code in same file already uses the correct parenthesised form).
src/ai_company/api/dto.py Adds action_type format validator (category:action), removes requested_by from create payload (now bound server-side from auth), and adds max_length=4096 to ApproveRequest.comment. Well-tested and clean.
src/ai_company/engine/approval_gate_models.py New frozen Pydantic models EscalationInfo and ResumePayload with NotBlankStr validation on all string fields. Clean, well-typed, and immutable. No issues.
web/src/stores/approvals.ts Rewrites WS handler to use approval_id key from backend payload and fetch full items via API. The fire-and-forget async IIFE pattern is appropriate. One logic issue: total is unconditionally incremented on approval.submitted even when active filters are set and the new item may not match them, causing count drift.
src/ai_company/engine/react_loop.py Adds optional approval_gate constructor parameter and threads it into execute_tool_calls. Minimal, clean change; no issues.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant ToolInvoker
    participant ApprovalGate
    participant ParkService
    participant ParkedContextRepo
    participant ApprovalStore
    participant ReactLoop
    participant HumanReviewer
    participant ApprovalsController

    Note over Agent,ReactLoop: Escalation detected during tool execution

    Agent->>ToolInvoker: invoke(tool_call)
    alt SecOps ESCALATE verdict
        ToolInvoker->>ToolInvoker: append to _pending_escalations (approval_id from verdict)
    else RequestHumanApprovalTool
        ToolInvoker->>ApprovalStore: add(ApprovalItem)
        ToolInvoker->>ToolInvoker: _track_parking_metadata() → append EscalationInfo
    end

    ReactLoop->>ToolInvoker: pending_escalations
    ReactLoop->>ApprovalGate: should_park(escalations)
    ApprovalGate-->>ReactLoop: EscalationInfo (first escalation)

    ReactLoop->>ApprovalGate: park_context(escalation, context, agent_id, task_id)
    ApprovalGate->>ParkService: park(context, approval_id, ...)
    ParkService-->>ApprovalGate: ParkedContext (serialized JSON)
    ApprovalGate->>ParkedContextRepo: save(parked)
    ApprovalGate-->>ReactLoop: ParkedContext

    ReactLoop-->>Agent: ExecutionResult(PARKED, metadata={approval_id})

    Note over HumanReviewer,ApprovalsController: Human makes a decision

    HumanReviewer->>ApprovalsController: POST /approvals/{id}/approve or /reject
    ApprovalsController->>ApprovalStore: save_if_pending(updated_item)
    ApprovalsController->>ApprovalsController: _log_approval_decision()
    ApprovalsController->>ApprovalsController: _trigger_resume() — logs intent only
    Note right of ApprovalsController: Future scheduler will call<br/>ApprovalGate.resume_context()<br/>(not yet implemented)

    Note over ApprovalGate,ParkedContextRepo: Resume path (future scheduler)
    ApprovalGate->>ParkedContextRepo: get_by_approval(approval_id)
    ParkedContextRepo-->>ApprovalGate: ParkedContext
    ApprovalGate->>ParkService: resume(parked)
    ParkService-->>ApprovalGate: AgentContext
    ApprovalGate->>ParkedContextRepo: delete(parked.id)
    ApprovalGate-->>Agent: (AgentContext, parked_id) + resume_message injected
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/ai_company/api/approval_store.py
Line: 172-173

Comment:
**Python 3 `except` tuple syntax missing parentheses — SyntaxError across all new files**

`except MemoryError, RecursionError:` is Python 2 syntax. In Python 3 the comma-separated form (without parentheses) is a `SyntaxError`; the tuple must be explicitly wrapped. Any module that contains this form will fail to import entirely.

The same bug appears throughout all the new files added in this PR:
- `src/ai_company/api/approval_store.py:172`
- `src/ai_company/api/controllers/approvals.py:100`
- `src/ai_company/engine/approval_gate.py:127, 149, 202, 216`
- `src/ai_company/engine/loop_helpers.py:69, 114, 186, 303, 387`
- `src/ai_company/tools/approval_tool.py:162, 238`
- `src/ai_company/tools/invoker.py:212, 295, 639`

Note: the existing (pre-PR) code in `invoker.py` already uses the correct form `except (MemoryError, RecursionError) as exc:` at lines 453, 539, 575, and 698 — confirming this is an inconsistency introduced only in the new additions.

Fix every occurrence by wrapping the two types in parentheses:

```suggestion
                except (MemoryError, RecursionError):
                    raise
```

The pattern to apply globally is:
```python
# Wrong (Python 2)
except MemoryError, RecursionError:

# Correct (Python 3)
except (MemoryError, RecursionError):
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/ai_company/engine/loop_helpers.py
Line: 372-378

Comment:
**`APPROVAL_GATE_CONTEXT_PARKED` emitted before parking has occurred**

This debug log fires when `ctx.task_execution` is `None` — i.e. before `park_context()` is even called. `APPROVAL_GATE_CONTEXT_PARKED` implies the context *has* been parked, so any log aggregation or alerting rule watching for `approval_gate.context.parked` will see a spurious event even if the subsequent `park_context()` call fails.

An informational constant such as `APPROVAL_GATE_ESCALATION_DETECTED` (or a dedicated "parking skipped task_id" diagnostic) is more accurate here. The successful-park event is already emitted inside `ApprovalGate.park_context()`, so this line also produces a duplicate on the happy path.

```suggestion
        logger.debug(
            APPROVAL_GATE_ESCALATION_DETECTED,
            approval_id=escalation.approval_id,
            agent_id=agent_id,
            note="No task_execution on context — task_id will be None",
        )
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: web/src/stores/approvals.ts
Line: 82-91

Comment:
**`total` incremented unconditionally when filters are active, regardless of whether the new item matches them**

When `activeFilters.value` is truthy, `total.value++` is incremented for every `approval.submitted` event even though the newly submitted item may not satisfy the current filter (e.g. the user is filtering by `status: 'rejected'` while a new `pending` item arrives). This inflates the displayed count and can cause pagination to request pages that don't exist.

A stricter approach would be to skip the increment when the item's status/risk_level/action_type don't match the active filter criteria, or to do a quick attribute check on the already-fetched `item` object:

```typescript
if (activeFilters.value) {
  // Only bump total if the item matches the active filter constraints
  const f = activeFilters.value
  const matches =
    (!f.status      || item.status     === f.status) &&
    (!f.risk_level  || item.risk_level === f.risk_level) &&
    (!f.action_type || item.action_type === f.action_type)
  if (matches) total.value++
} else {
  approvals.value = [item, ...approvals.value]
  total.value++
}
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 91823f4

Comment thread src/ai_company/engine/approval_gate.py
Comment thread src/ai_company/tools/approval_tool.py
Comment thread src/ai_company/engine/loop_helpers.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an approval-gating mechanism to the agent execution engine so SecOps “ESCALATE” verdicts and explicit request_human_approval tool calls can park execution pending a human decision, along with API/UI plumbing and observability events to support the workflow.

Changes:

  • Introduces ApprovalGate + models, and integrates escalation/parking checks into ReAct and Plan-and-Execute loops.
  • Adds RequestHumanApprovalTool and extends ToolInvoker to track and expose pending_escalations.
  • Hardens approvals API identity handling and fixes frontend WebSocket approval updates by fetching full items via API.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
web/src/stores/approvals.ts Updates WS approval event handling to use approval_id and fetch full items from API
tests/unit/tools/test_invoker_escalation.py Adds tests for escalation tracking in ToolInvoker
tests/unit/tools/test_approval_tool.py Adds tests for RequestHumanApprovalTool behavior/validation
tests/unit/observability/test_events.py Registers approval_gate module in event discovery tests
tests/unit/engine/test_loop_helpers_approval.py Adds tests for loop helper parking behavior when escalations occur
tests/unit/engine/test_approval_gate_models.py Adds tests for EscalationInfo / ResumePayload validation
tests/unit/engine/test_approval_gate.py Adds tests for ApprovalGate park/resume and resume-message formatting
src/ai_company/tools/registry.py Adds ToolRegistry.all_tools() for enumerating tool instances
src/ai_company/tools/invoker.py Tracks escalations (SecOps + parking metadata) via pending_escalations
src/ai_company/tools/approval_tool.py Implements RequestHumanApprovalTool (creates ApprovalItem + parking metadata)
src/ai_company/tools/init.py Exports RequestHumanApprovalTool
src/ai_company/observability/events/approval_gate.py Adds approval-gate lifecycle event constants
src/ai_company/engine/react_loop.py Wires optional ApprovalGate into ReAct loop tool execution
src/ai_company/engine/plan_execute_loop.py Wires optional ApprovalGate into Plan-and-Execute step tool execution
src/ai_company/engine/loop_helpers.py Adds parking path returning TerminationReason.PARKED after tool calls
src/ai_company/engine/approval_gate_models.py Adds frozen Pydantic models for escalation + resume payload
src/ai_company/engine/approval_gate.py Adds ApprovalGate service (park/resume + resume-message builder)
src/ai_company/engine/agent_engine.py Constructs ApprovalGate, injects into default loop, registers approval tool when configured
src/ai_company/engine/init.py Exposes approval-gate types from the engine package
src/ai_company/api/state.py Adds approval_gate to app state
src/ai_company/api/controllers/approvals.py Hardens requested_by binding to auth user; refactors decision logging; broadens WS publish error handling
src/ai_company/api/approval_store.py Wraps on_expire callback in exception handling
docs/design/engine.md Documents new approval-gate parking behavior in the engine pipeline
README.md Updates status text to reflect approval gates being implemented
CLAUDE.md Updates repo/module documentation to mention approval gate + approval tool
Comments suppressed due to low confidence (1)

src/ai_company/api/controllers/approvals.py:105

  • _publish_approval_event() now catches a broad Exception. If asyncio.CancelledError is an Exception in the runtime, this will swallow request cancellation and continue processing. Consider explicitly re-raising cancellation (and any other control-flow exceptions you treat as non-recoverable) alongside MemoryError/RecursionError.
    try:
        channels_plugin = _get_channels_plugin(request)
        channels_plugin.publish(
            event.model_dump_json(),
            channels=[CHANNEL_APPROVALS],
        )
    except MemoryError, RecursionError:
        raise
    except Exception:
        logger.warning(
            API_APPROVAL_PUBLISH_FAILED,
            approval_id=item.id,
            event_type=event_type.value,
            exc_info=True,
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

risk_level=verdict.risk_level,
reason=verdict.reason,
),
)
Comment thread src/ai_company/engine/loop_helpers.py Outdated
turns,
metadata={
"approval_id": escalation.approval_id,
"parking_failed": str(parking_failed),
Comment thread docs/design/engine.md Outdated
Comment on lines +452 to +453
parking is needed. If so, the context is serialized via `ParkService`,
persisted, and the loop returns a `PARKED` result.
Comment thread src/ai_company/engine/approval_gate.py Outdated
self._park_service = park_service
self._parked_context_repo = parked_context_repo
logger.debug(
APPROVAL_GATE_ESCALATION_DETECTED,
Comment on lines +203 to +207
logger.debug(
APPROVAL_GATE_ESCALATION_DETECTED,
action_type=action_type,
note="No risk classifier — defaulting to HIGH",
)
Comment thread web/src/stores/approvals.ts Outdated
Comment on lines +92 to +94
// Item may have been deleted — remove from local list
approvals.value = approvals.value.filter((a) => a.id !== approvalId)
total.value = Math.max(0, total.value - 1)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/observability/test_events.py (1)

175-225: 🧹 Nitpick | 🔵 Trivial

Add direct assertions for the new approval_gate module.

This only proves the module is discoverable. If src/ai_company/observability/events/approval_gate.py ships with a missing constant or a misspelled value, this file still passes as long as the module exists and the remaining strings are unique. Please add a test_approval_gate_events_exist() block like the other domains.

Based on learnings, "Event names must always use constants from domain-specific modules under ai_company.observability.events."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/observability/test_events.py` around lines 175 - 225, Add a new
unit test in tests/unit/observability/test_events.py named
test_approval_gate_events_exist that imports
ai_company.observability.events.approval_gate and asserts the specific expected
event constants (or their string values) are present and correct (similar style
to other domain-specific tests), so that existence and correctness of constants
in approval_gate.py are validated rather than merely the module being
discoverable; reference the existing test_all_domain_modules_discovered for
placement and use approval_gate module symbols to construct the assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/api/approval_store.py`:
- Around line 151-160: The except clause currently catches the built-in
MemoryError but not the project-specific MemoryError defined in
ai_company.memory.errors, so re-raised project errors fall into the generic
Exception handler; import the project error (e.g., from ai_company.memory.errors
import MemoryError as ProjectMemoryError) and update the except line to
explicitly re-raise both RecursionError and the project MemoryError (or include
both types in the except tuple used for re-raising) before the generic Exception
that logs via logger.exception(API_APPROVAL_EXPIRED, approval_id=item.id,
note="on_expire callback failed"); ensure the symbol _on_expire remains where
called and no error types are swallowed.

In `@src/ai_company/engine/agent_engine.py`:
- Around line 150-152: The default approval gate created by
_make_approval_gate() is missing the parked_context_repo so
ApprovalGate.resume_context() is a no-op and PARKED runs can't be resumed;
update the factory so the ApprovalGate created in _make_approval_gate (and any
place that constructs ApprovalGate in this class) receives the persistent
parked-context storage (self._approval_store or its parked_context_repo
attribute) as the parked_context_repo argument when constructing ApprovalGate,
ensuring resume_context() works and PARKED runs are resumable.

In `@src/ai_company/engine/approval_gate.py`:
- Around line 214-231: The current resume flow ignores the boolean return from
ParkedContextRepository.delete() and always logs APPROVAL_GATE_CONTEXT_RESUMED;
update the code in the method containing the await
self._parked_context_repo.delete(parked.id) call (e.g., the resume handler in
approval_gate.py) to capture the delete result, and if it returns False treat it
as a cleanup failure: log/exception with APPROVAL_GATE_RESUME_DELETE_FAILED
including approval_id and parked_id (and do not log
APPROVAL_GATE_CONTEXT_RESUMED), otherwise proceed to log
APPROVAL_GATE_CONTEXT_RESUMED and return (context, parked.id); preserve existing
exception handling for MemoryError/RecursionError.

In `@src/ai_company/engine/loop_helpers.py`:
- Around line 345-406: The metadata currently stores parking_failed as a string
in _park_for_approval; change it to store a boolean instead (i.e.,
parking_failed: parking_failed) when calling build_result so downstream
consumers receive a true boolean under the "parking_failed" metadata key; ensure
any serialization/consumption points that expect a string are updated to accept
a boolean (references: function _park_for_approval, local variable
parking_failed, build_result call, metadata key "parking_failed").

In `@src/ai_company/tools/approval_tool.py`:
- Around line 96-176: The execute() method is too long and should be split into
small helpers: keep execute() to orchestrate only (extract validation, item
construction, persistence, and result shaping into private methods).
Specifically, replace the inline logic with calls to helpers such as
_validate_action_type(action_type) (already exists),
_build_approval_item(action_type, title, description) to construct the
ApprovalItem (move the ApprovalItem import and all field population there),
_persist_approval_item(item) to run the try/except around await
self._approval_store.add(item) (re-raise MemoryError/RecursionError and on
Exception call logger.exception(APPROVAL_GATE_ESCALATION_FAILED, ...) and return
an error ToolExecutionResult), and _format_success_result(approval_id,
action_type, risk_level, title) to return the success ToolExecutionResult and
log APPROVAL_GATE_ESCALATION_DETECTED. After extraction, execute() should call
these helpers and return early on validation/persistence errors so the function
body remains under the 50-line limit.

In `@tests/unit/engine/test_approval_gate.py`:
- Around line 51-274: Tests repeat setup of MagicMock/AsyncMock objects
(park_service, repo, parked) across many test coroutines in TestParking and
TestResumeContext; extract these into pytest fixtures to DRY the test file.
Create fixtures named e.g. park_service, parked_mock (or parked), and
parked_repo (or repo) that return the configured MagicMock/AsyncMock instances
and use them by adding them as parameters to tests that currently call
ApprovalGate(...)/park_context()/resume_context(), updating tests like
test_calls_park_service, test_persists_to_repo_when_available,
test_successful_resume, etc., to accept the fixtures and remove the repeated
wiring inside each test while preserving any test-specific side_effects or
return overrides.
- Around line 279-310: The three tests test_approved_without_reason,
test_rejected_with_reason, and test_approved_with_reason duplicate the same
structure calling ApprovalGate.build_resume_message and asserting substrings;
refactor them into a single `@pytest.mark.parametrize` test that iterates over
cases (id, approved boolean, decided_by, decision_reason, expected_substrings)
and calls ApprovalGate.build_resume_message once per case, asserting each
expected substring is in the returned msg; update or remove the original three
test functions and keep descriptive parameter values to preserve coverage.

In `@tests/unit/engine/test_loop_helpers_approval.py`:
- Line 3: The test unnecessarily uses PropertyMock to stub pending_escalations;
replace type(invoker).pending_escalations =
PropertyMock(return_value=escalations) with a simple instance attribute
assignment (e.g., invoker.pending_escalations = escalations or setattr(invoker,
"pending_escalations", escalations)) in the helper and the other occurrence
around lines 55–64, and remove PropertyMock from the imports in the file
(adjusting the import list to only AsyncMock, MagicMock, patch).

In `@web/src/stores/approvals.ts`:
- Around line 72-80: The catch currently treats any error from
approvalsApi.getApproval as a delete; change it to inspect the error response
and only remove/ignore the item when the error is a definitive "not found" (HTTP
404 or 410) — for other errors (timeouts, 5xx, network) leave approvals.value
and total.value untouched and trigger the existing list refresh/fallback path
instead of decrementing state; locate both call sites to
approvalsApi.getApproval (the try/catch around getApproval and the similar block
at lines handling the other event) and update their catch handlers to branch on
error status (404/410 vs others) so only 404/410 perform the remove/ignore
behavior and other errors preserve current state and perform a refresh.
- Around line 71-77: When a websocket-driven update fetches an approval via
approvalsApi.getApproval(approvalId), do not unconditionally prepend it when
activeFilters.value is falsy; instead run the current filter predicate against
the fetched item (use the same filter logic used to build approvals.value) and
insert the item only if it matches, or remove it from approvals.value if it no
longer matches, and update total.value accordingly (increment when inserting a
new matching item, decrement when removing a previously-present matching item).
Apply the same re-filter/insert-or-remove logic to the other WS-handling block
(the code around the approvals update at lines similar to 86-90) so both
websocket update paths respect activeFilters.value (including empty objects) and
keep total.value in sync.

---

Outside diff comments:
In `@tests/unit/observability/test_events.py`:
- Around line 175-225: Add a new unit test in
tests/unit/observability/test_events.py named test_approval_gate_events_exist
that imports ai_company.observability.events.approval_gate and asserts the
specific expected event constants (or their string values) are present and
correct (similar style to other domain-specific tests), so that existence and
correctness of constants in approval_gate.py are validated rather than merely
the module being discoverable; reference the existing
test_all_domain_modules_discovered for placement and use approval_gate module
symbols to construct the assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: afb7d111-796d-4bf3-800d-74573c983a99

📥 Commits

Reviewing files that changed from the base of the PR and between 494013f and b999b65.

📒 Files selected for processing (25)
  • CLAUDE.md
  • README.md
  • docs/design/engine.md
  • src/ai_company/api/approval_store.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/api/state.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/engine/approval_gate_models.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/tools/registry.py
  • tests/unit/engine/test_approval_gate.py
  • tests/unit/engine/test_approval_gate_models.py
  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/observability/test_events.py
  • tests/unit/tools/test_approval_tool.py
  • tests/unit/tools/test_invoker_escalation.py
  • web/src/stores/approvals.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Agent
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Greptile Review
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (8)
web/src/**/*.{vue,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Web dashboard: Vue 3 + PrimeVue + Tailwind CSS, organized by feature in src/components/, src/stores/, src/views/. Enforce with ESLint and vue-tsc type-checking.

Files:

  • web/src/stores/approvals.ts
web/src/stores/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Frontend state management: Pinia stores organized by feature in src/stores/

Files:

  • web/src/stores/approvals.ts
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python version 3.14+ with PEP 649 native lazy annotations required.
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14.
Line length must be 88 characters, enforced by ruff.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Tests must use test-provider, test-small-001, etc.

Files:

  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/observability/test_events.py
  • tests/unit/tools/test_invoker_escalation.py
  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/api/state.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/tools/__init__.py
  • tests/unit/tools/test_approval_tool.py
  • src/ai_company/api/approval_store.py
  • src/ai_company/tools/registry.py
  • src/ai_company/engine/approval_gate_models.py
  • tests/unit/engine/test_approval_gate_models.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/api/controllers/approvals.py
  • tests/unit/engine/test_approval_gate.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/react_loop.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Tests must use markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Test coverage minimum: 80% (enforced in CI).
Async tests: use asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.

Files:

  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/observability/test_events.py
  • tests/unit/tools/test_invoker_escalation.py
  • tests/unit/tools/test_approval_tool.py
  • tests/unit/engine/test_approval_gate_models.py
  • tests/unit/engine/test_approval_gate.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: All public functions must have type hints. Enforce via mypy strict mode.
Google-style docstrings required on public classes and functions, enforced by ruff D rules.
Create new objects instead of mutating existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction plus MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with BaseModel, model_validator, computed_field, ConfigDict. Use @computed_field for derived values instead of storing redundant fields (e.g., TokenUsage.total_tokens).
Use NotBlankStr from core.types for all identifier/name fields in Pydantic models, including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants, instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow exceptions.
Validate at system boundaries: user input, external APIs, config files.
NEVER use import logging, logging.getLogger(), or print() in application code.
Always use logger as the variable name for loggers (not _logger, not log).
Event names must always use constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import dire...

Files:

  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/api/state.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/api/approval_store.py
  • src/ai_company/tools/registry.py
  • src/ai_company/engine/approval_gate_models.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/react_loop.py
src/ai_company/**/[!_]*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__)

Files:

  • src/ai_company/engine/plan_execute_loop.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/api/state.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/api/approval_store.py
  • src/ai_company/tools/registry.py
  • src/ai_company/engine/approval_gate_models.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/react_loop.py
docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

When approved deviations occur, update the relevant docs/design/ page to reflect the new reality.

Files:

  • docs/design/engine.md
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Documentation source in docs/ built with Zensical. Design spec in docs/design/ (7 pages). Architecture in docs/architecture/. Roadmap in docs/roadmap/. Security in docs/security.md.

Files:

  • docs/design/engine.md
🧠 Learnings (9)
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Event names must always use constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • tests/unit/observability/test_events.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/observability/events/approval_gate.py
  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/ai_company/**/[!_]*.py : Every module with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • src/ai_company/api/state.py
  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Structured logging: always use `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : NEVER use `import logging`, `logging.getLogger()`, or `print()` in application code.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Always use `logger` as the variable name for loggers (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : DEBUG logging for object creation, internal flow, and entry/exit of key functions.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • CLAUDE.md
🧬 Code graph analysis (18)
web/src/stores/approvals.ts (1)
web/src/api/types.ts (1)
  • WsEvent (519-524)
tests/unit/engine/test_loop_helpers_approval.py (7)
src/ai_company/engine/approval_gate.py (3)
  • ApprovalGate (39-263)
  • should_park (71-90)
  • park_context (92-161)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/engine/loop_helpers.py (1)
  • execute_tool_calls (240-342)
src/ai_company/engine/loop_protocol.py (2)
  • ExecutionResult (79-140)
  • TerminationReason (28-36)
src/ai_company/providers/enums.py (1)
  • FinishReason (15-22)
src/ai_company/providers/models.py (3)
  • CompletionResponse (257-306)
  • ToolCall (96-119)
  • ToolResult (122-135)
src/ai_company/engine/run_result.py (1)
  • termination_reason (64-66)
tests/unit/tools/test_invoker_escalation.py (5)
src/ai_company/providers/models.py (1)
  • ToolCall (96-119)
src/ai_company/security/models.py (2)
  • SecurityVerdict (35-67)
  • SecurityVerdictType (23-32)
src/ai_company/tools/base.py (3)
  • BaseTool (57-184)
  • ToolExecutionResult (25-54)
  • description (138-140)
src/ai_company/tools/invoker.py (5)
  • ToolInvoker (73-756)
  • registry (123-125)
  • pending_escalations (128-136)
  • invoke (343-365)
  • invoke_all (694-756)
src/ai_company/tools/registry.py (1)
  • ToolRegistry (30-126)
src/ai_company/engine/plan_execute_loop.py (2)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (39-263)
src/ai_company/engine/approval_gate.py (5)
src/ai_company/persistence/repositories.py (1)
  • ParkedContextRepository (199-267)
src/ai_company/security/timeout/park_service.py (3)
  • ParkService (29-144)
  • park (37-106)
  • resume (108-144)
src/ai_company/security/timeout/parked_context.py (1)
  • ParkedContext (19-64)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/engine/context.py (1)
  • AgentContext (87-307)
src/ai_company/api/state.py (1)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (39-263)
src/ai_company/tools/invoker.py (2)
src/ai_company/core/enums.py (1)
  • ApprovalRiskLevel (443-449)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/engine/__init__.py (3)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (39-263)
src/ai_company/engine/approval_gate_models.py (2)
  • EscalationInfo (14-33)
  • ResumePayload (36-51)
src/ai_company/tools/__init__.py (1)
src/ai_company/tools/approval_tool.py (1)
  • RequestHumanApprovalTool (31-208)
tests/unit/tools/test_approval_tool.py (3)
src/ai_company/api/approval_store.py (3)
  • ApprovalStore (27-162)
  • get (61-73)
  • add (42-59)
src/ai_company/security/timeout/risk_tier_classifier.py (1)
  • DefaultRiskTierClassifier (62-101)
src/ai_company/tools/approval_tool.py (2)
  • RequestHumanApprovalTool (31-208)
  • execute (96-176)
src/ai_company/api/approval_store.py (2)
src/ai_company/api/app.py (1)
  • _on_expire (74-99)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
src/ai_company/tools/registry.py (1)
src/ai_company/tools/base.py (2)
  • BaseTool (57-184)
  • name (123-125)
src/ai_company/engine/approval_gate_models.py (1)
src/ai_company/tools/base.py (1)
  • action_type (133-135)
tests/unit/engine/test_approval_gate_models.py (1)
src/ai_company/engine/approval_gate_models.py (2)
  • EscalationInfo (14-33)
  • ResumePayload (36-51)
src/ai_company/tools/approval_tool.py (5)
src/ai_company/core/enums.py (1)
  • ToolCategory (294-308)
src/ai_company/tools/base.py (6)
  • BaseTool (57-184)
  • ToolExecutionResult (25-54)
  • description (138-140)
  • category (128-130)
  • action_type (133-135)
  • parameters_schema (143-151)
src/ai_company/api/approval_store.py (2)
  • ApprovalStore (27-162)
  • add (42-59)
src/ai_company/security/timeout/risk_tier_classifier.py (1)
  • DefaultRiskTierClassifier (62-101)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
tests/unit/engine/test_approval_gate.py (3)
src/ai_company/engine/approval_gate.py (5)
  • ApprovalGate (39-263)
  • should_park (71-90)
  • park_context (92-161)
  • resume_context (163-231)
  • build_resume_message (234-263)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/security/timeout/park_service.py (3)
  • ParkService (29-144)
  • park (37-106)
  • resume (108-144)
src/ai_company/engine/agent_engine.py (6)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (39-263)
src/ai_company/security/timeout/park_service.py (1)
  • ParkService (29-144)
src/ai_company/core/agent.py (1)
  • AgentIdentity (266-342)
src/ai_company/tools/registry.py (2)
  • ToolRegistry (30-126)
  • all_tools (102-104)
src/ai_company/tools/approval_tool.py (1)
  • RequestHumanApprovalTool (31-208)
src/ai_company/security/timeout/risk_tier_classifier.py (1)
  • DefaultRiskTierClassifier (62-101)
src/ai_company/engine/react_loop.py (3)
tests/unit/engine/conftest.py (1)
  • engine (449-460)
src/ai_company/api/state.py (1)
  • approval_gate (139-141)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (39-263)
🪛 LanguageTool
CLAUDE.md

[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

🔇 Additional comments (24)
src/ai_company/engine/__init__.py (1)

9-10: LGTM!

The new public API exports for ApprovalGate, EscalationInfo, and ResumePayload are correctly imported and added to __all__ in alphabetical order, maintaining consistency with the existing export pattern.

docs/design/engine.md (1)

450-467: LGTM!

The documentation accurately captures the new escalation-handling flow and PARKED termination behavior. The description aligns with the implementation in loop_helpers.py and ApprovalGate, including the interaction with ToolInvoker.pending_escalations, ParkService, and the approval-timeout policy.

tests/unit/tools/test_approval_tool.py (4)

1-53: LGTM!

Well-structured test file with proper pytest markers (unit, timeout(30)). The fixtures correctly set up the ApprovalStore and RequestHumanApprovalTool instances. The TestToolCreation class comprehensively validates tool properties including name, action_type, and parameters schema.


56-144: LGTM!

Thorough execution tests covering the happy path, metadata validation, default risk level, content verification, and the task_id=None scenario. The assertions correctly verify both the ToolExecutionResult metadata and the persisted ApprovalItem state.


177-204: LGTM!

Good use of @pytest.mark.parametrize to cover multiple invalid action_type formats efficiently. The test cases cover edge cases like missing category, missing action, extra colons, and whitespace-only values.


220-256: LGTM!

The error handling test correctly verifies graceful degradation when the store fails. The monkeypatch approach is appropriate for simulating store failures in unit tests.

src/ai_company/api/controllers/approvals.py (4)

97-105: LGTM!

Correct use of PEP 758 except syntax (except MemoryError, RecursionError:) and proper error handling pattern — re-raising critical errors while logging warnings for recoverable failures.


148-166: LGTM!

Clean helper function that centralizes approval decision logging with appropriate event constants. The docstring correctly notes that context resumption is out of scope for this controller.


260-278: Good security hardening: requested_by bound to authenticated user.

The change correctly enforces that requested_by is populated from the authenticated user's username rather than the request body, preventing potential spoofing. The UnauthorizedError is appropriately raised when authentication is missing.


367-371: LGTM!

Correctly delegates to _log_approval_decision for consistent observability across approval decisions.

src/ai_company/engine/approval_gate_models.py (2)

14-33: LGTM!

Well-designed frozen Pydantic model with proper use of NotBlankStr for all identifier fields and ApprovalRiskLevel enum for type safety. The docstring clearly documents each attribute's purpose.


36-51: LGTM!

Clean model design for approval decision payloads. The optional decision_reason field correctly uses NotBlankStr | None to ensure non-blank values when provided.

src/ai_company/engine/loop_helpers.py (3)

11-13: LGTM!

Correct import of event constant from the domain-specific observability module. Based on learnings: "Event names must always use constants from domain-specific modules under ai_company.observability.events".


240-248: LGTM!

Clean API extension with a keyword-only approval_gate parameter defaulting to None. This preserves backward compatibility while enabling the escalation parking flow when configured.


329-342: LGTM!

The escalation check is correctly placed after tool execution completes. The flow properly delegates to _park_for_approval when an escalation warrants parking.

src/ai_company/engine/react_loop.py (2)

65-70: LGTM!

Clean constructor extension with keyword-only parameter and None default, maintaining backward compatibility.


217-224: LGTM!

The approval_gate is correctly forwarded to execute_tool_calls, enabling the escalation parking flow when configured.

src/ai_company/api/state.py (4)

14-14: LGTM!

Correct import added with noqa: TC001 for type-checking compliance.


39-49: LGTM!

The __slots__ tuple is correctly updated with the new _approval_gate entry, maintaining alphabetical order.


61-66: LGTM!

Constructor correctly accepts the optional approval_gate parameter and stores it in the private attribute.


138-141: LGTM!

The property correctly returns ApprovalGate | None rather than using _require_service, which aligns with the design that approval gate is optional and doesn't require a 503 error when unconfigured.

tests/unit/engine/test_approval_gate.py (3)

12-30: Solid test module scaffolding and helper setup.

pytestmark usage and _make_escalation helper keep the tests consistent and readable.


33-46: should_park behavior coverage is concise and correct.

The empty-tuple and first-escalation assertions match the intended contract exactly.


51-274: Good depth on parking/resume lifecycle paths.

These tests cover success and failure branches (serialization, persistence, deserialization, cleanup failure) with the right behavioral assertions.

Comment on lines +151 to +160
try:
self._on_expire(expired)
except MemoryError, RecursionError:
raise
except Exception:
logger.exception(
API_APPROVAL_EXPIRED,
approval_id=item.id,
note="on_expire callback failed",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Import the project MemoryError here.

src/ai_company/memory/errors.py defines the project-specific MemoryError, and src/ai_company/api/app.py deliberately re-raises it from _on_expire(). Because this module never imports that type, Line 153 only catches the built-in MemoryError; the project exception falls into except Exception and gets downgraded to a log.

Suggested fix
 from ai_company.api.errors import ConflictError
 from ai_company.core.approval import ApprovalItem  # noqa: TC001
 from ai_company.core.enums import (
     ApprovalRiskLevel,
     ApprovalStatus,
 )
+from ai_company.memory.errors import MemoryError as StoreMemoryError
 ...
-                except MemoryError, RecursionError:
+                except StoreMemoryError, MemoryError, RecursionError:
                     raise

As per coding guidelines, "Handle errors explicitly, never silently swallow exceptions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/api/approval_store.py` around lines 151 - 160, The except
clause currently catches the built-in MemoryError but not the project-specific
MemoryError defined in ai_company.memory.errors, so re-raised project errors
fall into the generic Exception handler; import the project error (e.g., from
ai_company.memory.errors import MemoryError as ProjectMemoryError) and update
the except line to explicitly re-raise both RecursionError and the project
MemoryError (or include both types in the except tuple used for re-raising)
before the generic Exception that logs via
logger.exception(API_APPROVAL_EXPIRED, approval_id=item.id, note="on_expire
callback failed"); ensure the symbol _on_expire remains where called and no
error types are swallowed.

Comment thread src/ai_company/engine/agent_engine.py
Comment on lines +214 to +231
try:
await self._parked_context_repo.delete(parked.id)
except MemoryError, RecursionError:
raise
except Exception:
logger.exception(
APPROVAL_GATE_RESUME_DELETE_FAILED,
approval_id=approval_id,
parked_id=parked.id,
note="Context resumed but parked record not cleaned up",
)

logger.info(
APPROVAL_GATE_CONTEXT_RESUMED,
approval_id=approval_id,
parked_id=parked.id,
)
return context, parked.id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Check the repository's delete() result before claiming resume success.

ParkedContextRepository.delete() returns bool, but this method ignores a False result and still logs APPROVAL_GATE_CONTEXT_RESUMED. If cleanup fails silently, the parked record is still present and the same approval can be resumed again.

Possible fix
         try:
-            await self._parked_context_repo.delete(parked.id)
+            deleted = await self._parked_context_repo.delete(parked.id)
         except MemoryError, RecursionError:
             raise
         except Exception:
             logger.exception(
                 APPROVAL_GATE_RESUME_DELETE_FAILED,
                 approval_id=approval_id,
                 parked_id=parked.id,
                 note="Context resumed but parked record not cleaned up",
             )
+        else:
+            if not deleted:
+                logger.warning(
+                    APPROVAL_GATE_RESUME_DELETE_FAILED,
+                    approval_id=approval_id,
+                    parked_id=parked.id,
+                    note="Context resumed but parked record not cleaned up",
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/approval_gate.py` around lines 214 - 231, The current
resume flow ignores the boolean return from ParkedContextRepository.delete() and
always logs APPROVAL_GATE_CONTEXT_RESUMED; update the code in the method
containing the await self._parked_context_repo.delete(parked.id) call (e.g., the
resume handler in approval_gate.py) to capture the delete result, and if it
returns False treat it as a cleanup failure: log/exception with
APPROVAL_GATE_RESUME_DELETE_FAILED including approval_id and parked_id (and do
not log APPROVAL_GATE_CONTEXT_RESUMED), otherwise proceed to log
APPROVAL_GATE_CONTEXT_RESUMED and return (context, parked.id); preserve existing
exception handling for MemoryError/RecursionError.

Comment thread src/ai_company/engine/loop_helpers.py
Comment thread src/ai_company/tools/approval_tool.py
Comment on lines +279 to +310
def test_approved_without_reason(self) -> None:
msg = ApprovalGate.build_resume_message(
"approval-1",
approved=True,
decided_by="admin",
)
assert "APPROVED" in msg
assert "approval-1" in msg
assert "admin" in msg
assert "data only" in msg

def test_rejected_with_reason(self) -> None:
msg = ApprovalGate.build_resume_message(
"approval-1",
approved=False,
decided_by="reviewer",
decision_reason="Too risky for production",
)
assert "REJECTED" in msg
assert "approval-1" in msg
assert "reviewer" in msg
assert "Too risky for production" in msg

def test_approved_with_reason(self) -> None:
msg = ApprovalGate.build_resume_message(
"approval-1",
approved=True,
decided_by="admin",
decision_reason="Looks good",
)
assert "APPROVED" in msg
assert "Looks good" in msg
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Parameterize message-variant assertions to reduce duplication.

These three tests are structurally the same and are a good fit for a single parameterized test.

♻️ Suggested refactor
 class TestBuildResumeMessage:
-    def test_approved_without_reason(self) -> None:
-        ...
-
-    def test_rejected_with_reason(self) -> None:
-        ...
-
-    def test_approved_with_reason(self) -> None:
-        ...
+    `@pytest.mark.parametrize`(
+        ("approved", "decided_by", "decision_reason", "expected_tokens"),
+        [
+            (True, "admin", None, ["APPROVED", "approval-1", "admin", "data only"]),
+            (
+                False,
+                "reviewer",
+                "Too risky for production",
+                ["REJECTED", "approval-1", "reviewer", "Too risky for production"],
+            ),
+            (True, "admin", "Looks good", ["APPROVED", "Looks good"]),
+        ],
+    )
+    def test_build_resume_message_variants(
+        self,
+        approved: bool,
+        decided_by: str,
+        decision_reason: str | None,
+        expected_tokens: list[str],
+    ) -> None:
+        msg = ApprovalGate.build_resume_message(
+            "approval-1",
+            approved=approved,
+            decided_by=decided_by,
+            decision_reason=decision_reason,
+        )
+        for token in expected_tokens:
+            assert token in msg

As per coding guidelines, "Prefer @pytest.mark.parametrize for testing similar cases."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_approved_without_reason(self) -> None:
msg = ApprovalGate.build_resume_message(
"approval-1",
approved=True,
decided_by="admin",
)
assert "APPROVED" in msg
assert "approval-1" in msg
assert "admin" in msg
assert "data only" in msg
def test_rejected_with_reason(self) -> None:
msg = ApprovalGate.build_resume_message(
"approval-1",
approved=False,
decided_by="reviewer",
decision_reason="Too risky for production",
)
assert "REJECTED" in msg
assert "approval-1" in msg
assert "reviewer" in msg
assert "Too risky for production" in msg
def test_approved_with_reason(self) -> None:
msg = ApprovalGate.build_resume_message(
"approval-1",
approved=True,
decided_by="admin",
decision_reason="Looks good",
)
assert "APPROVED" in msg
assert "Looks good" in msg
`@pytest.mark.parametrize`(
("approved", "decided_by", "decision_reason", "expected_tokens"),
[
(True, "admin", None, ["APPROVED", "approval-1", "admin", "data only"]),
(
False,
"reviewer",
"Too risky for production",
["REJECTED", "approval-1", "reviewer", "Too risky for production"],
),
(True, "admin", "Looks good", ["APPROVED", "Looks good"]),
],
)
def test_build_resume_message_variants(
self,
approved: bool,
decided_by: str,
decision_reason: str | None,
expected_tokens: list[str],
) -> None:
msg = ApprovalGate.build_resume_message(
"approval-1",
approved=approved,
decided_by=decided_by,
decision_reason=decision_reason,
)
for token in expected_tokens:
assert token in msg
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_approval_gate.py` around lines 279 - 310, The three
tests test_approved_without_reason, test_rejected_with_reason, and
test_approved_with_reason duplicate the same structure calling
ApprovalGate.build_resume_message and asserting substrings; refactor them into a
single `@pytest.mark.parametrize` test that iterates over cases (id, approved
boolean, decided_by, decision_reason, expected_substrings) and calls
ApprovalGate.build_resume_message once per case, asserting each expected
substring is in the returned msg; update or remove the original three test
functions and keep descriptive parameter values to preserve coverage.

Comment thread tests/unit/engine/test_loop_helpers_approval.py Outdated
Comment thread tests/unit/engine/test_loop_helpers_approval.py Outdated
Comment thread web/src/stores/approvals.ts Outdated
Comment thread web/src/stores/approvals.ts Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 75.30864% with 80 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.75%. Comparing base (494013f) to head (91823f4).
⚠️ Report is 5 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/ai_company/engine/_security_factory.py 40.38% 28 Missing and 3 partials ⚠️
src/ai_company/api/controllers/approvals.py 53.84% 10 Missing and 2 partials ⚠️
src/ai_company/api/approval_store.py 33.33% 8 Missing and 2 partials ⚠️
src/ai_company/tools/invoker.py 66.66% 7 Missing and 1 partial ⚠️
src/ai_company/tools/approval_tool.py 89.83% 6 Missing ⚠️
src/ai_company/engine/approval_gate.py 94.02% 4 Missing ⚠️
src/ai_company/api/dto.py 75.00% 2 Missing and 1 partial ⚠️
src/ai_company/engine/agent_engine.py 80.00% 2 Missing and 1 partial ⚠️
src/ai_company/engine/loop_helpers.py 85.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #365      +/-   ##
==========================================
- Coverage   93.90%   93.75%   -0.16%     
==========================================
  Files         447      452       +5     
  Lines       20819    21082     +263     
  Branches     2011     2034      +23     
==========================================
+ Hits        19551    19765     +214     
- Misses        981     1021      +40     
- Partials      287      296       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Critical: empty task_id ValidationError, parking metadata bypass,
PARKED→ERROR on park failure, wire parked_context_repo, TOCTOU race
via save_if_pending, resume trigger wiring in approve/reject.

Security: prompt injection mitigation, max_length on comment,
ttl_seconds cap, action_type format validation, remove phantom
requested_by field.

Logging: wrong event constants for init/risk/park-info, consistent
MemoryError logging, new INITIALIZED/RISK_CLASSIFIED/RESUME_TRIGGERED
constants.

Code quality: extract security/tool factories (962→860 lines), split
execute() into helpers, deterministic escalation order, boolean
parking_failed, check delete() result, risk classifier error handling.

Frontend: async handler type fix, WS test payload/await fix,
filter-aware WS, 404 vs transient error discrimination, total desync
fix, empty catch logging.

Tests: park failure expects ERROR, module-level markers, extracted
fixtures, simplified PropertyMock, updated resume message assertions.

Docs: persistence qualifier, CLAUDE.md events, docstring fixes.
Copilot AI review requested due to automatic review settings March 13, 2026 23:36
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 13, 2026 23:37 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (2)
web/src/stores/approvals.ts (1)

76-95: ⚠️ Potential issue | 🟡 Minor

WS approval.submitted handler still doesn't verify filter match.

When activeFilters.value is truthy, the code bumps total without checking if the fetched item actually matches the active filters. This can cause total to drift from the actual filtered count.

Consider verifying the fetched item against active filters and only incrementing total (and optionally inserting into approvals.value) when it matches:

🔧 Suggested approach
           case 'approval.submitted':
             if (!approvals.value.some((a) => a.id === approvalId)) {
               try {
                 const item = await approvalsApi.getApproval(approvalId)
-                if (activeFilters.value) {
-                  // Filters are active — item may not match; just bump total
-                  total.value++
-                } else {
+                const matchesFilters = !activeFilters.value || itemMatchesFilters(item, activeFilters.value)
+                if (matchesFilters) {
                   approvals.value = [item, ...approvals.value]
                   total.value++
                 }

You would need to implement an itemMatchesFilters helper that checks the item's status, risk_level, etc. against activeFilters.value.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/stores/approvals.ts` around lines 76 - 95, The handler for
'approval.submitted' currently increments total.value when activeFilters.value
is set without validating the new item against those filters; implement an
itemMatchesFilters(item, filters) helper and after fetching the item via
approvalsApi.getApproval(approvalId) call it when activeFilters.value is truthy
— only increment total.value (and insert item into approvals.value if you want
it visible) when itemMatchesFilters returns true; keep existing 404/410 handling
and only fall back to the previous behavior when no filters are active.
tests/unit/engine/test_approval_gate.py (1)

296-329: 🧹 Nitpick | 🔵 Trivial

Consider parameterizing similar message tests.

These three tests follow the same pattern and could be consolidated using @pytest.mark.parametrize. However, the current form is readable and the tests are few, so this is optional.

Parameterized version (optional)
`@pytest.mark.parametrize`(
    ("approved", "decided_by", "decision_reason", "expected_tokens"),
    [
        (True, "admin", None, ["APPROVED", "approval-1", "admin", "[SYSTEM:"]),
        (False, "reviewer", "Too risky for production", 
         ["REJECTED", "approval-1", "reviewer", "Too risky for production", "USER-SUPPLIED REASON", "untrusted data"]),
        (True, "admin", "Looks good", ["APPROVED", "Looks good", "USER-SUPPLIED REASON"]),
    ],
)
def test_build_resume_message_variants(
    self,
    approved: bool,
    decided_by: str,
    decision_reason: str | None,
    expected_tokens: list[str],
) -> None:
    msg = ApprovalGate.build_resume_message(
        "approval-1",
        approved=approved,
        decided_by=decided_by,
        decision_reason=decision_reason,
    )
    for token in expected_tokens:
        assert token in msg

As per coding guidelines: "Prefer @pytest.mark.parametrize for testing similar cases."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_approval_gate.py` around lines 296 - 329, The three
repetitive tests (test_approved_without_reason, test_rejected_with_reason,
test_approved_with_reason) should be consolidated into a single parametrized
test that calls ApprovalGate.build_resume_message with different (approved,
decided_by, decision_reason) inputs and asserts expected substrings; add a
pytest.mark.parametrize decorator (e.g., test_build_resume_message_variants)
with rows for each case and loop over expected_tokens asserting each is in msg,
then remove the three original test_... functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Line 154: The sentence listing event constants after "Event names" is missing
a comma after "e.g."; update the phrase "e.g. `PROVIDER_CALL_START` ..." to
"e.g., `PROVIDER_CALL_START` ..." so the abbreviation is punctuated consistently
in the long list (look for the "Event names" paragraph and the "e.g." before the
`PROVIDER_CALL_START`/events constant list).

In `@src/ai_company/api/controllers/approvals.py`:
- Around line 219-225: Replace the success event used in the exception handler:
create a new event constant APPROVAL_GATE_RESUME_FAILED in
ai_company.observability.events.approval_gate, import it into approvals.py, and
change the logger.warning call inside the except block to use
APPROVAL_GATE_RESUME_FAILED (keeping the same message, approval_id, note and
exc_info=True) instead of APPROVAL_GATE_RESUME_TRIGGERED so failure logging is
correctly categorized.

In `@src/ai_company/engine/approval_gate.py`:
- Around line 213-232: The warning "delete() returned False" can be emitted
incorrectly when delete() actually raised and was logged by logger.exception;
fix by tracking whether delete raised: introduce a boolean (e.g. delete_errored
= False) before calling self._parked_context_repo.delete, set delete_errored =
True inside the broad except Exception block that calls
logger.exception(APPROVAL_GATE_RESUME_DELETE_FAILED, ...), and change the final
check to only log the warning when not deleted and not delete_errored (if not
deleted and not delete_errored: logger.warning(...)). This uses the existing
symbols self._parked_context_repo.delete, deleted, logger.exception,
logger.warning, and APPROVAL_GATE_RESUME_DELETE_FAILED.

In `@src/ai_company/engine/loop_helpers.py`:
- Around line 373-378: The debug log currently emits the
APPROVAL_GATE_CONTEXT_PARKED event while only noting a missing task_id; change
this to a semantically correct diagnostic: either remove the parked event
constant and call logger.debug with a plain message about the missing
task_execution/task_id, or add a new event constant (e.g.,
APPROVAL_GATE_CONTEXT_PARK_NO_TASK) to
ai_company.observability.events.approval_gate and use that here instead of
APPROVAL_GATE_CONTEXT_PARKED; update the logger.debug invocation in
loop_helpers.py (the call that passes approval_id, agent_id and note="No
task_execution on context — task_id will be None") to use the chosen option so
observability events remain accurate.

In `@src/ai_company/tools/invoker.py`:
- Around line 633-634: The construction of ApprovalRiskLevel from
result.metadata can raise ValueError for invalid strings; update the code that
builds the object (the ApprovalRiskLevel(...) call using
result.metadata.get("risk_level", "high")) to validate or convert the metadata
value first: read risk_str = result.metadata.get("risk_level"), check if
risk_str is a valid member of ApprovalRiskLevel (or wrap
ApprovalRiskLevel(risk_str) in a try/except ValueError), and on failure log a
clear metadata validation warning and fall back to a safe default (e.g., "high")
before passing into ApprovalRiskLevel; ensure the change is applied where
ApprovalRiskLevel is instantiated so invalid metadata never raises an uncaught
ValueError.

In `@web/src/__tests__/stores/approvals.test.ts`:
- Around line 286-290: Add an afterEach hook that calls vi.restoreAllMocks() to
fully restore spies between tests (in addition to the existing
vi.clearAllMocks()); also change the axios.isAxiosError spy at the second
location to capture the original implementation (like the first instance) and
delegate to it for non-matching errors by storing (await
import('axios')).default.isAxiosError in originalIsAxiosError and calling
originalIsAxiosError(err) when err !== axiosError so the spy preserves the
original behavior.

In `@web/src/stores/approvals.ts`:
- Around line 96-117: The handler for
'approval.approved'/'approval.rejected'/'approval.expired' currently replaces
the item in approvals.value regardless of current activeFilters; instead, after
fetching the updated approval via approvalsApi.getApproval(approvalId) check
whether the updated item matches the current activeFilters (use whatever
predicate/function you use elsewhere to test filter membership) and if it does
replace the item in approvals.value, otherwise remove it from approvals.value
and decrement total.value accordingly; keep the existing 404/410 removal branch
as-is and perform this filter-check/replace-or-remove logic in the same switch
case that currently updates approvals.value.

---

Duplicate comments:
In `@tests/unit/engine/test_approval_gate.py`:
- Around line 296-329: The three repetitive tests (test_approved_without_reason,
test_rejected_with_reason, test_approved_with_reason) should be consolidated
into a single parametrized test that calls ApprovalGate.build_resume_message
with different (approved, decided_by, decision_reason) inputs and asserts
expected substrings; add a pytest.mark.parametrize decorator (e.g.,
test_build_resume_message_variants) with rows for each case and loop over
expected_tokens asserting each is in msg, then remove the three original
test_... functions.

In `@web/src/stores/approvals.ts`:
- Around line 76-95: The handler for 'approval.submitted' currently increments
total.value when activeFilters.value is set without validating the new item
against those filters; implement an itemMatchesFilters(item, filters) helper and
after fetching the item via approvalsApi.getApproval(approvalId) call it when
activeFilters.value is truthy — only increment total.value (and insert item into
approvals.value if you want it visible) when itemMatchesFilters returns true;
keep existing 404/410 handling and only fall back to the previous behavior when
no filters are active.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cf611785-f95c-4d33-8547-1af4d9797c41

📥 Commits

Reviewing files that changed from the base of the PR and between b999b65 and 3b5068f.

📒 Files selected for processing (22)
  • CLAUDE.md
  • docs/design/engine.md
  • src/ai_company/api/approval_store.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/api/dto.py
  • src/ai_company/api/ws_models.py
  • src/ai_company/engine/_security_factory.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/engine/loop_helpers.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/tools/invoker.py
  • tests/unit/api/controllers/test_approvals.py
  • tests/unit/api/test_dto.py
  • tests/unit/engine/test_approval_gate.py
  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/observability/test_events.py
  • web/src/__tests__/stores/approvals.test.ts
  • web/src/stores/approvals.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Greptile Review
  • GitHub Check: Test (Python 3.14)
🧰 Additional context used
📓 Path-based instructions (9)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python version 3.14+ with PEP 649 native lazy annotations required.
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14.
Line length must be 88 characters, enforced by ruff.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Tests must use test-provider, test-small-001, etc.

Files:

  • tests/unit/api/controllers/test_approvals.py
  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/engine/_security_factory.py
  • tests/unit/engine/test_loop_helpers_approval.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/api/ws_models.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/api/dto.py
  • src/ai_company/tools/approval_tool.py
  • tests/unit/api/test_dto.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/api/approval_store.py
  • tests/unit/engine/test_approval_gate.py
  • src/ai_company/engine/loop_helpers.py
  • tests/unit/observability/test_events.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Tests must use markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Test coverage minimum: 80% (enforced in CI).
Async tests: use asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.

Files:

  • tests/unit/api/controllers/test_approvals.py
  • tests/unit/engine/test_loop_helpers_approval.py
  • tests/unit/api/test_dto.py
  • tests/unit/engine/test_approval_gate.py
  • tests/unit/observability/test_events.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: All public functions must have type hints. Enforce via mypy strict mode.
Google-style docstrings required on public classes and functions, enforced by ruff D rules.
Create new objects instead of mutating existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction plus MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with BaseModel, model_validator, computed_field, ConfigDict. Use @computed_field for derived values instead of storing redundant fields (e.g., TokenUsage.total_tokens).
Use NotBlankStr from core.types for all identifier/name fields in Pydantic models, including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants, instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow exceptions.
Validate at system boundaries: user input, external APIs, config files.
NEVER use import logging, logging.getLogger(), or print() in application code.
Always use logger as the variable name for loggers (not _logger, not log).
Event names must always use constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import dire...

Files:

  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/engine/_security_factory.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/api/ws_models.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/api/dto.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/api/approval_store.py
  • src/ai_company/engine/loop_helpers.py
src/ai_company/**/[!_]*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__)

Files:

  • src/ai_company/security/timeout/park_service.py
  • src/ai_company/security/timeout/parked_context.py
  • src/ai_company/api/controllers/approvals.py
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/api/ws_models.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/engine/approval_gate.py
  • src/ai_company/api/dto.py
  • src/ai_company/tools/approval_tool.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/api/approval_store.py
  • src/ai_company/engine/loop_helpers.py
web/src/**/*.{vue,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Web dashboard: Vue 3 + PrimeVue + Tailwind CSS, organized by feature in src/components/, src/stores/, src/views/. Enforce with ESLint and vue-tsc type-checking.

Files:

  • web/src/stores/approvals.ts
  • web/src/__tests__/stores/approvals.test.ts
web/src/stores/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Frontend state management: Pinia stores organized by feature in src/stores/

Files:

  • web/src/stores/approvals.ts
web/src/__tests__/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Web dashboard tests: Vitest unit tests organized by feature in tests/

Files:

  • web/src/__tests__/stores/approvals.test.ts
docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

When approved deviations occur, update the relevant docs/design/ page to reflect the new reality.

Files:

  • docs/design/engine.md
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Documentation source in docs/ built with Zensical. Design spec in docs/design/ (7 pages). Architecture in docs/architecture/. Roadmap in docs/roadmap/. Security in docs/security.md.

Files:

  • docs/design/engine.md
🧠 Learnings (16)
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/ai_company/**/[!_]*.py : Every module with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Event names must always use constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • CLAUDE.md
  • src/ai_company/observability/events/approval_gate.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/loop_helpers.py
  • tests/unit/observability/test_events.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Structured logging: always use `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : NEVER use `import logging`, `logging.getLogger()`, or `print()` in application code.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Always use `logger` as the variable name for loggers (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : DEBUG logging for object creation, internal flow, and entry/exit of key functions.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.907Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.907Z
Learning: When review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes), fix them all. No deferring, no "out of scope" skipping.

Applied to files:

  • web/src/stores/approvals.ts
📚 Learning: 2026-03-13T21:03:58.907Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.907Z
Learning: Applies to web/src/stores/**/*.ts : Frontend state management: Pinia stores organized by feature in src/stores/

Applied to files:

  • web/src/stores/approvals.ts
📚 Learning: 2026-03-13T21:03:58.907Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.907Z
Learning: Applies to web/src/__tests__/**/*.ts : Web dashboard tests: Vitest unit tests organized by feature in __tests__/

Applied to files:

  • web/src/__tests__/stores/approvals.test.ts
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Use Pydantic v2 with `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. Use `computed_field` for derived values instead of storing redundant fields (e.g., `TokenUsage.total_tokens`).

Applied to files:

  • src/ai_company/api/dto.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases.

Applied to files:

  • tests/unit/engine/test_approval_gate.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to tests/**/*.py : Tests must use markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`

Applied to files:

  • tests/unit/observability/test_events.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.

Applied to files:

  • tests/unit/observability/test_events.py
🧬 Code graph analysis (14)
src/ai_company/security/timeout/park_service.py (1)
src/ai_company/engine/parallel_models.py (1)
  • task_id (87-89)
src/ai_company/security/timeout/parked_context.py (2)
src/ai_company/engine/parallel_models.py (1)
  • task_id (87-89)
src/ai_company/tools/base.py (1)
  • description (138-140)
src/ai_company/api/controllers/approvals.py (5)
src/ai_company/api/state.py (2)
  • approval_gate (139-141)
  • AppState (23-163)
src/ai_company/engine/approval_gate.py (3)
  • ApprovalGate (40-274)
  • resume_context (162-239)
  • build_resume_message (242-274)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
src/ai_company/api/approval_store.py (2)
  • get (61-73)
  • save_if_pending (125-142)
src/ai_company/api/errors.py (2)
  • UnauthorizedError (59-65)
  • ConflictError (41-47)
tests/unit/engine/test_loop_helpers_approval.py (4)
src/ai_company/engine/approval_gate.py (3)
  • ApprovalGate (40-274)
  • should_park (71-90)
  • park_context (92-160)
src/ai_company/engine/loop_helpers.py (1)
  • execute_tool_calls (241-343)
src/ai_company/engine/loop_protocol.py (2)
  • ExecutionResult (79-140)
  • TerminationReason (28-36)
src/ai_company/providers/models.py (1)
  • CompletionResponse (257-306)
web/src/stores/approvals.ts (2)
src/ai_company/api/ws_models.py (1)
  • WsEvent (48-73)
web/src/api/types.ts (1)
  • WsEvent (519-524)
src/ai_company/tools/invoker.py (3)
src/ai_company/core/enums.py (1)
  • ApprovalRiskLevel (443-449)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/tools/errors.py (1)
  • ToolExecutionError (55-56)
web/src/__tests__/stores/approvals.test.ts (2)
web/src/stores/approvals.ts (1)
  • useApprovalStore (8-136)
web/src/api/types.ts (1)
  • WsEvent (519-524)
src/ai_company/engine/approval_gate.py (5)
src/ai_company/persistence/repositories.py (1)
  • ParkedContextRepository (199-267)
src/ai_company/security/timeout/park_service.py (3)
  • ParkService (29-144)
  • park (37-106)
  • resume (108-144)
src/ai_company/security/timeout/parked_context.py (1)
  • ParkedContext (19-66)
src/ai_company/engine/approval_gate_models.py (1)
  • EscalationInfo (14-33)
src/ai_company/engine/context.py (1)
  • AgentContext (87-307)
src/ai_company/api/dto.py (1)
src/ai_company/engine/parallel_models.py (1)
  • task_id (87-89)
src/ai_company/tools/approval_tool.py (6)
src/ai_company/core/enums.py (1)
  • ToolCategory (294-308)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/tools/base.py (6)
  • BaseTool (57-184)
  • ToolExecutionResult (25-54)
  • description (138-140)
  • category (128-130)
  • action_type (133-135)
  • parameters_schema (143-151)
src/ai_company/api/approval_store.py (1)
  • add (42-59)
src/ai_company/security/timeout/risk_tier_classifier.py (1)
  • DefaultRiskTierClassifier (62-101)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
tests/unit/api/test_dto.py (3)
src/ai_company/tools/base.py (2)
  • action_type (133-135)
  • description (138-140)
src/ai_company/core/enums.py (1)
  • ApprovalRiskLevel (443-449)
src/ai_company/api/dto.py (1)
  • CreateApprovalRequest (196-249)
src/ai_company/engine/agent_engine.py (5)
src/ai_company/engine/_security_factory.py (2)
  • make_security_interceptor (39-117)
  • registry_with_approval_tool (120-148)
src/ai_company/engine/approval_gate.py (1)
  • ApprovalGate (40-274)
src/ai_company/persistence/repositories.py (1)
  • ParkedContextRepository (199-267)
src/ai_company/engine/loop_protocol.py (1)
  • ExecutionLoop (151-189)
src/ai_company/engine/react_loop.py (1)
  • ReactLoop (55-269)
src/ai_company/api/approval_store.py (4)
src/ai_company/core/approval.py (1)
  • ApprovalItem (24-96)
src/ai_company/core/enums.py (1)
  • ApprovalStatus (434-440)
src/ai_company/api/app.py (1)
  • _on_expire (74-99)
src/ai_company/memory/errors.py (1)
  • MemoryError (13-14)
tests/unit/engine/test_approval_gate.py (3)
src/ai_company/engine/approval_gate.py (3)
  • ApprovalGate (40-274)
  • park_context (92-160)
  • build_resume_message (242-274)
src/ai_company/persistence/repositories.py (1)
  • ParkedContextRepository (199-267)
src/ai_company/security/timeout/park_service.py (3)
  • ParkService (29-144)
  • park (37-106)
  • resume (108-144)
🪛 LanguageTool
CLAUDE.md

[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

🔇 Additional comments (37)
src/ai_company/api/ws_models.py (1)

45-45: LGTM!

The new APPROVAL_RESUMED enum member follows the established naming convention and aligns with the other approval-related WebSocket event types.

src/ai_company/security/timeout/parked_context.py (1)

41-43: LGTM!

The change to make task_id optional (NotBlankStr | None) correctly supports the taskless agents scenario. The default of None and updated description align with the PR objectives.

src/ai_company/security/timeout/park_service.py (1)

37-45: LGTM!

The signature change to accept optional task_id is well-implemented. The equality check at lines 92-97 correctly handles None comparison (Python's None == None evaluates to True), and the docstring clearly documents the taskless agent use case.

web/src/stores/approvals.ts (1)

87-93: LGTM on error handling differentiation.

The error handling now correctly distinguishes between definitive 404/410 responses (item genuinely gone) and transient errors (timeouts, 5xx). This addresses the previous review concern about treating all fetch errors as deletes.

Also applies to: 104-115

src/ai_company/api/dto.py (2)

221-232: LGTM!

The action_type field validator correctly enforces the category:action format. Using strip() to check for whitespace-only parts and the _ACTION_TYPE_PARTS constant for the magic number are good practices.


217-217: LGTM!

The ttl_seconds upper bound of 604800 (7 days) provides reasonable protection against excessive TTL values while maintaining flexibility.

CLAUDE.md (1)

101-101: LGTM!

Documentation updates accurately reflect the new approval gate integration across the API, engine, and tools modules.

Also applies to: 107-107, 115-115

src/ai_company/engine/_security_factory.py (2)

39-117: LGTM!

The make_security_interceptor function has proper defensive checks for configuration mismatches (autonomy without security config), appropriate logging before raising errors, and clean construction of the security stack with conditional detector inclusion.


120-148: LGTM!

The registry_with_approval_tool factory cleanly handles the optional approval store case and uses deferred imports to avoid circular dependencies. The pattern of building a new registry with existing tools plus the approval tool maintains immutability.

src/ai_company/tools/approval_tool.py (4)

97-133: LGTM!

The execute method is well-structured as an orchestrator, delegating to focused helper methods for validation, risk classification, persistence, and result building. This addresses the previous review concern about method length.


162-163: LGTM on PEP 758 exception syntax.

The except MemoryError, RecursionError: syntax correctly uses the Python 3.14 PEP 758 format (comma-separated, no parentheses) as required by the coding guidelines.

Also applies to: 238-239


135-178: LGTM!

The _persist_item helper properly handles persistence errors, re-raises critical exceptions (MemoryError, RecursionError), and logs failures with context before returning an error result.


229-252: LGTM!

The _classify_risk method correctly implements fail-safe behavior by defaulting to HIGH when classification fails or no classifier is configured, aligning with the D19 design spec mentioned in DefaultRiskTierClassifier.

src/ai_company/observability/events/approval_gate.py (1)

1-16: LGTM!

The event constants are well-structured, following the approval_gate.* naming convention consistently. The coverage spans the full approval gate lifecycle (initialization, escalation detection, context parking/resume, and failure cases).

tests/unit/observability/test_events.py (1)

135-135: LGTM!

The module-level pytestmark consolidation is cleaner than per-class markers, and adding "approval_gate" to the expected domain modules ensures the new event constants file is properly discovered by the test suite.

Also applies to: 176-178

src/ai_company/engine/approval_gate.py (1)

241-274: Good prompt injection mitigation in build_resume_message.

The use of repr() wrapping for user-supplied values and explicit labeling of untrusted data is a solid defense against prompt injection in the resume flow.

src/ai_company/engine/agent_engine.py (3)

139-142: LGTM — approval gate wiring is correctly sequenced.

The initialization order ensures _approval_store and _parked_context_repo are assigned before _make_approval_gate() is called, and the approval gate is created before _make_default_loop(). This addresses the previously flagged issue about wiring the parked_context_repo.


597-617: LGTM — approval gate factory properly guards on approval store.

The factory correctly returns None when no approval store is configured, ensuring the execution loop skips approval-gate checks in that case. The late import of ParkService avoids circular dependencies.


631-655: LGTM — tool invoker properly augmented with approval tool.

The registry augmentation via registry_with_approval_tool correctly passes the identity and task_id, enabling the approval tool when an approval store is configured.

tests/unit/api/controllers/test_approvals.py (1)

22-27: LGTM — action_type format updated to match new validation.

The change from "code_merge" to "code:merge" aligns with the CreateApprovalRequest validator that now enforces the "category:action" format.

tests/unit/api/test_dto.py (1)

40-45: LGTM — action_type format consistently updated across all metadata tests.

All four test cases now use "deploy:release" to comply with the "category:action" format requirement, allowing the tests to properly exercise the metadata validation logic without failing on action_type validation.

Also applies to: 51-56, 62-67, 71-76

docs/design/engine.md (1)

450-468: LGTM — documentation accurately describes the new approval gate flow.

The additions clearly explain the escalation detection, parking mechanism, and how PARKED termination interacts with the task lifecycle. The documentation aligns well with the implementation in ApprovalGate and loop_helpers.

src/ai_company/api/approval_store.py (2)

125-142: LGTM — save_if_pending correctly mitigates TOCTOU race.

The method applies lazy expiration check before comparing status, ensuring concurrent expirations don't result in saving a decision on an already-expired item. Returning None when the item is no longer PENDING allows callers to detect and handle concurrent decisions.


170-179: LGTM — exception handling in _check_expiration is appropriate.

The try/except properly re-raises MemoryError and RecursionError as critical errors, while logging other callback failures without disrupting the expiration flow. This aligns with the coding guideline to handle errors explicitly.

tests/unit/engine/test_loop_helpers_approval.py (1)

1-205: LGTM! Comprehensive test coverage for approval gate integration.

The test file properly covers the key scenarios:

  • No approval gate returns context normally
  • With gate but no escalation returns context
  • Escalation triggers parking with correct metadata
  • Park failure correctly returns ERROR (not PARKED)

Past review feedback has been addressed: PropertyMock removed in favor of simple attribute assignment, and park failure now expects TerminationReason.ERROR as appropriate.

src/ai_company/engine/loop_helpers.py (2)

241-343: Well-structured approval gate integration.

The execute_tool_calls function cleanly integrates the approval gate check after tool results are processed. The flow correctly:

  • Checks for escalations only when approval_gate is provided
  • Delegates to _park_for_approval for the parking logic
  • Returns updated context when no parking is needed

346-419: Proper error handling in parking flow.

The _park_for_approval helper correctly:

  • Returns ERROR (not PARKED) when parking fails — this ensures non-resumable failures are not masked
  • Uses boolean parking_failed in metadata (addresses past review feedback)
  • Re-raises MemoryError and RecursionError appropriately
src/ai_company/api/controllers/approvals.py (3)

101-109: Correct PEP 758 except syntax and error handling.

The exception handling properly re-raises MemoryError and RecursionError while allowing other exceptions to be logged as warnings without disrupting the approval flow.


319-322: Authentication enforcement for create_approval.

Good security hardening — requested_by is now bound to the authenticated user's username rather than accepting it from the request payload, mitigating potential spoofing.


410-418: TOCTOU race condition mitigated with save_if_pending.

Using save_if_pending instead of a separate check-then-save properly handles concurrent approval decisions. The ConflictError (409) response is appropriate for this scenario.

src/ai_company/tools/invoker.py (3)

120-136: Escalation tracking property is well-documented.

The pending_escalations property clearly documents when it's populated (ESCALATE verdicts, parking metadata) and when it's cleared (start of invoke/invoke_all). The tuple return type ensures immutability for callers.

Note: The past review concern about "approval gating discovered too late" is an architectural limitation — concurrent tool calls can complete before parking is checked. This would require a fundamentally different invocation model to address.


773-779: Deterministic escalation ordering.

Sorting escalations by original tool-call index ensures consistent behavior regardless of concurrent execution order. This is important for reproducible parking decisions.


601-652: Defensive escalation tracking with fail-closed behavior.

The _track_parking_metadata helper properly:

  • Checks both requires_parking=True and presence of approval_id
  • Re-raises MemoryError/RecursionError
  • Raises ToolExecutionError on tracking failure to prevent silent bypass of approval gate
tests/unit/engine/test_approval_gate.py (2)

34-54: Good use of fixtures to reduce test setup duplication.

The park_service, parked_mock, and repo fixtures cleanly extract the repeated mock wiring that was noted in past reviews. This improves maintainability.


57-290: Comprehensive test coverage for ApprovalGate.

The tests thoroughly cover:

  • should_park: empty input and multiple escalations
  • park_context: service calls, repo persistence, no-repo path, serialization errors, repo save errors
  • resume_context: successful resume, unknown approval, no repo, deletion after resume, deserialization failure, delete failure resilience

Edge cases like delete failure not losing context (lines 269-290) are particularly valuable for reliability.

web/src/__tests__/stores/approvals.test.ts (2)

6-6: Good async test synchronization for handleWsEvent.

Using flushPromises here is a solid fit since handleWsEvent schedules async work and returns synchronously.


163-279: WS event coverage is strong and aligned with the new approval_id contract.

These cases now validate fetch-on-event behavior, duplicate prevention, and total-count handling under filtered and error conditions well.

Also applies to: 341-366

Comment thread CLAUDE.md
- **Never** use `import logging` / `logging.getLogger()` / `print()` in application code
- **Variable name**: always `logger` (not `_logger`, not `log`)
- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `API_ROUTE_NOT_FOUND` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`, `SECURITY_EVALUATE_START` from `events.security`, `HR_HIRING_REQUEST_CREATED` from `events.hr`, `PERF_METRIC_RECORDED` from `events.performance`, `TRUST_EVALUATE_START` from `events.trust`, `PROMOTION_EVALUATE_START` from `events.promotion`, `PROMPT_BUILD_START` from `events.prompt`, `MEMORY_RETRIEVAL_START` from `events.memory`, `MEMORY_BACKEND_CONNECTED` from `events.memory`, `MEMORY_ENTRY_STORED` from `events.memory`, `MEMORY_BACKEND_SYSTEM_ERROR` from `events.memory`, `AUTONOMY_ACTION_AUTO_APPROVED` from `events.autonomy`, `TIMEOUT_POLICY_EVALUATED` from `events.timeout`, `PERSISTENCE_AUDIT_ENTRY_SAVED` from `events.persistence`, `TASK_ENGINE_STARTED` from `events.task_engine`, `COORDINATION_STARTED` from `events.coordination`, `COMMUNICATION_DISPATCH_START` from `events.communication`, `COMPANY_STARTED` from `events.company`, `CONFIG_LOADED` from `events.config`, `CORRELATION_ID_CREATED` from `events.correlation`, `DECOMPOSITION_STARTED` from `events.decomposition`, `DELEGATION_STARTED` from `events.delegation`, `EXECUTION_LOOP_STARTED` from `events.execution`, `GIT_OPERATION_START` from `events.git`, `PARALLEL_EXECUTION_STARTED` from `events.parallel`, `PERSONALITY_LOADED` from `events.personality`, `QUOTA_CHECKED` from `events.quota`, `ROLE_ASSIGNED` from `events.role`, `ROUTING_STARTED` from `events.routing`, `SANDBOX_EXECUTE_START` from `events.sandbox`, `TASK_CREATED` from `events.task`, `TASK_ASSIGNMENT_STARTED` from `events.task_assignment`, `TASK_ROUTING_STARTED` from `events.task_routing`, `TEMPLATE_LOADED` from `events.template`, `TOOL_INVOKE_START` from `events.tool`, `WORKSPACE_CREATED` from `events.workspace`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `API_ROUTE_NOT_FOUND` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`, `SECURITY_EVALUATE_START` from `events.security`, `HR_HIRING_REQUEST_CREATED` from `events.hr`, `PERF_METRIC_RECORDED` from `events.performance`, `TRUST_EVALUATE_START` from `events.trust`, `PROMOTION_EVALUATE_START` from `events.promotion`, `PROMPT_BUILD_START` from `events.prompt`, `MEMORY_RETRIEVAL_START` from `events.memory`, `MEMORY_BACKEND_CONNECTED` from `events.memory`, `MEMORY_ENTRY_STORED` from `events.memory`, `MEMORY_BACKEND_SYSTEM_ERROR` from `events.memory`, `AUTONOMY_ACTION_AUTO_APPROVED` from `events.autonomy`, `TIMEOUT_POLICY_EVALUATED` from `events.timeout`, `PERSISTENCE_AUDIT_ENTRY_SAVED` from `events.persistence`, `TASK_ENGINE_STARTED` from `events.task_engine`, `COORDINATION_STARTED` from `events.coordination`, `COMMUNICATION_DISPATCH_START` from `events.communication`, `COMPANY_STARTED` from `events.company`, `CONFIG_LOADED` from `events.config`, `CORRELATION_ID_CREATED` from `events.correlation`, `DECOMPOSITION_STARTED` from `events.decomposition`, `DELEGATION_STARTED` from `events.delegation`, `EXECUTION_LOOP_STARTED` from `events.execution`, `GIT_OPERATION_START` from `events.git`, `PARALLEL_EXECUTION_STARTED` from `events.parallel`, `PERSONALITY_LOADED` from `events.personality`, `QUOTA_CHECKED` from `events.quota`, `ROLE_ASSIGNED` from `events.role`, `ROUTING_STARTED` from `events.routing`, `SANDBOX_EXECUTE_START` from `events.sandbox`, `TASK_CREATED` from `events.task`, `TASK_ASSIGNMENT_STARTED` from `events.task_assignment`, `TASK_ROUTING_STARTED` from `events.task_routing`, `TEMPLATE_LOADED` from `events.template`, `TOOL_INVOKE_START` from `events.tool`, `WORKSPACE_CREATED` from `events.workspace`, `APPROVAL_GATE_ESCALATION_DETECTED` from `events.approval_gate`, `APPROVAL_GATE_INITIALIZED` from `events.approval_gate`, `APPROVAL_GATE_RISK_CLASSIFIED` from `events.approval_gate`, `APPROVAL_GATE_RESUME_TRIGGERED` from `events.approval_gate`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding comma after "e.g." for consistency.

Static analysis flagged the missing comma. While minor, it improves readability in the dense event constant listing.

🧰 Tools
🪛 LanguageTool

[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` at line 154, The sentence listing event constants after "Event
names" is missing a comma after "e.g."; update the phrase "e.g.
`PROVIDER_CALL_START` ..." to "e.g., `PROVIDER_CALL_START` ..." so the
abbreviation is punctuated consistently in the long list (look for the "Event
names" paragraph and the "e.g." before the `PROVIDER_CALL_START`/events constant
list).

Comment on lines +219 to +225
except Exception:
logger.warning(
APPROVAL_GATE_RESUME_TRIGGERED,
approval_id=approval_id,
note="Resume trigger failed — decision was saved successfully",
exc_info=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Success event used for failure logging.

APPROVAL_GATE_RESUME_TRIGGERED semantically indicates a successful resume trigger. Using it for the failure path (with exc_info=True) will confuse observability dashboards and alerting.

Consider using a separate event constant like APPROVAL_GATE_RESUME_FAILED for this error path.

Suggested fix
     except Exception:
         logger.warning(
-            APPROVAL_GATE_RESUME_TRIGGERED,
+            APPROVAL_GATE_RESUME_FAILED,
             approval_id=approval_id,
             note="Resume trigger failed — decision was saved successfully",
             exc_info=True,
         )

You'll need to add APPROVAL_GATE_RESUME_FAILED to ai_company.observability.events.approval_gate and import it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/api/controllers/approvals.py` around lines 219 - 225, Replace
the success event used in the exception handler: create a new event constant
APPROVAL_GATE_RESUME_FAILED in ai_company.observability.events.approval_gate,
import it into approvals.py, and change the logger.warning call inside the
except block to use APPROVAL_GATE_RESUME_FAILED (keeping the same message,
approval_id, note and exc_info=True) instead of APPROVAL_GATE_RESUME_TRIGGERED
so failure logging is correctly categorized.

Comment on lines +213 to +232
deleted = False
try:
deleted = await self._parked_context_repo.delete(parked.id)
except MemoryError, RecursionError:
raise
except Exception:
logger.exception(
APPROVAL_GATE_RESUME_DELETE_FAILED,
approval_id=approval_id,
parked_id=parked.id,
note="Context resumed but parked record not cleaned up",
)

if not deleted:
logger.warning(
APPROVAL_GATE_RESUME_DELETE_FAILED,
approval_id=approval_id,
parked_id=parked.id,
note="delete() returned False — parked record may still exist",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading warning message when delete throws an exception.

If delete() raises an exception, it's logged via logger.exception(), but then the if not deleted: block also fires with the note "delete() returned False" — which is incorrect since it threw, not returned False. This creates redundant/confusing logging.

Suggested fix
         deleted = False
+        delete_exception = False
         try:
             deleted = await self._parked_context_repo.delete(parked.id)
         except MemoryError, RecursionError:
             raise
         except Exception:
+            delete_exception = True
             logger.exception(
                 APPROVAL_GATE_RESUME_DELETE_FAILED,
                 approval_id=approval_id,
                 parked_id=parked.id,
                 note="Context resumed but parked record not cleaned up",
             )

-        if not deleted:
+        if not deleted and not delete_exception:
             logger.warning(
                 APPROVAL_GATE_RESUME_DELETE_FAILED,
                 approval_id=approval_id,
                 parked_id=parked.id,
                 note="delete() returned False — parked record may still exist",
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
deleted = False
try:
deleted = await self._parked_context_repo.delete(parked.id)
except MemoryError, RecursionError:
raise
except Exception:
logger.exception(
APPROVAL_GATE_RESUME_DELETE_FAILED,
approval_id=approval_id,
parked_id=parked.id,
note="Context resumed but parked record not cleaned up",
)
if not deleted:
logger.warning(
APPROVAL_GATE_RESUME_DELETE_FAILED,
approval_id=approval_id,
parked_id=parked.id,
note="delete() returned False — parked record may still exist",
)
deleted = False
delete_exception = False
try:
deleted = await self._parked_context_repo.delete(parked.id)
except MemoryError, RecursionError:
raise
except Exception:
delete_exception = True
logger.exception(
APPROVAL_GATE_RESUME_DELETE_FAILED,
approval_id=approval_id,
parked_id=parked.id,
note="Context resumed but parked record not cleaned up",
)
if not deleted and not delete_exception:
logger.warning(
APPROVAL_GATE_RESUME_DELETE_FAILED,
approval_id=approval_id,
parked_id=parked.id,
note="delete() returned False — parked record may still exist",
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/approval_gate.py` around lines 213 - 232, The warning
"delete() returned False" can be emitted incorrectly when delete() actually
raised and was logged by logger.exception; fix by tracking whether delete
raised: introduce a boolean (e.g. delete_errored = False) before calling
self._parked_context_repo.delete, set delete_errored = True inside the broad
except Exception block that calls
logger.exception(APPROVAL_GATE_RESUME_DELETE_FAILED, ...), and change the final
check to only log the warning when not deleted and not delete_errored (if not
deleted and not delete_errored: logger.warning(...)). This uses the existing
symbols self._parked_context_repo.delete, deleted, logger.exception,
logger.warning, and APPROVAL_GATE_RESUME_DELETE_FAILED.

Comment on lines +373 to +378
logger.debug(
APPROVAL_GATE_CONTEXT_PARKED,
approval_id=escalation.approval_id,
agent_id=agent_id,
note="No task_execution on context — task_id will be None",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading event constant for debug log.

APPROVAL_GATE_CONTEXT_PARKED is logged before parking actually occurs. This log is meant to note a missing task_id, not a successful park. Using a "parked" event here is semantically incorrect and could confuse observability tooling.

Consider using a DEBUG-level log without the parked event constant, or introduce a separate event for this diagnostic note.

Suggested fix
     else:
-        logger.debug(
-            APPROVAL_GATE_CONTEXT_PARKED,
+        logger.debug(
+            "approval_gate_context_park_no_task",
             approval_id=escalation.approval_id,
             agent_id=agent_id,
             note="No task_execution on context — task_id will be None",
         )

Alternatively, define a new event constant like APPROVAL_GATE_CONTEXT_PARK_NO_TASK in events.approval_gate if this diagnostic is important for observability.

Based on learnings: "Event names must always use constants from domain-specific modules under ai_company.observability.events" — the current usage misapplies the PARKED event for a pre-park diagnostic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/loop_helpers.py` around lines 373 - 378, The debug log
currently emits the APPROVAL_GATE_CONTEXT_PARKED event while only noting a
missing task_id; change this to a semantically correct diagnostic: either remove
the parked event constant and call logger.debug with a plain message about the
missing task_execution/task_id, or add a new event constant (e.g.,
APPROVAL_GATE_CONTEXT_PARK_NO_TASK) to
ai_company.observability.events.approval_gate and use that here instead of
APPROVAL_GATE_CONTEXT_PARKED; update the logger.debug invocation in
loop_helpers.py (the call that passes approval_id, agent_id and note="No
task_execution on context — task_id will be None") to use the chosen option so
observability events remain accurate.

Comment on lines +633 to +634
risk_level=ApprovalRiskLevel(
result.metadata.get("risk_level", "high"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Invalid risk_level metadata causes unhandled exception.

If result.metadata["risk_level"] contains an invalid value (not in ApprovalRiskLevel enum), ApprovalRiskLevel(...) raises ValueError. While the outer except Exception catches this, the error message won't clearly indicate a metadata validation issue.

Consider validating explicitly or using a safer fallback:

Safer risk_level handling
+                    raw_level = result.metadata.get("risk_level", "high")
+                    try:
+                        risk = ApprovalRiskLevel(raw_level)
+                    except ValueError:
+                        risk = ApprovalRiskLevel.HIGH
+
                     self._pending_escalations.append(
                         _EscalationInfo(
                             approval_id=str(result.metadata["approval_id"]),
                             tool_call_id=tool_call.id,
                             tool_name=tool.name,
                             action_type=tool.action_type,
-                            risk_level=ApprovalRiskLevel(
-                                result.metadata.get("risk_level", "high"),
-                            ),
+                            risk_level=risk,
                             reason="Agent requested human approval",
                         ),
                     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/invoker.py` around lines 633 - 634, The construction of
ApprovalRiskLevel from result.metadata can raise ValueError for invalid strings;
update the code that builds the object (the ApprovalRiskLevel(...) call using
result.metadata.get("risk_level", "high")) to validate or convert the metadata
value first: read risk_str = result.metadata.get("risk_level"), check if
risk_str is a valid member of ApprovalRiskLevel (or wrap
ApprovalRiskLevel(risk_str) in a try/except ValueError), and on failure log a
clear metadata validation warning and fall back to a safe default (e.g., "high")
before passing into ApprovalRiskLevel; ensure the change is applied where
ApprovalRiskLevel is instantiated so invalid metadata never raises an uncaught
ValueError.

Comment on lines +286 to +290
const originalIsAxiosError = (await import('axios')).default.isAxiosError
vi.spyOn((await import('axios')).default, 'isAxiosError').mockImplementation((err) => {
if (err === axiosError) return true
return originalIsAxiosError(err)
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify spy restoration hygiene in this test file (read-only).
rg -n "vi\\.spyOn\\(" web/src/__tests__/stores/approvals.test.ts -C2
rg -n "vi\\.(clearAllMocks|restoreAllMocks)" web/src/__tests__/stores/approvals.test.ts -C2

Repository: Aureliolo/synthorg

Length of output: 1006


🏁 Script executed:

cat -n web/src/__tests__/stores/approvals.test.ts | head -100

Repository: Aureliolo/synthorg

Length of output: 3921


🏁 Script executed:

cat -n web/src/__tests__/stores/approvals.test.ts | tail -100

Repository: Aureliolo/synthorg

Length of output: 4523


🏁 Script executed:

# Check for afterEach in the entire test file
rg -n "afterEach|afterAll" web/src/__tests__/stores/approvals.test.ts -C2

Repository: Aureliolo/synthorg

Length of output: 44


Add afterEach hook with vi.restoreAllMocks() and fix axios.isAxiosError spy at lines 316–319.

Line 42 calls vi.clearAllMocks(), which only clears call history—not spy implementations. The axios.isAxiosError spies can leak into later tests. Additionally, the spy at lines 316–319 returns false for unmatched errors instead of delegating to the original implementation, creating inconsistency with lines 286–290.

Two fixes required:

  1. Add afterEach block to restore all spies globally
  2. Update lines 316–319 to preserve the original isAxiosError implementation
Suggested fixes
-import { describe, it, expect, beforeEach, vi } from 'vitest'
+import { describe, it, expect, beforeEach, afterEach, vi } from 'vitest'
...
  beforeEach(() => {
    setActivePinia(createPinia())
    vi.clearAllMocks()
  })
+
+  afterEach(() => {
+    vi.restoreAllMocks()
+  })

At lines 316–319, update to match the pattern at lines 286–290:

+      const originalIsAxiosError = (await import('axios')).default.isAxiosError
       vi.spyOn((await import('axios')).default, 'isAxiosError').mockImplementation((err) => {
         if (err === axiosError) return true
-        return false
+        return originalIsAxiosError(err)
       })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/__tests__/stores/approvals.test.ts` around lines 286 - 290, Add an
afterEach hook that calls vi.restoreAllMocks() to fully restore spies between
tests (in addition to the existing vi.clearAllMocks()); also change the
axios.isAxiosError spy at the second location to capture the original
implementation (like the first instance) and delegate to it for non-matching
errors by storing (await import('axios')).default.isAxiosError in
originalIsAxiosError and calling originalIsAxiosError(err) when err !==
axiosError so the spy preserves the original behavior.

Comment on lines +96 to +117
case 'approval.approved':
case 'approval.rejected':
case 'approval.expired':
try {
const updated = await approvalsApi.getApproval(approvalId)
approvals.value = approvals.value.map((a) =>
a.id === approvalId ? updated : a,
)
} catch (err) {
if (axios.isAxiosError(err) && (err.response?.status === 404 || err.response?.status === 410)) {
// Item genuinely gone — remove from local list
const lengthBefore = approvals.value.length
approvals.value = approvals.value.filter((a) => a.id !== approvalId)
const removed = lengthBefore - approvals.value.length
if (removed > 0) {
total.value = Math.max(0, total.value - removed)
}
} else {
console.warn('Failed to fetch approval:', approvalId, err)
}
}
break
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Status-change handler updates items regardless of current filters.

When an approval is approved/rejected/expired, the code updates the item in-place (line 101-103) even if the updated status no longer matches activeFilters. This can leave non-matching items visible until the next manual refresh.

Consider filtering out items that no longer match after status changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/stores/approvals.ts` around lines 96 - 117, The handler for
'approval.approved'/'approval.rejected'/'approval.expired' currently replaces
the item in approvals.value regardless of current activeFilters; instead, after
fetching the updated approval via approvalsApi.getApproval(approvalId) check
whether the updated item matches the current activeFilters (use whatever
predicate/function you use elsewhere to test filter membership) and if it does
replace the item in approvals.value, otherwise remove it from approvals.value
and decrement total.value accordingly; keep the existing 404/410 removal branch
as-is and perform this filter-check/replace-or-remove logic in the same switch
case that currently updates approvals.value.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/unit/engine/test_approval_gate.py (1)

294-330: 🧹 Nitpick | 🔵 Trivial

Parameterize message-variant assertions to reduce duplication.

These three tests are structurally identical and are a good fit for a single parameterized test.

As per coding guidelines, "Prefer @pytest.mark.parametrize for testing similar cases."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_approval_gate.py` around lines 294 - 330, Replace the
three nearly identical tests in TestBuildResumeMessage with a single
`@pytest.mark.parametrize` test that iterates over cases for
ApprovalGate.build_resume_message; for each tuple include (approved, decided_by,
decision_reason, expected_substrings) and assert all expected substrings are in
the returned msg (e.g., "APPROVED"/"REJECTED", approval ID, actor, optional
reason, and markers like "USER-SUPPLIED REASON" or "untrusted data" when
applicable). Keep the call to ApprovalGate.build_resume_message("approval-1",
...) and name the param set to make failures readable; remove the three
individual test_* methods after adding the parameterized test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/unit/engine/test_approval_gate.py`:
- Around line 294-330: Replace the three nearly identical tests in
TestBuildResumeMessage with a single `@pytest.mark.parametrize` test that iterates
over cases for ApprovalGate.build_resume_message; for each tuple include
(approved, decided_by, decision_reason, expected_substrings) and assert all
expected substrings are in the returned msg (e.g., "APPROVED"/"REJECTED",
approval ID, actor, optional reason, and markers like "USER-SUPPLIED REASON" or
"untrusted data" when applicable). Keep the call to
ApprovalGate.build_resume_message("approval-1", ...) and name the param set to
make failures readable; remove the three individual test_* methods after adding
the parameterized test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e5d8def0-b19b-4e4e-b300-716b9404a82c

📥 Commits

Reviewing files that changed from the base of the PR and between 3b5068f and 0664529.

📒 Files selected for processing (1)
  • tests/unit/engine/test_approval_gate.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python version 3.14+ with PEP 649 native lazy annotations required.
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14.
Line length must be 88 characters, enforced by ruff.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Tests must use test-provider, test-small-001, etc.

Files:

  • tests/unit/engine/test_approval_gate.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Tests must use markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Test coverage minimum: 80% (enforced in CI).
Async tests: use asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.

Files:

  • tests/unit/engine/test_approval_gate.py
🧠 Learnings (1)
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases.

Applied to files:

  • tests/unit/engine/test_approval_gate.py
🧬 Code graph analysis (1)
tests/unit/engine/test_approval_gate.py (2)
src/ai_company/engine/approval_gate.py (5)
  • ApprovalGate (40-274)
  • should_park (71-90)
  • park_context (92-160)
  • resume_context (162-239)
  • build_resume_message (242-274)
src/ai_company/security/timeout/park_service.py (3)
  • ParkService (29-144)
  • park (37-106)
  • resume (108-144)
🔇 Additional comments (5)
tests/unit/engine/test_approval_gate.py (5)

1-13: LGTM!

Module setup correctly applies unit test markers and timeout. No from __future__ import annotations as required for Python 3.14+.


16-31: LGTM!

Clean test factory with sensible defaults. The noqa: PLR0913 is appropriate for test helper functions.


34-55: LGTM!

Well-designed fixtures that compose correctly. The parked_mock fixture's dependency on park_service ensures consistent mock state across tests.


58-70: LGTM!

Tests correctly verify the first-or-none semantics of should_park. Using a real ParkService() is acceptable here since the method doesn't interact with internal service state.


73-291: LGTM!

Comprehensive test coverage for park_context and resume_context flows. Good edge case handling:

  • Serialization/persistence error propagation (lines 144-182)
  • Parked record preservation on deserialization failure (lines 250-268)
  • Context returned despite delete failure (lines 270-291)

Comment thread src/ai_company/api/controllers/approvals.py Outdated
Comment thread src/ai_company/api/controllers/approvals.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an approval-gating workflow to the execution engine so SecOps ESCALATE verdicts and the new request_human_approval tool can “park” an execution pending a human decision, with supporting API/WebSocket and dashboard updates.

Changes:

  • Introduces ApprovalGate, EscalationInfo/ResumePayload, and loop/helper integration to terminate runs as PARKED when approval is required.
  • Adds RequestHumanApprovalTool and extends ToolInvoker + ToolRegistry to track and surface pending escalations.
  • Hardens approvals API (auth-bound requested_by, conflict-safe saves) and fixes frontend WS handling to use approval_id + re-fetch items.

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
web/src/stores/approvals.ts Updates WS approval event handling to use approval_id and re-fetch full items.
web/src/tests/stores/approvals.test.ts Expands WS event test coverage for new fetch-based behavior and error paths.
tests/unit/tools/test_invoker_escalation.py Adds unit tests for ToolInvoker.pending_escalations (SecOps + tool metadata).
tests/unit/tools/test_approval_tool.py Adds unit tests for RequestHumanApprovalTool behavior, validation, and failure handling.
tests/unit/observability/test_events.py Updates event discovery/markers to include new approval_gate domain module.
tests/unit/engine/test_loop_helpers_approval.py Tests loop helper parking behavior and error handling when escalations occur.
tests/unit/engine/test_approval_gate_models.py Tests Pydantic models for escalation + resume payloads.
tests/unit/engine/test_approval_gate.py Tests ApprovalGate park/resume behavior and resume message construction.
tests/unit/api/test_dto.py Updates DTO tests for new action_type format validation and removed fields.
tests/unit/api/controllers/test_approvals.py Updates controller tests for new create payload semantics.
src/ai_company/tools/registry.py Adds all_tools() to enumerate tool instances deterministically.
src/ai_company/tools/invoker.py Tracks escalation info from SecOps ESCALATE verdicts and tool parking metadata.
src/ai_company/tools/approval_tool.py Implements RequestHumanApprovalTool that creates ApprovalItem and signals parking.
src/ai_company/tools/init.py Exports RequestHumanApprovalTool.
src/ai_company/security/timeout/parked_context.py Allows task_id to be None for taskless agents.
src/ai_company/security/timeout/park_service.py Makes ParkService.park() accept optional task_id.
src/ai_company/observability/events/approval_gate.py Adds structured event constants for approval gate lifecycle.
src/ai_company/engine/react_loop.py Wires optional approval gate into ReAct loop tool-call execution path.
src/ai_company/engine/plan_execute_loop.py Wires optional approval gate into Plan/Execute tool-call execution path.
src/ai_company/engine/loop_helpers.py Adds escalation check after tool execution and parks context via approval gate.
src/ai_company/engine/approval_gate_models.py Adds frozen models for escalation tracking and resume decision payloads.
src/ai_company/engine/approval_gate.py Implements park/resume coordination and safe resume-message formatting.
src/ai_company/engine/agent_engine.py Integrates approval gate + tool registry augmentation; extracts security factory.
src/ai_company/engine/_security_factory.py Centralizes SecOps interceptor creation and registry augmentation with approval tool.
src/ai_company/engine/init.py Exposes approval gate types from engine package.
src/ai_company/api/ws_models.py Adds approval.resumed WS event type.
src/ai_company/api/state.py Stores optional ApprovalGate on AppState.
src/ai_company/api/dto.py Tightens DTO validation (action_type format, TTL bounds, comment max length).
src/ai_company/api/controllers/approvals.py Binds requested_by to auth user; uses conflict-safe saves; attempts resume trigger.
src/ai_company/api/approval_store.py Adds save_if_pending() and hardens on_expire callback error handling.
docs/design/engine.md Documents parking behavior on escalations in engine run flow.
README.md Updates status to reflect approval gate implementation.
CLAUDE.md Updates repo module descriptions and event-constant guidance to include approval gate.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +316 to +319
vi.spyOn((await import('axios')).default, 'isAxiosError').mockImplementation((err) => {
if (err === axiosError) return true
return false
})
Comment on lines +197 to +216
try:
result = await approval_gate.resume_context(approval_id)
if result is None:
return

_ctx, parked_id = result
resume_message = ApprovalGate.build_resume_message(
approval_id,
approved=approved,
decided_by=decided_by,
decision_reason=decision_reason,
)
logger.info(
APPROVAL_GATE_RESUME_TRIGGERED,
approval_id=approval_id,
parked_id=parked_id,
approved=approved,
decided_by=decided_by,
resume_message_length=len(resume_message),
)
Comment on lines +180 to +185
"""Best-effort resume of a parked agent context after a decision.

If an ``ApprovalGate`` is configured, loads the parked context,
builds a resume message, and publishes a WebSocket event.
Failures are logged at WARNING and never propagate to the caller.

Comment on lines +41 to +43
task_id: NotBlankStr | None = Field(
default=None, description="Task identifier (None for taskless agents)"
)
Comment on lines +79 to +86
const item = await approvalsApi.getApproval(approvalId)
if (activeFilters.value) {
// Filters are active — item may not match; just bump total
total.value++
} else {
approvals.value = [item, ...approvals.value]
total.value++
}
Comment on lines +285 to +290
// Patch axios.isAxiosError to recognize our mock
const originalIsAxiosError = (await import('axios')).default.isAxiosError
vi.spyOn((await import('axios')).default, 'isAxiosError').mockImplementation((err) => {
if (err === axiosError) return true
return originalIsAxiosError(err)
})
- _trigger_resume no longer calls resume_context() which would
  delete the parked record before a scheduler can consume it —
  now only logs the resume trigger for scheduler observation
- Remove dead APPROVAL_RESUMED WsEventType (never emitted)
- Remove unused ApprovalGate import from controller
Comment on lines +172 to +173
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3 except tuple syntax missing parentheses — SyntaxError across all new files

except MemoryError, RecursionError: is Python 2 syntax. In Python 3 the comma-separated form (without parentheses) is a SyntaxError; the tuple must be explicitly wrapped. Any module that contains this form will fail to import entirely.

The same bug appears throughout all the new files added in this PR:

  • src/ai_company/api/approval_store.py:172
  • src/ai_company/api/controllers/approvals.py:100
  • src/ai_company/engine/approval_gate.py:127, 149, 202, 216
  • src/ai_company/engine/loop_helpers.py:69, 114, 186, 303, 387
  • src/ai_company/tools/approval_tool.py:162, 238
  • src/ai_company/tools/invoker.py:212, 295, 639

Note: the existing (pre-PR) code in invoker.py already uses the correct form except (MemoryError, RecursionError) as exc: at lines 453, 539, 575, and 698 — confirming this is an inconsistency introduced only in the new additions.

Fix every occurrence by wrapping the two types in parentheses:

Suggested change
except MemoryError, RecursionError:
raise
except (MemoryError, RecursionError):
raise

The pattern to apply globally is:

# Wrong (Python 2)
except MemoryError, RecursionError:

# Correct (Python 3)
except (MemoryError, RecursionError):
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/api/approval_store.py
Line: 172-173

Comment:
**Python 3 `except` tuple syntax missing parentheses — SyntaxError across all new files**

`except MemoryError, RecursionError:` is Python 2 syntax. In Python 3 the comma-separated form (without parentheses) is a `SyntaxError`; the tuple must be explicitly wrapped. Any module that contains this form will fail to import entirely.

The same bug appears throughout all the new files added in this PR:
- `src/ai_company/api/approval_store.py:172`
- `src/ai_company/api/controllers/approvals.py:100`
- `src/ai_company/engine/approval_gate.py:127, 149, 202, 216`
- `src/ai_company/engine/loop_helpers.py:69, 114, 186, 303, 387`
- `src/ai_company/tools/approval_tool.py:162, 238`
- `src/ai_company/tools/invoker.py:212, 295, 639`

Note: the existing (pre-PR) code in `invoker.py` already uses the correct form `except (MemoryError, RecursionError) as exc:` at lines 453, 539, 575, and 698 — confirming this is an inconsistency introduced only in the new additions.

Fix every occurrence by wrapping the two types in parentheses:

```suggestion
                except (MemoryError, RecursionError):
                    raise
```

The pattern to apply globally is:
```python
# Wrong (Python 2)
except MemoryError, RecursionError:

# Correct (Python 3)
except (MemoryError, RecursionError):
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +372 to +378
else:
logger.debug(
APPROVAL_GATE_CONTEXT_PARKED,
approval_id=escalation.approval_id,
agent_id=agent_id,
note="No task_execution on context — task_id will be None",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVAL_GATE_CONTEXT_PARKED emitted before parking has occurred

This debug log fires when ctx.task_execution is None — i.e. before park_context() is even called. APPROVAL_GATE_CONTEXT_PARKED implies the context has been parked, so any log aggregation or alerting rule watching for approval_gate.context.parked will see a spurious event even if the subsequent park_context() call fails.

An informational constant such as APPROVAL_GATE_ESCALATION_DETECTED (or a dedicated "parking skipped task_id" diagnostic) is more accurate here. The successful-park event is already emitted inside ApprovalGate.park_context(), so this line also produces a duplicate on the happy path.

Suggested change
else:
logger.debug(
APPROVAL_GATE_CONTEXT_PARKED,
approval_id=escalation.approval_id,
agent_id=agent_id,
note="No task_execution on context — task_id will be None",
)
logger.debug(
APPROVAL_GATE_ESCALATION_DETECTED,
approval_id=escalation.approval_id,
agent_id=agent_id,
note="No task_execution on context — task_id will be None",
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/loop_helpers.py
Line: 372-378

Comment:
**`APPROVAL_GATE_CONTEXT_PARKED` emitted before parking has occurred**

This debug log fires when `ctx.task_execution` is `None` — i.e. before `park_context()` is even called. `APPROVAL_GATE_CONTEXT_PARKED` implies the context *has* been parked, so any log aggregation or alerting rule watching for `approval_gate.context.parked` will see a spurious event even if the subsequent `park_context()` call fails.

An informational constant such as `APPROVAL_GATE_ESCALATION_DETECTED` (or a dedicated "parking skipped task_id" diagnostic) is more accurate here. The successful-park event is already emitted inside `ApprovalGate.park_context()`, so this line also produces a duplicate on the happy path.

```suggestion
        logger.debug(
            APPROVAL_GATE_ESCALATION_DETECTED,
            approval_id=escalation.approval_id,
            agent_id=agent_id,
            note="No task_execution on context — task_id will be None",
        )
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +82 to +91
total.value++
} else {
approvals.value = [item, ...approvals.value]
total.value++
}
} catch (err) {
if (axios.isAxiosError(err) && (err.response?.status === 404 || err.response?.status === 410)) {
// Item genuinely gone — skip
} else {
console.warn('Failed to fetch approval:', approvalId, err)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total incremented unconditionally when filters are active, regardless of whether the new item matches them

When activeFilters.value is truthy, total.value++ is incremented for every approval.submitted event even though the newly submitted item may not satisfy the current filter (e.g. the user is filtering by status: 'rejected' while a new pending item arrives). This inflates the displayed count and can cause pagination to request pages that don't exist.

A stricter approach would be to skip the increment when the item's status/risk_level/action_type don't match the active filter criteria, or to do a quick attribute check on the already-fetched item object:

if (activeFilters.value) {
  // Only bump total if the item matches the active filter constraints
  const f = activeFilters.value
  const matches =
    (!f.status      || item.status     === f.status) &&
    (!f.risk_level  || item.risk_level === f.risk_level) &&
    (!f.action_type || item.action_type === f.action_type)
  if (matches) total.value++
} else {
  approvals.value = [item, ...approvals.value]
  total.value++
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: web/src/stores/approvals.ts
Line: 82-91

Comment:
**`total` incremented unconditionally when filters are active, regardless of whether the new item matches them**

When `activeFilters.value` is truthy, `total.value++` is incremented for every `approval.submitted` event even though the newly submitted item may not satisfy the current filter (e.g. the user is filtering by `status: 'rejected'` while a new `pending` item arrives). This inflates the displayed count and can cause pagination to request pages that don't exist.

A stricter approach would be to skip the increment when the item's status/risk_level/action_type don't match the active filter criteria, or to do a quick attribute check on the already-fetched `item` object:

```typescript
if (activeFilters.value) {
  // Only bump total if the item matches the active filter constraints
  const f = activeFilters.value
  const matches =
    (!f.status      || item.status     === f.status) &&
    (!f.risk_level  || item.risk_level === f.risk_level) &&
    (!f.action_type || item.action_type === f.action_type)
  if (matches) total.value++
} else {
  approvals.value = [item, ...approvals.value]
  total.value++
}
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: implement request_human_approval tool for approval workflow feat: implement approval workflow gates in engine

2 participants