feat: add approval workflow gates to TaskEngine#365
Conversation
Implement the approval gate layer that parks agent execution when human approval is required and provides infrastructure for resuming after a decision. Also adds the `request_human_approval` tool for agents to explicitly request approval. - Add EscalationInfo/ResumePayload models and approval gate events - Add ApprovalGate service (park/resume via ParkService) - Add RequestHumanApprovalTool (creates ApprovalItem, signals parking) - Add ToolInvoker escalation tracking (ESCALATE verdicts + tool metadata) - Integrate approval gate into execute_tool_calls and both loop types - Wire into AgentEngine, AppState, and approvals controller resume path Closes #258 Closes #259
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughThis PR implements approval workflow gates in the engine by introducing an ApprovalGate service that coordinates parking and resumption of agent execution pending human approval. It adds a request_human_approval tool, integrates approval checks into execution loops, updates the API layer for approval handling, and extends frontend WebSocket handling to fetch full approval items. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as Agent Loop
participant ToolInvoker as Tool Invoker
participant Gate as ApprovalGate
participant Store as ParkService
participant Repo as ParkedContextRepository
participant API as Approval API
Agent->>ToolInvoker: execute_tool_calls(with approval_gate)
ToolInvoker->>ToolInvoker: invoke tools
ToolInvoker->>Gate: check pending_escalations
Gate->>Agent: has escalations?
alt Escalation Detected
Agent->>Gate: should_park(escalations)
Gate-->>Agent: EscalationInfo
Agent->>Store: serialize context
Store-->>Agent: serialized context
Agent->>Repo: persist parked context
Repo-->>Agent: ParkedContext with approval_id
Agent-->>Agent: return PARKED ExecutionResult
else No Escalation
Agent-->>Agent: continue execution normally
end
sequenceDiagram
participant API as Approval API
participant Store as ApprovalStore
participant Gate as ApprovalGate
participant ParkSvc as ParkService
participant Repo as ParkedContextRepository
participant AgentLoop as Agent Loop
API->>Store: approve(approval_id, decision)
Store->>Store: save_if_pending(item)
Store-->>API: updated item or None
alt Approval Saved
API->>Gate: resume_context(approval_id)
Gate->>ParkSvc: deserialize context
Gate->>Repo: load parked context
Repo-->>Gate: ParkedContext
Gate-->>API: (AgentContext, decision)
API->>API: _trigger_resume with decision_reason
API->>AgentLoop: inject resume message
AgentLoop-->>AgentLoop: continue with approval decision
Gate->>Repo: delete parked context
else Conflict (Not PENDING)
API-->>API: raise ConflictError
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
📝 Coding Plan
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive human approval workflow into the TaskEngine, allowing agents to pause execution and request human intervention for sensitive actions. It establishes a dedicated approval gate service, integrates it deeply into the agent execution loops, and provides a new tool for agents to explicitly request approvals. The changes also include significant API and frontend enhancements to support this new functionality, ensuring secure and observable approval processes. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive approval workflow system, a significant and well-executed feature. The changes are extensive, touching the agent engine, API, tools, and frontend, and include the addition of an ApprovalGate service, a RequestHumanApprovalTool, and robust context parking/resuming logic. The implementation quality is high, with strong attention to security (e.g., preventing identity spoofing in the approvals API), robustness (e.g., improved WebSocket event handling), and observability. The inclusion of thorough unit tests for all new components is commendable. I have one suggestion regarding inconsistent logging of fatal errors in the ToolInvoker for improved diagnostics.
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
Inconsistent logging for non-recoverable errors. This except block for MemoryError and RecursionError no longer logs the exception before re-raising, but other methods in this class like _safe_deepcopy_args and _execute_tool still do. For consistent error handling and diagnostics, it's better to either log these fatal errors consistently in all methods where they are caught and re-raised, or handle logging only at a higher level. I'd suggest restoring the logging here to match the behavior of the other methods.
except (MemoryError, RecursionError) as exc:
logger.exception(
TOOL_INVOKE_NON_RECOVERABLE,
tool_call_id=tool_call.id,
tool_name=tool_call.name,
error=f"{type(exc).__name__}: {exc}",
)
raise
Greptile SummaryThis PR wires approval workflow gates into the Critical issue:
Other findings:
Confidence Score: 1/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent
participant ToolInvoker
participant ApprovalGate
participant ParkService
participant ParkedContextRepo
participant ApprovalStore
participant ReactLoop
participant HumanReviewer
participant ApprovalsController
Note over Agent,ReactLoop: Escalation detected during tool execution
Agent->>ToolInvoker: invoke(tool_call)
alt SecOps ESCALATE verdict
ToolInvoker->>ToolInvoker: append to _pending_escalations (approval_id from verdict)
else RequestHumanApprovalTool
ToolInvoker->>ApprovalStore: add(ApprovalItem)
ToolInvoker->>ToolInvoker: _track_parking_metadata() → append EscalationInfo
end
ReactLoop->>ToolInvoker: pending_escalations
ReactLoop->>ApprovalGate: should_park(escalations)
ApprovalGate-->>ReactLoop: EscalationInfo (first escalation)
ReactLoop->>ApprovalGate: park_context(escalation, context, agent_id, task_id)
ApprovalGate->>ParkService: park(context, approval_id, ...)
ParkService-->>ApprovalGate: ParkedContext (serialized JSON)
ApprovalGate->>ParkedContextRepo: save(parked)
ApprovalGate-->>ReactLoop: ParkedContext
ReactLoop-->>Agent: ExecutionResult(PARKED, metadata={approval_id})
Note over HumanReviewer,ApprovalsController: Human makes a decision
HumanReviewer->>ApprovalsController: POST /approvals/{id}/approve or /reject
ApprovalsController->>ApprovalStore: save_if_pending(updated_item)
ApprovalsController->>ApprovalsController: _log_approval_decision()
ApprovalsController->>ApprovalsController: _trigger_resume() — logs intent only
Note right of ApprovalsController: Future scheduler will call<br/>ApprovalGate.resume_context()<br/>(not yet implemented)
Note over ApprovalGate,ParkedContextRepo: Resume path (future scheduler)
ApprovalGate->>ParkedContextRepo: get_by_approval(approval_id)
ParkedContextRepo-->>ApprovalGate: ParkedContext
ApprovalGate->>ParkService: resume(parked)
ParkService-->>ApprovalGate: AgentContext
ApprovalGate->>ParkedContextRepo: delete(parked.id)
ApprovalGate-->>Agent: (AgentContext, parked_id) + resume_message injected
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/ai_company/api/approval_store.py
Line: 172-173
Comment:
**Python 3 `except` tuple syntax missing parentheses — SyntaxError across all new files**
`except MemoryError, RecursionError:` is Python 2 syntax. In Python 3 the comma-separated form (without parentheses) is a `SyntaxError`; the tuple must be explicitly wrapped. Any module that contains this form will fail to import entirely.
The same bug appears throughout all the new files added in this PR:
- `src/ai_company/api/approval_store.py:172`
- `src/ai_company/api/controllers/approvals.py:100`
- `src/ai_company/engine/approval_gate.py:127, 149, 202, 216`
- `src/ai_company/engine/loop_helpers.py:69, 114, 186, 303, 387`
- `src/ai_company/tools/approval_tool.py:162, 238`
- `src/ai_company/tools/invoker.py:212, 295, 639`
Note: the existing (pre-PR) code in `invoker.py` already uses the correct form `except (MemoryError, RecursionError) as exc:` at lines 453, 539, 575, and 698 — confirming this is an inconsistency introduced only in the new additions.
Fix every occurrence by wrapping the two types in parentheses:
```suggestion
except (MemoryError, RecursionError):
raise
```
The pattern to apply globally is:
```python
# Wrong (Python 2)
except MemoryError, RecursionError:
# Correct (Python 3)
except (MemoryError, RecursionError):
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/ai_company/engine/loop_helpers.py
Line: 372-378
Comment:
**`APPROVAL_GATE_CONTEXT_PARKED` emitted before parking has occurred**
This debug log fires when `ctx.task_execution` is `None` — i.e. before `park_context()` is even called. `APPROVAL_GATE_CONTEXT_PARKED` implies the context *has* been parked, so any log aggregation or alerting rule watching for `approval_gate.context.parked` will see a spurious event even if the subsequent `park_context()` call fails.
An informational constant such as `APPROVAL_GATE_ESCALATION_DETECTED` (or a dedicated "parking skipped task_id" diagnostic) is more accurate here. The successful-park event is already emitted inside `ApprovalGate.park_context()`, so this line also produces a duplicate on the happy path.
```suggestion
logger.debug(
APPROVAL_GATE_ESCALATION_DETECTED,
approval_id=escalation.approval_id,
agent_id=agent_id,
note="No task_execution on context — task_id will be None",
)
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: web/src/stores/approvals.ts
Line: 82-91
Comment:
**`total` incremented unconditionally when filters are active, regardless of whether the new item matches them**
When `activeFilters.value` is truthy, `total.value++` is incremented for every `approval.submitted` event even though the newly submitted item may not satisfy the current filter (e.g. the user is filtering by `status: 'rejected'` while a new `pending` item arrives). This inflates the displayed count and can cause pagination to request pages that don't exist.
A stricter approach would be to skip the increment when the item's status/risk_level/action_type don't match the active filter criteria, or to do a quick attribute check on the already-fetched `item` object:
```typescript
if (activeFilters.value) {
// Only bump total if the item matches the active filter constraints
const f = activeFilters.value
const matches =
(!f.status || item.status === f.status) &&
(!f.risk_level || item.risk_level === f.risk_level) &&
(!f.action_type || item.action_type === f.action_type)
if (matches) total.value++
} else {
approvals.value = [item, ...approvals.value]
total.value++
}
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 91823f4 |
There was a problem hiding this comment.
Pull request overview
This PR adds an approval-gating mechanism to the agent execution engine so SecOps “ESCALATE” verdicts and explicit request_human_approval tool calls can park execution pending a human decision, along with API/UI plumbing and observability events to support the workflow.
Changes:
- Introduces
ApprovalGate+ models, and integrates escalation/parking checks into ReAct and Plan-and-Execute loops. - Adds
RequestHumanApprovalTooland extendsToolInvokerto track and exposepending_escalations. - Hardens approvals API identity handling and fixes frontend WebSocket approval updates by fetching full items via API.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| web/src/stores/approvals.ts | Updates WS approval event handling to use approval_id and fetch full items from API |
| tests/unit/tools/test_invoker_escalation.py | Adds tests for escalation tracking in ToolInvoker |
| tests/unit/tools/test_approval_tool.py | Adds tests for RequestHumanApprovalTool behavior/validation |
| tests/unit/observability/test_events.py | Registers approval_gate module in event discovery tests |
| tests/unit/engine/test_loop_helpers_approval.py | Adds tests for loop helper parking behavior when escalations occur |
| tests/unit/engine/test_approval_gate_models.py | Adds tests for EscalationInfo / ResumePayload validation |
| tests/unit/engine/test_approval_gate.py | Adds tests for ApprovalGate park/resume and resume-message formatting |
| src/ai_company/tools/registry.py | Adds ToolRegistry.all_tools() for enumerating tool instances |
| src/ai_company/tools/invoker.py | Tracks escalations (SecOps + parking metadata) via pending_escalations |
| src/ai_company/tools/approval_tool.py | Implements RequestHumanApprovalTool (creates ApprovalItem + parking metadata) |
| src/ai_company/tools/init.py | Exports RequestHumanApprovalTool |
| src/ai_company/observability/events/approval_gate.py | Adds approval-gate lifecycle event constants |
| src/ai_company/engine/react_loop.py | Wires optional ApprovalGate into ReAct loop tool execution |
| src/ai_company/engine/plan_execute_loop.py | Wires optional ApprovalGate into Plan-and-Execute step tool execution |
| src/ai_company/engine/loop_helpers.py | Adds parking path returning TerminationReason.PARKED after tool calls |
| src/ai_company/engine/approval_gate_models.py | Adds frozen Pydantic models for escalation + resume payload |
| src/ai_company/engine/approval_gate.py | Adds ApprovalGate service (park/resume + resume-message builder) |
| src/ai_company/engine/agent_engine.py | Constructs ApprovalGate, injects into default loop, registers approval tool when configured |
| src/ai_company/engine/init.py | Exposes approval-gate types from the engine package |
| src/ai_company/api/state.py | Adds approval_gate to app state |
| src/ai_company/api/controllers/approvals.py | Hardens requested_by binding to auth user; refactors decision logging; broadens WS publish error handling |
| src/ai_company/api/approval_store.py | Wraps on_expire callback in exception handling |
| docs/design/engine.md | Documents new approval-gate parking behavior in the engine pipeline |
| README.md | Updates status text to reflect approval gates being implemented |
| CLAUDE.md | Updates repo/module documentation to mention approval gate + approval tool |
Comments suppressed due to low confidence (1)
src/ai_company/api/controllers/approvals.py:105
_publish_approval_event()now catches a broadException. Ifasyncio.CancelledErroris anExceptionin the runtime, this will swallow request cancellation and continue processing. Consider explicitly re-raising cancellation (and any other control-flow exceptions you treat as non-recoverable) alongsideMemoryError/RecursionError.
try:
channels_plugin = _get_channels_plugin(request)
channels_plugin.publish(
event.model_dump_json(),
channels=[CHANNEL_APPROVALS],
)
except MemoryError, RecursionError:
raise
except Exception:
logger.warning(
API_APPROVAL_PUBLISH_FAILED,
approval_id=item.id,
event_type=event_type.value,
exc_info=True,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| risk_level=verdict.risk_level, | ||
| reason=verdict.reason, | ||
| ), | ||
| ) |
| turns, | ||
| metadata={ | ||
| "approval_id": escalation.approval_id, | ||
| "parking_failed": str(parking_failed), |
| parking is needed. If so, the context is serialized via `ParkService`, | ||
| persisted, and the loop returns a `PARKED` result. |
| self._park_service = park_service | ||
| self._parked_context_repo = parked_context_repo | ||
| logger.debug( | ||
| APPROVAL_GATE_ESCALATION_DETECTED, |
| logger.debug( | ||
| APPROVAL_GATE_ESCALATION_DETECTED, | ||
| action_type=action_type, | ||
| note="No risk classifier — defaulting to HIGH", | ||
| ) |
| // Item may have been deleted — remove from local list | ||
| approvals.value = approvals.value.filter((a) => a.id !== approvalId) | ||
| total.value = Math.max(0, total.value - 1) |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/observability/test_events.py (1)
175-225: 🧹 Nitpick | 🔵 TrivialAdd direct assertions for the new
approval_gatemodule.This only proves the module is discoverable. If
src/ai_company/observability/events/approval_gate.pyships with a missing constant or a misspelled value, this file still passes as long as the module exists and the remaining strings are unique. Please add atest_approval_gate_events_exist()block like the other domains.Based on learnings, "Event names must always use constants from domain-specific modules under
ai_company.observability.events."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/observability/test_events.py` around lines 175 - 225, Add a new unit test in tests/unit/observability/test_events.py named test_approval_gate_events_exist that imports ai_company.observability.events.approval_gate and asserts the specific expected event constants (or their string values) are present and correct (similar style to other domain-specific tests), so that existence and correctness of constants in approval_gate.py are validated rather than merely the module being discoverable; reference the existing test_all_domain_modules_discovered for placement and use approval_gate module symbols to construct the assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/api/approval_store.py`:
- Around line 151-160: The except clause currently catches the built-in
MemoryError but not the project-specific MemoryError defined in
ai_company.memory.errors, so re-raised project errors fall into the generic
Exception handler; import the project error (e.g., from ai_company.memory.errors
import MemoryError as ProjectMemoryError) and update the except line to
explicitly re-raise both RecursionError and the project MemoryError (or include
both types in the except tuple used for re-raising) before the generic Exception
that logs via logger.exception(API_APPROVAL_EXPIRED, approval_id=item.id,
note="on_expire callback failed"); ensure the symbol _on_expire remains where
called and no error types are swallowed.
In `@src/ai_company/engine/agent_engine.py`:
- Around line 150-152: The default approval gate created by
_make_approval_gate() is missing the parked_context_repo so
ApprovalGate.resume_context() is a no-op and PARKED runs can't be resumed;
update the factory so the ApprovalGate created in _make_approval_gate (and any
place that constructs ApprovalGate in this class) receives the persistent
parked-context storage (self._approval_store or its parked_context_repo
attribute) as the parked_context_repo argument when constructing ApprovalGate,
ensuring resume_context() works and PARKED runs are resumable.
In `@src/ai_company/engine/approval_gate.py`:
- Around line 214-231: The current resume flow ignores the boolean return from
ParkedContextRepository.delete() and always logs APPROVAL_GATE_CONTEXT_RESUMED;
update the code in the method containing the await
self._parked_context_repo.delete(parked.id) call (e.g., the resume handler in
approval_gate.py) to capture the delete result, and if it returns False treat it
as a cleanup failure: log/exception with APPROVAL_GATE_RESUME_DELETE_FAILED
including approval_id and parked_id (and do not log
APPROVAL_GATE_CONTEXT_RESUMED), otherwise proceed to log
APPROVAL_GATE_CONTEXT_RESUMED and return (context, parked.id); preserve existing
exception handling for MemoryError/RecursionError.
In `@src/ai_company/engine/loop_helpers.py`:
- Around line 345-406: The metadata currently stores parking_failed as a string
in _park_for_approval; change it to store a boolean instead (i.e.,
parking_failed: parking_failed) when calling build_result so downstream
consumers receive a true boolean under the "parking_failed" metadata key; ensure
any serialization/consumption points that expect a string are updated to accept
a boolean (references: function _park_for_approval, local variable
parking_failed, build_result call, metadata key "parking_failed").
In `@src/ai_company/tools/approval_tool.py`:
- Around line 96-176: The execute() method is too long and should be split into
small helpers: keep execute() to orchestrate only (extract validation, item
construction, persistence, and result shaping into private methods).
Specifically, replace the inline logic with calls to helpers such as
_validate_action_type(action_type) (already exists),
_build_approval_item(action_type, title, description) to construct the
ApprovalItem (move the ApprovalItem import and all field population there),
_persist_approval_item(item) to run the try/except around await
self._approval_store.add(item) (re-raise MemoryError/RecursionError and on
Exception call logger.exception(APPROVAL_GATE_ESCALATION_FAILED, ...) and return
an error ToolExecutionResult), and _format_success_result(approval_id,
action_type, risk_level, title) to return the success ToolExecutionResult and
log APPROVAL_GATE_ESCALATION_DETECTED. After extraction, execute() should call
these helpers and return early on validation/persistence errors so the function
body remains under the 50-line limit.
In `@tests/unit/engine/test_approval_gate.py`:
- Around line 51-274: Tests repeat setup of MagicMock/AsyncMock objects
(park_service, repo, parked) across many test coroutines in TestParking and
TestResumeContext; extract these into pytest fixtures to DRY the test file.
Create fixtures named e.g. park_service, parked_mock (or parked), and
parked_repo (or repo) that return the configured MagicMock/AsyncMock instances
and use them by adding them as parameters to tests that currently call
ApprovalGate(...)/park_context()/resume_context(), updating tests like
test_calls_park_service, test_persists_to_repo_when_available,
test_successful_resume, etc., to accept the fixtures and remove the repeated
wiring inside each test while preserving any test-specific side_effects or
return overrides.
- Around line 279-310: The three tests test_approved_without_reason,
test_rejected_with_reason, and test_approved_with_reason duplicate the same
structure calling ApprovalGate.build_resume_message and asserting substrings;
refactor them into a single `@pytest.mark.parametrize` test that iterates over
cases (id, approved boolean, decided_by, decision_reason, expected_substrings)
and calls ApprovalGate.build_resume_message once per case, asserting each
expected substring is in the returned msg; update or remove the original three
test functions and keep descriptive parameter values to preserve coverage.
In `@tests/unit/engine/test_loop_helpers_approval.py`:
- Line 3: The test unnecessarily uses PropertyMock to stub pending_escalations;
replace type(invoker).pending_escalations =
PropertyMock(return_value=escalations) with a simple instance attribute
assignment (e.g., invoker.pending_escalations = escalations or setattr(invoker,
"pending_escalations", escalations)) in the helper and the other occurrence
around lines 55–64, and remove PropertyMock from the imports in the file
(adjusting the import list to only AsyncMock, MagicMock, patch).
In `@web/src/stores/approvals.ts`:
- Around line 72-80: The catch currently treats any error from
approvalsApi.getApproval as a delete; change it to inspect the error response
and only remove/ignore the item when the error is a definitive "not found" (HTTP
404 or 410) — for other errors (timeouts, 5xx, network) leave approvals.value
and total.value untouched and trigger the existing list refresh/fallback path
instead of decrementing state; locate both call sites to
approvalsApi.getApproval (the try/catch around getApproval and the similar block
at lines handling the other event) and update their catch handlers to branch on
error status (404/410 vs others) so only 404/410 perform the remove/ignore
behavior and other errors preserve current state and perform a refresh.
- Around line 71-77: When a websocket-driven update fetches an approval via
approvalsApi.getApproval(approvalId), do not unconditionally prepend it when
activeFilters.value is falsy; instead run the current filter predicate against
the fetched item (use the same filter logic used to build approvals.value) and
insert the item only if it matches, or remove it from approvals.value if it no
longer matches, and update total.value accordingly (increment when inserting a
new matching item, decrement when removing a previously-present matching item).
Apply the same re-filter/insert-or-remove logic to the other WS-handling block
(the code around the approvals update at lines similar to 86-90) so both
websocket update paths respect activeFilters.value (including empty objects) and
keep total.value in sync.
---
Outside diff comments:
In `@tests/unit/observability/test_events.py`:
- Around line 175-225: Add a new unit test in
tests/unit/observability/test_events.py named test_approval_gate_events_exist
that imports ai_company.observability.events.approval_gate and asserts the
specific expected event constants (or their string values) are present and
correct (similar style to other domain-specific tests), so that existence and
correctness of constants in approval_gate.py are validated rather than merely
the module being discoverable; reference the existing
test_all_domain_modules_discovered for placement and use approval_gate module
symbols to construct the assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: afb7d111-796d-4bf3-800d-74573c983a99
📒 Files selected for processing (25)
CLAUDE.mdREADME.mddocs/design/engine.mdsrc/ai_company/api/approval_store.pysrc/ai_company/api/controllers/approvals.pysrc/ai_company/api/state.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/approval_gate.pysrc/ai_company/engine/approval_gate_models.pysrc/ai_company/engine/loop_helpers.pysrc/ai_company/engine/plan_execute_loop.pysrc/ai_company/engine/react_loop.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/tools/__init__.pysrc/ai_company/tools/approval_tool.pysrc/ai_company/tools/invoker.pysrc/ai_company/tools/registry.pytests/unit/engine/test_approval_gate.pytests/unit/engine/test_approval_gate_models.pytests/unit/engine/test_loop_helpers_approval.pytests/unit/observability/test_events.pytests/unit/tools/test_approval_tool.pytests/unit/tools/test_invoker_escalation.pyweb/src/stores/approvals.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Agent
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Greptile Review
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (8)
web/src/**/*.{vue,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Web dashboard: Vue 3 + PrimeVue + Tailwind CSS, organized by feature in src/components/, src/stores/, src/views/. Enforce with ESLint and vue-tsc type-checking.
Files:
web/src/stores/approvals.ts
web/src/stores/**/*.ts
📄 CodeRabbit inference engine (CLAUDE.md)
Frontend state management: Pinia stores organized by feature in src/stores/
Files:
web/src/stores/approvals.ts
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python version 3.14+ with PEP 649 native lazy annotations required.
Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax:except A, B:(no parentheses) — ruff enforces this on Python 3.14.
Line length must be 88 characters, enforced by ruff.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001. Tests must usetest-provider,test-small-001, etc.
Files:
tests/unit/engine/test_loop_helpers_approval.pytests/unit/observability/test_events.pytests/unit/tools/test_invoker_escalation.pysrc/ai_company/engine/plan_execute_loop.pysrc/ai_company/engine/approval_gate.pysrc/ai_company/engine/loop_helpers.pysrc/ai_company/api/state.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/tools/invoker.pysrc/ai_company/engine/__init__.pysrc/ai_company/tools/__init__.pytests/unit/tools/test_approval_tool.pysrc/ai_company/api/approval_store.pysrc/ai_company/tools/registry.pysrc/ai_company/engine/approval_gate_models.pytests/unit/engine/test_approval_gate_models.pysrc/ai_company/tools/approval_tool.pysrc/ai_company/api/controllers/approvals.pytests/unit/engine/test_approval_gate.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/react_loop.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Tests must use markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Test coverage minimum: 80% (enforced in CI).
Async tests: useasyncio_mode = "auto"— no manual@pytest.mark.asyncioneeded.
Test timeout: 30 seconds per test.
Prefer@pytest.mark.parametrizefor testing similar cases.
Files:
tests/unit/engine/test_loop_helpers_approval.pytests/unit/observability/test_events.pytests/unit/tools/test_invoker_escalation.pytests/unit/tools/test_approval_tool.pytests/unit/engine/test_approval_gate_models.pytests/unit/engine/test_approval_gate.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: All public functions must have type hints. Enforce via mypy strict mode.
Google-style docstrings required on public classes and functions, enforced by ruff D rules.
Create new objects instead of mutating existing ones. For non-Pydantic internal collections, usecopy.deepcopy()at construction plusMappingProxyTypewrapping for read-only enforcement.
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention and usecopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 withBaseModel,model_validator,computed_field,ConfigDict. Use@computed_fieldfor derived values instead of storing redundant fields (e.g.,TokenUsage.total_tokens).
UseNotBlankStrfromcore.typesfor all identifier/name fields in Pydantic models, including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants, instead of manual whitespace validators.
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow exceptions.
Validate at system boundaries: user input, external APIs, config files.
NEVER useimport logging,logging.getLogger(), orprint()in application code.
Always useloggeras the variable name for loggers (not_logger, notlog).
Event names must always use constants from domain-specific modules underai_company.observability.events(e.g.,PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget). Import dire...
Files:
src/ai_company/engine/plan_execute_loop.pysrc/ai_company/engine/approval_gate.pysrc/ai_company/engine/loop_helpers.pysrc/ai_company/api/state.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/tools/invoker.pysrc/ai_company/engine/__init__.pysrc/ai_company/tools/__init__.pysrc/ai_company/api/approval_store.pysrc/ai_company/tools/registry.pysrc/ai_company/engine/approval_gate_models.pysrc/ai_company/tools/approval_tool.pysrc/ai_company/api/controllers/approvals.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/react_loop.py
src/ai_company/**/[!_]*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Every module with business logic MUST have:
from ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Files:
src/ai_company/engine/plan_execute_loop.pysrc/ai_company/engine/approval_gate.pysrc/ai_company/engine/loop_helpers.pysrc/ai_company/api/state.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/tools/invoker.pysrc/ai_company/api/approval_store.pysrc/ai_company/tools/registry.pysrc/ai_company/engine/approval_gate_models.pysrc/ai_company/tools/approval_tool.pysrc/ai_company/api/controllers/approvals.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/react_loop.py
docs/design/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
When approved deviations occur, update the relevant
docs/design/page to reflect the new reality.
Files:
docs/design/engine.md
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Documentation source in
docs/built with Zensical. Design spec indocs/design/(7 pages). Architecture indocs/architecture/. Roadmap indocs/roadmap/. Security indocs/security.md.
Files:
docs/design/engine.md
🧠 Learnings (9)
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Event names must always use constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
tests/unit/observability/test_events.pysrc/ai_company/engine/loop_helpers.pysrc/ai_company/observability/events/approval_gate.pyCLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/ai_company/**/[!_]*.py : Every module with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`
Applied to files:
src/ai_company/api/state.pyCLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Structured logging: always use `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : NEVER use `import logging`, `logging.getLogger()`, or `print()` in application code.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Always use `logger` as the variable name for loggers (not `_logger`, not `log`).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : DEBUG logging for object creation, internal flow, and entry/exit of key functions.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Pure data models, enums, and re-exports do NOT need logging.
Applied to files:
CLAUDE.md
🧬 Code graph analysis (18)
web/src/stores/approvals.ts (1)
web/src/api/types.ts (1)
WsEvent(519-524)
tests/unit/engine/test_loop_helpers_approval.py (7)
src/ai_company/engine/approval_gate.py (3)
ApprovalGate(39-263)should_park(71-90)park_context(92-161)src/ai_company/engine/approval_gate_models.py (1)
EscalationInfo(14-33)src/ai_company/engine/loop_helpers.py (1)
execute_tool_calls(240-342)src/ai_company/engine/loop_protocol.py (2)
ExecutionResult(79-140)TerminationReason(28-36)src/ai_company/providers/enums.py (1)
FinishReason(15-22)src/ai_company/providers/models.py (3)
CompletionResponse(257-306)ToolCall(96-119)ToolResult(122-135)src/ai_company/engine/run_result.py (1)
termination_reason(64-66)
tests/unit/tools/test_invoker_escalation.py (5)
src/ai_company/providers/models.py (1)
ToolCall(96-119)src/ai_company/security/models.py (2)
SecurityVerdict(35-67)SecurityVerdictType(23-32)src/ai_company/tools/base.py (3)
BaseTool(57-184)ToolExecutionResult(25-54)description(138-140)src/ai_company/tools/invoker.py (5)
ToolInvoker(73-756)registry(123-125)pending_escalations(128-136)invoke(343-365)invoke_all(694-756)src/ai_company/tools/registry.py (1)
ToolRegistry(30-126)
src/ai_company/engine/plan_execute_loop.py (2)
src/ai_company/api/state.py (1)
approval_gate(139-141)src/ai_company/engine/approval_gate.py (1)
ApprovalGate(39-263)
src/ai_company/engine/approval_gate.py (5)
src/ai_company/persistence/repositories.py (1)
ParkedContextRepository(199-267)src/ai_company/security/timeout/park_service.py (3)
ParkService(29-144)park(37-106)resume(108-144)src/ai_company/security/timeout/parked_context.py (1)
ParkedContext(19-64)src/ai_company/engine/approval_gate_models.py (1)
EscalationInfo(14-33)src/ai_company/engine/context.py (1)
AgentContext(87-307)
src/ai_company/api/state.py (1)
src/ai_company/engine/approval_gate.py (1)
ApprovalGate(39-263)
src/ai_company/tools/invoker.py (2)
src/ai_company/core/enums.py (1)
ApprovalRiskLevel(443-449)src/ai_company/engine/approval_gate_models.py (1)
EscalationInfo(14-33)
src/ai_company/engine/__init__.py (3)
src/ai_company/api/state.py (1)
approval_gate(139-141)src/ai_company/engine/approval_gate.py (1)
ApprovalGate(39-263)src/ai_company/engine/approval_gate_models.py (2)
EscalationInfo(14-33)ResumePayload(36-51)
src/ai_company/tools/__init__.py (1)
src/ai_company/tools/approval_tool.py (1)
RequestHumanApprovalTool(31-208)
tests/unit/tools/test_approval_tool.py (3)
src/ai_company/api/approval_store.py (3)
ApprovalStore(27-162)get(61-73)add(42-59)src/ai_company/security/timeout/risk_tier_classifier.py (1)
DefaultRiskTierClassifier(62-101)src/ai_company/tools/approval_tool.py (2)
RequestHumanApprovalTool(31-208)execute(96-176)
src/ai_company/api/approval_store.py (2)
src/ai_company/api/app.py (1)
_on_expire(74-99)src/ai_company/memory/errors.py (1)
MemoryError(13-14)
src/ai_company/tools/registry.py (1)
src/ai_company/tools/base.py (2)
BaseTool(57-184)name(123-125)
src/ai_company/engine/approval_gate_models.py (1)
src/ai_company/tools/base.py (1)
action_type(133-135)
tests/unit/engine/test_approval_gate_models.py (1)
src/ai_company/engine/approval_gate_models.py (2)
EscalationInfo(14-33)ResumePayload(36-51)
src/ai_company/tools/approval_tool.py (5)
src/ai_company/core/enums.py (1)
ToolCategory(294-308)src/ai_company/tools/base.py (6)
BaseTool(57-184)ToolExecutionResult(25-54)description(138-140)category(128-130)action_type(133-135)parameters_schema(143-151)src/ai_company/api/approval_store.py (2)
ApprovalStore(27-162)add(42-59)src/ai_company/security/timeout/risk_tier_classifier.py (1)
DefaultRiskTierClassifier(62-101)src/ai_company/memory/errors.py (1)
MemoryError(13-14)
tests/unit/engine/test_approval_gate.py (3)
src/ai_company/engine/approval_gate.py (5)
ApprovalGate(39-263)should_park(71-90)park_context(92-161)resume_context(163-231)build_resume_message(234-263)src/ai_company/engine/approval_gate_models.py (1)
EscalationInfo(14-33)src/ai_company/security/timeout/park_service.py (3)
ParkService(29-144)park(37-106)resume(108-144)
src/ai_company/engine/agent_engine.py (6)
src/ai_company/engine/approval_gate.py (1)
ApprovalGate(39-263)src/ai_company/security/timeout/park_service.py (1)
ParkService(29-144)src/ai_company/core/agent.py (1)
AgentIdentity(266-342)src/ai_company/tools/registry.py (2)
ToolRegistry(30-126)all_tools(102-104)src/ai_company/tools/approval_tool.py (1)
RequestHumanApprovalTool(31-208)src/ai_company/security/timeout/risk_tier_classifier.py (1)
DefaultRiskTierClassifier(62-101)
src/ai_company/engine/react_loop.py (3)
tests/unit/engine/conftest.py (1)
engine(449-460)src/ai_company/api/state.py (1)
approval_gate(139-141)src/ai_company/engine/approval_gate.py (1)
ApprovalGate(39-263)
🪛 LanguageTool
CLAUDE.md
[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...
(EG_NO_COMMA)
🔇 Additional comments (24)
src/ai_company/engine/__init__.py (1)
9-10: LGTM!The new public API exports for
ApprovalGate,EscalationInfo, andResumePayloadare correctly imported and added to__all__in alphabetical order, maintaining consistency with the existing export pattern.docs/design/engine.md (1)
450-467: LGTM!The documentation accurately captures the new escalation-handling flow and PARKED termination behavior. The description aligns with the implementation in
loop_helpers.pyandApprovalGate, including the interaction withToolInvoker.pending_escalations,ParkService, and the approval-timeout policy.tests/unit/tools/test_approval_tool.py (4)
1-53: LGTM!Well-structured test file with proper pytest markers (
unit,timeout(30)). The fixtures correctly set up theApprovalStoreandRequestHumanApprovalToolinstances. TheTestToolCreationclass comprehensively validates tool properties including name, action_type, and parameters schema.
56-144: LGTM!Thorough execution tests covering the happy path, metadata validation, default risk level, content verification, and the
task_id=Nonescenario. The assertions correctly verify both theToolExecutionResultmetadata and the persistedApprovalItemstate.
177-204: LGTM!Good use of
@pytest.mark.parametrizeto cover multiple invalidaction_typeformats efficiently. The test cases cover edge cases like missing category, missing action, extra colons, and whitespace-only values.
220-256: LGTM!The error handling test correctly verifies graceful degradation when the store fails. The monkeypatch approach is appropriate for simulating store failures in unit tests.
src/ai_company/api/controllers/approvals.py (4)
97-105: LGTM!Correct use of PEP 758 except syntax (
except MemoryError, RecursionError:) and proper error handling pattern — re-raising critical errors while logging warnings for recoverable failures.
148-166: LGTM!Clean helper function that centralizes approval decision logging with appropriate event constants. The docstring correctly notes that context resumption is out of scope for this controller.
260-278: Good security hardening:requested_bybound to authenticated user.The change correctly enforces that
requested_byis populated from the authenticated user's username rather than the request body, preventing potential spoofing. TheUnauthorizedErroris appropriately raised when authentication is missing.
367-371: LGTM!Correctly delegates to
_log_approval_decisionfor consistent observability across approval decisions.src/ai_company/engine/approval_gate_models.py (2)
14-33: LGTM!Well-designed frozen Pydantic model with proper use of
NotBlankStrfor all identifier fields andApprovalRiskLevelenum for type safety. The docstring clearly documents each attribute's purpose.
36-51: LGTM!Clean model design for approval decision payloads. The optional
decision_reasonfield correctly usesNotBlankStr | Noneto ensure non-blank values when provided.src/ai_company/engine/loop_helpers.py (3)
11-13: LGTM!Correct import of event constant from the domain-specific observability module. Based on learnings: "Event names must always use constants from domain-specific modules under
ai_company.observability.events".
240-248: LGTM!Clean API extension with a keyword-only
approval_gateparameter defaulting toNone. This preserves backward compatibility while enabling the escalation parking flow when configured.
329-342: LGTM!The escalation check is correctly placed after tool execution completes. The flow properly delegates to
_park_for_approvalwhen an escalation warrants parking.src/ai_company/engine/react_loop.py (2)
65-70: LGTM!Clean constructor extension with keyword-only parameter and
Nonedefault, maintaining backward compatibility.
217-224: LGTM!The
approval_gateis correctly forwarded toexecute_tool_calls, enabling the escalation parking flow when configured.src/ai_company/api/state.py (4)
14-14: LGTM!Correct import added with
noqa: TC001for type-checking compliance.
39-49: LGTM!The
__slots__tuple is correctly updated with the new_approval_gateentry, maintaining alphabetical order.
61-66: LGTM!Constructor correctly accepts the optional
approval_gateparameter and stores it in the private attribute.
138-141: LGTM!The property correctly returns
ApprovalGate | Nonerather than using_require_service, which aligns with the design that approval gate is optional and doesn't require a 503 error when unconfigured.tests/unit/engine/test_approval_gate.py (3)
12-30: Solid test module scaffolding and helper setup.
pytestmarkusage and_make_escalationhelper keep the tests consistent and readable.
33-46:should_parkbehavior coverage is concise and correct.The empty-tuple and first-escalation assertions match the intended contract exactly.
51-274: Good depth on parking/resume lifecycle paths.These tests cover success and failure branches (serialization, persistence, deserialization, cleanup failure) with the right behavioral assertions.
| try: | ||
| self._on_expire(expired) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception: | ||
| logger.exception( | ||
| API_APPROVAL_EXPIRED, | ||
| approval_id=item.id, | ||
| note="on_expire callback failed", | ||
| ) |
There was a problem hiding this comment.
Import the project MemoryError here.
src/ai_company/memory/errors.py defines the project-specific MemoryError, and src/ai_company/api/app.py deliberately re-raises it from _on_expire(). Because this module never imports that type, Line 153 only catches the built-in MemoryError; the project exception falls into except Exception and gets downgraded to a log.
Suggested fix
from ai_company.api.errors import ConflictError
from ai_company.core.approval import ApprovalItem # noqa: TC001
from ai_company.core.enums import (
ApprovalRiskLevel,
ApprovalStatus,
)
+from ai_company.memory.errors import MemoryError as StoreMemoryError
...
- except MemoryError, RecursionError:
+ except StoreMemoryError, MemoryError, RecursionError:
raiseAs per coding guidelines, "Handle errors explicitly, never silently swallow exceptions."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/api/approval_store.py` around lines 151 - 160, The except
clause currently catches the built-in MemoryError but not the project-specific
MemoryError defined in ai_company.memory.errors, so re-raised project errors
fall into the generic Exception handler; import the project error (e.g., from
ai_company.memory.errors import MemoryError as ProjectMemoryError) and update
the except line to explicitly re-raise both RecursionError and the project
MemoryError (or include both types in the except tuple used for re-raising)
before the generic Exception that logs via
logger.exception(API_APPROVAL_EXPIRED, approval_id=item.id, note="on_expire
callback failed"); ensure the symbol _on_expire remains where called and no
error types are swallowed.
| try: | ||
| await self._parked_context_repo.delete(parked.id) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception: | ||
| logger.exception( | ||
| APPROVAL_GATE_RESUME_DELETE_FAILED, | ||
| approval_id=approval_id, | ||
| parked_id=parked.id, | ||
| note="Context resumed but parked record not cleaned up", | ||
| ) | ||
|
|
||
| logger.info( | ||
| APPROVAL_GATE_CONTEXT_RESUMED, | ||
| approval_id=approval_id, | ||
| parked_id=parked.id, | ||
| ) | ||
| return context, parked.id |
There was a problem hiding this comment.
Check the repository's delete() result before claiming resume success.
ParkedContextRepository.delete() returns bool, but this method ignores a False result and still logs APPROVAL_GATE_CONTEXT_RESUMED. If cleanup fails silently, the parked record is still present and the same approval can be resumed again.
Possible fix
try:
- await self._parked_context_repo.delete(parked.id)
+ deleted = await self._parked_context_repo.delete(parked.id)
except MemoryError, RecursionError:
raise
except Exception:
logger.exception(
APPROVAL_GATE_RESUME_DELETE_FAILED,
approval_id=approval_id,
parked_id=parked.id,
note="Context resumed but parked record not cleaned up",
)
+ else:
+ if not deleted:
+ logger.warning(
+ APPROVAL_GATE_RESUME_DELETE_FAILED,
+ approval_id=approval_id,
+ parked_id=parked.id,
+ note="Context resumed but parked record not cleaned up",
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/engine/approval_gate.py` around lines 214 - 231, The current
resume flow ignores the boolean return from ParkedContextRepository.delete() and
always logs APPROVAL_GATE_CONTEXT_RESUMED; update the code in the method
containing the await self._parked_context_repo.delete(parked.id) call (e.g., the
resume handler in approval_gate.py) to capture the delete result, and if it
returns False treat it as a cleanup failure: log/exception with
APPROVAL_GATE_RESUME_DELETE_FAILED including approval_id and parked_id (and do
not log APPROVAL_GATE_CONTEXT_RESUMED), otherwise proceed to log
APPROVAL_GATE_CONTEXT_RESUMED and return (context, parked.id); preserve existing
exception handling for MemoryError/RecursionError.
| def test_approved_without_reason(self) -> None: | ||
| msg = ApprovalGate.build_resume_message( | ||
| "approval-1", | ||
| approved=True, | ||
| decided_by="admin", | ||
| ) | ||
| assert "APPROVED" in msg | ||
| assert "approval-1" in msg | ||
| assert "admin" in msg | ||
| assert "data only" in msg | ||
|
|
||
| def test_rejected_with_reason(self) -> None: | ||
| msg = ApprovalGate.build_resume_message( | ||
| "approval-1", | ||
| approved=False, | ||
| decided_by="reviewer", | ||
| decision_reason="Too risky for production", | ||
| ) | ||
| assert "REJECTED" in msg | ||
| assert "approval-1" in msg | ||
| assert "reviewer" in msg | ||
| assert "Too risky for production" in msg | ||
|
|
||
| def test_approved_with_reason(self) -> None: | ||
| msg = ApprovalGate.build_resume_message( | ||
| "approval-1", | ||
| approved=True, | ||
| decided_by="admin", | ||
| decision_reason="Looks good", | ||
| ) | ||
| assert "APPROVED" in msg | ||
| assert "Looks good" in msg |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Parameterize message-variant assertions to reduce duplication.
These three tests are structurally the same and are a good fit for a single parameterized test.
♻️ Suggested refactor
class TestBuildResumeMessage:
- def test_approved_without_reason(self) -> None:
- ...
-
- def test_rejected_with_reason(self) -> None:
- ...
-
- def test_approved_with_reason(self) -> None:
- ...
+ `@pytest.mark.parametrize`(
+ ("approved", "decided_by", "decision_reason", "expected_tokens"),
+ [
+ (True, "admin", None, ["APPROVED", "approval-1", "admin", "data only"]),
+ (
+ False,
+ "reviewer",
+ "Too risky for production",
+ ["REJECTED", "approval-1", "reviewer", "Too risky for production"],
+ ),
+ (True, "admin", "Looks good", ["APPROVED", "Looks good"]),
+ ],
+ )
+ def test_build_resume_message_variants(
+ self,
+ approved: bool,
+ decided_by: str,
+ decision_reason: str | None,
+ expected_tokens: list[str],
+ ) -> None:
+ msg = ApprovalGate.build_resume_message(
+ "approval-1",
+ approved=approved,
+ decided_by=decided_by,
+ decision_reason=decision_reason,
+ )
+ for token in expected_tokens:
+ assert token in msgAs per coding guidelines, "Prefer @pytest.mark.parametrize for testing similar cases."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_approved_without_reason(self) -> None: | |
| msg = ApprovalGate.build_resume_message( | |
| "approval-1", | |
| approved=True, | |
| decided_by="admin", | |
| ) | |
| assert "APPROVED" in msg | |
| assert "approval-1" in msg | |
| assert "admin" in msg | |
| assert "data only" in msg | |
| def test_rejected_with_reason(self) -> None: | |
| msg = ApprovalGate.build_resume_message( | |
| "approval-1", | |
| approved=False, | |
| decided_by="reviewer", | |
| decision_reason="Too risky for production", | |
| ) | |
| assert "REJECTED" in msg | |
| assert "approval-1" in msg | |
| assert "reviewer" in msg | |
| assert "Too risky for production" in msg | |
| def test_approved_with_reason(self) -> None: | |
| msg = ApprovalGate.build_resume_message( | |
| "approval-1", | |
| approved=True, | |
| decided_by="admin", | |
| decision_reason="Looks good", | |
| ) | |
| assert "APPROVED" in msg | |
| assert "Looks good" in msg | |
| `@pytest.mark.parametrize`( | |
| ("approved", "decided_by", "decision_reason", "expected_tokens"), | |
| [ | |
| (True, "admin", None, ["APPROVED", "approval-1", "admin", "data only"]), | |
| ( | |
| False, | |
| "reviewer", | |
| "Too risky for production", | |
| ["REJECTED", "approval-1", "reviewer", "Too risky for production"], | |
| ), | |
| (True, "admin", "Looks good", ["APPROVED", "Looks good"]), | |
| ], | |
| ) | |
| def test_build_resume_message_variants( | |
| self, | |
| approved: bool, | |
| decided_by: str, | |
| decision_reason: str | None, | |
| expected_tokens: list[str], | |
| ) -> None: | |
| msg = ApprovalGate.build_resume_message( | |
| "approval-1", | |
| approved=approved, | |
| decided_by=decided_by, | |
| decision_reason=decision_reason, | |
| ) | |
| for token in expected_tokens: | |
| assert token in msg |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/test_approval_gate.py` around lines 279 - 310, The three
tests test_approved_without_reason, test_rejected_with_reason, and
test_approved_with_reason duplicate the same structure calling
ApprovalGate.build_resume_message and asserting substrings; refactor them into a
single `@pytest.mark.parametrize` test that iterates over cases (id, approved
boolean, decided_by, decision_reason, expected_substrings) and calls
ApprovalGate.build_resume_message once per case, asserting each expected
substring is in the returned msg; update or remove the original three test
functions and keep descriptive parameter values to preserve coverage.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #365 +/- ##
==========================================
- Coverage 93.90% 93.75% -0.16%
==========================================
Files 447 452 +5
Lines 20819 21082 +263
Branches 2011 2034 +23
==========================================
+ Hits 19551 19765 +214
- Misses 981 1021 +40
- Partials 287 296 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Critical: empty task_id ValidationError, parking metadata bypass, PARKED→ERROR on park failure, wire parked_context_repo, TOCTOU race via save_if_pending, resume trigger wiring in approve/reject. Security: prompt injection mitigation, max_length on comment, ttl_seconds cap, action_type format validation, remove phantom requested_by field. Logging: wrong event constants for init/risk/park-info, consistent MemoryError logging, new INITIALIZED/RISK_CLASSIFIED/RESUME_TRIGGERED constants. Code quality: extract security/tool factories (962→860 lines), split execute() into helpers, deterministic escalation order, boolean parking_failed, check delete() result, risk classifier error handling. Frontend: async handler type fix, WS test payload/await fix, filter-aware WS, 404 vs transient error discrimination, total desync fix, empty catch logging. Tests: park failure expects ERROR, module-level markers, extracted fixtures, simplified PropertyMock, updated resume message assertions. Docs: persistence qualifier, CLAUDE.md events, docstring fixes.
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (2)
web/src/stores/approvals.ts (1)
76-95:⚠️ Potential issue | 🟡 MinorWS
approval.submittedhandler still doesn't verify filter match.When
activeFilters.valueis truthy, the code bumpstotalwithout checking if the fetched item actually matches the active filters. This can causetotalto drift from the actual filtered count.Consider verifying the fetched item against active filters and only incrementing
total(and optionally inserting intoapprovals.value) when it matches:🔧 Suggested approach
case 'approval.submitted': if (!approvals.value.some((a) => a.id === approvalId)) { try { const item = await approvalsApi.getApproval(approvalId) - if (activeFilters.value) { - // Filters are active — item may not match; just bump total - total.value++ - } else { + const matchesFilters = !activeFilters.value || itemMatchesFilters(item, activeFilters.value) + if (matchesFilters) { approvals.value = [item, ...approvals.value] total.value++ }You would need to implement an
itemMatchesFiltershelper that checks the item'sstatus,risk_level, etc. againstactiveFilters.value.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/src/stores/approvals.ts` around lines 76 - 95, The handler for 'approval.submitted' currently increments total.value when activeFilters.value is set without validating the new item against those filters; implement an itemMatchesFilters(item, filters) helper and after fetching the item via approvalsApi.getApproval(approvalId) call it when activeFilters.value is truthy — only increment total.value (and insert item into approvals.value if you want it visible) when itemMatchesFilters returns true; keep existing 404/410 handling and only fall back to the previous behavior when no filters are active.tests/unit/engine/test_approval_gate.py (1)
296-329: 🧹 Nitpick | 🔵 TrivialConsider parameterizing similar message tests.
These three tests follow the same pattern and could be consolidated using
@pytest.mark.parametrize. However, the current form is readable and the tests are few, so this is optional.Parameterized version (optional)
`@pytest.mark.parametrize`( ("approved", "decided_by", "decision_reason", "expected_tokens"), [ (True, "admin", None, ["APPROVED", "approval-1", "admin", "[SYSTEM:"]), (False, "reviewer", "Too risky for production", ["REJECTED", "approval-1", "reviewer", "Too risky for production", "USER-SUPPLIED REASON", "untrusted data"]), (True, "admin", "Looks good", ["APPROVED", "Looks good", "USER-SUPPLIED REASON"]), ], ) def test_build_resume_message_variants( self, approved: bool, decided_by: str, decision_reason: str | None, expected_tokens: list[str], ) -> None: msg = ApprovalGate.build_resume_message( "approval-1", approved=approved, decided_by=decided_by, decision_reason=decision_reason, ) for token in expected_tokens: assert token in msgAs per coding guidelines: "Prefer
@pytest.mark.parametrizefor testing similar cases."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/engine/test_approval_gate.py` around lines 296 - 329, The three repetitive tests (test_approved_without_reason, test_rejected_with_reason, test_approved_with_reason) should be consolidated into a single parametrized test that calls ApprovalGate.build_resume_message with different (approved, decided_by, decision_reason) inputs and asserts expected substrings; add a pytest.mark.parametrize decorator (e.g., test_build_resume_message_variants) with rows for each case and loop over expected_tokens asserting each is in msg, then remove the three original test_... functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Line 154: The sentence listing event constants after "Event names" is missing
a comma after "e.g."; update the phrase "e.g. `PROVIDER_CALL_START` ..." to
"e.g., `PROVIDER_CALL_START` ..." so the abbreviation is punctuated consistently
in the long list (look for the "Event names" paragraph and the "e.g." before the
`PROVIDER_CALL_START`/events constant list).
In `@src/ai_company/api/controllers/approvals.py`:
- Around line 219-225: Replace the success event used in the exception handler:
create a new event constant APPROVAL_GATE_RESUME_FAILED in
ai_company.observability.events.approval_gate, import it into approvals.py, and
change the logger.warning call inside the except block to use
APPROVAL_GATE_RESUME_FAILED (keeping the same message, approval_id, note and
exc_info=True) instead of APPROVAL_GATE_RESUME_TRIGGERED so failure logging is
correctly categorized.
In `@src/ai_company/engine/approval_gate.py`:
- Around line 213-232: The warning "delete() returned False" can be emitted
incorrectly when delete() actually raised and was logged by logger.exception;
fix by tracking whether delete raised: introduce a boolean (e.g. delete_errored
= False) before calling self._parked_context_repo.delete, set delete_errored =
True inside the broad except Exception block that calls
logger.exception(APPROVAL_GATE_RESUME_DELETE_FAILED, ...), and change the final
check to only log the warning when not deleted and not delete_errored (if not
deleted and not delete_errored: logger.warning(...)). This uses the existing
symbols self._parked_context_repo.delete, deleted, logger.exception,
logger.warning, and APPROVAL_GATE_RESUME_DELETE_FAILED.
In `@src/ai_company/engine/loop_helpers.py`:
- Around line 373-378: The debug log currently emits the
APPROVAL_GATE_CONTEXT_PARKED event while only noting a missing task_id; change
this to a semantically correct diagnostic: either remove the parked event
constant and call logger.debug with a plain message about the missing
task_execution/task_id, or add a new event constant (e.g.,
APPROVAL_GATE_CONTEXT_PARK_NO_TASK) to
ai_company.observability.events.approval_gate and use that here instead of
APPROVAL_GATE_CONTEXT_PARKED; update the logger.debug invocation in
loop_helpers.py (the call that passes approval_id, agent_id and note="No
task_execution on context — task_id will be None") to use the chosen option so
observability events remain accurate.
In `@src/ai_company/tools/invoker.py`:
- Around line 633-634: The construction of ApprovalRiskLevel from
result.metadata can raise ValueError for invalid strings; update the code that
builds the object (the ApprovalRiskLevel(...) call using
result.metadata.get("risk_level", "high")) to validate or convert the metadata
value first: read risk_str = result.metadata.get("risk_level"), check if
risk_str is a valid member of ApprovalRiskLevel (or wrap
ApprovalRiskLevel(risk_str) in a try/except ValueError), and on failure log a
clear metadata validation warning and fall back to a safe default (e.g., "high")
before passing into ApprovalRiskLevel; ensure the change is applied where
ApprovalRiskLevel is instantiated so invalid metadata never raises an uncaught
ValueError.
In `@web/src/__tests__/stores/approvals.test.ts`:
- Around line 286-290: Add an afterEach hook that calls vi.restoreAllMocks() to
fully restore spies between tests (in addition to the existing
vi.clearAllMocks()); also change the axios.isAxiosError spy at the second
location to capture the original implementation (like the first instance) and
delegate to it for non-matching errors by storing (await
import('axios')).default.isAxiosError in originalIsAxiosError and calling
originalIsAxiosError(err) when err !== axiosError so the spy preserves the
original behavior.
In `@web/src/stores/approvals.ts`:
- Around line 96-117: The handler for
'approval.approved'/'approval.rejected'/'approval.expired' currently replaces
the item in approvals.value regardless of current activeFilters; instead, after
fetching the updated approval via approvalsApi.getApproval(approvalId) check
whether the updated item matches the current activeFilters (use whatever
predicate/function you use elsewhere to test filter membership) and if it does
replace the item in approvals.value, otherwise remove it from approvals.value
and decrement total.value accordingly; keep the existing 404/410 removal branch
as-is and perform this filter-check/replace-or-remove logic in the same switch
case that currently updates approvals.value.
---
Duplicate comments:
In `@tests/unit/engine/test_approval_gate.py`:
- Around line 296-329: The three repetitive tests (test_approved_without_reason,
test_rejected_with_reason, test_approved_with_reason) should be consolidated
into a single parametrized test that calls ApprovalGate.build_resume_message
with different (approved, decided_by, decision_reason) inputs and asserts
expected substrings; add a pytest.mark.parametrize decorator (e.g.,
test_build_resume_message_variants) with rows for each case and loop over
expected_tokens asserting each is in msg, then remove the three original
test_... functions.
In `@web/src/stores/approvals.ts`:
- Around line 76-95: The handler for 'approval.submitted' currently increments
total.value when activeFilters.value is set without validating the new item
against those filters; implement an itemMatchesFilters(item, filters) helper and
after fetching the item via approvalsApi.getApproval(approvalId) call it when
activeFilters.value is truthy — only increment total.value (and insert item into
approvals.value if you want it visible) when itemMatchesFilters returns true;
keep existing 404/410 handling and only fall back to the previous behavior when
no filters are active.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: cf611785-f95c-4d33-8547-1af4d9797c41
📒 Files selected for processing (22)
CLAUDE.mddocs/design/engine.mdsrc/ai_company/api/approval_store.pysrc/ai_company/api/controllers/approvals.pysrc/ai_company/api/dto.pysrc/ai_company/api/ws_models.pysrc/ai_company/engine/_security_factory.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/approval_gate.pysrc/ai_company/engine/loop_helpers.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/security/timeout/park_service.pysrc/ai_company/security/timeout/parked_context.pysrc/ai_company/tools/approval_tool.pysrc/ai_company/tools/invoker.pytests/unit/api/controllers/test_approvals.pytests/unit/api/test_dto.pytests/unit/engine/test_approval_gate.pytests/unit/engine/test_loop_helpers_approval.pytests/unit/observability/test_events.pyweb/src/__tests__/stores/approvals.test.tsweb/src/stores/approvals.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Greptile Review
- GitHub Check: Test (Python 3.14)
🧰 Additional context used
📓 Path-based instructions (9)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python version 3.14+ with PEP 649 native lazy annotations required.
Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax:except A, B:(no parentheses) — ruff enforces this on Python 3.14.
Line length must be 88 characters, enforced by ruff.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001. Tests must usetest-provider,test-small-001, etc.
Files:
tests/unit/api/controllers/test_approvals.pysrc/ai_company/security/timeout/park_service.pysrc/ai_company/security/timeout/parked_context.pysrc/ai_company/api/controllers/approvals.pysrc/ai_company/engine/_security_factory.pytests/unit/engine/test_loop_helpers_approval.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/api/ws_models.pysrc/ai_company/tools/invoker.pysrc/ai_company/engine/approval_gate.pysrc/ai_company/api/dto.pysrc/ai_company/tools/approval_tool.pytests/unit/api/test_dto.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/api/approval_store.pytests/unit/engine/test_approval_gate.pysrc/ai_company/engine/loop_helpers.pytests/unit/observability/test_events.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Tests must use markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Test coverage minimum: 80% (enforced in CI).
Async tests: useasyncio_mode = "auto"— no manual@pytest.mark.asyncioneeded.
Test timeout: 30 seconds per test.
Prefer@pytest.mark.parametrizefor testing similar cases.
Files:
tests/unit/api/controllers/test_approvals.pytests/unit/engine/test_loop_helpers_approval.pytests/unit/api/test_dto.pytests/unit/engine/test_approval_gate.pytests/unit/observability/test_events.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: All public functions must have type hints. Enforce via mypy strict mode.
Google-style docstrings required on public classes and functions, enforced by ruff D rules.
Create new objects instead of mutating existing ones. For non-Pydantic internal collections, usecopy.deepcopy()at construction plusMappingProxyTypewrapping for read-only enforcement.
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention and usecopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 withBaseModel,model_validator,computed_field,ConfigDict. Use@computed_fieldfor derived values instead of storing redundant fields (e.g.,TokenUsage.total_tokens).
UseNotBlankStrfromcore.typesfor all identifier/name fields in Pydantic models, including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants, instead of manual whitespace validators.
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow exceptions.
Validate at system boundaries: user input, external APIs, config files.
NEVER useimport logging,logging.getLogger(), orprint()in application code.
Always useloggeras the variable name for loggers (not_logger, notlog).
Event names must always use constants from domain-specific modules underai_company.observability.events(e.g.,PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget). Import dire...
Files:
src/ai_company/security/timeout/park_service.pysrc/ai_company/security/timeout/parked_context.pysrc/ai_company/api/controllers/approvals.pysrc/ai_company/engine/_security_factory.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/api/ws_models.pysrc/ai_company/tools/invoker.pysrc/ai_company/engine/approval_gate.pysrc/ai_company/api/dto.pysrc/ai_company/tools/approval_tool.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/api/approval_store.pysrc/ai_company/engine/loop_helpers.py
src/ai_company/**/[!_]*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Every module with business logic MUST have:
from ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Files:
src/ai_company/security/timeout/park_service.pysrc/ai_company/security/timeout/parked_context.pysrc/ai_company/api/controllers/approvals.pysrc/ai_company/observability/events/approval_gate.pysrc/ai_company/api/ws_models.pysrc/ai_company/tools/invoker.pysrc/ai_company/engine/approval_gate.pysrc/ai_company/api/dto.pysrc/ai_company/tools/approval_tool.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/api/approval_store.pysrc/ai_company/engine/loop_helpers.py
web/src/**/*.{vue,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Web dashboard: Vue 3 + PrimeVue + Tailwind CSS, organized by feature in src/components/, src/stores/, src/views/. Enforce with ESLint and vue-tsc type-checking.
Files:
web/src/stores/approvals.tsweb/src/__tests__/stores/approvals.test.ts
web/src/stores/**/*.ts
📄 CodeRabbit inference engine (CLAUDE.md)
Frontend state management: Pinia stores organized by feature in src/stores/
Files:
web/src/stores/approvals.ts
web/src/__tests__/**/*.ts
📄 CodeRabbit inference engine (CLAUDE.md)
Web dashboard tests: Vitest unit tests organized by feature in tests/
Files:
web/src/__tests__/stores/approvals.test.ts
docs/design/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
When approved deviations occur, update the relevant
docs/design/page to reflect the new reality.
Files:
docs/design/engine.md
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Documentation source in
docs/built with Zensical. Design spec indocs/design/(7 pages). Architecture indocs/architecture/. Roadmap indocs/roadmap/. Security indocs/security.md.
Files:
docs/design/engine.md
🧠 Learnings (16)
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/ai_company/**/[!_]*.py : Every module with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Event names must always use constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
CLAUDE.mdsrc/ai_company/observability/events/approval_gate.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/loop_helpers.pytests/unit/observability/test_events.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Structured logging: always use `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : NEVER use `import logging`, `logging.getLogger()`, or `print()` in application code.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Always use `logger` as the variable name for loggers (not `_logger`, not `log`).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : DEBUG logging for object creation, internal flow, and entry/exit of key functions.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Pure data models, enums, and re-exports do NOT need logging.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-13T21:03:58.907Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.907Z
Learning: When review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes), fix them all. No deferring, no "out of scope" skipping.
Applied to files:
web/src/stores/approvals.ts
📚 Learning: 2026-03-13T21:03:58.907Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.907Z
Learning: Applies to web/src/stores/**/*.ts : Frontend state management: Pinia stores organized by feature in src/stores/
Applied to files:
web/src/stores/approvals.ts
📚 Learning: 2026-03-13T21:03:58.907Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.907Z
Learning: Applies to web/src/__tests__/**/*.ts : Web dashboard tests: Vitest unit tests organized by feature in __tests__/
Applied to files:
web/src/__tests__/stores/approvals.test.ts
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to src/**/*.py : Use Pydantic v2 with `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. Use `computed_field` for derived values instead of storing redundant fields (e.g., `TokenUsage.total_tokens`).
Applied to files:
src/ai_company/api/dto.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases.
Applied to files:
tests/unit/engine/test_approval_gate.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to tests/**/*.py : Tests must use markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`
Applied to files:
tests/unit/observability/test_events.py
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.
Applied to files:
tests/unit/observability/test_events.py
🧬 Code graph analysis (14)
src/ai_company/security/timeout/park_service.py (1)
src/ai_company/engine/parallel_models.py (1)
task_id(87-89)
src/ai_company/security/timeout/parked_context.py (2)
src/ai_company/engine/parallel_models.py (1)
task_id(87-89)src/ai_company/tools/base.py (1)
description(138-140)
src/ai_company/api/controllers/approvals.py (5)
src/ai_company/api/state.py (2)
approval_gate(139-141)AppState(23-163)src/ai_company/engine/approval_gate.py (3)
ApprovalGate(40-274)resume_context(162-239)build_resume_message(242-274)src/ai_company/memory/errors.py (1)
MemoryError(13-14)src/ai_company/api/approval_store.py (2)
get(61-73)save_if_pending(125-142)src/ai_company/api/errors.py (2)
UnauthorizedError(59-65)ConflictError(41-47)
tests/unit/engine/test_loop_helpers_approval.py (4)
src/ai_company/engine/approval_gate.py (3)
ApprovalGate(40-274)should_park(71-90)park_context(92-160)src/ai_company/engine/loop_helpers.py (1)
execute_tool_calls(241-343)src/ai_company/engine/loop_protocol.py (2)
ExecutionResult(79-140)TerminationReason(28-36)src/ai_company/providers/models.py (1)
CompletionResponse(257-306)
web/src/stores/approvals.ts (2)
src/ai_company/api/ws_models.py (1)
WsEvent(48-73)web/src/api/types.ts (1)
WsEvent(519-524)
src/ai_company/tools/invoker.py (3)
src/ai_company/core/enums.py (1)
ApprovalRiskLevel(443-449)src/ai_company/engine/approval_gate_models.py (1)
EscalationInfo(14-33)src/ai_company/tools/errors.py (1)
ToolExecutionError(55-56)
web/src/__tests__/stores/approvals.test.ts (2)
web/src/stores/approvals.ts (1)
useApprovalStore(8-136)web/src/api/types.ts (1)
WsEvent(519-524)
src/ai_company/engine/approval_gate.py (5)
src/ai_company/persistence/repositories.py (1)
ParkedContextRepository(199-267)src/ai_company/security/timeout/park_service.py (3)
ParkService(29-144)park(37-106)resume(108-144)src/ai_company/security/timeout/parked_context.py (1)
ParkedContext(19-66)src/ai_company/engine/approval_gate_models.py (1)
EscalationInfo(14-33)src/ai_company/engine/context.py (1)
AgentContext(87-307)
src/ai_company/api/dto.py (1)
src/ai_company/engine/parallel_models.py (1)
task_id(87-89)
src/ai_company/tools/approval_tool.py (6)
src/ai_company/core/enums.py (1)
ToolCategory(294-308)src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/tools/base.py (6)
BaseTool(57-184)ToolExecutionResult(25-54)description(138-140)category(128-130)action_type(133-135)parameters_schema(143-151)src/ai_company/api/approval_store.py (1)
add(42-59)src/ai_company/security/timeout/risk_tier_classifier.py (1)
DefaultRiskTierClassifier(62-101)src/ai_company/memory/errors.py (1)
MemoryError(13-14)
tests/unit/api/test_dto.py (3)
src/ai_company/tools/base.py (2)
action_type(133-135)description(138-140)src/ai_company/core/enums.py (1)
ApprovalRiskLevel(443-449)src/ai_company/api/dto.py (1)
CreateApprovalRequest(196-249)
src/ai_company/engine/agent_engine.py (5)
src/ai_company/engine/_security_factory.py (2)
make_security_interceptor(39-117)registry_with_approval_tool(120-148)src/ai_company/engine/approval_gate.py (1)
ApprovalGate(40-274)src/ai_company/persistence/repositories.py (1)
ParkedContextRepository(199-267)src/ai_company/engine/loop_protocol.py (1)
ExecutionLoop(151-189)src/ai_company/engine/react_loop.py (1)
ReactLoop(55-269)
src/ai_company/api/approval_store.py (4)
src/ai_company/core/approval.py (1)
ApprovalItem(24-96)src/ai_company/core/enums.py (1)
ApprovalStatus(434-440)src/ai_company/api/app.py (1)
_on_expire(74-99)src/ai_company/memory/errors.py (1)
MemoryError(13-14)
tests/unit/engine/test_approval_gate.py (3)
src/ai_company/engine/approval_gate.py (3)
ApprovalGate(40-274)park_context(92-160)build_resume_message(242-274)src/ai_company/persistence/repositories.py (1)
ParkedContextRepository(199-267)src/ai_company/security/timeout/park_service.py (3)
ParkService(29-144)park(37-106)resume(108-144)
🪛 LanguageTool
CLAUDE.md
[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...
(EG_NO_COMMA)
🔇 Additional comments (37)
src/ai_company/api/ws_models.py (1)
45-45: LGTM!The new
APPROVAL_RESUMEDenum member follows the established naming convention and aligns with the other approval-related WebSocket event types.src/ai_company/security/timeout/parked_context.py (1)
41-43: LGTM!The change to make
task_idoptional (NotBlankStr | None) correctly supports the taskless agents scenario. The default ofNoneand updated description align with the PR objectives.src/ai_company/security/timeout/park_service.py (1)
37-45: LGTM!The signature change to accept optional
task_idis well-implemented. The equality check at lines 92-97 correctly handlesNonecomparison (Python'sNone == Noneevaluates toTrue), and the docstring clearly documents the taskless agent use case.web/src/stores/approvals.ts (1)
87-93: LGTM on error handling differentiation.The error handling now correctly distinguishes between definitive 404/410 responses (item genuinely gone) and transient errors (timeouts, 5xx). This addresses the previous review concern about treating all fetch errors as deletes.
Also applies to: 104-115
src/ai_company/api/dto.py (2)
221-232: LGTM!The
action_typefield validator correctly enforces thecategory:actionformat. Usingstrip()to check for whitespace-only parts and the_ACTION_TYPE_PARTSconstant for the magic number are good practices.
217-217: LGTM!The
ttl_secondsupper bound of 604800 (7 days) provides reasonable protection against excessive TTL values while maintaining flexibility.CLAUDE.md (1)
101-101: LGTM!Documentation updates accurately reflect the new approval gate integration across the API, engine, and tools modules.
Also applies to: 107-107, 115-115
src/ai_company/engine/_security_factory.py (2)
39-117: LGTM!The
make_security_interceptorfunction has proper defensive checks for configuration mismatches (autonomy without security config), appropriate logging before raising errors, and clean construction of the security stack with conditional detector inclusion.
120-148: LGTM!The
registry_with_approval_toolfactory cleanly handles the optional approval store case and uses deferred imports to avoid circular dependencies. The pattern of building a new registry with existing tools plus the approval tool maintains immutability.src/ai_company/tools/approval_tool.py (4)
97-133: LGTM!The
executemethod is well-structured as an orchestrator, delegating to focused helper methods for validation, risk classification, persistence, and result building. This addresses the previous review concern about method length.
162-163: LGTM on PEP 758 exception syntax.The
except MemoryError, RecursionError:syntax correctly uses the Python 3.14 PEP 758 format (comma-separated, no parentheses) as required by the coding guidelines.Also applies to: 238-239
135-178: LGTM!The
_persist_itemhelper properly handles persistence errors, re-raises critical exceptions (MemoryError,RecursionError), and logs failures with context before returning an error result.
229-252: LGTM!The
_classify_riskmethod correctly implements fail-safe behavior by defaulting toHIGHwhen classification fails or no classifier is configured, aligning with the D19 design spec mentioned inDefaultRiskTierClassifier.src/ai_company/observability/events/approval_gate.py (1)
1-16: LGTM!The event constants are well-structured, following the
approval_gate.*naming convention consistently. The coverage spans the full approval gate lifecycle (initialization, escalation detection, context parking/resume, and failure cases).tests/unit/observability/test_events.py (1)
135-135: LGTM!The module-level
pytestmarkconsolidation is cleaner than per-class markers, and adding"approval_gate"to the expected domain modules ensures the new event constants file is properly discovered by the test suite.Also applies to: 176-178
src/ai_company/engine/approval_gate.py (1)
241-274: Good prompt injection mitigation inbuild_resume_message.The use of
repr()wrapping for user-supplied values and explicit labeling of untrusted data is a solid defense against prompt injection in the resume flow.src/ai_company/engine/agent_engine.py (3)
139-142: LGTM — approval gate wiring is correctly sequenced.The initialization order ensures
_approval_storeand_parked_context_repoare assigned before_make_approval_gate()is called, and the approval gate is created before_make_default_loop(). This addresses the previously flagged issue about wiring theparked_context_repo.
597-617: LGTM — approval gate factory properly guards on approval store.The factory correctly returns
Nonewhen no approval store is configured, ensuring the execution loop skips approval-gate checks in that case. The late import ofParkServiceavoids circular dependencies.
631-655: LGTM — tool invoker properly augmented with approval tool.The registry augmentation via
registry_with_approval_toolcorrectly passes the identity and task_id, enabling the approval tool when an approval store is configured.tests/unit/api/controllers/test_approvals.py (1)
22-27: LGTM — action_type format updated to match new validation.The change from
"code_merge"to"code:merge"aligns with theCreateApprovalRequestvalidator that now enforces the"category:action"format.tests/unit/api/test_dto.py (1)
40-45: LGTM — action_type format consistently updated across all metadata tests.All four test cases now use
"deploy:release"to comply with the"category:action"format requirement, allowing the tests to properly exercise the metadata validation logic without failing on action_type validation.Also applies to: 51-56, 62-67, 71-76
docs/design/engine.md (1)
450-468: LGTM — documentation accurately describes the new approval gate flow.The additions clearly explain the escalation detection, parking mechanism, and how
PARKEDtermination interacts with the task lifecycle. The documentation aligns well with the implementation inApprovalGateandloop_helpers.src/ai_company/api/approval_store.py (2)
125-142: LGTM —save_if_pendingcorrectly mitigates TOCTOU race.The method applies lazy expiration check before comparing status, ensuring concurrent expirations don't result in saving a decision on an already-expired item. Returning
Nonewhen the item is no longerPENDINGallows callers to detect and handle concurrent decisions.
170-179: LGTM — exception handling in_check_expirationis appropriate.The try/except properly re-raises
MemoryErrorandRecursionErroras critical errors, while logging other callback failures without disrupting the expiration flow. This aligns with the coding guideline to handle errors explicitly.tests/unit/engine/test_loop_helpers_approval.py (1)
1-205: LGTM! Comprehensive test coverage for approval gate integration.The test file properly covers the key scenarios:
- No approval gate returns context normally
- With gate but no escalation returns context
- Escalation triggers parking with correct metadata
- Park failure correctly returns ERROR (not PARKED)
Past review feedback has been addressed:
PropertyMockremoved in favor of simple attribute assignment, and park failure now expectsTerminationReason.ERRORas appropriate.src/ai_company/engine/loop_helpers.py (2)
241-343: Well-structured approval gate integration.The
execute_tool_callsfunction cleanly integrates the approval gate check after tool results are processed. The flow correctly:
- Checks for escalations only when
approval_gateis provided- Delegates to
_park_for_approvalfor the parking logic- Returns updated context when no parking is needed
346-419: Proper error handling in parking flow.The
_park_for_approvalhelper correctly:
- Returns
ERROR(notPARKED) when parking fails — this ensures non-resumable failures are not masked- Uses boolean
parking_failedin metadata (addresses past review feedback)- Re-raises
MemoryErrorandRecursionErrorappropriatelysrc/ai_company/api/controllers/approvals.py (3)
101-109: Correct PEP 758 except syntax and error handling.The exception handling properly re-raises
MemoryErrorandRecursionErrorwhile allowing other exceptions to be logged as warnings without disrupting the approval flow.
319-322: Authentication enforcement for create_approval.Good security hardening —
requested_byis now bound to the authenticated user's username rather than accepting it from the request payload, mitigating potential spoofing.
410-418: TOCTOU race condition mitigated with save_if_pending.Using
save_if_pendinginstead of a separate check-then-save properly handles concurrent approval decisions. TheConflictError(409) response is appropriate for this scenario.src/ai_company/tools/invoker.py (3)
120-136: Escalation tracking property is well-documented.The
pending_escalationsproperty clearly documents when it's populated (ESCALATE verdicts, parking metadata) and when it's cleared (start ofinvoke/invoke_all). The tuple return type ensures immutability for callers.Note: The past review concern about "approval gating discovered too late" is an architectural limitation — concurrent tool calls can complete before parking is checked. This would require a fundamentally different invocation model to address.
773-779: Deterministic escalation ordering.Sorting escalations by original tool-call index ensures consistent behavior regardless of concurrent execution order. This is important for reproducible parking decisions.
601-652: Defensive escalation tracking with fail-closed behavior.The
_track_parking_metadatahelper properly:
- Checks both
requires_parking=Trueand presence ofapproval_id- Re-raises
MemoryError/RecursionError- Raises
ToolExecutionErroron tracking failure to prevent silent bypass of approval gatetests/unit/engine/test_approval_gate.py (2)
34-54: Good use of fixtures to reduce test setup duplication.The
park_service,parked_mock, andrepofixtures cleanly extract the repeated mock wiring that was noted in past reviews. This improves maintainability.
57-290: Comprehensive test coverage for ApprovalGate.The tests thoroughly cover:
should_park: empty input and multiple escalationspark_context: service calls, repo persistence, no-repo path, serialization errors, repo save errorsresume_context: successful resume, unknown approval, no repo, deletion after resume, deserialization failure, delete failure resilienceEdge cases like delete failure not losing context (lines 269-290) are particularly valuable for reliability.
web/src/__tests__/stores/approvals.test.ts (2)
6-6: Good async test synchronization forhandleWsEvent.Using
flushPromiseshere is a solid fit sincehandleWsEventschedules async work and returns synchronously.
163-279: WS event coverage is strong and aligned with the newapproval_idcontract.These cases now validate fetch-on-event behavior, duplicate prevention, and total-count handling under filtered and error conditions well.
Also applies to: 341-366
| - **Never** use `import logging` / `logging.getLogger()` / `print()` in application code | ||
| - **Variable name**: always `logger` (not `_logger`, not `log`) | ||
| - **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `API_ROUTE_NOT_FOUND` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`, `SECURITY_EVALUATE_START` from `events.security`, `HR_HIRING_REQUEST_CREATED` from `events.hr`, `PERF_METRIC_RECORDED` from `events.performance`, `TRUST_EVALUATE_START` from `events.trust`, `PROMOTION_EVALUATE_START` from `events.promotion`, `PROMPT_BUILD_START` from `events.prompt`, `MEMORY_RETRIEVAL_START` from `events.memory`, `MEMORY_BACKEND_CONNECTED` from `events.memory`, `MEMORY_ENTRY_STORED` from `events.memory`, `MEMORY_BACKEND_SYSTEM_ERROR` from `events.memory`, `AUTONOMY_ACTION_AUTO_APPROVED` from `events.autonomy`, `TIMEOUT_POLICY_EVALUATED` from `events.timeout`, `PERSISTENCE_AUDIT_ENTRY_SAVED` from `events.persistence`, `TASK_ENGINE_STARTED` from `events.task_engine`, `COORDINATION_STARTED` from `events.coordination`, `COMMUNICATION_DISPATCH_START` from `events.communication`, `COMPANY_STARTED` from `events.company`, `CONFIG_LOADED` from `events.config`, `CORRELATION_ID_CREATED` from `events.correlation`, `DECOMPOSITION_STARTED` from `events.decomposition`, `DELEGATION_STARTED` from `events.delegation`, `EXECUTION_LOOP_STARTED` from `events.execution`, `GIT_OPERATION_START` from `events.git`, `PARALLEL_EXECUTION_STARTED` from `events.parallel`, `PERSONALITY_LOADED` from `events.personality`, `QUOTA_CHECKED` from `events.quota`, `ROLE_ASSIGNED` from `events.role`, `ROUTING_STARTED` from `events.routing`, `SANDBOX_EXECUTE_START` from `events.sandbox`, `TASK_CREATED` from `events.task`, `TASK_ASSIGNMENT_STARTED` from `events.task_assignment`, `TASK_ROUTING_STARTED` from `events.task_routing`, `TEMPLATE_LOADED` from `events.template`, `TOOL_INVOKE_START` from `events.tool`, `WORKSPACE_CREATED` from `events.workspace`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT` | ||
| - **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `API_ROUTE_NOT_FOUND` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`, `SECURITY_EVALUATE_START` from `events.security`, `HR_HIRING_REQUEST_CREATED` from `events.hr`, `PERF_METRIC_RECORDED` from `events.performance`, `TRUST_EVALUATE_START` from `events.trust`, `PROMOTION_EVALUATE_START` from `events.promotion`, `PROMPT_BUILD_START` from `events.prompt`, `MEMORY_RETRIEVAL_START` from `events.memory`, `MEMORY_BACKEND_CONNECTED` from `events.memory`, `MEMORY_ENTRY_STORED` from `events.memory`, `MEMORY_BACKEND_SYSTEM_ERROR` from `events.memory`, `AUTONOMY_ACTION_AUTO_APPROVED` from `events.autonomy`, `TIMEOUT_POLICY_EVALUATED` from `events.timeout`, `PERSISTENCE_AUDIT_ENTRY_SAVED` from `events.persistence`, `TASK_ENGINE_STARTED` from `events.task_engine`, `COORDINATION_STARTED` from `events.coordination`, `COMMUNICATION_DISPATCH_START` from `events.communication`, `COMPANY_STARTED` from `events.company`, `CONFIG_LOADED` from `events.config`, `CORRELATION_ID_CREATED` from `events.correlation`, `DECOMPOSITION_STARTED` from `events.decomposition`, `DELEGATION_STARTED` from `events.delegation`, `EXECUTION_LOOP_STARTED` from `events.execution`, `GIT_OPERATION_START` from `events.git`, `PARALLEL_EXECUTION_STARTED` from `events.parallel`, `PERSONALITY_LOADED` from `events.personality`, `QUOTA_CHECKED` from `events.quota`, `ROLE_ASSIGNED` from `events.role`, `ROUTING_STARTED` from `events.routing`, `SANDBOX_EXECUTE_START` from `events.sandbox`, `TASK_CREATED` from `events.task`, `TASK_ASSIGNMENT_STARTED` from `events.task_assignment`, `TASK_ROUTING_STARTED` from `events.task_routing`, `TEMPLATE_LOADED` from `events.template`, `TOOL_INVOKE_START` from `events.tool`, `WORKSPACE_CREATED` from `events.workspace`, `APPROVAL_GATE_ESCALATION_DETECTED` from `events.approval_gate`, `APPROVAL_GATE_INITIALIZED` from `events.approval_gate`, `APPROVAL_GATE_RISK_CLASSIFIED` from `events.approval_gate`, `APPROVAL_GATE_RESUME_TRIGGERED` from `events.approval_gate`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT` |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding comma after "e.g." for consistency.
Static analysis flagged the missing comma. While minor, it improves readability in the dense event constant listing.
🧰 Tools
🪛 LanguageTool
[style] ~154-~154: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...
(EG_NO_COMMA)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CLAUDE.md` at line 154, The sentence listing event constants after "Event
names" is missing a comma after "e.g."; update the phrase "e.g.
`PROVIDER_CALL_START` ..." to "e.g., `PROVIDER_CALL_START` ..." so the
abbreviation is punctuated consistently in the long list (look for the "Event
names" paragraph and the "e.g." before the `PROVIDER_CALL_START`/events constant
list).
| except Exception: | ||
| logger.warning( | ||
| APPROVAL_GATE_RESUME_TRIGGERED, | ||
| approval_id=approval_id, | ||
| note="Resume trigger failed — decision was saved successfully", | ||
| exc_info=True, | ||
| ) |
There was a problem hiding this comment.
Success event used for failure logging.
APPROVAL_GATE_RESUME_TRIGGERED semantically indicates a successful resume trigger. Using it for the failure path (with exc_info=True) will confuse observability dashboards and alerting.
Consider using a separate event constant like APPROVAL_GATE_RESUME_FAILED for this error path.
Suggested fix
except Exception:
logger.warning(
- APPROVAL_GATE_RESUME_TRIGGERED,
+ APPROVAL_GATE_RESUME_FAILED,
approval_id=approval_id,
note="Resume trigger failed — decision was saved successfully",
exc_info=True,
)You'll need to add APPROVAL_GATE_RESUME_FAILED to ai_company.observability.events.approval_gate and import it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/api/controllers/approvals.py` around lines 219 - 225, Replace
the success event used in the exception handler: create a new event constant
APPROVAL_GATE_RESUME_FAILED in ai_company.observability.events.approval_gate,
import it into approvals.py, and change the logger.warning call inside the
except block to use APPROVAL_GATE_RESUME_FAILED (keeping the same message,
approval_id, note and exc_info=True) instead of APPROVAL_GATE_RESUME_TRIGGERED
so failure logging is correctly categorized.
| deleted = False | ||
| try: | ||
| deleted = await self._parked_context_repo.delete(parked.id) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception: | ||
| logger.exception( | ||
| APPROVAL_GATE_RESUME_DELETE_FAILED, | ||
| approval_id=approval_id, | ||
| parked_id=parked.id, | ||
| note="Context resumed but parked record not cleaned up", | ||
| ) | ||
|
|
||
| if not deleted: | ||
| logger.warning( | ||
| APPROVAL_GATE_RESUME_DELETE_FAILED, | ||
| approval_id=approval_id, | ||
| parked_id=parked.id, | ||
| note="delete() returned False — parked record may still exist", | ||
| ) |
There was a problem hiding this comment.
Misleading warning message when delete throws an exception.
If delete() raises an exception, it's logged via logger.exception(), but then the if not deleted: block also fires with the note "delete() returned False" — which is incorrect since it threw, not returned False. This creates redundant/confusing logging.
Suggested fix
deleted = False
+ delete_exception = False
try:
deleted = await self._parked_context_repo.delete(parked.id)
except MemoryError, RecursionError:
raise
except Exception:
+ delete_exception = True
logger.exception(
APPROVAL_GATE_RESUME_DELETE_FAILED,
approval_id=approval_id,
parked_id=parked.id,
note="Context resumed but parked record not cleaned up",
)
- if not deleted:
+ if not deleted and not delete_exception:
logger.warning(
APPROVAL_GATE_RESUME_DELETE_FAILED,
approval_id=approval_id,
parked_id=parked.id,
note="delete() returned False — parked record may still exist",
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| deleted = False | |
| try: | |
| deleted = await self._parked_context_repo.delete(parked.id) | |
| except MemoryError, RecursionError: | |
| raise | |
| except Exception: | |
| logger.exception( | |
| APPROVAL_GATE_RESUME_DELETE_FAILED, | |
| approval_id=approval_id, | |
| parked_id=parked.id, | |
| note="Context resumed but parked record not cleaned up", | |
| ) | |
| if not deleted: | |
| logger.warning( | |
| APPROVAL_GATE_RESUME_DELETE_FAILED, | |
| approval_id=approval_id, | |
| parked_id=parked.id, | |
| note="delete() returned False — parked record may still exist", | |
| ) | |
| deleted = False | |
| delete_exception = False | |
| try: | |
| deleted = await self._parked_context_repo.delete(parked.id) | |
| except MemoryError, RecursionError: | |
| raise | |
| except Exception: | |
| delete_exception = True | |
| logger.exception( | |
| APPROVAL_GATE_RESUME_DELETE_FAILED, | |
| approval_id=approval_id, | |
| parked_id=parked.id, | |
| note="Context resumed but parked record not cleaned up", | |
| ) | |
| if not deleted and not delete_exception: | |
| logger.warning( | |
| APPROVAL_GATE_RESUME_DELETE_FAILED, | |
| approval_id=approval_id, | |
| parked_id=parked.id, | |
| note="delete() returned False — parked record may still exist", | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/engine/approval_gate.py` around lines 213 - 232, The warning
"delete() returned False" can be emitted incorrectly when delete() actually
raised and was logged by logger.exception; fix by tracking whether delete
raised: introduce a boolean (e.g. delete_errored = False) before calling
self._parked_context_repo.delete, set delete_errored = True inside the broad
except Exception block that calls
logger.exception(APPROVAL_GATE_RESUME_DELETE_FAILED, ...), and change the final
check to only log the warning when not deleted and not delete_errored (if not
deleted and not delete_errored: logger.warning(...)). This uses the existing
symbols self._parked_context_repo.delete, deleted, logger.exception,
logger.warning, and APPROVAL_GATE_RESUME_DELETE_FAILED.
| logger.debug( | ||
| APPROVAL_GATE_CONTEXT_PARKED, | ||
| approval_id=escalation.approval_id, | ||
| agent_id=agent_id, | ||
| note="No task_execution on context — task_id will be None", | ||
| ) |
There was a problem hiding this comment.
Misleading event constant for debug log.
APPROVAL_GATE_CONTEXT_PARKED is logged before parking actually occurs. This log is meant to note a missing task_id, not a successful park. Using a "parked" event here is semantically incorrect and could confuse observability tooling.
Consider using a DEBUG-level log without the parked event constant, or introduce a separate event for this diagnostic note.
Suggested fix
else:
- logger.debug(
- APPROVAL_GATE_CONTEXT_PARKED,
+ logger.debug(
+ "approval_gate_context_park_no_task",
approval_id=escalation.approval_id,
agent_id=agent_id,
note="No task_execution on context — task_id will be None",
)Alternatively, define a new event constant like APPROVAL_GATE_CONTEXT_PARK_NO_TASK in events.approval_gate if this diagnostic is important for observability.
Based on learnings: "Event names must always use constants from domain-specific modules under ai_company.observability.events" — the current usage misapplies the PARKED event for a pre-park diagnostic.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/engine/loop_helpers.py` around lines 373 - 378, The debug log
currently emits the APPROVAL_GATE_CONTEXT_PARKED event while only noting a
missing task_id; change this to a semantically correct diagnostic: either remove
the parked event constant and call logger.debug with a plain message about the
missing task_execution/task_id, or add a new event constant (e.g.,
APPROVAL_GATE_CONTEXT_PARK_NO_TASK) to
ai_company.observability.events.approval_gate and use that here instead of
APPROVAL_GATE_CONTEXT_PARKED; update the logger.debug invocation in
loop_helpers.py (the call that passes approval_id, agent_id and note="No
task_execution on context — task_id will be None") to use the chosen option so
observability events remain accurate.
| risk_level=ApprovalRiskLevel( | ||
| result.metadata.get("risk_level", "high"), |
There was a problem hiding this comment.
Invalid risk_level metadata causes unhandled exception.
If result.metadata["risk_level"] contains an invalid value (not in ApprovalRiskLevel enum), ApprovalRiskLevel(...) raises ValueError. While the outer except Exception catches this, the error message won't clearly indicate a metadata validation issue.
Consider validating explicitly or using a safer fallback:
Safer risk_level handling
+ raw_level = result.metadata.get("risk_level", "high")
+ try:
+ risk = ApprovalRiskLevel(raw_level)
+ except ValueError:
+ risk = ApprovalRiskLevel.HIGH
+
self._pending_escalations.append(
_EscalationInfo(
approval_id=str(result.metadata["approval_id"]),
tool_call_id=tool_call.id,
tool_name=tool.name,
action_type=tool.action_type,
- risk_level=ApprovalRiskLevel(
- result.metadata.get("risk_level", "high"),
- ),
+ risk_level=risk,
reason="Agent requested human approval",
),
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/tools/invoker.py` around lines 633 - 634, The construction of
ApprovalRiskLevel from result.metadata can raise ValueError for invalid strings;
update the code that builds the object (the ApprovalRiskLevel(...) call using
result.metadata.get("risk_level", "high")) to validate or convert the metadata
value first: read risk_str = result.metadata.get("risk_level"), check if
risk_str is a valid member of ApprovalRiskLevel (or wrap
ApprovalRiskLevel(risk_str) in a try/except ValueError), and on failure log a
clear metadata validation warning and fall back to a safe default (e.g., "high")
before passing into ApprovalRiskLevel; ensure the change is applied where
ApprovalRiskLevel is instantiated so invalid metadata never raises an uncaught
ValueError.
| const originalIsAxiosError = (await import('axios')).default.isAxiosError | ||
| vi.spyOn((await import('axios')).default, 'isAxiosError').mockImplementation((err) => { | ||
| if (err === axiosError) return true | ||
| return originalIsAxiosError(err) | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify spy restoration hygiene in this test file (read-only).
rg -n "vi\\.spyOn\\(" web/src/__tests__/stores/approvals.test.ts -C2
rg -n "vi\\.(clearAllMocks|restoreAllMocks)" web/src/__tests__/stores/approvals.test.ts -C2Repository: Aureliolo/synthorg
Length of output: 1006
🏁 Script executed:
cat -n web/src/__tests__/stores/approvals.test.ts | head -100Repository: Aureliolo/synthorg
Length of output: 3921
🏁 Script executed:
cat -n web/src/__tests__/stores/approvals.test.ts | tail -100Repository: Aureliolo/synthorg
Length of output: 4523
🏁 Script executed:
# Check for afterEach in the entire test file
rg -n "afterEach|afterAll" web/src/__tests__/stores/approvals.test.ts -C2Repository: Aureliolo/synthorg
Length of output: 44
Add afterEach hook with vi.restoreAllMocks() and fix axios.isAxiosError spy at lines 316–319.
Line 42 calls vi.clearAllMocks(), which only clears call history—not spy implementations. The axios.isAxiosError spies can leak into later tests. Additionally, the spy at lines 316–319 returns false for unmatched errors instead of delegating to the original implementation, creating inconsistency with lines 286–290.
Two fixes required:
- Add
afterEachblock to restore all spies globally - Update lines 316–319 to preserve the original
isAxiosErrorimplementation
Suggested fixes
-import { describe, it, expect, beforeEach, vi } from 'vitest'
+import { describe, it, expect, beforeEach, afterEach, vi } from 'vitest'
...
beforeEach(() => {
setActivePinia(createPinia())
vi.clearAllMocks()
})
+
+ afterEach(() => {
+ vi.restoreAllMocks()
+ })At lines 316–319, update to match the pattern at lines 286–290:
+ const originalIsAxiosError = (await import('axios')).default.isAxiosError
vi.spyOn((await import('axios')).default, 'isAxiosError').mockImplementation((err) => {
if (err === axiosError) return true
- return false
+ return originalIsAxiosError(err)
})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/src/__tests__/stores/approvals.test.ts` around lines 286 - 290, Add an
afterEach hook that calls vi.restoreAllMocks() to fully restore spies between
tests (in addition to the existing vi.clearAllMocks()); also change the
axios.isAxiosError spy at the second location to capture the original
implementation (like the first instance) and delegate to it for non-matching
errors by storing (await import('axios')).default.isAxiosError in
originalIsAxiosError and calling originalIsAxiosError(err) when err !==
axiosError so the spy preserves the original behavior.
| case 'approval.approved': | ||
| case 'approval.rejected': | ||
| case 'approval.expired': | ||
| try { | ||
| const updated = await approvalsApi.getApproval(approvalId) | ||
| approvals.value = approvals.value.map((a) => | ||
| a.id === approvalId ? updated : a, | ||
| ) | ||
| } catch (err) { | ||
| if (axios.isAxiosError(err) && (err.response?.status === 404 || err.response?.status === 410)) { | ||
| // Item genuinely gone — remove from local list | ||
| const lengthBefore = approvals.value.length | ||
| approvals.value = approvals.value.filter((a) => a.id !== approvalId) | ||
| const removed = lengthBefore - approvals.value.length | ||
| if (removed > 0) { | ||
| total.value = Math.max(0, total.value - removed) | ||
| } | ||
| } else { | ||
| console.warn('Failed to fetch approval:', approvalId, err) | ||
| } | ||
| } | ||
| break |
There was a problem hiding this comment.
Status-change handler updates items regardless of current filters.
When an approval is approved/rejected/expired, the code updates the item in-place (line 101-103) even if the updated status no longer matches activeFilters. This can leave non-matching items visible until the next manual refresh.
Consider filtering out items that no longer match after status changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/src/stores/approvals.ts` around lines 96 - 117, The handler for
'approval.approved'/'approval.rejected'/'approval.expired' currently replaces
the item in approvals.value regardless of current activeFilters; instead, after
fetching the updated approval via approvalsApi.getApproval(approvalId) check
whether the updated item matches the current activeFilters (use whatever
predicate/function you use elsewhere to test filter membership) and if it does
replace the item in approvals.value, otherwise remove it from approvals.value
and decrement total.value accordingly; keep the existing 404/410 removal branch
as-is and perform this filter-check/replace-or-remove logic in the same switch
case that currently updates approvals.value.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/unit/engine/test_approval_gate.py (1)
294-330: 🧹 Nitpick | 🔵 TrivialParameterize message-variant assertions to reduce duplication.
These three tests are structurally identical and are a good fit for a single parameterized test.
As per coding guidelines, "Prefer
@pytest.mark.parametrizefor testing similar cases."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/engine/test_approval_gate.py` around lines 294 - 330, Replace the three nearly identical tests in TestBuildResumeMessage with a single `@pytest.mark.parametrize` test that iterates over cases for ApprovalGate.build_resume_message; for each tuple include (approved, decided_by, decision_reason, expected_substrings) and assert all expected substrings are in the returned msg (e.g., "APPROVED"/"REJECTED", approval ID, actor, optional reason, and markers like "USER-SUPPLIED REASON" or "untrusted data" when applicable). Keep the call to ApprovalGate.build_resume_message("approval-1", ...) and name the param set to make failures readable; remove the three individual test_* methods after adding the parameterized test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/unit/engine/test_approval_gate.py`:
- Around line 294-330: Replace the three nearly identical tests in
TestBuildResumeMessage with a single `@pytest.mark.parametrize` test that iterates
over cases for ApprovalGate.build_resume_message; for each tuple include
(approved, decided_by, decision_reason, expected_substrings) and assert all
expected substrings are in the returned msg (e.g., "APPROVED"/"REJECTED",
approval ID, actor, optional reason, and markers like "USER-SUPPLIED REASON" or
"untrusted data" when applicable). Keep the call to
ApprovalGate.build_resume_message("approval-1", ...) and name the param set to
make failures readable; remove the three individual test_* methods after adding
the parameterized test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e5d8def0-b19b-4e4e-b300-716b9404a82c
📒 Files selected for processing (1)
tests/unit/engine/test_approval_gate.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python version 3.14+ with PEP 649 native lazy annotations required.
Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax:except A, B:(no parentheses) — ruff enforces this on Python 3.14.
Line length must be 88 characters, enforced by ruff.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001. Tests must usetest-provider,test-small-001, etc.
Files:
tests/unit/engine/test_approval_gate.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Tests must use markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Test coverage minimum: 80% (enforced in CI).
Async tests: useasyncio_mode = "auto"— no manual@pytest.mark.asyncioneeded.
Test timeout: 30 seconds per test.
Prefer@pytest.mark.parametrizefor testing similar cases.
Files:
tests/unit/engine/test_approval_gate.py
🧠 Learnings (1)
📚 Learning: 2026-03-13T21:03:58.906Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-13T21:03:58.906Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases.
Applied to files:
tests/unit/engine/test_approval_gate.py
🧬 Code graph analysis (1)
tests/unit/engine/test_approval_gate.py (2)
src/ai_company/engine/approval_gate.py (5)
ApprovalGate(40-274)should_park(71-90)park_context(92-160)resume_context(162-239)build_resume_message(242-274)src/ai_company/security/timeout/park_service.py (3)
ParkService(29-144)park(37-106)resume(108-144)
🔇 Additional comments (5)
tests/unit/engine/test_approval_gate.py (5)
1-13: LGTM!Module setup correctly applies unit test markers and timeout. No
from __future__ import annotationsas required for Python 3.14+.
16-31: LGTM!Clean test factory with sensible defaults. The
noqa: PLR0913is appropriate for test helper functions.
34-55: LGTM!Well-designed fixtures that compose correctly. The
parked_mockfixture's dependency onpark_serviceensures consistent mock state across tests.
58-70: LGTM!Tests correctly verify the first-or-none semantics of
should_park. Using a realParkService()is acceptable here since the method doesn't interact with internal service state.
73-291: LGTM!Comprehensive test coverage for
park_contextandresume_contextflows. Good edge case handling:
- Serialization/persistence error propagation (lines 144-182)
- Parked record preservation on deserialization failure (lines 250-268)
- Context returned despite delete failure (lines 270-291)
There was a problem hiding this comment.
Pull request overview
Adds an approval-gating workflow to the execution engine so SecOps ESCALATE verdicts and the new request_human_approval tool can “park” an execution pending a human decision, with supporting API/WebSocket and dashboard updates.
Changes:
- Introduces
ApprovalGate,EscalationInfo/ResumePayload, and loop/helper integration to terminate runs asPARKEDwhen approval is required. - Adds
RequestHumanApprovalTooland extendsToolInvoker+ToolRegistryto track and surface pending escalations. - Hardens approvals API (auth-bound
requested_by, conflict-safe saves) and fixes frontend WS handling to useapproval_id+ re-fetch items.
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| web/src/stores/approvals.ts | Updates WS approval event handling to use approval_id and re-fetch full items. |
| web/src/tests/stores/approvals.test.ts | Expands WS event test coverage for new fetch-based behavior and error paths. |
| tests/unit/tools/test_invoker_escalation.py | Adds unit tests for ToolInvoker.pending_escalations (SecOps + tool metadata). |
| tests/unit/tools/test_approval_tool.py | Adds unit tests for RequestHumanApprovalTool behavior, validation, and failure handling. |
| tests/unit/observability/test_events.py | Updates event discovery/markers to include new approval_gate domain module. |
| tests/unit/engine/test_loop_helpers_approval.py | Tests loop helper parking behavior and error handling when escalations occur. |
| tests/unit/engine/test_approval_gate_models.py | Tests Pydantic models for escalation + resume payloads. |
| tests/unit/engine/test_approval_gate.py | Tests ApprovalGate park/resume behavior and resume message construction. |
| tests/unit/api/test_dto.py | Updates DTO tests for new action_type format validation and removed fields. |
| tests/unit/api/controllers/test_approvals.py | Updates controller tests for new create payload semantics. |
| src/ai_company/tools/registry.py | Adds all_tools() to enumerate tool instances deterministically. |
| src/ai_company/tools/invoker.py | Tracks escalation info from SecOps ESCALATE verdicts and tool parking metadata. |
| src/ai_company/tools/approval_tool.py | Implements RequestHumanApprovalTool that creates ApprovalItem and signals parking. |
| src/ai_company/tools/init.py | Exports RequestHumanApprovalTool. |
| src/ai_company/security/timeout/parked_context.py | Allows task_id to be None for taskless agents. |
| src/ai_company/security/timeout/park_service.py | Makes ParkService.park() accept optional task_id. |
| src/ai_company/observability/events/approval_gate.py | Adds structured event constants for approval gate lifecycle. |
| src/ai_company/engine/react_loop.py | Wires optional approval gate into ReAct loop tool-call execution path. |
| src/ai_company/engine/plan_execute_loop.py | Wires optional approval gate into Plan/Execute tool-call execution path. |
| src/ai_company/engine/loop_helpers.py | Adds escalation check after tool execution and parks context via approval gate. |
| src/ai_company/engine/approval_gate_models.py | Adds frozen models for escalation tracking and resume decision payloads. |
| src/ai_company/engine/approval_gate.py | Implements park/resume coordination and safe resume-message formatting. |
| src/ai_company/engine/agent_engine.py | Integrates approval gate + tool registry augmentation; extracts security factory. |
| src/ai_company/engine/_security_factory.py | Centralizes SecOps interceptor creation and registry augmentation with approval tool. |
| src/ai_company/engine/init.py | Exposes approval gate types from engine package. |
| src/ai_company/api/ws_models.py | Adds approval.resumed WS event type. |
| src/ai_company/api/state.py | Stores optional ApprovalGate on AppState. |
| src/ai_company/api/dto.py | Tightens DTO validation (action_type format, TTL bounds, comment max length). |
| src/ai_company/api/controllers/approvals.py | Binds requested_by to auth user; uses conflict-safe saves; attempts resume trigger. |
| src/ai_company/api/approval_store.py | Adds save_if_pending() and hardens on_expire callback error handling. |
| docs/design/engine.md | Documents parking behavior on escalations in engine run flow. |
| README.md | Updates status to reflect approval gate implementation. |
| CLAUDE.md | Updates repo module descriptions and event-constant guidance to include approval gate. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| vi.spyOn((await import('axios')).default, 'isAxiosError').mockImplementation((err) => { | ||
| if (err === axiosError) return true | ||
| return false | ||
| }) |
| try: | ||
| result = await approval_gate.resume_context(approval_id) | ||
| if result is None: | ||
| return | ||
|
|
||
| _ctx, parked_id = result | ||
| resume_message = ApprovalGate.build_resume_message( | ||
| approval_id, | ||
| approved=approved, | ||
| decided_by=decided_by, | ||
| decision_reason=decision_reason, | ||
| ) | ||
| logger.info( | ||
| APPROVAL_GATE_RESUME_TRIGGERED, | ||
| approval_id=approval_id, | ||
| parked_id=parked_id, | ||
| approved=approved, | ||
| decided_by=decided_by, | ||
| resume_message_length=len(resume_message), | ||
| ) |
| """Best-effort resume of a parked agent context after a decision. | ||
|
|
||
| If an ``ApprovalGate`` is configured, loads the parked context, | ||
| builds a resume message, and publishes a WebSocket event. | ||
| Failures are logged at WARNING and never propagate to the caller. | ||
|
|
| task_id: NotBlankStr | None = Field( | ||
| default=None, description="Task identifier (None for taskless agents)" | ||
| ) |
| const item = await approvalsApi.getApproval(approvalId) | ||
| if (activeFilters.value) { | ||
| // Filters are active — item may not match; just bump total | ||
| total.value++ | ||
| } else { | ||
| approvals.value = [item, ...approvals.value] | ||
| total.value++ | ||
| } |
| // Patch axios.isAxiosError to recognize our mock | ||
| const originalIsAxiosError = (await import('axios')).default.isAxiosError | ||
| vi.spyOn((await import('axios')).default, 'isAxiosError').mockImplementation((err) => { | ||
| if (err === axiosError) return true | ||
| return originalIsAxiosError(err) | ||
| }) |
- _trigger_resume no longer calls resume_context() which would delete the parked record before a scheduler can consume it — now only logs the resume trigger for scheduler observation - Remove dead APPROVAL_RESUMED WsEventType (never emitted) - Remove unused ApprovalGate import from controller
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
Python 3 except tuple syntax missing parentheses — SyntaxError across all new files
except MemoryError, RecursionError: is Python 2 syntax. In Python 3 the comma-separated form (without parentheses) is a SyntaxError; the tuple must be explicitly wrapped. Any module that contains this form will fail to import entirely.
The same bug appears throughout all the new files added in this PR:
src/ai_company/api/approval_store.py:172src/ai_company/api/controllers/approvals.py:100src/ai_company/engine/approval_gate.py:127, 149, 202, 216src/ai_company/engine/loop_helpers.py:69, 114, 186, 303, 387src/ai_company/tools/approval_tool.py:162, 238src/ai_company/tools/invoker.py:212, 295, 639
Note: the existing (pre-PR) code in invoker.py already uses the correct form except (MemoryError, RecursionError) as exc: at lines 453, 539, 575, and 698 — confirming this is an inconsistency introduced only in the new additions.
Fix every occurrence by wrapping the two types in parentheses:
| except MemoryError, RecursionError: | |
| raise | |
| except (MemoryError, RecursionError): | |
| raise |
The pattern to apply globally is:
# Wrong (Python 2)
except MemoryError, RecursionError:
# Correct (Python 3)
except (MemoryError, RecursionError):Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/api/approval_store.py
Line: 172-173
Comment:
**Python 3 `except` tuple syntax missing parentheses — SyntaxError across all new files**
`except MemoryError, RecursionError:` is Python 2 syntax. In Python 3 the comma-separated form (without parentheses) is a `SyntaxError`; the tuple must be explicitly wrapped. Any module that contains this form will fail to import entirely.
The same bug appears throughout all the new files added in this PR:
- `src/ai_company/api/approval_store.py:172`
- `src/ai_company/api/controllers/approvals.py:100`
- `src/ai_company/engine/approval_gate.py:127, 149, 202, 216`
- `src/ai_company/engine/loop_helpers.py:69, 114, 186, 303, 387`
- `src/ai_company/tools/approval_tool.py:162, 238`
- `src/ai_company/tools/invoker.py:212, 295, 639`
Note: the existing (pre-PR) code in `invoker.py` already uses the correct form `except (MemoryError, RecursionError) as exc:` at lines 453, 539, 575, and 698 — confirming this is an inconsistency introduced only in the new additions.
Fix every occurrence by wrapping the two types in parentheses:
```suggestion
except (MemoryError, RecursionError):
raise
```
The pattern to apply globally is:
```python
# Wrong (Python 2)
except MemoryError, RecursionError:
# Correct (Python 3)
except (MemoryError, RecursionError):
```
How can I resolve this? If you propose a fix, please make it concise.| else: | ||
| logger.debug( | ||
| APPROVAL_GATE_CONTEXT_PARKED, | ||
| approval_id=escalation.approval_id, | ||
| agent_id=agent_id, | ||
| note="No task_execution on context — task_id will be None", | ||
| ) |
There was a problem hiding this comment.
APPROVAL_GATE_CONTEXT_PARKED emitted before parking has occurred
This debug log fires when ctx.task_execution is None — i.e. before park_context() is even called. APPROVAL_GATE_CONTEXT_PARKED implies the context has been parked, so any log aggregation or alerting rule watching for approval_gate.context.parked will see a spurious event even if the subsequent park_context() call fails.
An informational constant such as APPROVAL_GATE_ESCALATION_DETECTED (or a dedicated "parking skipped task_id" diagnostic) is more accurate here. The successful-park event is already emitted inside ApprovalGate.park_context(), so this line also produces a duplicate on the happy path.
| else: | |
| logger.debug( | |
| APPROVAL_GATE_CONTEXT_PARKED, | |
| approval_id=escalation.approval_id, | |
| agent_id=agent_id, | |
| note="No task_execution on context — task_id will be None", | |
| ) | |
| logger.debug( | |
| APPROVAL_GATE_ESCALATION_DETECTED, | |
| approval_id=escalation.approval_id, | |
| agent_id=agent_id, | |
| note="No task_execution on context — task_id will be None", | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/loop_helpers.py
Line: 372-378
Comment:
**`APPROVAL_GATE_CONTEXT_PARKED` emitted before parking has occurred**
This debug log fires when `ctx.task_execution` is `None` — i.e. before `park_context()` is even called. `APPROVAL_GATE_CONTEXT_PARKED` implies the context *has* been parked, so any log aggregation or alerting rule watching for `approval_gate.context.parked` will see a spurious event even if the subsequent `park_context()` call fails.
An informational constant such as `APPROVAL_GATE_ESCALATION_DETECTED` (or a dedicated "parking skipped task_id" diagnostic) is more accurate here. The successful-park event is already emitted inside `ApprovalGate.park_context()`, so this line also produces a duplicate on the happy path.
```suggestion
logger.debug(
APPROVAL_GATE_ESCALATION_DETECTED,
approval_id=escalation.approval_id,
agent_id=agent_id,
note="No task_execution on context — task_id will be None",
)
```
How can I resolve this? If you propose a fix, please make it concise.| total.value++ | ||
| } else { | ||
| approvals.value = [item, ...approvals.value] | ||
| total.value++ | ||
| } | ||
| } catch (err) { | ||
| if (axios.isAxiosError(err) && (err.response?.status === 404 || err.response?.status === 410)) { | ||
| // Item genuinely gone — skip | ||
| } else { | ||
| console.warn('Failed to fetch approval:', approvalId, err) |
There was a problem hiding this comment.
total incremented unconditionally when filters are active, regardless of whether the new item matches them
When activeFilters.value is truthy, total.value++ is incremented for every approval.submitted event even though the newly submitted item may not satisfy the current filter (e.g. the user is filtering by status: 'rejected' while a new pending item arrives). This inflates the displayed count and can cause pagination to request pages that don't exist.
A stricter approach would be to skip the increment when the item's status/risk_level/action_type don't match the active filter criteria, or to do a quick attribute check on the already-fetched item object:
if (activeFilters.value) {
// Only bump total if the item matches the active filter constraints
const f = activeFilters.value
const matches =
(!f.status || item.status === f.status) &&
(!f.risk_level || item.risk_level === f.risk_level) &&
(!f.action_type || item.action_type === f.action_type)
if (matches) total.value++
} else {
approvals.value = [item, ...approvals.value]
total.value++
}Prompt To Fix With AI
This is a comment left during a code review.
Path: web/src/stores/approvals.ts
Line: 82-91
Comment:
**`total` incremented unconditionally when filters are active, regardless of whether the new item matches them**
When `activeFilters.value` is truthy, `total.value++` is incremented for every `approval.submitted` event even though the newly submitted item may not satisfy the current filter (e.g. the user is filtering by `status: 'rejected'` while a new `pending` item arrives). This inflates the displayed count and can cause pagination to request pages that don't exist.
A stricter approach would be to skip the increment when the item's status/risk_level/action_type don't match the active filter criteria, or to do a quick attribute check on the already-fetched `item` object:
```typescript
if (activeFilters.value) {
// Only bump total if the item matches the active filter constraints
const f = activeFilters.value
const matches =
(!f.status || item.status === f.status) &&
(!f.risk_level || item.risk_level === f.risk_level) &&
(!f.action_type || item.action_type === f.action_type)
if (matches) total.value++
} else {
approvals.value = [item, ...approvals.value]
total.value++
}
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
request_human_approvaltool calls into the engine execution loop, parking agent context when human approval is requiredRequestHumanApprovalTool: Agent-callable tool that createsApprovalItemin the approval store and signals execution parking via metadataApprovalGateservice: Coordinates context serialization (viaParkService), persistence (viaParkedContextRepository), and resume with decision message injectionReactLoopandPlanExecuteLoopcheck for escalations after tool execution and returnPARKEDtermination reasonToolInvokerescalation tracking: Detects ESCALATE verdicts and parking metadata, exposespending_escalationswith propertuple[EscalationInfo, ...]typingrequested_bybound to auth user (not request body), prompt injection mitigation in resume messages, broadened WebSocket error handlingapproval_idkey from backend payload and fetches full items via APICloses #258, closes #259
Review coverage
Pre-reviewed by 15 specialized agents (docs-consistency, code-reviewer, python-reviewer, test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, conventions-enforcer, security-reviewer, api-contract-drift, test-quality-reviewer, async-concurrency-reviewer, issue-resolution-verifier).
43 findings identified and addressed across Critical (14), Major (23), and Medium (6) severities. Key fixes:
tuple[Any, ...]→tuple[EscalationInfo, ...]Test plan
uv run ruff check src/ tests/— cleanuv run mypy src/ tests/— clean (926 files, strict)uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80— 7540 passed, 94.60% coverage