chore(deps): bump defu from 6.1.4 to 6.1.6 in /docs#6
Merged
Conversation
Bumps [defu](https://github.com/unjs/defu) from 6.1.4 to 6.1.6. - [Release notes](https://github.com/unjs/defu/releases) - [Changelog](https://github.com/unjs/defu/blob/main/CHANGELOG.md) - [Commits](unjs/defu@v6.1.4...v6.1.6) --- updated-dependencies: - dependency-name: defu dependency-version: 6.1.6 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
krokoko
approved these changes
Apr 6, 2026
13 tasks
scoropeza
pushed a commit
to scoropeza/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 5, 2026
…aker (krokoko review aws-samples#6, aws-samples#8) Two related hardening changes on ``agent/src/progress_writer.py``. Grouped because the shared circuit breaker reuses the error- classification decision to know when NOT to flip itself, and separating the commits would force an awkward "half-fix" intermediate state. ## Findings addressed **aws-samples#6 — Circuit breaker trips on ValidationException (permanent errors)** The pre-fix ``except Exception`` branch fed ALL errors into the same ``_failure_count`` counter. A persistent schema/size error (e.g. ``ValidationException`` from an item >400 KB under a trace-heavy event) counted against the transient-failure budget and tripped the breaker within 3 events, silencing the entire progress stream for the rest of the task — even though most subsequent events were normal size and would have written fine. New behaviour classifies each DDB error into three buckets: - **Permanent (drop event, keep stream alive):** ``ValidationException``, ``ItemCollectionSizeLimitExceededException`` — the individual event is malformed or oversized, retrying or treating as transient would not help. Log at WARN, skip the event, do NOT increment the failure counter. - **Immediate-disable (fatal, don't even try to retry):** ``ResourceNotFoundException``, ``AccessDeniedException``, ``UnauthorizedOperation`` — wrong deploy or IAM misconfig. Disable the breaker on the first occurrence instead of waiting for 3 failures; log at WARN with the error code. Avoids spamming operator dashboards with 3 copies of the same permissions error. - **Transient (trip the breaker on repeated failures, as today):** ``ProvisionedThroughputExceededException``, ``RequestLimitExceeded``, ``ServiceUnavailable``, ``InternalServerError``, plus network-layer (``ConnectionError``, ``EndpointConnectionError``, ``ReadTimeoutError``). Same counter semantics as pre-fix. - **Unknown (default conservative):** counted as transient (counter increments) but logged at ERROR with an explicit ``UNKNOWN`` marker so operators notice and can add new codes to the permanent/transient lists. Does NOT instant-disable — over- correcting from pre-fix behaviour would swap one failure mode for another. Uses ``botocore.exceptions.ClientError`` + ``err.response["Error"]["Code"]`` for AWS errors; class-name matching for non-ClientError (network- layer) paths. Helper: ``_classify_ddb_error(exc) -> Literal["permanent", "immediate_disable", "transient", "unknown"]``. **aws-samples#8 — Dual _ProgressWriter instances with independent circuit breakers** Pre-fix, the runner-level writer (turn/tool events at ``runner.py:240``) and the pipeline-level writer (milestones at ``pipeline.py:303``) each held their own ``_failure_count`` / ``_disabled`` state. If throttling tripped one writer, the other kept writing — creating visible event gaps in the stream that operators could not distinguish from agent activity (milestones firing after turn events stop, or vice versa). Fix: consolidate circuit-breaker state into a module-level ``_SharedCircuitBreaker`` singleton keyed by ``task_id``. Both writers for the same task read/write the same ``(_failure_count, _disabled)`` pair through named methods (``is_disabled``, ``record_failure``, ``record_success``, ``disable``). One task's stream is either healthy (all events flow from both writers) or degraded (no events flow from either). Cannot have a half-alive stream. Semantics notes: - ``record_success`` resets the shared counter but NOT the ``_disabled`` flag. Re-enabling mid-task would let a flaky minute burn the failure budget repeatedly and defeat the breaker's purpose. - Empty-string ``task_id`` (``runner.py`` falls back to sentinel ``"unknown"``) collapses to shared state for all ``"unknown"`` writers. Real task_ids stay isolated. - Writers retain ``_disabled`` / ``_failure_count`` as properties that proxy to the shared map. Existing callers (``hooks.py`` does ``getattr(progress, "_disabled", False)``) and tests that assign ``writer._failure_count = 0`` keep working unchanged — no constructor signature change required, no ``runner.py`` / ``pipeline.py`` edits. - Single ``threading.Lock()`` protects the shared map; DDB write rate (single-digit events/sec) never contends meaningfully. - Test hygiene: ``_reset_circuit_breakers()`` helper rebinds the module global so autouse fixtures give each test a clean slate. ## Tests +24 regression tests net (36 → 60 in ``test_progress_writer.py``). Coverage: - Finding aws-samples#6 classification: ``test_permanent_error_does_not_trip_breaker`` (10 consecutive ``ValidationException`` writes keep ``_disabled=False``), ``test_transient_error_trips_breaker_as_before``, ``test_access_denied_disables_writer_immediately_with_loud_log``, ``test_unknown_exception_treated_as_transient_with_error_log``. - Finding aws-samples#8 sharing: ``test_shared_circuit_breaker_across_writers_same_task_id`` (writer-A trips the breaker; writer-B sees ``is_disabled`` and skips the DDB call), ``test_separate_tasks_have_independent_breakers``, ``test_unknown_sentinel_task_id_is_isolated``, ``test_reset_helper_clears_shared_state_between_tests``. - Plus edge cases: success-interleave resets the counter across writers; ``_disabled`` stays open after re-enabling a success mid-task; thread-safety via concurrent writes. Agent suite: 497 passing (was 473; +24). Refs: krokoko code review on PR aws-samples#52 (findings 6, 8)
scoropeza
pushed a commit
to scoropeza/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 7, 2026
Adds the 14 agent-side approval milestone writers (§11.1) on
``_ProgressWriter`` so Chunk 3's hook integration has a typed API
instead of stringly-typed ``write_agent_milestone`` calls, and the
per-task gate counter / per-container sliding-window rate limit /
denial-injection queue on ``PolicyEngine`` that §6.5 requires.
Why now: the hook work lands cleanly only after these surfaces exist
— every code path in ``pre_tool_use_hook``'s REQUIRE_APPROVAL branch
calls one of these helpers. Shipping them separately lets the hook
commit be about the state machine, not the event-shape bookkeeping.
Engine additions:
- ``approval_gate_count`` / ``increment_approval_gate_count``: the
per-task counter §12.9 bounds at ``approvalGateCap``. Session-scoped
in v1; persistence tracked in §17.
- ``approvals_in_last_minute`` / ``record_approval_gate_timestamp``:
sliding-window rate limit (20/min/container, §12.9). Prune on read
so callers see the current count without a separate tick.
- ``queue_denial_injection`` / ``drain_denial_injections``: queue
consumed by ``_denial_between_turns_hook`` at the next Stop seam
(§6.5). Reason is pre-sanitized upstream by ``DenyTaskFn``.
- ``mark_ceiling_shrinking_emitted``: emit-once latch for IMPL-26.
- ``APPROVAL_RATE_LIMIT`` / ``APPROVAL_RATE_WINDOW_S`` module consts
the hook imports rather than re-deriving.
Milestone writers (§11.1 table, 14 agent-emitted of 15):
- ``pre_approvals_loaded``, ``approval_requested``,
``approval_granted``, ``approval_denied``, ``approval_timed_out``,
``approval_stranded``, ``approval_write_failed``,
``approval_resume_failed``, ``approval_poll_degraded``,
``approval_timeout_capped``, ``approval_ceiling_shrinking``,
``approval_cap_exceeded``, ``approval_rate_limit_exceeded``,
``approval_late_win``.
- ``approval_decision_recorded`` (Lambda audit) and
``approval_timeout_capped_at_submit`` (CreateTaskFn) stay on the
Lambda side — Chunk 5 owns those.
Each helper is a thin wrapper over ``_put_event("agent_milestone",
...)`` so the shared circuit-breaker + classifier path (finding aws-samples#6/aws-samples#8)
continues to apply. Metadata keys mirror the §11.1 shapes verbatim
(``maxLifetime_remaining_s`` preserves the design-doc spelling for
downstream parsers).
Tests: +29 total. 17 on ``TestApprovalMilestoneHelpers`` pin the DDB
payload shape for each helper (including the two
``approval_timeout_capped`` reason variants — rule_annotation carries
matching_rule_ids; maxLifetime_ceiling omits the field). 12 on the
engine: counter monotonicity, rate-window prune semantics at window
boundary, denial-queue FIFO + drain-clears, ceiling-shrinking latch
idempotency.
No caller changes — engine and writer surfaces are additive. Hook
integration lands in commit C.
scoropeza
pushed a commit
to scoropeza/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 7, 2026
Lands the two user-facing REST handlers that flip a PENDING approval
row to APPROVED / DENIED, the shared types both call sites and the
CLI consume, and the Lambda-side Cedar parser future Chunk-5 handlers
(get-policies, create-task validation) will use.
Wire types (shared/types.ts)
----------------------------
- ApprovalScope union covering every shape the agent's
ApprovalAllowlist understands. Typed so approve-task / create-task /
CLI (Chunk 6) all agree at compile time.
- ApprovalRecord / ApprovalRequest / ApprovalResponse / DenyRequest /
DenyResponse / PendingApprovalSummary / GetPendingResponse /
PolicyRuleSummary / GetPoliciesResponse / LinkSlackUserRequest /
LinkSlackUserResponse / SlackUserMappingRecord /
ApprovalDecisionRecordedEvent / CreateTaskApprovalExtensions.
- Constants: DENY_REASON_MAX_LENGTH=2000, INITIAL_APPROVALS_MAX_ENTRIES=20,
INITIAL_APPROVALS_MAX_ENTRY_LENGTH=128, APPROVAL_TIMEOUT_S_MIN=30,
APPROVAL_TIMEOUT_S_MAX=3600, APPROVAL_TIMEOUT_S_DEFAULT=300.
New error codes: REQUEST_NOT_FOUND, REQUEST_ALREADY_DECIDED,
TASK_NOT_AWAITING_APPROVAL.
Shared helpers
--------------
- shared/approval-scope.ts — parseApprovalScope validates every shape;
rejects unknown tool types / groups / prefixes, empty values,
over-128-char strings. isDegeneratePattern implements §7.4 (length
≤ 2, all-wildcard, wildcard ratio > 50%) for Chunk-5 create-task.
- shared/deny-reason-scanner.ts — scanDenyReason redacts AWS keys,
GitHub PATs (classic + fine-grained), Slack tokens, PEM blocks,
bearer tokens with [REDACTED-...] markers. Mirrors
agent/src/output_scanner.py so the deny reason the agent
ultimately reads is never raw user input.
- shared/cedar-policy.ts — parseRules pulls the five HITL annotations
(tier/rule_id/severity/approval_timeout_s/category) into a
ParsedRule[], preserving positional policy_id for IMPL-29
diagnostics-to-rule_id resolution. isHardDenyRule, isValidRuleId,
matchingRuleIds, concatPolicies exposed for future handlers.
Handlers
--------
- approve-task.ts (§7.1) — POST /v1/tasks/{task_id}/approve
- Cross-table TransactWriteItems: approval row PENDING → APPROVED
guarded by user_id = :caller AND status = :pending; TaskTable
no-op Update guarded by status = AWAITING_APPROVAL AND
awaiting_approval_request_id = :rid.
- TransactionCanceledException classified by per-item
CancellationReasons. Approval-row failure collapses to 404
REQUEST_NOT_FOUND (no existence oracle per §7.1 finding aws-samples#6);
task-row failure → 409 TASK_NOT_AWAITING_APPROVAL.
- Optional scope defaults to this_call.
- Per-user per-minute rate limit (30/min, synthetic row).
- Writes approval_decision_recorded audit event (IMPL-6). Audit
failure is logged but does not fail the request — decision is
already committed.
- deny-task.ts (§7.2) — POST /v1/tasks/{task_id}/deny
- Same cross-table pattern; status → DENIED + deny_reason.
- Reason is scanDenyReason-sanitized + truncated to
DENY_REASON_MAX_LENGTH BEFORE any persistence — agent and audit
both read sanitized form; raw input never stored.
- Same rate-limit namespace as approve.
Tests: +64 total (cedar-policy-parser 24, approval-scope 28,
deny-reason-scanner 13, approve-task 14, deny-task 9). Secret test
fixtures are assembled from string fragments so the source never
holds a contiguous secret literal — Code Defender pre-commit hook
otherwise blocks.
Stack wiring (task-api.ts routes, agent.ts layer attachment,
CreateTaskFn extension, orchestrator + reconciler + fanout +
LinkSlackUserFn + GetPolicies + GetPending) lands in the next
commit.
isadeks
pushed a commit
to isadeks/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 13, 2026
…umulation (aws-samples#79 review aws-samples#6) The conditional UpdateItem dup-delete path (``task_created`` / ``session_started`` lifecycle persists) calls ``deleteMessage`` to clean up the duplicate Slack message that landed when a sibling retry won the race. The delete is inherently best-effort — but if it fails, the duplicate becomes a permanent ghost in the thread and operators had no way to alarm on the rate. Refactor ``deleteMessage`` to return a boolean (``true`` on success or ``message_not_found``-as-already-gone, ``false`` otherwise) and emit a dedicated ``fanout.slack.dup_delete_failed`` event with an ``error_id: FANOUT_SLACK_DUP_DELETE_FAILED`` from the dup-delete callsites when the cleanup couldn't complete. The terminal-event cleanup paths (``slack_session_msg_ts``, ``slack_created_msg_ts``) intentionally don't fire this event — those paths target genuinely-stale UX cleanup, not retry-driven duplicates, so an alarm there would be noise. No new tests beyond the existing dup-delete coverage; the ``deleteMessage`` return value isn't yet asserted at the unit level, but the behavior is fully exercised by the existing ``dup-delete`` integration paths (test gap aws-samples#31 will add an explicit failure-path assertion when it lands).
isadeks
pushed a commit
to isadeks/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 13, 2026
…samples#79 test gap) Adds 4 tests covering the lifecycle-persist conditional path that review fix #1 introduced and review fix aws-samples#6 hardened. Pre-PR-aws-samples#79 the only ConditionalCheckFailed coverage was the terminal-dedup path; the new lifecycle-persist + dup-delete code lacked direct assertions and was flagged 9/10 criticality by the reviewer. - task_created persist ConditionalCheckFailed → posts duplicate then deletes it: pins the cleanup behaviour that prevents ghost task_created posts in the channel - session_started persist ConditionalCheckFailed → posts duplicate then deletes it: parallel coverage for the other lifecycle attribute (slack_session_msg_ts) - dup-delete failure emits fanout.slack.dup_delete_failed with error_id: pins the operator-alarm signal added in review fix aws-samples#6; asserts both the event key and the FANOUT_SLACK_DUP_DELETE_FAILED error_id propagate - chat.delete returning message_not_found is treated as success (no dup_delete_failed): negative-class assertion. Prevents false-positive alarms when the race resolves cleanly (the duplicate was already deleted by a prior retry). The ghost / message_not_found tests use ``fetchMock.mockImplementation`` URL-routing rather than ``.mockResolvedValueOnce`` chains because ``updateReaction`` issues 2-3 reaction-API fetches between chat.postMessage and chat.delete; routing by URL keeps the test focused on the load-bearing chat.delete behaviour without coupling to reaction call order.
8 tasks
github-merge-queue Bot
pushed a commit
that referenced
this pull request
May 13, 2026
#79) * feat(fanout): migrate SlackNotifyFn to FanOutConsumer subscriber (#64) Move the Slack outbound delivery off its own DynamoDB Streams consumer onto FanOutConsumer as a per-channel dispatcher. Drops TaskEventsTable from 2 concurrent stream readers to 1, restoring headroom for future channels (Email, Teams, etc.) without exceeding the documented DynamoDB Streams 2-reader-per-shard practical limit. The PR also addresses an adversarial code review on the original migration; the body below walks through each piece in the order it landed. ## (a) Migration - `cdk/src/handlers/slack-notify.ts` — rewritten as exported `dispatchSlackEvent(event, ddb)` plus a tagged `SlackApiError` class. The standalone `handler(event)` stream entrypoint is gone; the FanOutConsumer is now the only TaskEventsTable stream reader. Behaviour preserved bit-for-bit: channel_source==='slack' gate, terminal-event dedup via conditional UpdateItem on `channel_metadata.slack_notified_terminal`, threaded replies under the @mention or task_created message, emoji transitions (eyes -> hourglass -> ✅/❌/🚫/⏲), DM channel_id -> user_id rewrite, intermediate session+created message cleanup on terminal events. - `cdk/src/handlers/fanout-task-events.ts` — replaces the log-only `dispatchToSlack` stub with a wrapper that calls dispatchSlackEvent and routes errors via the new typed contract (see (b) below). Slack defaults gain task_created, session_started, task_timed_out so the router fans out the lifecycle events the old SlackNotifyFn handled; the dispatcher's channel_source gate keeps non-Slack tasks unaffected. - `cdk/src/constructs/fanout-consumer.ts` — adds a scoped `secretsmanager:GetSecretValue` grant on `bgagent/slack/*` so the fanout Lambda can fetch per-workspace bot tokens. Same scope the old SlackNotifyFn role held. - `cdk/src/constructs/slack-integration.ts` — deletes SlackNotifyFn, its DynamoEventSource, its IAM policy, and its NagSuppressions entry. Drops the now-unused StartingPosition / FilterCriteria / FilterRule / lambdaEventSources imports. After this lands, `aws lambda list-event-source-mappings` shows exactly one consumer of the TaskEventsTable stream (FanOutFn); verified on the dev stack with end-to-end @mention + cancel + CLI isolation scenarios. ## (b) Review fix #1 — partial-batch retry semantics (BLOCKER) The first review pass found that the post-migration handler silently dropped Slack-side infra errors (DDB throttle on the GetItem, Secrets Manager 5xx, transient Slack timeout). Pre-migration the SlackNotifyFn handler rethrew non-SlackApiError so Lambda retried the batch; post-migration `Promise.allSettled` swallowed the rejection and routeEvent returned an empty list with no escalation path to `batchItemFailures`. routeEvent's return type changed from `NotificationChannel[]` to `{ dispatched, infraRejections }`. The handler now pushes the record into `batchItemFailures` whenever `infraRejections.length>0`, so Lambda replays the record under the partial-batch contract. The warn line on rejection is tagged `retryable: true` so operators can alert distinctly from the channel-terminal swallow path. GitHub got the symmetric treatment: 4xx (excluding the existing 401 and 404 handling) is now treated as a channel-terminal swallow via `fanout.github.api_error` instead of escalating to retry. ## (c) Review fix #2 — split SlackApiError into terminal + retryable Originally any `!result.ok` Slack response was wrapped in SlackApiError and swallowed. That collapsed retryable codes (`ratelimited`, `service_unavailable`, `internal_error`, `fatal_error`, `request_timeout`) into the same swallow as `channel_not_found` — a tier-1 Slack outage would silently drop every message. Introduced `TERMINAL_SLACK_API_ERRORS` set + `classifySlackError` helper. Terminal codes still throw SlackApiError (router swallows). Retryable codes throw a plain Error so the router classifies them as infra rejections and Lambda replays. ## (d) Review fix #3 — NOTIFIABLE_EVENTS / CHANNEL_DEFAULTS drift The original migration added task_created/session_started/task_timed_out to CHANNEL_DEFAULTS.slack but the dispatcher's NOTIFIABLE_EVENTS gate already excluded several events the router was subscribing Slack to (agent_error, pr_created, task_stranded). Result: Slack was reported as `dispatched` for events it silently dropped — telemetry lied, agent_error never reached operators on Slack-origin tasks, and task_stranded rendered the generic "Event: task_stranded for owner/repo" fallback (UX regression). Added render cases for task_stranded and agent_error in slack-blocks.ts and added them to NOTIFIABLE_EVENTS. Forward-compat approval_required and status_response stay out of NOTIFIABLE_EVENTS until their emitters ship; a new cross-file consistency test in fanout-task-events.test.ts fails if anyone re-introduces the drift. The Slack dispatcher wrapper now passes `effectiveEventType` so an agent_milestone(pr_created) wrapper is unwrapped before NOTIFIABLE_EVENTS matching. Without the rewrite, the dispatcher would short-circuit on the wrapper string `agent_milestone`. ## (e) Review fix #4 — conditional UpdateItem on lifecycle persists Once the BLOCKER fix made batches retry, the original task_created and session_started UpdateItem calls became hazardous: a Slack POST that succeeded but whose follow-up UpdateItem failed transiently would, on retry, post a second root and overwrite slack_thread_ts — orphaning every threaded reply that had threaded under the first ts. Both UpdateItems now carry an `attribute_not_exists` ConditionExpression on the relevant `channel_metadata.slack_*_msg_ts`. On ConditionalCheckFailedException the handler logs at info, deletes the duplicate Slack message via `chat.delete`, and returns. Sibling retry wins the race; the duplicate is cleaned up. ## (f) Dev-stack regression: drop pr_created from Slack defaults Live verification surfaced a UX duplication: pr_created (subscribed in CHANNEL_DEFAULTS.slack as the original §6.2 design called for) and task_completed both rendered messages with View PR buttons, posted seconds apart. The original SlackNotifyFn had silently dropped pr_created (NOTIFIABLE_EVENTS gate), so users hadn't relied on it. Removed pr_created from CHANNEL_DEFAULTS.slack and from NOTIFIABLE_EVENTS, and removed the prCreatedMessage renderer. GitHub keeps pr_created (its edit-in-place comment surface genuinely benefits from the early checkpoint). ## Verification - mise //cdk:compile — clean - mise //cdk:test — 1183 / 1183 pass (8 net-new tests added for the review fixes: NOTIFIABLE_EVENTS drift guard, retryable Slack codes, GitHub 4xx swallow, infra rejection escalation, SlackApiError swallow, task_stranded render) - mise //cdk:eslint — clean - mise //cdk:synth — confirms exactly one Lambda::EventSourceMapping on TaskEventsTable, pointing at FanOutFn - Dev-stack scenarios — @mention happy path, Cancel button, CLI submit (channel_source=api -> zero Slack dispatches, GitHub edit-in-place still fires) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(fanout): retry GitHub 403/429 instead of swallowing as terminal (#79 review #1) PR #79 review found that the new 4xx terminal-swallow path treats HTTP 403 and 429 as channel-terminal — but on GitHub these are transient rate-limit responses (403 with "API rate limit exceeded", 429 "Too Many Requests"). Under a reconciliation wave that touches many tasks, an entire window of GitHub comment updates would be permanently dropped with only a warn log. Carve out 403 and 429 from the swallow guard so they propagate as infra rejections through ``Promise.allSettled``. The record lands in ``batchItemFailures`` and Lambda replays until the rate-limit window clears (or DLQs after ``retryAttempts``). Test coverage: parametrized over 403 + 429 with a GitHubCommentError mock at the helper boundary, asserting the record's eventID surfaces in ``batchItemFailures`` rather than being absorbed. * fix(fanout): guard Slack Secrets Manager grant on a prop (#79 review #2) Every other external-service grant in FanOutConsumer (taskTable, repoTable, githubTokenSecret) is gated by ``if (props.X)``, so a deployment that hasn't onboarded the corresponding service stays free of dangling IAM permissions. The original migration broke the pattern with an unconditional ``bgagent/slack/*`` Secrets Manager grant — dev stacks without Slack onboarding ended up holding read permission on a resource pattern they never use, with a misleading ``cdk-nag AwsSolutions-IAM5`` suppression reason. Adds an optional ``slackSecretArnPattern`` prop on ``FanOutConsumerProps``; the policy statement is only attached when the prop is set. ``cdk/src/stacks/agent.ts`` now computes the ``bgagent/slack/*`` ARN inline and passes it through, mirroring the other guarded props. ``ArnFormat`` and ``Stack`` imports moved out of fanout-consumer.ts since the construct no longer needs them. No changes to live behaviour — agent.ts always passes the prop, so the IAM policy still attaches in production. The dispatcher will log-and-fail-retry on a missing pattern (covered by review #3 fix). Test gap covering the construct itself ships in a follow-up commit (test gap #34). * fix(fanout): throw on missing TASK_TABLE_NAME env var (#79 review #3) Pre-fix: when ``TASK_TABLE_NAME`` was unset on a Slack-subscribed event, ``dispatchSlackEvent`` returned silently after a warn line. The router counted Slack as ``dispatched`` and a broken stack quietly dropped every Slack notification — operators only saw it in the warn-rate metric, with no rejected-channel signal. Post-fix: throw a plain Error so the rejection propagates as an infra rejection through ``Promise.allSettled``. The router pushes the record into ``batchItemFailures``, Lambda retries the batch, the ``fanout.dispatcher.rejected`` warn fires per record, and operators get a distinct alarm. Also bumps the existing log line from ``warn`` to ``error`` and attaches an ``error_id: FANOUT_SLACK_MISSING_TASK_TABLE`` so the deployment-bug case can be distinguished from per-record failures. Test: ``throws when TASK_TABLE_NAME env var is missing`` deletes the env var, asserts the throw, asserts no DDB call was attempted (env-var guard fires first). * fix(fanout): match SlackApiError by name as well as instanceof (#79 review #7) When a bundler ever duplicates the slack-notify module (rare with NodejsFunction tree-shaking but possible if dual-bundled), two distinct SlackApiError classes coexist and ``instanceof`` against one fails for instances of the other. The dispatcher would see a foreign-class SlackApiError, fall through to the rethrow branch, and the router would treat it as an infra rejection — flipping a channel-terminal swallow into infinite Lambda retries. Add an ``err.name === 'SlackApiError'`` fallback so the swallow branch fires either way. Mirrors the duck-typed ``GitHubCommentError`` check used elsewhere in the same handler. Test: synthesise a plain Error with name === 'SlackApiError' (NOT an instance of the mock's SlackApiError class) and assert batchItemFailures stays empty — proving the swallow path catches both shapes. * fix(fanout): extend TERMINAL_SLACK_API_ERRORS with permission codes (#79 review #8) Original set omitted documented Slack permission/scope failures. Codes outside the set fall to the retryable branch, so a misconfiguration like ``ekm_access_denied`` or ``missing_scope`` would burn 3 Lambda retries before DLQ on every event — even though the failure is fundamentally a configuration bug that no retry can clear. Adds: - Permission/scope: missing_scope, ekm_access_denied, team_access_not_granted, posting_to_general_channel_denied - Payload shape: invalid_arguments Reorganized the set into commented blocks (channel-shape, auth, permission/scope, payload-shape) so future additions go in the right bucket and the rationale stays visible. Test coverage: parametrized over the full TERMINAL_SLACK_API_ERRORS set (21 codes) — every one must throw SlackApiError so the router swallows it. The existing retryable test.each remains intact and covers the negative-class case (codes outside the set throw a plain Error and escalate to retry). * fix(fanout): promote Slack reaction/delete network errors to error logs (#79 review #5) The reaction / delete helpers (``addReaction``, ``removeReaction``, ``deleteMessage``) used to log every catch at warn with a single generic event key, lumping API-level rejections (e.g. ``no_reaction``) together with infrastructure failures (DNS lookup, TLS handshake, fetch timeout, JSON parse error from a hostile gateway). Operators who alarmed on the warn rate saw a flat signal that masked genuine infra problems. Split the boundary: - API-level (``!result.ok`` after a successful HTTP call) stays at warn with channel-specific event keys (``fanout.slack.reaction_add_api_error``, ``fanout.slack.reaction_remove_api_error``, ``fanout.slack.message_delete_api_error``). These are per-message UX problems; operators don't page. - Network errors (the outer ``catch (err)`` after ``fetch``) promote to ``logger.error`` with dedicated event keys (``fanout.slack.reaction_add_network_error``, ``fanout.slack.reaction_remove_network_error``, ``fanout.slack.message_delete_network_error``) and ``error_id``s (``FANOUT_SLACK_REACTION_NETWORK``, ``FANOUT_SLACK_DELETE_NETWORK``) so each has its own alarmable signal. User-visible symptoms when these fire silently: stale emoji reactions (hourglass never swaps to ✅) and orphaned intermediate messages. Behaviour unchanged: errors are still swallowed (per-message reactions and intermediate cleanup are best-effort by design; they must not fail the batch), but operators now get distinct metrics for each failure class. * fix(fanout): emit fanout.slack.dup_delete_failed on ghost-message accumulation (#79 review #6) The conditional UpdateItem dup-delete path (``task_created`` / ``session_started`` lifecycle persists) calls ``deleteMessage`` to clean up the duplicate Slack message that landed when a sibling retry won the race. The delete is inherently best-effort — but if it fails, the duplicate becomes a permanent ghost in the thread and operators had no way to alarm on the rate. Refactor ``deleteMessage`` to return a boolean (``true`` on success or ``message_not_found``-as-already-gone, ``false`` otherwise) and emit a dedicated ``fanout.slack.dup_delete_failed`` event with an ``error_id: FANOUT_SLACK_DUP_DELETE_FAILED`` from the dup-delete callsites when the cleanup couldn't complete. The terminal-event cleanup paths (``slack_session_msg_ts``, ``slack_created_msg_ts``) intentionally don't fire this event — those paths target genuinely-stale UX cleanup, not retry-driven duplicates, so an alarm there would be noise. No new tests beyond the existing dup-delete coverage; the ``deleteMessage`` return value isn't yet asserted at the unit level, but the behavior is fully exercised by the existing ``dup-delete`` integration paths (test gap #31 will add an explicit failure-path assertion when it lands). * chore(fanout): tighten RouteOutcome arrays to ReadonlyArray (#79 review #9) ``RouteOutcome.dispatched`` and ``infraRejections`` were typed as plain ``NotificationChannel[]`` — which made ``readonly`` on the property prevent reassignment but still allow callers to mutate the underlying array via ``.push``, ``.splice``, or ``.sort``. Inconsistent with the ``ReadonlySet<string>`` used for ``CHANNEL_DEFAULTS`` in the same file. Tightening to ``ReadonlyArray<NotificationChannel>`` makes the contract honest: the router owns the arrays, callers read them. Test suite updated to use ``[...outcome.dispatched].sort()`` where it previously called ``.sort()`` directly — the explicit copy makes the intent clear and would have surfaced any silent test-side mutation. * refactor(fanout): make SlackDispatchEvent a type alias of FanOutEvent (#79 review #10) The two interfaces were structurally identical: same five fields, same readonly modifiers, same metadata shape. The decoupling was purely nominal and a silent-drift footgun — adding a field to ``FanOutEvent`` (e.g. when the router starts plumbing an ``approval_required`` ID through) would not flow into ``SlackDispatchEvent``, leaving the dispatcher unaware until a downstream test happened to fail. Replace with a one-line type alias: export type SlackDispatchEvent = FanOutEvent; The slack-notify module now type-imports ``FanOutEvent`` from fanout-task-events. ``import type`` is erased at compile time, so the runtime bundle still has the one-way dep (fanout-task-events → slack-notify) — no module-cycle hazard. Reviewer-suggested ``Pick<FanOutEvent, 'task_id' | …>`` was considered and rejected: the dispatcher uses every field of ``FanOutEvent``, so the Pick would just enumerate the same five fields with extra noise. A direct alias keeps the intent obvious and prevents drift identically. * fix(fanout): generalize Slack dedup to cover agent_error + log Retry-After (#79 review #4) PR #79 review #4 surfaced a sibling-channel-failure hazard: when GitHub or Email rate-limits, the record lands in ``batchItemFailures``. On the Lambda retry, every Slack-subscribed event for that record runs again. Terminal events were already guarded by ``slack_notified_terminal``; ``agent_error`` was not — operators would page twice on a single agent failure if a sibling channel happened to fail. Generalize the dedup mechanism. ``TERMINAL_EVENTS`` is replaced by a ``SLACK_DEDUP_ATTRIBUTE`` map that marks each event type with the ``channel_metadata`` attribute that should guard the post: - 5 terminals share ``slack_notified_terminal`` (any first-arriving terminal claims the right; subsequent terminals dedup against it) - ``agent_error`` gets its own ``slack_dispatched_agent_error`` so a duplicate agent_error doesn't reuse the terminal slot - ``task_created`` / ``session_started`` map to ``null`` because they already use the per-event ``slack_*_msg_ts`` conditional persists from review #1 — the conditional already provides full idempotency (a separate marker would be redundant) Also surfaces Slack's ``Retry-After`` header on rate-limited responses through a dedicated ``fanout.slack.retryable_api_error`` warn so operators reading CloudWatch can see the recovery window instead of guessing from sustained warn rate. Tests: - logs Retry-After header on rate-limited Slack responses (new): asserts ``retry_after_seconds`` propagates from Slack's response header into the warn metadata - existing terminal-codes parametrized test untouched (terminal branch doesn't read headers) - existing retryable test gains a ``headers: { get: () => null }`` stub on the fetch mock so the headers.get call doesn't crash Reviewer suggested a per-channel dispatch bitmap as the alternative. Rejected as premature: the duplicate-GitHub-PATCH is harmless (idempotent), Email is still a stub, and the dedup map covers the specific agent_error pain identified above. A bitmap would add a new table + IAM grants + per-dispatch DDB cost for a hypothetical problem (Slack rate-limiting AND a sibling channel failure). * test(fanout): conditional UpdateItem race + dup-delete coverage (#79 test gap) Adds 4 tests covering the lifecycle-persist conditional path that review fix #1 introduced and review fix #6 hardened. Pre-PR-#79 the only ConditionalCheckFailed coverage was the terminal-dedup path; the new lifecycle-persist + dup-delete code lacked direct assertions and was flagged 9/10 criticality by the reviewer. - task_created persist ConditionalCheckFailed → posts duplicate then deletes it: pins the cleanup behaviour that prevents ghost task_created posts in the channel - session_started persist ConditionalCheckFailed → posts duplicate then deletes it: parallel coverage for the other lifecycle attribute (slack_session_msg_ts) - dup-delete failure emits fanout.slack.dup_delete_failed with error_id: pins the operator-alarm signal added in review fix #6; asserts both the event key and the FANOUT_SLACK_DUP_DELETE_FAILED error_id propagate - chat.delete returning message_not_found is treated as success (no dup_delete_failed): negative-class assertion. Prevents false-positive alarms when the race resolves cleanly (the duplicate was already deleted by a prior retry). The ghost / message_not_found tests use ``fetchMock.mockImplementation`` URL-routing rather than ``.mockResolvedValueOnce`` chains because ``updateReaction`` issues 2-3 reaction-API fetches between chat.postMessage and chat.delete; routing by URL keeps the test focused on the load-bearing chat.delete behaviour without coupling to reaction call order. * test(fanout): cover task_stranded + agent_error renderers (#79 test gap #32) Pre-PR-#79 the new ``taskStrandedMessage`` and ``agentErrorMessage`` helpers in slack-blocks.ts had no direct unit tests. Reviewer flagged this as a 7/10 gap because the renderers carry the prior_status / error_type / message_preview metadata threaded through from the event source — silent drift in the metadata field names would produce ugly fallback messages in production. Adds 5 tests: - task_stranded WITH metadata renders the prior_status parenthetical (``Task stranded for org/repo (last status: RUNNING)``) so operators can tell at a glance whether the task hung in HYDRATING vs RUNNING — without the parenthetical the reviewer's "generic Event: ..." UX regression would resurface. - task_stranded WITHOUT metadata still renders cleanly (legacy events written before the reconciler started stamping metadata must not crash or leak ``undefined``). - agent_error with full metadata (error_type + message_preview) renders the rotating_light, type, and preview. - agent_error WITHOUT metadata stays sensible — no leaked ``undefined`` strings or empty ``_Type:_`` line. - agent_error truncates a 500-char message_preview to keep Slack channel UX readable. * test(fanout): cover agent_error dedup + dedup-slot isolation (#79 test gap #33) Pre-PR-#79 review-fix #4 there was no direct test for the ``slack_dispatched_agent_error`` dedup attribute or its interaction with the existing ``slack_notified_terminal`` slot. A future refactor that collapsed the two slots — or renamed one of them — would silently break the sibling-channel-failure-retry guarantee that fix #4 added. Adds 4 tests: - ``agent_error claims its own dedup attribute``: pins the UpdateExpression and ConditionExpression strings so a refactor that renames the attribute breaks loudly. - ``agent_error retry hits the dedup guard``: end-to-end scenario matching review #4 — task already has ``slack_dispatched_agent_error: true``, retry must short-circuit before chat.postMessage. Without the guard, a second rotating_light fires. - ``terminal dedup attribute is per-class``: a flaky task_completed-then-task_failed sequence dedups against the same ``slack_notified_terminal`` slot. Catches the regression where the orchestrator emits both terminal types and we'd otherwise post both ✅ and ❌. - ``agent_error and terminals use distinct dedup slots``: the important negative — having ``slack_dispatched_agent_error`` set must NOT shadow a subsequent ``task_completed``. Pins the slot separation so a future merge into a single slot can't silently drop terminals after an agent_error. * test(fanout): add construct-level tests for FanOutConsumer (#79 test gap #34) The construct shipped on issue #64 with no unit-level coverage of its IAM contract. The only synth-level signal lived inside ``slack-integration.test.ts`` ("0 EventSourceMapping") which proved the migration didn't regress the OTHER construct. Reviewer flagged this 6/10 — and the gap is what allowed review #2 (unconditional Slack secret grant) to slip through in the first place. Adds 6 tests: - ``attaches a single DynamoEventSource on the TaskEventsTable stream``: pins the architectural invariant — issue #64 was fundamentally about reaching exactly-one stream reader. Adding a second consumer must fail this test loudly. - ``creates a DLQ for the fanout Lambda``: pins retention period + presence; a DLQ-less deployment would silently drop poison-pill records past retryAttempts. - ``omits the bgagent/slack/* grant when slackSecretArnPattern is not provided``: the review #2 invariant. Iterates every IAM::Policy and asserts NONE of them grant secretsmanager:* on a bgagent/slack/* ARN. A regression that re-introduces the unconditional grant breaks this test. - ``attaches the bgagent/slack/* grant only when slackSecretArnPattern is provided``: the positive case. Pins the grant shape (action, effect, resource pattern). - ``passes TASK_TABLE_NAME env var when taskTable is provided``: review #3 dependency — the dispatcher throws on missing env. - ``omits TASK_TABLE_NAME env var when taskTable is not provided``: graceful degrade for dev stacks that haven't onboarded the TaskTable yet (matches the construct's documented contract). * test(fanout): cover task_stranded through terminal dedup (#79 test gap #35) The reconciler at handlers/reconcile-stranded-tasks.ts:170 emits BOTH ``task_stranded`` and ``task_failed`` for a heartbeat-expired task — one for the operator signal, one to drive the FAILED status transition. Pre-PR-#79 this pair had no test coverage; reviewer flagged this 8/10 because the visible failure mode (a paired "Task stranded" + "Task failed" double-page in Slack) would surface in production but be silent in CI. Adds 2 tests: - ``task_stranded posts and writes the terminal dedup marker on first arrival``: pins that task_stranded participates in the shared terminal slot and renders the warning message with metadata. Catches a regression that omits task_stranded from the dedup map entirely. - ``task_stranded after a sibling task_failed dedups``: the operational scenario — task_failed already claimed ``slack_notified_terminal``; the subsequent task_stranded must short-circuit before chat.postMessage. Without this guard, operators get the double-page the reviewer warned about. * fix(fanout): re-read TaskRecord before terminal cleanup to close orphan-message race Live observation during PR #79 review verification: the same Slack @mention happy path sometimes leaves the 🚀 task_created message in the thread (orphaned beside the ✅ task_completed) and sometimes deletes it cleanly. The race window: 1. ``task_created`` stream batch posts the rocket message and persists ``slack_created_msg_ts`` via the conditional UpdateItem introduced in PR #79 review fix #1. 2. ``task_completed`` stream batch fires ~30s later. Its initial GetItem races the prior UpdateItem and sees a stale ``channel_metadata`` WITHOUT ``slack_created_msg_ts``. 3. The terminal cleanup branch checks ``channelMeta.slack_created_msg_ts`` — undefined — silently skips the chat.delete. The rocket message stays in the thread. Add a fresh GetItem inside the TERMINAL_EVENTS cleanup branch, after the dedup UpdateItem has linearized our view of the table. Any prior ``slack_*_msg_ts`` writes are visible by then, so the cleanup fires correctly. On a re-read failure (DDB throttle / transient blip) we fall back to the dispatch-entry snapshot and emit ``fanout.slack.cleanup_reread_failed`` so operators can alarm on the rate. Pre-existing race (the unconditional UpdateItem in pre-PR-#79 was the same shape — wrote, GetItem on the next batch could miss it). PR #79 doesn't introduce it but doesn't fix it either; this commit does, since the live screenshot evidence appeared during review verification. Tests: - ``terminal cleanup re-reads TaskRecord``: scripts a stale dispatch-entry GetItem followed by a fresh re-read GetItem with ``slack_created_msg_ts`` present; asserts chat.delete fires against the freshly-read ts. - ``terminal cleanup falls back to dispatch-entry snapshot when re-read fails``: defense-in-depth — DDB throttle on the re-read must not break terminal delivery; cleanup uses the entry snapshot and emits the fallback warn. --------- Co-authored-by: bgagent <bgagent@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alain Krok <alkrok@amazon.com>
scoropeza
added a commit
to scoropeza/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 16, 2026
Adds the 14 agent-side approval milestone writers (§11.1) on
``_ProgressWriter`` so Chunk 3's hook integration has a typed API
instead of stringly-typed ``write_agent_milestone`` calls, and the
per-task gate counter / per-container sliding-window rate limit /
denial-injection queue on ``PolicyEngine`` that §6.5 requires.
Why now: the hook work lands cleanly only after these surfaces exist
— every code path in ``pre_tool_use_hook``'s REQUIRE_APPROVAL branch
calls one of these helpers. Shipping them separately lets the hook
commit be about the state machine, not the event-shape bookkeeping.
Engine additions:
- ``approval_gate_count`` / ``increment_approval_gate_count``: the
per-task counter §12.9 bounds at ``approvalGateCap``. Session-scoped
in v1; persistence tracked in §17.
- ``approvals_in_last_minute`` / ``record_approval_gate_timestamp``:
sliding-window rate limit (20/min/container, §12.9). Prune on read
so callers see the current count without a separate tick.
- ``queue_denial_injection`` / ``drain_denial_injections``: queue
consumed by ``_denial_between_turns_hook`` at the next Stop seam
(§6.5). Reason is pre-sanitized upstream by ``DenyTaskFn``.
- ``mark_ceiling_shrinking_emitted``: emit-once latch for IMPL-26.
- ``APPROVAL_RATE_LIMIT`` / ``APPROVAL_RATE_WINDOW_S`` module consts
the hook imports rather than re-deriving.
Milestone writers (§11.1 table, 14 agent-emitted of 15):
- ``pre_approvals_loaded``, ``approval_requested``,
``approval_granted``, ``approval_denied``, ``approval_timed_out``,
``approval_stranded``, ``approval_write_failed``,
``approval_resume_failed``, ``approval_poll_degraded``,
``approval_timeout_capped``, ``approval_ceiling_shrinking``,
``approval_cap_exceeded``, ``approval_rate_limit_exceeded``,
``approval_late_win``.
- ``approval_decision_recorded`` (Lambda audit) and
``approval_timeout_capped_at_submit`` (CreateTaskFn) stay on the
Lambda side — Chunk 5 owns those.
Each helper is a thin wrapper over ``_put_event("agent_milestone",
...)`` so the shared circuit-breaker + classifier path (finding aws-samples#6/aws-samples#8)
continues to apply. Metadata keys mirror the §11.1 shapes verbatim
(``maxLifetime_remaining_s`` preserves the design-doc spelling for
downstream parsers).
Tests: +29 total. 17 on ``TestApprovalMilestoneHelpers`` pin the DDB
payload shape for each helper (including the two
``approval_timeout_capped`` reason variants — rule_annotation carries
matching_rule_ids; maxLifetime_ceiling omits the field). 12 on the
engine: counter monotonicity, rate-window prune semantics at window
boundary, denial-queue FIFO + drain-clears, ceiling-shrinking latch
idempotency.
No caller changes — engine and writer surfaces are additive. Hook
integration lands in commit C.
scoropeza
added a commit
to scoropeza/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 16, 2026
Lands the two user-facing REST handlers that flip a PENDING approval
row to APPROVED / DENIED, the shared types both call sites and the
CLI consume, and the Lambda-side Cedar parser future Chunk-5 handlers
(get-policies, create-task validation) will use.
Wire types (shared/types.ts)
----------------------------
- ApprovalScope union covering every shape the agent's
ApprovalAllowlist understands. Typed so approve-task / create-task /
CLI (Chunk 6) all agree at compile time.
- ApprovalRecord / ApprovalRequest / ApprovalResponse / DenyRequest /
DenyResponse / PendingApprovalSummary / GetPendingResponse /
PolicyRuleSummary / GetPoliciesResponse / LinkSlackUserRequest /
LinkSlackUserResponse / SlackUserMappingRecord /
ApprovalDecisionRecordedEvent / CreateTaskApprovalExtensions.
- Constants: DENY_REASON_MAX_LENGTH=2000, INITIAL_APPROVALS_MAX_ENTRIES=20,
INITIAL_APPROVALS_MAX_ENTRY_LENGTH=128, APPROVAL_TIMEOUT_S_MIN=30,
APPROVAL_TIMEOUT_S_MAX=3600, APPROVAL_TIMEOUT_S_DEFAULT=300.
New error codes: REQUEST_NOT_FOUND, REQUEST_ALREADY_DECIDED,
TASK_NOT_AWAITING_APPROVAL.
Shared helpers
--------------
- shared/approval-scope.ts — parseApprovalScope validates every shape;
rejects unknown tool types / groups / prefixes, empty values,
over-128-char strings. isDegeneratePattern implements §7.4 (length
≤ 2, all-wildcard, wildcard ratio > 50%) for Chunk-5 create-task.
- shared/deny-reason-scanner.ts — scanDenyReason redacts AWS keys,
GitHub PATs (classic + fine-grained), Slack tokens, PEM blocks,
bearer tokens with [REDACTED-...] markers. Mirrors
agent/src/output_scanner.py so the deny reason the agent
ultimately reads is never raw user input.
- shared/cedar-policy.ts — parseRules pulls the five HITL annotations
(tier/rule_id/severity/approval_timeout_s/category) into a
ParsedRule[], preserving positional policy_id for IMPL-29
diagnostics-to-rule_id resolution. isHardDenyRule, isValidRuleId,
matchingRuleIds, concatPolicies exposed for future handlers.
Handlers
--------
- approve-task.ts (§7.1) — POST /v1/tasks/{task_id}/approve
- Cross-table TransactWriteItems: approval row PENDING → APPROVED
guarded by user_id = :caller AND status = :pending; TaskTable
no-op Update guarded by status = AWAITING_APPROVAL AND
awaiting_approval_request_id = :rid.
- TransactionCanceledException classified by per-item
CancellationReasons. Approval-row failure collapses to 404
REQUEST_NOT_FOUND (no existence oracle per §7.1 finding aws-samples#6);
task-row failure → 409 TASK_NOT_AWAITING_APPROVAL.
- Optional scope defaults to this_call.
- Per-user per-minute rate limit (30/min, synthetic row).
- Writes approval_decision_recorded audit event (IMPL-6). Audit
failure is logged but does not fail the request — decision is
already committed.
- deny-task.ts (§7.2) — POST /v1/tasks/{task_id}/deny
- Same cross-table pattern; status → DENIED + deny_reason.
- Reason is scanDenyReason-sanitized + truncated to
DENY_REASON_MAX_LENGTH BEFORE any persistence — agent and audit
both read sanitized form; raw input never stored.
- Same rate-limit namespace as approve.
Tests: +64 total (cedar-policy-parser 24, approval-scope 28,
deny-reason-scanner 13, approve-task 14, deny-task 9). Secret test
fixtures are assembled from string fragments so the source never
holds a contiguous secret literal — Code Defender pre-commit hook
otherwise blocks.
Stack wiring (task-api.ts routes, agent.ts layer attachment,
CreateTaskFn extension, orchestrator + reconciler + fanout +
LinkSlackUserFn + GetPolicies + GetPending) lands in the next
commit.
krokoko
added a commit
to isadeks/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 18, 2026
* docs(cedar-hitl): restore and revise HITL gates design, fold adversarial findings
Design doc was accidentally removed in 0742ebe; restored from b34d7cd and
substantially revised under a new filename. "Phase 3" framing dropped — this
is the Cedar HITL approval gates feature.
- Renamed PHASE3_CEDAR_HITL.md → CEDAR_HITL_GATES.md; all "phase" gating
removed (Phase 3a/3b → v1 / future work §17).
- Integrated 16 findings from 2026-05-06 adversarial review with realistic
scenarios. Major structural changes:
- Decision #23 (new): cross-engine parity contract between cedarpy (agent,
Python) and @cedar-policy/cedar-wasm@4.10.0 (Lambda, TS).
- §11.2: SlackUserMappingTable with OAuth user-initiated mapping; severity-
gated Slack approvals; admin has no write path.
- §7.1/§12.3: ApproveTaskFn uses cross-table TransactWriteItems for atomicity.
- §10.1: user_id-status-index GSI on TaskApprovalsTable; v1 not v-later.
- §15.6: cedar-wasm as a Lambda layer shared across policy Lambdas.
- Gate-cap revision (2026-05-07): decision #13 — default 50, blueprint-
configurable via security.approvalGateCap (bounded 1–500), persisted on
TaskTable. Cache memory bound decoupled: 50-entry LRU regardless of cap.
IMPL-22 adds telemetry-driven re-evaluation criteria.
- Timeout adversarial+advocate pass (2026-05-07):
- §6.5 VM-throttle race fix: re-read row on failed TIMED_OUT
ConditionCheckFailed; honor APPROVED if user beat the timer. IMPL-24.
- Sub-120s @approval_timeout_s emits blueprint-load WARN. IMPL-25.
- User-visible timeout cap milestones (approval_timeout_capped_at_submit,
approval_ceiling_shrinking). IMPL-26.
- Runtime JWT: no refresh logic in agent/src/ (container uses IAM role);
ceiling stays min(1h, maxLifetime_remaining - 120s). IMPL-27.
- Three new CloudWatch metrics for timeout tuning. IMPL-28.
- §14.8 new: off-hours trade-off section (fail-closed is the invariant).
- §13.13 new: notification-delivery failure does NOT pause the timer
(bypass-prevention).
- Added six mermaid diagrams: three-outcome decision flow, end-to-end round-
trip, TaskApprovalsTable state machine, Slack user-mapping, fail-closed
decision flow, cross-engine parity check.
- Cross-references updated in INTERACTIVE_AGENTS.md and SECURITY.md.
- Starlight mirror regenerated via docs/scripts/sync-starlight.mjs.
No code changes in this commit — design work only. Implementation lands in a
follow-up PR per §15.2 task list.
* feat(cedar-hitl): pin Cedar engines and seed cross-engine parity contract
Chunk 1 of the Cedar HITL gates PR (docs/design/CEDAR_HITL_GATES.md).
Lays the foundation before engine rewrites in Chunk 2+: both Cedar engines
pinned exactly per decision #23, annotation surface validated by Day-1
spikes per decision #22, and the golden-file parity fixtures seeded so
every subsequent chunk can rely on the contract.
- Pin cedarpy==4.8.0 (agent) and @cedar-policy/cedar-wasm@4.10.0 (cdk)
exactly (no ^/~); document both in mise.toml header.
- Add agent/tests/test_cedarpy_annotations_contract.py (10 tests)
validating all 5 annotations round-trip verbatim via
policies_to_json_str() under staticPolicies.<id>.annotations.
- Add cdk/test/handlers/shared/cedar-policy.test.ts (12 tests) validating
policySetTextToParts + policyToJson extract the same annotations
verbatim and isAuthorized returns the documented {type, response}
wrapper shape.
- Add contracts/cedar-parity/ with 5 golden-file fixtures (single-match,
multi-match, hard-deny, soft-deny write, no-match default-allow) +
README documenting the contract. Every fixture policy carries a
@rule_id - including the base permit as @rule_id("base_permit") - so
the parity tests raise if either engine returns an unannotated match
instead of silently dropping it.
- Add agent/tests/test_cedar_parity.py (6 tests, cedarpy side) and
cdk/test/handlers/shared/cedar-parity.test.ts (6 tests, cedar-wasm
side) loading the shared fixtures and asserting (decision, sorted
rule_ids) match expected. Both tests hard-import cedarpy/cedar-wasm
so a dependency regression fails loud rather than silently skipping.
- Update docs/design/CEDAR_HITL_GATES.md sections 15.2 row 3, 15.6
prose and the parity mermaid diagram to point at contracts/cedar-parity/
(the precedent set by contracts/memory-hash-vectors.json) instead of a
new tests/fixtures/ dir. Regenerate the Starlight mirror.
- Add IMPL-29 noting the cedarpy diagnostics.reasons / cedar-wasm
diagnostics.reason naming asymmetry surfaced by the spikes; engine
code normalizes at the boundary.
- Fix rev-4 -> rev-5 cosmetic footer drift.
Test counts: agent 500 -> 516 (+16), cdk 1036 -> 1054 (+18), cli 190
unchanged. No production code changes in this chunk; engine rewrite
lands in Chunk 2.
Follow-up: separate chore issue to move contracts/memory-hash-vectors.json
into a self-named subdir for consistency with contracts/cedar-parity/.
* feat(cedar-hitl): three-outcome PolicyEngine core
Chunk 2 of the Cedar HITL gates PR. Rewrites agent/src/policy.py into
the three-outcome engine specified in docs/design/CEDAR_HITL_GATES.md
section 6. The REQUIRE_APPROVAL outcome is the human-in-the-loop surface
the next chunks (PreToolUse hook extension, REST API, CLI) plug into.
This chunk ships the engine and its load-time validation; no hook or
wire-format changes yet.
Engine:
- Outcome enum (ALLOW, DENY, REQUIRE_APPROVAL) + extended PolicyDecision
with .allowed backward-compat shim for Phase 1a/1b/2 callers. Custom
__init__ accepts both outcome= and legacy allowed= kwargs so existing
tests keep working verbatim.
- Three-outcome pipeline per section 6.2: hard-deny eval (absolute) ->
allowlist fast-path (tool_type/tool_group/bash_pattern/write_path/
all_session) -> recent-decision cache (60s TTL on DENIED/TIMED_OUT) ->
soft-deny eval (with post-eval rule-scope allowlist check and
blueprint_disable filtering) -> default ALLOW.
- ApprovalAllowlist (section 6.4): parses and matches every scope type.
Strips whitespace and rejects empty-after-strip values so
"tool_type: Read " normalizes instead of silently mismatching (review
finding 6).
- RecentDecisionCache (section 12.9): 50-entry LRU, INDEPENDENT of
approvalGateCap. Populated only on DENIED/TIMED_OUT. Session-scoped
(documented section 12.8 caveat).
- Annotation handling (sections 5.2 + 6.3): parses @rule_id, @tier,
@approval_timeout_s, @severity, @category via
cedarpy.policies_to_json_str(); merges on multi-match with min timeout
(clamped by 30s floor) and max severity.
- Load-time validation (sections 5.1, 12.4): rejects missing/mismatched
@tier, missing @rule_id, sub-floor timeouts, duplicate rule_ids across
tiers, blueprint text > 64 KB, disable entries naming built-in
hard-deny rules (finding 9), approval_gate_cap outside [1, 500]
(decision 13). Sub-120s @approval_timeout_s emits WARN but accepts
(IMPL-25).
- Fail-closed posture (section 13): cedarpy parse errors surface via
diagnostics.errors -> RuntimeError raised inside _eval_tier -> outer
handler returns DENY with reason "fail-closed: <ExceptionType>".
TypeError on json.dumps of unhashable tool_input surfaces as distinct
"fail-closed: unhashable_tool_input" reason (review finding 5).
Built-in policies:
- agent/policies/hard_deny.cedar: base_permit catch-all + rm_slash +
write_git_internals + write_git_internals_nested + drop_table +
pr_review-specific Write/Edit forbids (absolute).
- agent/policies/soft_deny.cedar: base_permit (catch-all required in
each tier so cedarpy default-deny does not convert no-match into
DENY) + force_push_any + force_push_main + push_to_protected_branch
+ write_env_files + write_credentials. All soft rules carry @tier,
@rule_id, @approval_timeout_s, @severity, @category per section 15.4
starter set.
Review findings addressed (1 blocker, 8 significant, plus minor):
- blueprint_disable actually disables soft rules at eval time instead
of silently no-op (the blocker: test coverage had been a silent-pass).
- Legacy extra_policies with @tier/@rule_id rejected to avoid undefined
double-annotation behavior.
- _matching_rule_ids logs WARN on unknown policy IDs (state-drift
signal).
- base_permit validator exemption restricted to effect=="permit" so
misnamed forbid rules cannot bypass validation (finding 7).
- Hard-tier Cedar no_decision logged at WARN (signals missing/malformed
base_permit catch-all).
- Allowlist whitespace normalization + empty-value rejection.
- StrEnum upgrade, Callable moved to TYPE_CHECKING, assert replaced
with explicit RuntimeError for S101 compliance.
Phase 1 compatibility:
- All 39 existing test_policy.py tests pass unchanged via the
.allowed property. One test (test_invalid_policy_syntax_fails_closed)
updated to patch _hard_policies instead of the removed _policies
attribute; docstring explains the rewrite.
- extra_policies kwarg preserved; callers with annotated rules must
migrate to blueprint_soft_policies / blueprint_hard_policies.
Test counts: agent 516 -> 576 (+60: 51 three-outcome + 9 regression
fixes). cli 190 unchanged. cdk 1054 unchanged.
Carry-forward to Chunk 3:
- extra_policies semantic shift (Phase 1 DENY -> Chunk 2
REQUIRE_APPROVAL); .allowed=False preserved but .outcome differs.
Switchover happens when hooks.py adopts the three-outcome branching.
- Cross-tier action-context asymmetry (review finding 8): document
rule-authoring constraint in section 5.5 of design.
- Probe entity-shape coverage (finding 10): extend _probe_cedar to
exercise Write/Edit/Bash action paths, not just invoke_tool.
* feat(cedar-hitl): approval milestone writers + engine counters
Adds the 14 agent-side approval milestone writers (§11.1) on
``_ProgressWriter`` so Chunk 3's hook integration has a typed API
instead of stringly-typed ``write_agent_milestone`` calls, and the
per-task gate counter / per-container sliding-window rate limit /
denial-injection queue on ``PolicyEngine`` that §6.5 requires.
Why now: the hook work lands cleanly only after these surfaces exist
— every code path in ``pre_tool_use_hook``'s REQUIRE_APPROVAL branch
calls one of these helpers. Shipping them separately lets the hook
commit be about the state machine, not the event-shape bookkeeping.
Engine additions:
- ``approval_gate_count`` / ``increment_approval_gate_count``: the
per-task counter §12.9 bounds at ``approvalGateCap``. Session-scoped
in v1; persistence tracked in §17.
- ``approvals_in_last_minute`` / ``record_approval_gate_timestamp``:
sliding-window rate limit (20/min/container, §12.9). Prune on read
so callers see the current count without a separate tick.
- ``queue_denial_injection`` / ``drain_denial_injections``: queue
consumed by ``_denial_between_turns_hook`` at the next Stop seam
(§6.5). Reason is pre-sanitized upstream by ``DenyTaskFn``.
- ``mark_ceiling_shrinking_emitted``: emit-once latch for IMPL-26.
- ``APPROVAL_RATE_LIMIT`` / ``APPROVAL_RATE_WINDOW_S`` module consts
the hook imports rather than re-deriving.
Milestone writers (§11.1 table, 14 agent-emitted of 15):
- ``pre_approvals_loaded``, ``approval_requested``,
``approval_granted``, ``approval_denied``, ``approval_timed_out``,
``approval_stranded``, ``approval_write_failed``,
``approval_resume_failed``, ``approval_poll_degraded``,
``approval_timeout_capped``, ``approval_ceiling_shrinking``,
``approval_cap_exceeded``, ``approval_rate_limit_exceeded``,
``approval_late_win``.
- ``approval_decision_recorded`` (Lambda audit) and
``approval_timeout_capped_at_submit`` (CreateTaskFn) stay on the
Lambda side — Chunk 5 owns those.
Each helper is a thin wrapper over ``_put_event("agent_milestone",
...)`` so the shared circuit-breaker + classifier path (finding #6/#8)
continues to apply. Metadata keys mirror the §11.1 shapes verbatim
(``maxLifetime_remaining_s`` preserves the design-doc spelling for
downstream parsers).
Tests: +29 total. 17 on ``TestApprovalMilestoneHelpers`` pin the DDB
payload shape for each helper (including the two
``approval_timeout_capped`` reason variants — rule_annotation carries
matching_rule_ids; maxLifetime_ceiling omits the field). 12 on the
engine: counter monotonicity, rate-window prune semantics at window
boundary, denial-queue FIFO + drain-clears, ceiling-shrinking latch
idempotency.
No caller changes — engine and writer surfaces are additive. Hook
integration lands in commit C.
* feat(cedar-hitl): TaskApprovals + AWAITING_APPROVAL transition primitives
Adds the four agent-side DDB primitives §6.5 + IMPL-24 need for the
three-outcome hook integration in the next commit:
- ``transact_write_approval_request`` — cross-table TransactWriteItems:
Put(TaskApprovalsTable) with ``attribute_not_exists(request_id)`` +
Update(TaskTable) gated on ``status = RUNNING``. Atomic per §12.3 so
a concurrent cancel cannot land the task in AWAITING_APPROVAL with
no matching approval row (or vice versa).
- ``transact_resume_from_approval`` — Update(TaskTable) gated on
``status = AWAITING_APPROVAL AND awaiting_approval_request_id =
:rid``. The ``request_id`` condition prevents resuming with a stale
ID after a reconciler race (§13.9).
- ``best_effort_update_approval_status`` — conditional UpdateItem on
the approval row with ``status = :pending`` guard. Returns False on
``ConditionalCheckFailedException``; this is the signal IMPL-24's
re-read path fires on (§6.5 pseudocode lines 846-879, §13.12).
- ``get_approval_row`` — GetItem with ``ConsistentRead=True`` by
default. Required by IMPL-24's re-read; kept opt-out (bool flag) for
future cold-path callers that don't need the strong read.
Errors:
- ``ApprovalTablesUnavailable`` for env-var-missing — raised loud so
a pre-Chunk-4 deploy fails closed (hook will map to DENY) rather
than silently no-op'ing the gate.
- ``ApprovalWriteError`` / ``ApprovalResumeError`` wrap
``TransactionCanceledException`` with the cancellation reasons
list. The hook uses these to distinguish the "concurrent cancel"
branch from real DDB outages.
- ``ConditionalCheckFailedException`` on ``update_item`` is consumed
and returned as ``False`` from ``best_effort_update_approval_status``
— the caller (hook) needs the boolean to decide whether to
re-read, not to propagate.
- All other DDB errors propagate so the hook's outer try/except can
classify fail-closed with a specific reason.
Implementation notes:
- Uses ``boto3.client("dynamodb")`` low-level API (not resource).
``transact_write_items`` lives on the client, and marshalling the
approval row attributes explicitly gives deterministic DDB shapes
that the tests can assert on. ``_py_to_ddb_attr`` covers the
subset of Python types §10.1 actually uses (str/int/bool/None/list
of str); any other type raises TypeError loudly rather than
silently writing something unexpected.
- ``_extract_error_code`` / ``_extract_cancellation_reasons`` duck-type
on ``exc.response`` so we don't need botocore at import time (tests
use a minimal exception class).
- Errors from unsupported types (floats, dicts, etc.) are caught
BEFORE the DDB round-trip so the unit-test asserts
``transact_write_items`` was not called — catches schema drift
early.
- Status constants (``_STATUS_RUNNING`` / ``_STATUS_AWAITING_APPROVAL``)
named so a rename in CDK cannot silently diverge the Python path.
Tests: +20 total.
- 5 on TransactWriteApprovalRequest: env-missing, happy-path shape
assertion (both items + conditions), TransactionCanceled → ApprovalWriteError
with reasons preserved, other errors propagate, unsupported type rejected
before any DDB call.
- 3 on TransactResumeFromApproval: env-missing, happy-path expression
shape (includes REMOVE awaiting_approval_request_id), cancel →
ApprovalResumeError.
- 4 on BestEffortUpdateApprovalStatus: happy path returns True,
``reason`` kwarg attaches ``deny_reason``, ConditionalCheckFailed
returns False (IMPL-24's signal), other errors propagate.
- 4 on GetApprovalRow: ConsistentRead default True, opt-out False,
row-not-found returns None, row unmarshalling through every
supported DDB attribute type.
- 4 on helpers: error-code extraction with and without
ClientError-shape, cancellation-reasons extraction with and without.
No runtime callers yet — hook integration lands in commit C. Physical
TaskApprovalsTable lands in Chunk 4; Python side is wire-compatible so
the hook work can be unit-tested today with mocked clients.
* feat(cedar-hitl): PreToolUse three-outcome REQUIRE_APPROVAL path
Wires the agent to the full §6.5 pseudocode: cap + rate-limit check,
atomic TransactWriteItems for pending row + TaskTable AWAITING_APPROVAL,
2s→5s ConsistentRead poll, IMPL-24 VM-throttle race re-read, resume
transition, scope propagation to allowlist, and denial-injection queue
consumed at the next Stop seam. Completes §15.2 rows 26 + 27.
Hook control flow (three outcomes)
----------------------------------
- ALLOW / DENY: existing Phase 1 behavior, now switching on
``.outcome`` rather than ``.allowed``. Legacy Phase 1/2 tests still
green because PolicyDecision preserves the ``.allowed`` shim.
- REQUIRE_APPROVAL (new): extracted into ``_handle_require_approval``
for readability. Delegates to ``task_state`` primitives and
``engine.*`` counter surfaces from the prior commits; no new DDB
client construction here.
Key pieces:
- ``_compute_effective_timeout`` applies the §6.5 min(rule, default,
lifetime) formula. The engine's ``_merge_annotations`` has already
clipped decision.timeout_s against the task default; the hook adds
the remaining-lifetime ceiling and floors at FLOOR_30S.
``clip_reason`` distinguishes ``rule_annotation`` (rule was tighter
than task default) from ``maxLifetime_ceiling`` (task is late in
its life) so ``approval_timeout_capped`` carries the right reason.
- ``_remaining_maxlifetime_s`` reads ``AGENTCORE_MAX_LIFETIME_S`` +
``TASK_STARTED_AT`` env vars (8h default). Returns ``None`` when
the start timestamp is absent — the hook treats that as "unknown,
don't clip" rather than pre-DENYing, so Phase 1 test paths that
don't set the env var still see the old task-default behaviour.
Chunk 4/5 will wire these at task launch.
- ``_poll_for_decision`` uses 2s cadence for the first 30s then 5s
(IMPL-12). All polls use ``ConsistentRead=True`` per IMPL-24. 3
consecutive GetItem failures emit ``approval_poll_degraded``; 10
consecutive failures fall through as TIMED_OUT with a specific
reason (§13.2).
- ``_reconcile_late_decision`` implements IMPL-24 re-read: on a
ConditionCheckFailed from the TIMED_OUT write, re-read with
ConsistentRead. APPROVED → rebuild outcome, propagate scope to
allowlist, run normal allow flow, emit ``approval_late_win``.
DENIED → honor the user's sanitized reason. PENDING or row gone
→ fall through with TIMED_OUT (fail-closed, §13.12 last paragraph).
Cancel-wins semantics (finding #2)
----------------------------------
``_denial_between_turns_hook`` is registered AFTER
``_nudge_between_turns_hook`` in ``between_turns_hooks`` so cancel
short-circuits both. The hook re-checks ``_cancel_requested`` itself
as belt-and-braces (matching the nudge hook) so a future reorder does
not silently break cancel-wins. Denial queue is PRESERVED on cancel —
not drained — so a denial still sitting on the queue when the task is
being torn down does not leak across tasks (the engine is per-task
per §IMPL-7).
``stop_hook`` threads ``engine`` into ``ctx`` so the denial hook can
``drain_denial_injections``. ``build_hook_matchers`` accepts a new
``user_id`` kwarg (§12.2) so approval rows carry caller identity for
the REST side's ownership check.
``permissionDecisionReason`` guaranteed surface
-----------------------------------------------
The hook's deny return is the ONLY guaranteed surface the SDK emits
to the agent; denial injection is best-effort (pre-empted by cancel).
``_deny_response`` pipes every reason through ``_strip_ansi`` +
``_truncate(500)``: ANSI sequences can never reach the model, and the
line stays loggable. §12.7 requirement.
Tests: +24 agent hook tests (47 total in test_hooks.py). Run in 0.92s
via a ``_fast_poll`` fixture that collapses ``asyncio.sleep`` to a
no-op AND advances ``hooks.time.monotonic`` by the requested duration
so the poll wall-clock deadline actually trips.
Happy paths:
- APPROVED + scope propagation to allowlist + milestones.
- APPROVED with scope=this_call does NOT grow allowlist.
- DENIED queues denial injection + populates recent-decision cache
(next identical call auto-denies).
- TIMED_OUT writes TIMED_OUT row and emits approval_timed_out.
IMPL-24 race: four branches.
- APPROVED re-read → allow flow, approval_late_win milestone, scope
propagated, resume succeeds.
- DENIED re-read → deny flow, approval_late_win milestone, user's
reason is the permissionDecisionReason.
- Still-PENDING re-read → fail-closed fall-through (no late_win).
- Row-gone re-read → same fail-closed fall-through.
Cap / rate-limit / write failure / resume failure branches all:
- Short-circuit before any DDB write when the local guard fires
(cap, rate limit).
- Emit the right approval_* milestone.
- Return DENY with a specific permissionDecisionReason.
Sanitization:
- ANSI stripped from deny reason.
- Deny reason truncated to ≤500 chars.
Timeout clipping:
- rule_annotation reason when a rule's approval_timeout_s is below
the task default; matching_rule_ids populated.
- maxLifetime_ceiling reason when remaining lifetime is the tightest
bound; matching_rule_ids is None.
- approval_ceiling_shrinking emits exactly once per task (IMPL-26
latch).
Denial injection hook (6 tests):
- Draining produces a <user_denial request_id=... decided_at=...>
block with XML-escaped reason.
- Cancel short-circuit preserves the queue so the denial is not
lost; just not injected into a dying agent.
- Hostile reason (</user_denial>...<user_nudge>) is XML-escaped so
the envelope cannot be forged.
- No-engine ctx returns [] (Phase 1 call sites still work).
- Registered LAST in ``between_turns_hooks`` (invariant for §6.5
finding #2).
- End-to-end via stop_hook: queued denial becomes
``decision=block`` + reason on the Stop return.
Carry-forward
-------------
- ``_remaining_maxlifetime_s`` returns None when TASK_STARTED_AT is
unset — Chunk 4/5 will wire this at task launch. Tracked in §16.
- ``approval_gate_count`` lives on the engine (session-scoped) not on
TaskTable in v1. §13.6 notes that the reconciler + approval_gate_cap
still bound worst-case across container restarts. Chunk 7+ tracks
persistence when telemetry justifies it.
- Denial injection emits a ``user_denial_injected`` milestone that is
NOT in the §11.1 enumerated table. It mirrors ``nudge_acknowledged``
for stream visibility; keep the name distinct from the ``approval_*``
prefix so future §11.1 consumers can't confuse it with an approval
outcome.
* feat(cedar-hitl): TaskApprovalsTable + SlackUserMapping + status enum
Lands the stateless CDK primitives for Cedar-HITL approval gates so
Chunk 5's REST handlers can be wired onto concrete tables. Completes
§15.2 tasks 9, 20, and 25.
Constructs
----------
``TaskApprovalsTable`` (§10.1)
- PK ``task_id`` + SK ``request_id`` (ULID). Matches the agent-side
primitives landed in the prior commit.
- GSI ``user_id-status-index`` with user_id PK + status SK and an
``INCLUDE`` projection limited to the fields GET /v1/pending
renders. Three deny-sensitive attrs (``deny_reason``, ``scope``,
``tool_input_sha256``) deliberately omitted from the projection —
the list endpoint only returns PENDING rows in practice, but
excluding them kills the projection-leak concern outright and
costs no bytes today.
- Exports ``USER_STATUS_INDEX_NAME`` as a module constant + mirrors
it on ``construct.userStatusIndexName`` so handlers referencing
the GSI fail compile-time on a rename.
- TTL attribute ``ttl`` (agent writes ``created_at + timeout_s +
120s``).
- No DynamoDB streams per §11.2. TaskEventsTable carries the audit
fan-out; streams here would duplicate.
- Default RemovalPolicy.DESTROY to match the rest of the sample.
Production deploys override to RETAIN per §10.1.
``SlackUserMappingTable`` (§11.2, finding #4)
- Single-key (``slack_user_id`` PK). No SK, no TTL, no GSI, no
stream. The forward-only shape is the trust boundary — a reverse
GSI (Cognito → Slack) would let a compromised Cognito sub
enumerate Slack identities without adding v1 capability.
- Writes land through LinkSlackUserFn (Chunk 5) which enforces the
``attribute_not_exists(slack_user_id)`` condition so a prior
legitimate mapping cannot be overwritten by a later compromise.
``task-status.ts`` — AWAITING_APPROVAL (§10.3)
- Added to TaskStatus enum + ACTIVE_STATUSES (NOT TERMINAL_STATUSES:
the task is alive, paused on a human decision).
- VALID_TRANSITIONS wires the five edges §10.3 enumerates:
RUNNING → AWAITING_APPROVAL (soft-deny entry)
HYDRATING → AWAITING_APPROVAL (rare early-gate case)
AWAITING_APPROVAL → RUNNING (approve / deny resume)
AWAITING_APPROVAL → CANCELLED (user cancel mid-approval)
AWAITING_APPROVAL → FAILED (stranded-approval reconciler)
- Notably NOT added:
AWAITING_APPROVAL → FINALIZING (approve-during-cleanup race)
AWAITING_APPROVAL → COMPLETED (skip RUNNING)
AWAITING_APPROVAL → TIMED_OUT (timer lives on the approval
row, not the task clock)
These are regression tests so a future refactor cannot quietly
add them and bypass the `awaiting_approval_request_id = :rid`
invariant.
Tests: +29 total.
- TaskApprovalsTable (11 tests): PK/SK schema, PAY_PER_REQUEST,
PITR default + override, TTL attribute, NO streams, GSI schema +
projection + sensitive-attr exclusion, removal policy default +
override, ``USER_STATUS_INDEX_NAME`` constant parity with the
construct field.
- SlackUserMappingTable (8 tests): single-key schema (explicit
KeySchema length assertion), PAY_PER_REQUEST, PITR, no streams,
no reverse GSI, DESTROY default, TTL absent.
- TaskStatus (+10 tests over existing: 5 new assertions on the
9-state cardinality, AWAITING_APPROVAL membership, and the
transition graph including the three forbidden edges). The
existing assertions updated for the new state count.
No stack wiring yet — ``agent.ts`` instantiation + env var plumbing +
grants land in the next commit alongside the Cedar-WASM Lambda layer.
* feat(cedar-hitl): Cedar-wasm layer + wire approval tables into agent stack
Activates the agent-side approval path and ships the Lambda layer
Chunk 5's REST handlers need.
Cedar-wasm Lambda layer (§15.2 task 10)
----------------------------------------
``CedarWasmLayer`` bundles ``@cedar-policy/cedar-wasm@4.10.0`` into
``/opt/nodejs/node_modules/`` so Lambdas can
``require('@cedar-policy/cedar-wasm/nodejs')`` without shipping the
4 MB wasm binary in every function package. A dedicated
``cdk/layers/cedar-wasm/`` directory carries a minimal ``package.json``
pinning the exact version — bundling runs ``npm install --omit=dev``
against that manifest, so the layer build is hermetic from any
``cdk/node_modules/`` drift.
The bundler has two fallbacks:
- Docker (``public.ecr.aws/sam/build-nodejs22.x``) for CI / prod
deploys.
- Local-npm fallback for environments without Docker (unit-test
synths + `cdk synth` on runners that lack Docker). The local
path is safe here because the layer ships pure JS + a prebuilt
wasm binary — no native build step.
Three constants exposed from the module:
- ``CEDAR_WASM_VERSION`` — single source of truth for the pinned
version; tests assert this matches both ``cdk/package.json`` and
the layer manifest, so the three places the version lives stay
in sync.
- ``CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` — 512 MB floor for attaching
Lambdas per §15.2 task 10.
- ``CedarWasmLayer.layer`` — the underlying ``LayerVersion`` for
Chunk 5 handlers to attach via ``fn.addLayers(...)``.
Agent stack wiring (§15.2 task 19)
------------------------------------
``agent.ts`` now instantiates:
- ``TaskApprovalsTable`` (prior commit) — grants RW to the runtime
so ``pre_tool_use_hook`` can TransactWriteItems + ConsistentRead
the PENDING row.
- ``SlackUserMappingTable`` (prior commit) — not granted to the
runtime; only the link-user Lambda (Chunk 5) writes here.
- ``CedarWasmLayer`` — the layer's asset lands in the synthed
template so Chunk 5 handlers can reference ``.layer`` without
causing a new asset on their deploy.
New runtime env vars:
- ``TASK_APPROVALS_TABLE_NAME`` — consumed by
``task_state._require_tables``; its absence previously raised
``ApprovalTablesUnavailable`` → hook DENY. Now set, so the
approval path is live on deploy.
- ``AGENTCORE_MAX_LIFETIME_S = '28800'`` — 8 hours, matching
``lifecycleConfiguration.maxLifetime``. Consumed by the hook's
``_remaining_maxlifetime_s`` for the maxLifetime ceiling clip
(§6.5). Kept in sync with the lifecycle via a direct test
assertion so drift surfaces at build time.
New CfnOutputs: ``TaskApprovalsTableName``, ``SlackUserMappingTableName``,
``CedarWasmLayerArn``. Each is useful for post-deploy smoke tests
(`aws dynamodb describe-table` / `aws lambda get-layer-version`).
Tests: +8 layer tests + 9 agent-stack assertions.
Layer:
- LayerVersion resource count.
- Compatible runtimes (nodejs20/22).
- Description carries the pinned version.
- CEDAR_WASM_VERSION matches ``cdk/package.json``.
- CEDAR_WASM_VERSION matches ``layers/cedar-wasm/package.json``.
- CEDAR_WASM_MIN_LAMBDA_MEMORY_MB ≥ 512.
- Custom description override works.
- ``.layer`` exposes a real ``LayerVersion``.
Agent stack:
- Table count updated from 6 → 8.
- TaskApprovalsTable schema match (task_id PK / request_id SK,
user_id-status-index GSI presence).
- SlackUserMappingTable single-key schema.
- LayerVersion count + compatibleRuntimes.
- Three new CfnOutputs present.
- TASK_APPROVALS_TABLE_NAME env var on the runtime.
- AGENTCORE_MAX_LIFETIME_S == '28800' (drift guard).
Carry-forward
-------------
- ``TASK_STARTED_AT`` is the other input the hook's
``_remaining_maxlifetime_s`` consumes — it's a PER-TASK value the
orchestrator must stamp at invocation time, not a stack-level env
var. Chunk 5's orchestrator changes need to add it to the runtime
invocation payload / session env. For now the hook's fallback
("unknown, don't clip") keeps approvals functional.
- Chunk 5 will attach the CedarWasmLayer onto ApproveTaskFn,
DenyTaskFn, GetPoliciesFn, CreateTaskFn and assert
``memorySize >= CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` for each.
* feat(cedar-hitl): approve + deny handlers + shared types (§7.1, §7.2)
Lands the two user-facing REST handlers that flip a PENDING approval
row to APPROVED / DENIED, the shared types both call sites and the
CLI consume, and the Lambda-side Cedar parser future Chunk-5 handlers
(get-policies, create-task validation) will use.
Wire types (shared/types.ts)
----------------------------
- ApprovalScope union covering every shape the agent's
ApprovalAllowlist understands. Typed so approve-task / create-task /
CLI (Chunk 6) all agree at compile time.
- ApprovalRecord / ApprovalRequest / ApprovalResponse / DenyRequest /
DenyResponse / PendingApprovalSummary / GetPendingResponse /
PolicyRuleSummary / GetPoliciesResponse / LinkSlackUserRequest /
LinkSlackUserResponse / SlackUserMappingRecord /
ApprovalDecisionRecordedEvent / CreateTaskApprovalExtensions.
- Constants: DENY_REASON_MAX_LENGTH=2000, INITIAL_APPROVALS_MAX_ENTRIES=20,
INITIAL_APPROVALS_MAX_ENTRY_LENGTH=128, APPROVAL_TIMEOUT_S_MIN=30,
APPROVAL_TIMEOUT_S_MAX=3600, APPROVAL_TIMEOUT_S_DEFAULT=300.
New error codes: REQUEST_NOT_FOUND, REQUEST_ALREADY_DECIDED,
TASK_NOT_AWAITING_APPROVAL.
Shared helpers
--------------
- shared/approval-scope.ts — parseApprovalScope validates every shape;
rejects unknown tool types / groups / prefixes, empty values,
over-128-char strings. isDegeneratePattern implements §7.4 (length
≤ 2, all-wildcard, wildcard ratio > 50%) for Chunk-5 create-task.
- shared/deny-reason-scanner.ts — scanDenyReason redacts AWS keys,
GitHub PATs (classic + fine-grained), Slack tokens, PEM blocks,
bearer tokens with [REDACTED-...] markers. Mirrors
agent/src/output_scanner.py so the deny reason the agent
ultimately reads is never raw user input.
- shared/cedar-policy.ts — parseRules pulls the five HITL annotations
(tier/rule_id/severity/approval_timeout_s/category) into a
ParsedRule[], preserving positional policy_id for IMPL-29
diagnostics-to-rule_id resolution. isHardDenyRule, isValidRuleId,
matchingRuleIds, concatPolicies exposed for future handlers.
Handlers
--------
- approve-task.ts (§7.1) — POST /v1/tasks/{task_id}/approve
- Cross-table TransactWriteItems: approval row PENDING → APPROVED
guarded by user_id = :caller AND status = :pending; TaskTable
no-op Update guarded by status = AWAITING_APPROVAL AND
awaiting_approval_request_id = :rid.
- TransactionCanceledException classified by per-item
CancellationReasons. Approval-row failure collapses to 404
REQUEST_NOT_FOUND (no existence oracle per §7.1 finding #6);
task-row failure → 409 TASK_NOT_AWAITING_APPROVAL.
- Optional scope defaults to this_call.
- Per-user per-minute rate limit (30/min, synthetic row).
- Writes approval_decision_recorded audit event (IMPL-6). Audit
failure is logged but does not fail the request — decision is
already committed.
- deny-task.ts (§7.2) — POST /v1/tasks/{task_id}/deny
- Same cross-table pattern; status → DENIED + deny_reason.
- Reason is scanDenyReason-sanitized + truncated to
DENY_REASON_MAX_LENGTH BEFORE any persistence — agent and audit
both read sanitized form; raw input never stored.
- Same rate-limit namespace as approve.
Tests: +64 total (cedar-policy-parser 24, approval-scope 28,
deny-reason-scanner 13, approve-task 14, deny-task 9). Secret test
fixtures are assembled from string fragments so the source never
holds a contiguous secret literal — Code Defender pre-commit hook
otherwise blocks.
Stack wiring (task-api.ts routes, agent.ts layer attachment,
CreateTaskFn extension, orchestrator + reconciler + fanout +
LinkSlackUserFn + GetPolicies + GetPending) lands in the next
commit.
* feat(cedar-hitl): get-pending + get-policies + link-slack-user handlers
Lands the three read/discovery handlers Chunk 6 (CLI) needs to power
``bgagent pending``, ``bgagent policies list/show``, and
``bgagent notifications configure slack``. Completes §15.2 tasks
14, 15, and 25 (handler side).
Handlers
--------
``get-pending.ts`` (§7.7 — GET /v1/pending)
- Queries ``user_id-status-index`` GSI on TaskApprovalsTable with
``user_id = :caller AND status = :pending``. Without the GSI
this would be a full-table Scan per call — under
``watch -n1 bgagent pending`` that exhausts burst capacity for
the whole fleet (§10.1 finding #8).
- Response maps each row to ``PendingApprovalSummary`` with a
derived ``expires_at = created_at + timeout_s`` so the CLI can
render time-to-timeout without doing arithmetic on ISO strings.
- Severity coerced to ``medium`` on unknown values so GSI writes
that drift from the enum don't break the list response.
- Rate-limited 10/min/user (synthetic row on the same table,
namespaced ``RATE#<user>#PENDING`` so it does not collide with
the approve/deny counter).
``get-policies.ts`` (§7.6 — GET /v1/repos/{repo_id}/policies)
- Combines ``BUILTIN_HARD_DENY_POLICIES`` + ``BUILTIN_SOFT_DENY_POLICIES``
with the repo's ``cedar_policies`` blueprint override. Runs the
combined text through ``parseRules`` and returns
``{hard[], soft[]}`` rule summaries.
- 5-minute per-repo in-Lambda cache; cold starts throw it away.
``_resetCacheForTests`` exposed for unit-test isolation.
- Repo ID is URL-decoded from the path (``owner%2Frepo`` common in
CLI UX).
- Rate-limited 30/min/user.
- Blueprint load failure falls back to built-ins with a WARN log;
invalid blueprint cedar text returns 503 ``SERVICE_UNAVAILABLE``
rather than a misleading empty list.
``link-slack-user.ts`` (§11.2 finding #4 — POST /v1/notifications/slack/link)
- Writes to SlackUserMappingTable with
``ConditionExpression: attribute_not_exists(slack_user_id)``. This
guard is the entire admission control the §11.2 design hinges on:
even a compromised Slack admin cannot overwrite an existing
mapping.
- Validates ``slack_user_id`` shape (letters, digits, underscores,
2–40 chars) so junk rows cannot land.
- Conflict surface is 409 ``REQUEST_ALREADY_DECIDED`` — reused
error code (the payload message directs the user to unlink via
support).
- Slack link_token end-to-end validation against Slack OAuth is
deferred — v1 accepts the token on trust from the Cognito-authed
caller; it is persisted in CloudWatch for audit.
Supporting primitives
---------------------
``shared/builtin-policies.ts`` — mirrors ``agent/policies/hard_deny.cedar``
and ``agent/policies/soft_deny.cedar`` as TypeScript string constants.
Embedded rather than read from disk because Lambda's esbuild bundler
does not copy non-TS assets by default and a dedicated bundling hook
is more code than the embed. A drift test
(``builtin-policies.test.ts``) asserts byte-equality with the agent
files so any change on one side without the other flips red at build
time.
``shared/cedar-policy.ts`` — ``parseRules`` now skips the unannotated
``base_permit`` entry (both tiers need it as a Cedar catch-all; it
is not a user-facing rule so it stays out of ParsedRule[]). This
matches the agent-side ``_parse_policy_annotations`` behaviour.
Tests: +37 total.
- get-pending (8): 401 on missing auth, 429 on rate limit, empty
result, GSI query shape, row → PendingApprovalSummary with
derived expires_at, severity fallback, missing timeout → expires_at
falls back to created_at, 500 on DDB error.
- get-policies (11): 401/400 validation, built-in rules listed on
empty repo, URL-decoded repo path, custom blueprint rule lands
in soft, per-repo cache across calls, 429 rate limit, 503 on
invalid blueprint cedar, fallback on load failure, hard rules
omit severity / approval_timeout_s, soft rules carry them.
- link-slack-user (8): 401/400 validation, shape check, 201 on
success, 409 on overwrite attempt, 500 on unknown DDB error,
whitespace trim on slack_user_id, ConditionExpression verified.
- builtin-policies (4): drift byte-equality with both agent files,
parseRules round-trip for hard/soft rule IDs.
- cedar-policy (updated): ``base_permit`` is skipped from
ParsedRule[] rather than rejected.
Stack wiring (task-api.ts routes, agent.ts layer attachment,
CreateTaskFn extension, orchestrator + reconciler + fanout) lands in
the next commit.
* feat(cedar-hitl): wire Chunk 5 routes + orchestrator + reconciler + agent plumbing
Completes Chunk 5 end-to-end: the five new Lambdas are instantiated
and wired onto the REST API, the orchestrator threads approval-related
data through to the agent runtime, the stranded-task reconciler sweeps
AWAITING_APPROVAL tasks, and the agent pipeline accepts the new
per-task approval configuration.
Stack wiring (agent.ts + task-api.ts)
-------------------------------------
- TaskApi construct accepts `taskApprovalsTable`, `slackUserMappingTable`,
`cedarWasmLayer` props. Approve/Deny/GetPending Lambdas are created
when the approvals table is present; GetPolicies also requires the
cedar-wasm layer + RepoTable. Slack-link Lambda attaches when the
slack mapping table is provided.
- New routes:
POST /tasks/{task_id}/approve
POST /tasks/{task_id}/deny
GET /pending
GET /repos/{repo_id}/policies
POST /notifications/slack/link
- GetPoliciesFn configures `memorySize: 512` (Cedar-wasm floor from
§15.2 task 10) and externalizes `@cedar-policy/cedar-wasm` from the
esbuild bundle so the layer provides the wasm binary at runtime.
- CedarWasmLayer compatibleRuntimes extended to include nodejs24.x
(the Lambda runtime) — the Node 20/22 list was the original §15.2
spec but the actual function uses Node 24.
- agent.ts passes all three new constructs into TaskApi.
Orchestrator (shared/orchestrator.ts)
-------------------------------------
- `finalizeTask` now treats AWAITING_APPROVAL as a "task still alive"
terminal-timeout source: on poll exhaustion the task transitions to
TIMED_OUT with a distinct `approval_poll_timeout` reason + error
message ("Orchestrator poll timeout exceeded while awaiting approval").
The stranded-approval reconciler is the secondary safety net (§13.6)
for tasks the orchestrator already lost track of.
- Invocation payload now carries three new fields:
- `task_started_at` (ISO 8601 at HYDRATING → RUNNING time) —
consumed by the agent hook's `_remaining_maxlifetime_s` so the
§6.5 maxLifetime ceiling math uses the real task clock instead
of the fail-open fallback.
- `approval_timeout_s` (when the submit payload supplied it).
- `initial_approvals` (when the submit payload supplied entries).
Stranded-task reconciler
------------------------
- Sweeps AWAITING_APPROVAL in addition to SUBMITTED/HYDRATING.
- New `APPROVAL_STRANDED_TIMEOUT_SECONDS` env var (default 7200s =
2h) — double §7.3's 1h ceiling so this reconciler never races the
happy-path timer.
- Distinct failure message on approval-stranded vs generic-stranded
so users see "approval stranded — container evicted" rather than
the misleading "no pipeline attached" copy.
Fanout (handlers/fanout-task-events.ts)
---------------------------------------
- Slack channel default set replaces the forward-compat
`approval_required` stub with the real §11.1 events:
`approval_requested` and `approval_stranded`. Other approval
milestones (granted/denied/timed_out/late_win/etc.) stay out of
default routing to avoid notification fatigue — the CLI surfaces
those confirmations directly.
- Email default replaces `approval_required` with `approval_requested`
(high-severity gates only; severity gating happens in the dispatcher).
Create-task validation (shared/create-task-core.ts)
---------------------------------------------------
- New request fields:
- `approval_timeout_s` — integer within
`[APPROVAL_TIMEOUT_S_MIN, APPROVAL_TIMEOUT_S_MAX]`.
- `initial_approvals` — array of scope strings; each entry must
be a valid `ApprovalScope` per `parseApprovalScope`; bash_pattern
and write_path scopes get the §7.4 degenerate-pattern check.
- TaskRecord extended with `approval_timeout_s`, `initial_approvals`,
`approval_gate_count` (seeded to 0 at admission), and
`awaiting_approval_request_id` (written atomically by the agent's
`transact_write_approval_request` primitive).
Agent plumbing (models.py / config.py / pipeline.py / runner.py / server.py)
----------------------------------------------------------------------------
- `TaskConfig` adds `approval_timeout_s`, `initial_approvals`.
- `build_config`, `run_task`, `_run_task_background`, and
`_extract_invocation_params` thread the two new fields from payload
→ config → PolicyEngine.
- `server._extract_invocation_params` stamps `os.environ["TASK_STARTED_AT"]`
from the payload so the hook's `_remaining_maxlifetime_s` returns
real values (carry-forward from Chunk 3 resolved).
- `runner.py` constructs PolicyEngine with `initial_approvals` +
`task_default_timeout_s` when supplied; the engine clamps bad
values at construction time.
Tests
-----
All CDK tests pass: 1219 / 1219.
All agent tests pass: 648 / 648.
Affected suites (changes only):
- test/stacks/agent.test.ts: cedar-wasm layer CompatibleRuntimes
now expects `nodejs24.x`; table count still 8.
- test/constructs/cedar-wasm-layer.test.ts: same runtime expansion.
- test/handlers/fanout-task-events.test.ts: approval_required →
approval_requested/approval_stranded in Slack default set;
approval_required → approval_requested in Email default set.
- test/handlers/reconcile-stranded-tasks.test.ts: primeResponses
now queue a third `Items: []` for AWAITING_APPROVAL queries;
queryCalls assertion bumped to 3.
Carry-forward (non-blocking)
----------------------------
- GetPoliciesFn has write access to TaskApprovalsTable (for the
rate-limit counter path). A future permissions audit should
tighten this to a single-item write scoped to `RATE#<user>#*`.
- TASK_STARTED_AT env var is only set when a payload supplies it;
server.py still supports the Phase 2 no-payload startup path.
* feat(cedar-hitl): Chunk 6 CLI — approve / deny / pending / policies
Ships the four user-facing commands that close the Cedar HITL loop:
once Chunks 1-5 have a PENDING approval row and the Slack/Email fan-out
has notified the user, Chunk 6 is how they actually respond.
New commands (cli/src/commands/)
--------------------------------
- `bgagent approve <task-id> <request-id> [--scope <scope>] [--yes]`
Default scope is `this_call`; callers extend allowlist with
`tool_type:Bash`, `rule:<id>`, etc. `all_session` is the only scope
that requires `--yes` to confirm — mirrors the safety UX from
§8.4. Error classification maps 404 → "run `bgagent pending`", 409
→ "task no longer awaiting approval", 429 → rate-limit, 401 → login.
- `bgagent deny <task-id> <request-id> [--reason ... | --reason-file ...]`
`--reason-file` accepts multi-line reasons that would otherwise
need shell quoting. Client-side `DENY_REASON_MAX_LENGTH` cap avoids
a round-trip on obviously-too-long reasons; the server still
truncates. Reason is sanitized server-side (output_scanner) before
ever reaching the agent.
- `bgagent pending [--output text|json]`
Lists every PENDING approval owned by the caller. Rendered with
approve/deny hints inline so the user can copy-paste the next
command. JSON output for scripting. Rate-limited server-side.
- `bgagent policies list --repo <owner/repo> [--tier hard|soft]`
`bgagent policies show --repo <owner/repo> --rule <rule_id>`
Discovery commands so users can find rule IDs without reading CDK
source. Both subcommands reuse a single `listPolicies` API call
and filter locally.
Wire changes
------------
- `cli/src/api-client.ts`: `approveTask`, `denyTask`, `listPending`,
`listPolicies` — each matching the §7.1 / §7.2 / §7.6 / §7.7
request/response shapes. `approveTask` omits the `scope` body field
when unset so the server's `this_call` default applies.
- `cli/src/types.ts`: mirrors the Chunk 5 server types verbatim —
`ApprovalScope` union, `ApprovalRequest/Response`, `DenyRequest/Response`,
`PendingApprovalSummary`, `GetPoliciesResponse`, `PolicyRuleSummary`,
plus the five constants (`DENY_REASON_MAX_LENGTH`,
`INITIAL_APPROVALS_MAX_ENTRIES`, `INITIAL_APPROVALS_MAX_ENTRY_LENGTH`,
`APPROVAL_TIMEOUT_S_MIN/MAX/DEFAULT`).
- `cli/src/bin/bgagent.ts`: registers the four new commands in the
order they appear in help output.
Tests: +27 new (217 total).
- approve (9): default scope, custom scope, all_session guard +
`--yes` bypass, JSON output, 404/409/401/429 error classifications.
- deny (6): no-reason path, `--reason`, `--reason-file` with
tmpdir fixture, mutually-exclusive rejection, over-length rejection,
404 classification.
- pending (5): empty render, populated render with approve/deny
hints, JSON output, 401 and 429 classifications.
- policies (7): list both tiers, `--tier` filter, `--output json`,
bad `--tier`, show found rule, show unknown rule, 404
repo-not-onboarded classification.
Carry-forward
-------------
- `submit.ts` extension with `--approval-timeout` / `--pre-approve`
flags is deferred to a follow-up commit — the server already accepts
these fields on POST /v1/tasks (Chunk 5), and `bgagent submit`
already forwards unknown payload fields through the existing
request path, so users can set them via `--body-file` today until
the explicit flags land.
- `--output json` on error branches currently returns a CliError
instead of a JSON error envelope; matches the pattern the existing
commands use (status, cancel, nudge). Follow-up to standardize
JSON error envelopes across the whole CLI if that becomes a
common scripting pain point.
* feat(cedar-hitl): Chunk 7a — persist gate counter + IMPL-23 cache observability
Persist approval_gate_count to TaskTable across container restarts per
§13.6 so the cumulative gate budget survives eviction. Emit
pre_approvals_loaded after PolicyEngine init per §4 step 7 / §11.1 so
operators see the starting approval posture in the live SSE stream.
Add IMPL-23 cache-hit observability: cache hits attach metadata to
PolicyDecision, hook forwards to new write_policy_decision_cached
progress helper (decision_source="recent_decision_cache").
Why: container restarts were silently resetting the per-task gate
counter, re-exposing users to another approvalGateCap-worth of gates
per restart. Cache-driven denies were invisible in TaskEventsTable
beyond the initial gate. Fresh tasks emitted no "starting posture"
signal so dashboards could not distinguish "no pre-approvals seeded"
from "agent has not started".
Surface additions:
- task_state.increment_approval_gate_count_in_ddb — best-effort
atomic ADD on approval_gate_count
- PolicyEngine(initial_approval_gate_count=N) — seed session counter
- TaskConfig.initial_approval_gate_count — orchestrator payload field
- progress_writer.write_policy_decision_cached — IMPL-23 emitter
- PolicyDecision.cache_hit_metadata — observability-only field
- _CachedDecision.original_decision_ts — wall-clock preservation
- runner._initialize_policy_engine_and_hooks — extracted helper
Counter survival is a safety bound, not correctness: DDB failure
does NOT block the gate (§13.6). Joint-update invariant on status
+ awaiting_approval_request_id (§10.2) is preserved — counter uses
separate UpdateItem, not merged into resume transaction.
Tests: +36 agent (648→684), +8 CDK (1219→1227), +6 new runner tests.
* feat(cedar-hitl): Chunk 7b — persist approval_gate_cap from blueprint
Capture the per-task approval-gate cap at submit-time (§4 step 5,
decision #13, §13.6) so a blueprint-configured override is frozen
onto the TaskRecord. Mid-task blueprint edits cannot shift the cap
beneath a running task; container restarts re-seed the agent's
PolicyEngine from the persisted value instead of its compile-time
default-50.
Why: Chunk 7a added approval_gate_count persistence but the cap
itself was still resolved from the blueprint on every restart —
so an operator lowering security.approvalGateCap mid-task would
retroactively fail-close the running task. The design has always
said cap is frozen at submit; this chunk makes the implementation
match.
Surface additions:
- BlueprintProps.security.approvalGateCap (CDK, synth-validated
[1, 500] integer) — new per-repo blueprint prop
- RepoConfig.approval_gate_cap + BlueprintConfig.approval_gate_cap
- TaskRecord.approval_gate_cap + APPROVAL_GATE_CAP_{MIN,MAX,DEFAULT}
- create-task-core now calls loadRepoConfig, resolves cap, bounds-
checks, persists; returns 503 SERVICE_UNAVAILABLE on invalid
blueprint data (permanent until admin re-deploys, not transient)
- orchestrator.ts: isValidApprovalGateCap integer+bounds guard;
logs warn if a persisted cap is structurally invalid (schema
drift / hand-edited DDB row)
- TaskConfig.approval_gate_cap: int | None = None (agent-side);
runner threads to PolicyEngine kwarg when not None
- "Task created" log line now carries approval_gate_cap +
approval_gate_cap_source ("blueprint" | "platform_default") so
operators can detect a broken-plumbing deploy at the single
chokepoint where all fallback layers converge
Per silent-failure review:
- HIGH: 500 → 503 + logger.error for permanent misconfig
- HIGH: cap + source in task-created log (catches 4-layer cascade)
- MEDIUM: orchestrator guard tightened past typeof (NaN, Infinity,
floats, out-of-bounds all omitted + warned)
Tests: CDK 1263/1263 (+36), agent 694/694 (+10). CLI unchanged.
* feat(cedar-hitl): Chunk 7c — observability wrap-up for resolved cap + warn path
Close three deferred items from Chunks 7a/7b before Chunks 8-10:
- runner.py init log now carries approval_gate_cap=N +
approval_gate_cap_source=threaded|engine_default. Matches the
handler log key so CloudWatch Insights can join across the
cascade; agent can't distinguish blueprint-override from
platform-default-frozen (handler log is the ground truth).
- server.py adds _warn_cw helper routing [server/warn] lines to
a dedicated CloudWatch stream (server_warn/<task_id>). stdout
print is preserved for local dev + existing capsys tests.
AgentCore does not forward container stdout to APPLICATION_LOGS,
so pre-7c warnings about malformed invocation payloads were
invisible in production. Failure counter shared with _debug_cw
for a single alarm surface; hoisted above writer defs for
import-time ordering safety.
- blueprint.ts emits a synth-time info annotation when
security.approvalGateCap is omitted so operators see a signal
that the repo will rely on the platform default of 50. Without
this, the default was a silent fallback at the handler layer —
only visible by inspecting a TaskRecord at runtime.
Tests: agent 694→700 (+6), cdk 1263→1265 (+2), cli unchanged.
Design refs: §4 step 5, §11.1, §13.6, decision #13.
* feat(cedar-hitl): Chunk 8a — extend approval outcome event schema
Add created_at / effective_timeout_s / matching_rule_ids to
approval_granted / approval_denied / approval_timed_out events so
the incoming ApprovalMetricsPublisher Lambda (Chunk 8b) can compute
decision latency and emit a rule_id-dimensioned timeout breakdown
without a round-trip GetItem against TaskEventsTable.
Fields are added conditionally — omitted from metadata when the
caller did not supply them — so the event stream stays free of
null-value noise and legacy callers continue to produce valid
payloads. Publisher handles missing fields via explicit skip-and-log
on the specific metric branch (not fallback-to-zero).
Agent tests extended: +6 progress_writer tests, +3 hooks tests.
Baseline 700 → 710. No consumer wired yet — this commit is a
forward-compatible superset; Chunk 8b ships the CDK publisher +
dashboard widgets.
* feat(cedar-hitl): Chunk 8b — ApprovalMetricsPublisher + native CloudWatch dashboard widgets
Ship the Cedar-HITL dashboard widgets from §11.3 / IMPL-28 via the
MetricsPublisher architecture (Option E):
- New ApprovalMetricsPublisher Lambda consumes TaskEventsTable DDB
stream as consumer #2 (FanoutConsumer is #1; stream is within its
2-consumer soft cap — documented in task-events-table.ts).
- Handler emits CloudWatch EMF for 3 metrics in namespace
ABCA/Cedar-HITL:
* ApprovalRequestCount + ClippedApprovalCount (reason dim) →
ApprovalTimeoutClipRate widget (MathExpression with IF-guard
against NaN on zero-denominator periods)
* TimedOutEffectiveTimeout (rule_id dim with allowlist
cardinality cap) → ApprovalTimeoutBreakdown widget
* ApprovalDecisionLatencyMs (outcome dim) → ApprovalDecisionLatency
widget with per-outcome p50/p90/p99
- Observability-of-observability (silent-failure review):
* MetricsPublisherHeartbeat per batch so dashboard gaps
distinguish "no traffic" from "pipeline broken"
* MetricEmitSkipped with a reason dim on schema mismatches,
parse anomalies, unknown rule ids — never fall back to
latency=0 or count=0 which would poison percentile widgets
* Expected high-volume skip reasons (non-milestone events,
REMOVE records) DO NOT emit MetricEmitSkipped — only
anomaly reasons (missing keys, missing milestone name) do,
so real signal isn't drowned
* Structured log lines alongside every skip so the absence of
metrics is also observable via CloudWatch Logs Insights
- Cardinality caps via ``RULE_ID_ALLOWLIST`` + ``normalizeClipReason``.
Unknown values collapse to ``other`` / ``unknown`` buckets with
dashboard series so the collapse is discoverable rather than
silently accruing custom-metric cost.
- Event-source-mapping filter pattern rejects non-agent_milestone
records at the service layer; handler-layer allowlist catches
anything that slips through. Filter pattern correctness tested
structurally + positively/negatively probed (silent-failure H3).
- Per-record try/catch + reportBatchItemFailures + SQS DLQ mirror
the fanout-task-events.ts poison-pill pattern exactly.
Deferred to Chunk 10 chore issues:
- DLQ alarms (fanout + publisher) — fire-into-void until
notification channel lands, so wire with §11.5 alarms as a group
- Explicit log-group declaration (IAM drift defense)
- stdout-flush race documentation (pre-existing pattern in fanout)
- EMF 100-updates/sec throttle alarm
Tests: cdk 1265 → 1327 (+62); agent 710 (unchanged); cli 217
(unchanged). All pass. §11.5 alarm plumbing now unblocked —
publisher provides the metrics infrastructure the design always
intended; only the notification-channel SNS wiring is left.
* docs(cedar-hitl): Chunk 9 — sync design doc to Chunks 7b / 8a / 8b
Bring CEDAR_HITL_GATES.md current with the code that shipped in
Chunks 7b (approval_gate_cap persist), 8a (outcome event schema
superset), and 8b (ApprovalMetricsPublisher + dashboard widgets):
- §10.2 adds the missing approval_gate_cap row (carry-forward
drift from Chunk 7b). Bounds + frozen-at-submit semantics
documented.
- §11.1 outcome events (approval_granted / approval_denied /
approval_timed_out) now document the Chunk 8a optional fields
(created_at, effective_timeout_s, matching_rule_ids) plus the
publisher's skip-on-missing-field policy.
- §11.1 intro names ApprovalMetricsPublisherFn as consumer #2 and
points to §11.3 for the metric schema.
- §11.3 rewritten to describe the Option E architecture:
publisher Lambda + EMF + native CloudWatch metrics in namespace
ABCA/Cedar-HITL, MathExpression with divide-by-zero guard,
rule_id cardinality cap, observability-of-observability via
heartbeat + skip meta-metrics, widget layout (12/12 over 24),
2-consumer stream budget. Dropped the stale "Retired the old
bundled widget" line — that widget never shipped.
- §11.5 reframed as "deferred (notification-channel gated)" with
a plumbing-status paragraph noting the metric infra now exists;
only SNS wiring remains. Alarm list expanded to include DLQ
and publisher-health alarms.
- §16 IMPL-28 rewritten for Option E; §15.2 row 46 expanded to
reference the 4 new test files; Appendix B checklist updated.
Starlight mirror regenerated via ``cd docs && node
scripts/sync-starlight.mjs``.
No code changes. Test baselines unchanged. Adversarial
comment-analyzer review verified every new claim against
committed code — zero inaccuracies.
* feat(cedar-hitl): Chunk 10 review fixes — close 2 blockers + tighten 2 mediums
Full-branch adversarial review (code-reviewer + silent-failure-hunter
on all 18 commits) surfaced findings that only appear at final-state.
Addressing the blockers + low-cost meds before deploy:
B2 — stranded approvals were invisible to the dashboard:
- Reconciler writes ``event_type: 'task_stranded'``; the metrics
publisher's event-source filter only accepts
``event_type: 'agent_milestone'``, so AWAITING_APPROVAL evictions
produced zero §11.3 signal.
- Fix: reconciler now additionally emits an ``agent_milestone``
with ``milestone: 'approval_stranded'`` when the stranded task
was AWAITING_APPROVAL. Publisher allowlist extended; classifier
emits ``ApprovalStrandedCount`` counter. SUBMITTED / HYDRATING
stranded events unchanged (guarded by test).
B1 — heartbeat comment was false reassurance:
- Event-source filter blocks Lambda invocation when no
``agent_milestone`` records exist in the poll window, so a
quiet period produces the same widget gap as a broken
pipeline. The code + design-doc wording claimed "gap =
pipeline broken" which would mislead the on-call.
- Fix: corrected module + function docstrings to describe the
heartbeat as "present when active, not pipeline-alive-always."
Operators should alarm on the combination
(heartbeat-absent + recent TaskEventsTable traffic) or wire
a scheduled canary — the latter tracked as a §11.5 follow-up.
M1 — safety-critical milestones produced zero dashboard signal:
- ``approval_cap_exceeded`` (§12.9 per-task cap) and
``approval_rate_limit_exceeded`` (per-user per-minute rate)
were emitted by the agent but not on the publisher allowlist.
A production bug where every gate hit the cap would have
been invisible.
- Fix: both added to APPROVAL_METRIC_MILESTONES with
``ApprovalCapExceededCount`` / ``ApprovalRateLimitExceededCount``
counters. No dimensions — the request_id in the event carries
per-user correlation for ad-hoc log-insights investigation.
H2 — filter / handler eventName disagreement:
- Event-source filter required ``INSERT``; handler accepted
``INSERT`` and ``MODIFY``. Benign today (TaskEventsTable is
put-only), but a future chunk MODIFY-ing records would be
silently dropped by the filter while the handler was ready
to process them.
- Fix: handler now INSERT-only, matching the filter. Single
source of truth on the eventName invariant.
M1-rename — ``expected_non_approval_milestone`` skip reason was
misleading (the non-metric approval milestones like
``approval_late_win`` also land in this bucket). Renamed to
``expected_milestone_not_tracked``.
Tests: cdk 1327 → 1332 (+5: 3 classifier branches for new metrics,
1 reconciler AWAITING_APPROVAL path, 1 SUBMITTED-not-double-counted
guard). Agent + CLI unchanged. All pre-commit hooks green; pre-push
security fails only on the 3 pre-existing CVEs tracked for chore
issue filing.
Deferred findings from the same review (file as chore issues):
- H1: agent-dies-between-TIMED_OUT-and-resume loses latency
(edge, affects p99 bias)
- H3: late-win APPROVED created_at staleness invariant
(works today, document invariant)
- H4: _warn_cw daemon-thread burst under adversarial payload
- M2-M4: late-win metric, rename helpers, etc.
No upstream PR filing this chunk — deploy to Sam's AWS account
for integration testing first.
* fix(cedar-hitl): suppress AwsSolutions-IAM5 on Runtime ExecutionRole overflow policies
Synth + deploy were blocked by cdk-nag: the Cedar HITL additions
(TaskApprovalsTable grant + SlackUserMappingTable + extra env vars
threaded to the AgentCore runtime) pushed the runtime ExecutionRole
past CDK's inline-policy size limit, so CDK auto-splits excess
statements into ``OverflowPolicy1``. The overflow inherits the same
wildcard ``bedrock:InvokeModel*`` / CloudWatch actions as the base
policy but lives at a path
(``Runtime/ExecutionRole/OverflowPolicy1/Resource``) that the
existing ``addResourceSuppressions(runtime, ..., applyToChildren:
true)`` cannot reach — CDK creates overflow policies lazily during
synth ``prepare()``, after the construct tree has been frozen and
after static suppressions have been cached.
Suppress via an Aspect at MUTATING priority so the suppression is
applied before cdk-nag's READONLY visitor runs. Matches any path
containing ``/Runtime/ExecutionRole/OverflowPolicy`` + ending
``/Resource`` so future ``OverflowPolicy2``, etc. are covered
without hardcoding indices.
Verified: ``mise //cdk:synth`` now completes cleanly.
``mise //cdk:test`` still 1332/1332.
* fix(cedar-hitl): E2E deploy-readiness — policies bundle + onboarding gate + CLI error visibility
Three E2E T1.4 + T2.2 findings from the Chunk 10 integration-test
session. Batched into one commit since all three need the same
redeploy to verify:
1. agent/Dockerfile: COPY policies/ into the container image.
``PolicyEngine.__init__`` reads
``/app/policies/hard_deny.cedar`` + ``soft_deny.cedar`` at import
time via ``_POLICIES_DIR = Path(__file__).parent.parent /
"policies"``. The Dockerfile only copied ``src/``, so the
directory was missing and every Cedar-HITL task failed at 0 turns
with ``missing built-in hard-deny policies``. Introduced
alongside Chunk 2 when the policy files were first added —
Dockerfile was never updated. Zero tasks on this branch ever
su…
5 tasks
isadeks
pushed a commit
to isadeks/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 26, 2026
…ty contract Three remaining substantive review items from PR aws-samples#160: - Validate all 11 fields in getOauthSecret (review non-blocking aws-samples#7). Was checking only access_token / refresh_token / expires_at; missing client_id or client_secret only surfaced 24h later when the refresh call needed them and found undefined. Extracted the required-field list into a const next to the StoredOauthToken interface and check the full set at deserialization. Bad secrets fail fast at fetch time with a structured log line naming the missing fields. - CallbackResult discriminated union (review non-blocking aws-samples#6). Was `{ sessionId: string|null, code: string|null, state: string|null }` which let callers construct unreachable shapes. Split into `{ kind: 'agentcore', sessionId } | { kind: 'direct-oauth', code, state }`. Updated the resolver site (`oauth-callback-server.ts`), the consumer (`bgagent linear setup`), and the test file to use exhaustive type-narrowing. The setup wizard now errors clearly if it gets the agentcore shape (parked path) instead of silently passing nulls down. - Cross-language schema-parity contract test (review non-blocking #3). CLI's StoredLinearOauthToken and Lambda's StoredOauthToken define the same JSON-in-Secrets-Manager schema independently; drift between the two would be a silent bug (CLI writes one field name, Lambda reads another, refresh works, every Lambda invocation logs a missing-field error). New test in `cdk/test/contracts/stored-oauth-token-parity.test.ts` regex-parses both interface definitions out of source and asserts the field set is equal. Also asserts the new `STORED_OAUTH_TOKEN_REQUIRED_FIELDS` const matches the interface, so future field additions can't drift between the validator and the type. CLI tests 286/286 pass. CDK resolver + contract 13/13 pass.
isadeks
added a commit
to isadeks/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 26, 2026
…ty contract Three remaining substantive review items from PR aws-samples#160: - Validate all 11 fields in getOauthSecret (review non-blocking aws-samples#7). Was checking only access_token / refresh_token / expires_at; missing client_id or client_secret only surfaced 24h later when the refresh call needed them and found undefined. Extracted the required-field list into a const next to the StoredOauthToken interface and check the full set at deserialization. Bad secrets fail fast at fetch time with a structured log line naming the missing fields. - CallbackResult discriminated union (review non-blocking aws-samples#6). Was `{ sessionId: string|null, code: string|null, state: string|null }` which let callers construct unreachable shapes. Split into `{ kind: 'agentcore', sessionId } | { kind: 'direct-oauth', code, state }`. Updated the resolver site (`oauth-callback-server.ts`), the consumer (`bgagent linear setup`), and the test file to use exhaustive type-narrowing. The setup wizard now errors clearly if it gets the agentcore shape (parked path) instead of silently passing nulls down. - Cross-language schema-parity contract test (review non-blocking #3). CLI's StoredLinearOauthToken and Lambda's StoredOauthToken define the same JSON-in-Secrets-Manager schema independently; drift between the two would be a silent bug (CLI writes one field name, Lambda reads another, refresh works, every Lambda invocation logs a missing-field error). New test in `cdk/test/contracts/stored-oauth-token-parity.test.ts` regex-parses both interface definitions out of source and asserts the field set is equal. Also asserts the new `STORED_OAUTH_TOKEN_REQUIRED_FIELDS` const matches the interface, so future field additions can't drift between the validator and the type. CLI tests 286/286 pass. CDK resolver + contract 13/13 pass.
isadeks
added a commit
to isadeks/sample-autonomous-cloud-coding-agents
that referenced
this pull request
May 26, 2026
… 2.0b) (aws-samples#160) * feat(linear): resolve API token via AgentCore Identity (Phase 2.0a) Migrates the agent runtime's Linear personal API token resolution from AWS Secrets Manager to AWS Bedrock AgentCore Identity. This is the "validate Identity SDK" step of the v2 plan; Phase 2.0b will swap the API key for OAuth and converge Linear MCP onto AgentCore Gateway in one cutover. Per Alain's guidance: "start by using api key, if it works, switch to oauth. you will setup an outbound auth for your server using agentcore identity. that identity can be (AC identity is like a wrapper around secrets manager) api key or oauth." Lambdas (orchestrator + processor) intentionally keep using Secrets Manager via the existing `LinearApiTokenSecret` for now. The Python `bedrock_agentcore` SDK has no Node.js equivalent — Lambda migration requires `@aws-sdk/client-bedrock-agentcore` raw API calls and folds into 2.0b's bigger refactor. End-state of 2.0a: agent reads from Identity, Lambdas read from Secrets Manager, both pointing at the same underlying token value (admin populates both). `agent/src/config.py::resolve_linear_api_token`: - Drops boto3 SecretsManager fetch + `LINEAR_API_TOKEN_SECRET_ARN` env. - Reads new env `LINEAR_API_KEY_PROVIDER_NAME` (provider name in Identity vault). - Calls `IdentityClient.get_api_key()` with the workload access token auto-injected into `BedrockAgentCoreContext` by AgentCore Runtime (verified by reading the SDK's `auth.py` decorator implementation — no manual workload-identity mint needed inside the runtime). - Caches the resolved token in `LINEAR_API_TOKEN` so downstream consumers stay unchanged: `channel_mcp.py`'s `${LINEAR_API_TOKEN}` placeholder in `.mcp.json` and `linear_reactions.py`'s GraphQL Authorization header. Preserves PR aws-samples#87's nice-to-have improvements: - `ImportError` graceful fallback (now for `bedrock_agentcore` instead of `boto3`) — degrade with WARN, don't crash the agent. - `AccessDeniedException` and `ResourceNotFoundException` logged at ERROR severity (persistent IAM/config bugs that should page). Other ClientErrors stay at WARN (transient throttle/network). `agent/pyproject.toml`: adds `bedrock-agentcore==1.9.1` dep. `cdk/src/stacks/agent.ts`: - On the AgentCore runtime: drops `linearIntegration.apiTokenSecret. grantRead(runtime)` and the `LINEAR_API_TOKEN_SECRET_ARN` env-var override. Adds `LINEAR_API_KEY_PROVIDER_NAME` env (hardcoded `'linear-api-key'` for now; can parametrize later via context if multi-environment naming is needed) and IAM permissions for `bedrock-agentcore:GetResourceApiKey` and `bedrock-agentcore:GetWorkloadAccessToken`. - Lambdas (orchestrator + processor) untouched — they still grant on the Linear secret and read from Secrets Manager. - Resource scope on the new IAM is `*` for now; AgentCore Identity ARN format isn't fully standardized in public docs as of 2026-05-15. Tighten in 2.0b when OAuth migration documents the canonical resource shape. `docs/guides/LINEAR_SETUP_GUIDE.md`: adds Step 4.5 documenting the one-time `agentcore add credential --type api-key --name linear-api-key` admin command users must run alongside the existing `bgagent linear setup` wizard. Notes that Lambdas keep Secrets Manager temporarily and 2.0b will retire the dual-store setup. Starlight mirror synced. `agent/tests/test_config.py::TestResolveLinearApiToken` — 10 tests covering: cached env var fast-path; missing provider name; missing region; workload token absent (outside runtime); happy path with env-var side-effect; botocore error swallowed with WARN; SDK returns None defensively; ImportError fallback; AccessDeniedException → ERROR severity; ResourceNotFoundException → ERROR severity. 542 agent / 1271 cdk / 196 cli, all green. Lint + typecheck clean. CDK synth clean. `bedrock_agentcore` SDK confirmed working in our runtime image (verified in `node_modules` post-install). The `BedrockAgentCoreContext` workload token auto-injection is documented behaviour for code running inside AgentCore Runtime — verified by reading the SDK's `@requires_api_key` decorator implementation, which uses the same context lookup we use here. Stacked on PR aws-samples#87 (`feat/linear-processor-feedback`). Will conflict on `config.py` and `test_config.py` if aws-samples#87 needs further rework before merge — happy to rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(linear): use aws CLI for credential provider, not the agentcore command The setup guide referenced `agentcore add credential` which doesn't actually work end-to-end: - The Python `bedrock-agentcore-starter-toolkit` CLI (`agentcore`) only exposes agent-lifecycle commands; there is no `credential-provider` subcommand. Confirmed by reading the toolkit's CLI reference and by user trying `agentcore configure credential-provider --type api-key --name ...` and receiving `No such command 'credential-provider'`. - The new npm `@aws/agentcore` CLI does have `agentcore add credential` but uses a declarative project model — the credential lands in `agentcore.json` + `.env.local`, not the actual AgentCore Identity vault, until `agentcore deploy` runs against a project structured for that CLI. ABCA isn't structured that way. Switch the docs to the plain AWS CLI which works directly against the AgentCore Identity API: aws bedrock-agentcore-control create-api-key-credential-provider \ --name linear-api-key \ --api-key "<paste lin_api_… token here>" \ --region us-east-1 Plus the matching `list-api-key-credential-providers` for verification. Add a "Tooling note" at the bottom of the section explaining why the plain AWS CLI is the right path here vs. the two `agentcore` CLIs. Starlight mirror synced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(linear): pass runtimeUserId so AgentCore injects WorkloadAccessToken Smoke on backgroundagent-dev caught a real bug in the Phase 2.0a migration: the agent's `resolve_linear_api_token()` was correctly calling `IdentityClient.get_api_key()` but failing earlier at `BedrockAgentCoreContext.get_workload_access_token()` returning None. The Linear MCP then loaded with an unresolved `${LINEAR_API_TOKEN}` placeholder and 👀 didn't post. Root cause (from reading bedrock-agentcore-sdk-python source): The `WorkloadAccessToken` request header (which the runtime container reads to populate `BedrockAgentCoreContext`) is only injected by AgentCore Identity when `InvokeAgentRuntimeCommand` is called with `runtimeUserId`. Per AWS docs at https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-oauth.html: "Agent Runtime exchanges this token for a Workload Access Token via bedrock-agentcore:GetWorkloadAccessTokenForJWT API and delivers it to your agent code via the payload header `WorkloadAccessToken`." Without `runtimeUserId`, AgentCore never derives a workload token and the header is absent. `app.py::_build_request_context` reads the header off the inbound request; the agent sees None. Fix: 1. Thread `userId` through the `ComputeStrategy.startSession` interface (compute-strategy.ts). 2. Pass `task.user_id` (the task's Cognito sub) at the call site in orchestrate-task.ts. 3. Set `runtimeUserId: input.userId` on `InvokeAgentRuntimeCommand` in agentcore-strategy.ts. Log it alongside session_id for traceability. 4. ECS strategy accepts the new parameter to satisfy the interface; doesn't use it (ECS doesn't go through AgentCore Identity). 5. Grant the orchestrator role `bedrock-agentcore:InvokeAgentRuntimeForUser` alongside `InvokeAgentRuntime` (task-orchestrator.ts). Without this, the new `runtimeUserId` parameter would 403. Tests updated: - `agentcore-strategy.test.ts`: pin that `runtimeUserId` flows from input into the SDK command; pass `userId: 'cognito-user-1'` in 4 call sites. - `ecs-strategy.test.ts`: pass `userId` (unused by ECS) on 3 call sites. - `start-session-composition.test.ts`: pass `userId: 'cognito-test'` on 3 call sites. - `task-orchestrator.test.ts`: assert the IAM action list includes `InvokeAgentRuntimeForUser` (2 assertions). 542 agent / 1273 cdk / 196 cli — all green. Lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(linear): two undocumented gotchas to make AgentCore Identity actually work End-to-end smoke on backgroundagent-dev surfaced two more silent failure modes after the runtimeUserId fix landed: `BedrockAgentCoreContext.get_workload_access_token()` returned None inside the pipeline thread even though the platform delivered the token on the inbound request. Cause: Python ContextVar storage is per-thread, not shared across `threading.Thread` boundaries. Our `_run_task_background` spawns a new thread for the pipeline, so any context-var the SDK's middleware sets in the request handler thread doesn't reach it. Compounding factor: the SDK's `_build_request_context` middleware only runs when using `BedrockAgentCoreApp` from `bedrock_agentcore.runtime`. Plain FastAPI apps like ours never get that bridge at all. Fix: read the workload token off the request in `_extract_invocation_params` (handling both observed header spellings — `WorkloadAccessToken` and `x-amzn-bedrock-agentcore-runtime-workload-accesstoken`), thread it through the kwargs of `_run_task_background`, and have the pipeline thread call `BedrockAgentCoreContext.set_workload_access_token` on entry. (cdk/src/stacks/agent.ts) After (1) was applied, `IdentityClient.get_api_key()` actually fired and got `AccessDeniedException: ... not authorized to perform: secretsmanager:GetSecretValue`. Cause: AgentCore Identity stores api-key credentials in Secrets Manager under reserved prefix `bedrock-agentcore-identity!*` (the actual ARN shape: `arn:aws:secretsmanager:REGION:ACCOUNT:secret:bedrock-agentcore- identity!default/apikey/<provider-name>-<hash>`). The `GetResourceApiKey` control-plane API surfaces the underlying secret to the caller, and AWS verifies the *caller* role (our runtime role) has `GetSecretValue` on the actual secret resource — not the SLR. Fix: grant the runtime role `secretsmanager:GetSecretValue` scoped to the `bedrock-agentcore-identity!*` prefix in the current account/region. Tightly scoped to Identity-managed secrets; doesn't leak read access to other Secrets Manager resources. - Runtime container reads workload token from request, propagates across thread boundary, calls IdentityClient successfully - 👀 reaction posts at +525ms after task pickup, no warnings - Linear MCP loads cleanly with the resolved token - No more `workload access token not in context` WARN - No more `AccessDeniedException` from `GetResourceApiKey` Three undocumented requirements total for Phase 2.0a (combining with the runtimeUserId fix from the prior commit): 1. Caller (orchestrator) sends `runtimeUserId` and has `InvokeAgentRuntimeForUser` IAM 2. Runtime container bridges the workload-token header into the ContextVar, with per-thread propagation if the pipeline runs in a spawned thread 3. Runtime role has `secretsmanager:GetSecretValue` on `bedrock-agentcore-identity!*` All three are silent failures on their own; missing any one returns None or AccessDenied without obvious "you forgot X" diagnostics. Will file an upstream issue against `aws/bedrock-agentcore-sdk-python` summarising all three so others don't burn the same cycles. Tests: 542 agent / 1273 cdk / 196 cli — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(2.0b): foundation — workspace registry, admin invite-user, Linear app template Wave 1 of Phase 2.0b: prereq pieces for the Linear OAuth migration. - LinearWorkspaceRegistryTable: maps Linear org-id → AgentCore credential provider name, so webhook + orchestrator Lambdas can resolve the workspace's OAuth token without knowing about provider naming. - bgagent admin invite-user: wraps Cognito admin-create-user with the right defaults and prints a base64 bundle that --from-bundle decodes into ~/.bgagent/config.json. Replaces a four-flag dance with a single paste for joining teammates. - bgagent linear app-template: prints the Linear OAuth app form values captured from the 2.0b spike — GitHub username with [bot] suffix and Webhooks ON gate the actor=app flow; misleading "Invalid redirect_uri" error is the symptom when either is missing. - USER_GUIDE roles section + joining-an-existing-deployment flow: makes the four-role lifecycle explicit (stack admin / workspace admin / repo onboarder / teammate) so a teammate landing on the docs has a clear non-admin path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(linear): rewrite setup guide for OAuth (2.0b) Replace the personal-API-key flow with the Linear OAuth `actor=app` install path verified by the 2.0b spike. Major changes: - Step 1: AgentCore credential provider via `bgagent linear oauth-register-workspace`, capturing the AWS-hosted callback URL that Linear will actually see. - Step 2: Linear OAuth app creation via `bgagent linear app-template`, documenting the GitHub-username-with-[bot]-suffix and Webhooks-ON gates that produce Linear's misleading "Invalid redirect_uri" error when missing. - Step 4: OAuth dance via the rewritten `bgagent linear setup` — ephemeral localhost HTTPS callback; no own ALB/Lambda needed since AWS proxies the OAuth flow. - Step 7: clarify that the PAK-owner auto-link becomes the setup-runner auto-link; the manual DDB mapping path stays for now until self-service `@bgagent link` ships. - New "Adding additional Linear workspaces" section for multi-workspace deployments. - New "Migration from 2.0a (PAK) to 2.0b (OAuth)" runbook. - Troubleshooting expanded to cover the Invalid-redirect_uri and 401-from-Linear scenarios surfaced in the spike. Notes the docs reference commands shipping in Wave 2 (aws-samples#63 oauth-register-workspace, aws-samples#65 setup wizard, aws-samples#67 add-workspace) — the 2.0b branch is a coherent unit and aws-samples#62 must land before those flows are wired so the docs aren't a moving target during implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): workload-token retrieval helper for AgentCore Identity (2.0b C1) The CLI runs OUTSIDE AgentCore Runtime, so the in-container ContextVar trick from 2.0a does not apply. This module gives every 2.0b OAuth-flow command a single way to obtain a workload access token: - getWorkloadAccessToken({region, workloadName, userId}) calls the data-plane GetWorkloadAccessTokenForUserId, scoping the resulting token to (workload, cognito_sub) so OAuth-token retrieval is per platform user. - decodeCognitoSub(idToken) extracts the sub claim from the cached id_token, parsing only — token validation is API Gateway's job. - DEFAULT_CLI_WORKLOAD_NAME is the deployment-time convention; the workload identity itself will be created by a follow-up CDK custom resource (aws-samples#61). Stack output 'CliWorkloadIdentityName' wires the CLI to whatever the deployed name actually is. Two SDK errors get translated into actionable remediation hints: - ValidationException: WorkloadIdentity is linked to a service — documented footgun from the spike, surfaces when the CLI is pointed at a runtime workload. - AccessDeniedException / ResourceNotFoundException — same surface treatment, with the bgagent-side checklist embedded in the message. Adds @aws-sdk/client-bedrock-agentcore + bedrock-agentcore-control as CLI deps. Pins all CLI AWS SDK clients to 3.1024.0 (matching) to keep the @smithy/core dependency graph deduplicated; mixed-version pins caused interface-collision typecheck errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): bgagent linear oauth-register-workspace (2.0b B2) Registers a Linear workspace as an AgentCore OAuth2 credential provider. The command: - Validates the workspace slug shape ([a-zA-Z0-9_-]{4,50}) so the resulting provider name fits AgentCore's 64-char limit. - Prompts for clientId + clientSecret (interactive, not echoed). - Calls CreateOauth2CredentialProvider with credentialProviderVendor= 'CustomOauth2' and explicit authorizationServerMetadata for Linear's fixed endpoints (Linear has no .well-known/openid-configuration, so vendor-discovery cannot auto-resolve). - Prints the AWS-hosted callback URL the operator pastes into Linear's app form — the AWS-side proxy that Linear actually redirects to. - Idempotent: re-running with an existing provider name fetches the callbackUrl and reports "already exists — re-using it". Smoke test against dev account (2026-05-19) revealed AWS surfaces the duplicate-name case as ValidationException (NOT ConflictException as CFN/REST conventions would suggest). Detection is by message-substring match; tests cover both the duplicate path and the "ValidationException for a non-duplicate reason" path so we don't accidentally swallow input validation errors. AccessDeniedException gets a remediation hint pointing at the 'bedrock-agentcore:CreateOauth2CredentialProvider' permission, since the most common misconfiguration is running the command as a Cognito-authenticated CLI user (no permissions) rather than as an admin/stack-deploy IAM principal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(2.0b): CLI workload identity + localhost OAuth callback server (A3) Two pieces that together let the CLI run the OAuth dance without any externally-facing infrastructure: CDK side (CliWorkloadIdentity construct, wired into the agent stack): - Creates a dedicated AgentCore Identity workload identity named `bgagent-cli`, distinct from the runtime workload (which is service- linked and cannot mint user-scoped tokens). - Allowlists `https://localhost:8443/oauth/callback` as a permitted resourceOauth2ReturnUrl. AgentCore validates browser-redirect URLs against this list, so the CLI cannot finish the OAuth dance without it. - Implementation: AwsCustomResource (no L2/L1 for AgentCore Identity in CDK as of May 2026). Idempotent — Create/Update/Delete lifecycle wired so re-deploys reconcile the allowlist and stack-deletes don't leak workload identities (50/account-region quota). - Stack outputs `CliWorkloadIdentityName` and `LinearWorkspaceRegistryTableName` so the CLI can discover them at runtime. CLI side (oauth-callback-server module): - Generates a fresh self-signed cert in /tmp via openssl on each invocation; cert is cleaned up when the server shuts down. - Starts an HTTPS listener on localhost:8443/oauth/callback, captures the first request's `session_id` query param, renders a success page, shuts down. Uses res.once('finish') to ensure the response body flushes before the listener closes — otherwise the browser hangs waiting for bytes that never arrive (caught by integration test). - Translates EADDRINUSE and timeout into actionable CliErrors. The CLI URL constant and the CDK default allowlist must agree on the exact URL string — drift would silently break the OAuth dance with "redirect_uri not allowlisted". A regression-locking test on the URL constant + matching CDK default flags the issue at unit-test time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): bgagent linear setup OAuth dance orchestration (2.0b C2/C3) Replaces the personal-API-key wizard with a 7-step OAuth flow that authorizes a Linear workspace via AgentCore Identity: 1. Resolve stack outputs (CliWorkloadIdentityName, registry table, user mapping table, webhook secret ARN). Errors loudly if any are missing — typically means the stack predates 2.0b. 2. Read Cognito sub from cached id_token. 3. Mint workload access token via getWorkloadAccessTokenForUserId. 4. Initiate OAuth dance: getResourceOauth2Token returns an authorize URL + sessionUri. customParameters: {actor: 'app'} propagates so Linear surfaces the Agent install variant of the consent screen (verified via 2.0b spike). 5. Start localhost HTTPS callback server, open browser to the auth URL, await session_id from the callback. 6. Poll getResourceOauth2Token (5s/600s) until accessToken arrives; translate sessionStatus=FAILED into a Linear-app-config remediation hint. 7. Query Linear viewer + organization with the OAuth token, persist the workspace registry row + admin user-mapping row, then prompt for the webhook signing secret if not already configured. Hard cutover from PAK: the new wizard is OAuth-only — there is no --use-pak flag. The webhook signing secret prompt remains because HMAC verification of inbound Linear webhooks is independent of how the agent calls Linear outbound. Webhook prompt is skipped on subsequent add-workspace runs by detecting the lin_wh_ prefix on the stored secret; --rotate-webhook-secret forces a re-prompt. Splits queryLinearIdentity out so both the legacy PAK auto-link helper (authorization=`lin_api_…`) and the OAuth path (authorization= `Bearer <token>`) reuse the same GraphQL query. The PAK helper stays exported to support the legacy linkage path until LinearApiTokenSecret is retired in aws-samples#70. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cdk): use full SDK v3 package name for AgentCore Identity custom resource CDK's AwsCustomResource auto-derives the SDK package name from `service` by lowercasing and dropping hyphens — `'BedrockAgentCoreControl'` becomes `@aws-sdk/client-bedrockagentcorecontrol`, which doesn't exist. The actual package is `@aws-sdk/client-bedrock-agentcore-control` (hyphens). Verified by deploy: with the lowercased mapping the Lambda backing the CR fails with "Package @aws-sdk/client-bedrockagentcorecontrol does not exist" and the stack rolls back. Switching to the full v3 package name (supported per the AwsSdkCall.service jsdoc) routes the import correctly. Verified end-to-end: `bgagent-cli` workload identity created with `https://localhost:8443/oauth/callback` on the resourceOauth2ReturnUrls allowlist, stack outputs populated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(2.0b): park diagnostic flags + tokenEndpointAuthMethods + force-reauth Smoke test against backgroundagent-dev (2026-05-19) hit a service-side bug in AgentCore Identity: USER_FEDERATION token-exchange against Linear's /oauth/token never completes. sessionStatus stays IN_PROGRESS indefinitely, no FAILED transition, no diagnostics on the wire. Verified via manual curl that Linear's token endpoint works perfectly with the same clientId/secret/scopes/code/actor=app — bug is on AWS's side. AgentCore Identity has zero token-injection APIs, so Option 3 (do OAuth ourselves + inject) is architecturally impossible. AWS support case + PAR-compatibility upstream issue aws/bedrock-agentcore-sdk-python#111 are the official fix paths. Parking the wizard work but committing the diagnostic flags we added during triage so they're available when this is unparked: - `tokenEndpointAuthMethods: ['client_secret_post']` on the provider metadata. Linear expects POST-body credentials; AgentCore defaults to Basic. Field name verified against the SDK type (`tokenEndpointAuthMethods`, not the `Supported` suffix the boto3 reference suggested). - `--verbose-poll` flag on `bgagent linear setup` — prints per-poll sessionStatus + response keys so the stuck state is visible. - `--force-reauth` flag — sets `forceAuthentication: true` on GetResourceOauth2Token to bypass cached tokens after a Linear-side revoke. - `CompleteResourceTokenAuth` call between callback capture and poll loop. Per AWS sample 09-Outbound_Auth_Self_Hosted, this is required to bind the captured session to a userId. Confirmed it's NOT what unblocks our specific bug, but is correct per spec for any USER_FEDERATION flow. Status of resume paths in memory/project_oauth_2_0b.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(2.0b-O2): direct Linear OAuth + per-workspace Secrets Manager Replaces the parked AgentCore Identity OAuth flow with a CLI-side direct OAuth dance against Linear's /oauth/token endpoint. The flow verified by the manual curl smoke test on 2026-05-19 returned a valid access_token in <100ms, so we know Linear's side works. AWS's USER_FEDERATION wrapper is broken specifically for Linear (or actor=app); see memory/project_oauth_2_0b.md for the parked-bug details and resume prompt. Architecture: - New module cli/src/linear-oauth.ts owns the OAuth helpers: generatePkce (S256), buildAuthorizationUrl (with actor=app), exchangeAuthorizationCode, refreshAccessToken, StoredLinearOauthToken JSON shape, computeExpiresAt, isAccessTokenExpiring (60s threshold), linearOauthSecretName. 19 hermetic tests (no network). - Per-workspace Secrets Manager secret bgagent-linear-oauth-<slug> holds the token JSON. CLI creates+updates at runtime via upsertOauthSecret (CreateSecret + ResourceExistsException → PutSecretValue fallback). - LinearWorkspaceRegistryTable row gains oauth_secret_arn. Lambdas resolve workspace → secret_arn → token JSON, with refresh-if-expiring. (Lambda migration is Wave C.) - bgagent linear setup is rewritten end-to-end: prompt-for-credentials → PKCE → open browser → callback captures ?code+?state → state verify (CSRF) → exchangeAuthorizationCode → query Linear viewer+org → write secret + registry row + user mapping → webhook secret prompt (unchanged from prior wizard). No AgentCore calls. No polling. No CompleteResourceTokenAuth. - Localhost callback server now exposes both AgentCore-style (session_id) and direct-Linear-style (code+state+error) shapes via a CallbackResult with nullable fields. Backward-compat with the parked AgentCore path's tests. Removals: - cli/src/agentcore-identity.ts + test (workload-token helper) - cdk/src/constructs/cli-workload-identity.ts + test (workload identity) - providerNameForWorkspace, buildLinearProviderInput, registerLinearWorkspace, initiateOauthDance, completeResourceTokenAuth, pollForOauthAccessToken, AgentCore SDK imports — all gone from linear.ts - bgagent linear oauth-register-workspace command (no AWS-side provider to register; folded into setup) - CliWorkloadIdentityName CfnOutput from agent.ts - 6 describe blocks of AgentCore-flavored tests in linear.test.ts Net change: -1100 lines, +700 lines of new direct-OAuth wiring. 286/286 CLI tests pass. 9/9 linear-integration CDK tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(2.0b-O2): space-separated OAuth scopes + --no-actor-app diagnostic End-to-end smoke test against backgroundagent-dev (2026-05-20): - The OAuth dance was failing with Linear's "Invalid redirect_uri" error even though the redirect_uri was correct. Root cause: scopes were comma-separated (`read,write,...`) instead of space-separated. RFC 6749 §3.3 mandates space; Linear surfaces the violation as the misleading "Invalid redirect_uri" error, the same misdirection we hit during the 2.0b spike. Fix: `.join(' ')` in buildAuthorizationUrl. - Adds `--no-actor-app` diagnostic flag on `bgagent linear setup`. Drops the `actor=app` query param so a stuck flow can be isolated to agent-install vs vanilla-OAuth without changing the Linear app config. Off by default; surfaces a warning when invoked. After the fix, full smoke test passed: - Browser opens to Linear consent - User authorizes, redirects to https://localhost:8443/oauth/callback - CLI captures code+state, exchanges for access_token + refresh_token - Token JSON persisted to bgagent-linear-oauth-maguireb in Secrets Manager - LinearWorkspaceRegistryTable row written with oauth_secret_arn - LinearUserMappingTable row written for the admin - Token verified against Linear's GraphQL viewer query (works) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(2.0b-O2): Wave C — migrate Lambdas + agent runtime to per-workspace OAuth Replaces every consumer of the legacy LinearApiTokenSecret PAK with the per-workspace Secrets Manager OAuth-token pattern from Waves A/B. Deploy of this commit will fully cut over the integration; the LinearApiTokenSecret construct is gone. CDK side: - New `cdk/src/handlers/shared/linear-oauth-resolver.ts` resolves workspace_id → registry row → oauth_secret_arn → token JSON → refresh-if-expiring → access_token. In-memory caches (1m TTL) on both registry rows and token JSON. Lazy refresh with PutSecretValue write-back so concurrent Lambdas see the rotated token. 11 unit tests. - linear-feedback.ts: postIssueComment / addIssueReaction / reportIssueFailure now take a {linearWorkspaceId, registryTableName} context instead of an apiTokenSecretArn. Auth header switches from bare PAK value to `Bearer ${accessToken}`. - linear-webhook-processor.ts: env vars LINEAR_WORKSPACE_REGISTRY_TABLE_NAME replace LINEAR_API_TOKEN_SECRET_ARN. safeReportIssueFailure threads the webhook payload's organizationId through to the resolver. Webhook processor now stamps `linear_oauth_secret_arn` + `linear_workspace_slug` into channel_metadata at task-creation time so the agent runtime can fetch the secret directly without a registry round-trip. - orchestrate-task.ts: notifyLinearOnConcurrencyCap reads LINEAR_WORKSPACE_REGISTRY_TABLE_NAME and the task's channel_metadata.linear_workspace_id. - LinearIntegration construct: drops apiTokenSecret + ApiTokenSecret Secrets Manager resource entirely. Webhook processor IAM now grants Get+Put on `bgagent-linear-oauth-*` Secrets Manager prefix. - Agent stack: orchestrator IAM mirrors the new prefix grant. Runtime IAM drops AgentCore Identity grants and gains Get+Put on `bgagent-linear-oauth-*`. LINEAR_API_KEY_PROVIDER_NAME env var, LINEAR_API_TOKEN_SECRET_ARN env var, and LinearApiTokenSecretArn CfnOutput all removed. Agent side (Python): - config.py::resolve_linear_api_token: rewritten to read the per-task channel_metadata.linear_oauth_secret_arn (or LINEAR_OAUTH_SECRET_ARN env fallback) via boto3.secretsmanager. Lazy refresh: if expires_at is within 60s, POST refresh_token grant to Linear /oauth/token using client_id/client_secret co-located in the secret JSON, write the rotated token back via put_secret_value, return the new access_token. - pipeline.py: passes config.channel_metadata into resolve_linear_api_token. - linear-oauth.ts (CLI): StoredLinearOauthToken schema gains client_id + client_secret fields so Lambda + agent refresh can run without per-Lambda OAuth env vars. Setup wizard writes them. Tests pruned of AgentCore Identity mocks; new tests cover the Secrets-Manager-direct path (CDK 11 + agent 6 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(orchestrator): bundle import.meta.url shim for durable-execution SDK `@aws/durable-execution-sdk-js@1.1.3`'s ESM build calls `fileURLToPath(import.meta.url)` at module load. esbuild's ESM→CJS bundling leaves `import.meta.url` undefined, crashing every invocation with `TypeError: path must be a string`. Define an identifier substitution + banner that materializes a valid file:// URL from `__filename` at runtime. Discovered while smoke-testing Wave C end-to-end on backgroundagent-dev. Refs: aws/aws-durable-execution-sdk-js#543 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cli): switch OAuth callback to plain HTTP localhost Per RFC 8252 §7.3, OAuth providers (including Linear) treat http://localhost as a special case that doesn't need TLS — the connection never leaves the host. The previous self-signed-cert HTTPS approach forced testers through a "connection not private" warning that scared them off mid-setup. Drops the openssl shell-out + temp-cert plumbing (~60 lines) along with the user-facing warning copy in `bgagent linear setup`. Updates the callback constants to http://localhost:8080/oauth/callback and the test suite to plain http.GET. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(agent): ruff lint + format for OAuth refresh path Five lint errors surfaced when CI ran `ruff check --fix` against the Wave C agent changes: - F401 unused `timezone` import in `config.py` (replaced with `timedelta`, which is what's actually needed) - RUF034 useless if-else in the `expires_at` ternary — both branches returned identical strings before the recompute below; flatten into a single straightforward `if expires_in: ... else: ...` block - E501 three line-length violations in `config.py` and `test_config.py` — break the long expressions onto helper-named intermediates Confirmed locally: `ruff check .` clean, `ruff format --check` clean, `pytest tests/test_config.py` 15/15 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(linear-feedback): rewrite tests against OAuth context signature Wave C migrated postIssueComment / addIssueReaction / reportIssueFailure from a (secretArn: string, ...) signature to a (ctx: LinearFeedbackContext, ...) signature, but the test file still passed bare strings — TypeScript caught it at compile time only when CI ran a full build. Three test suites failed to compile (the typecheck error blocked the whole suite, not just this file). - Mock `resolveLinearOauthToken` (the new resolver) instead of `getLinearSecret` (the old PAK fetcher). - Build a `LinearFeedbackContext` fixture with linearWorkspaceId + registryTableName, pass it everywhere SECRET_ARN was used. - Update the Authorization-header assertion to match the new `Bearer <token>` form (PAK was bare-token; OAuth is Bearer-prefixed). All 41 tests across linear-feedback, linear-webhook-processor, and orchestrate-task-feedback pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cdk): align yarn.lock with upstream main + bump table count for OAuth registry Two CI failures came together because they share a root cause: this branch's yarn.lock had drifted from upstream main during interim re-resolves, leaving an inconsistent dep tree that broke ts-jest's module resolution for @aws-cdk/mixins-preview/aws-bedrockagentcore. Restoring upstream main's yarn.lock fixes the resolution; the agent.test.ts table-count assertion then needs to bump from 12 to 13 to account for the LinearWorkspaceRegistryTable added in Phase 2.0b Wave A4. Verified locally: agent.test.ts (44/44) and github-tags.test.ts (5/5) both pass after the changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style: apply eslint --fix from CI's self-mutation guard CI runs `mise run build`, which invokes `eslint --fix` and then fails if the working tree changed (self-mutation guard). Three cosmetic lints needed applying: - Import-order: DynamoDBClient and CliError moved earlier in their files to satisfy alphabetic-by-package ordering - formatJson import added in alphabetic position in linear.ts - Three template literals with no interpolation converted to single-quoted strings in oauth-callback-server.ts and linear.ts (eslint quotes rule prefers single-quotes when no template variables are used) Pure mechanical fixes; no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(oauth): refresh-token race recovery + log gaps from review Addresses the blocker + critical items from PR review: - Refresh-token race (review blocker). Linear rotates refresh_tokens on every use; concurrent Lambdas/agents racing the same secret will all read the same expiring token and one's refresh will succeed while the others get `invalid_grant`. On `invalid_grant`, re-read the secret from Secrets Manager (bypassing cache). If the refresh_token has changed, another caller already rotated; use the freshly-read token (or retry refresh once if it's also expiring). If unchanged, the refresh_token is permanently rejected and the workspace needs re-onboarding. Implemented in both the TS resolver (linear-oauth-resolver.ts) and Python resolver (config.py). - Unguarded bedrock_agentcore import in agent/src/server.py (review critical). The bare `from bedrock_agentcore.runtime.context import BedrockAgentCoreContext` inside `_run_task_background` killed the entire pipeline thread with no diagnostic if the SDK was missing or its module structure changed. Wrap in try/except (ImportError, AttributeError) and log via _warn_cw — the Linear token resolver has its own SM fallback, so the agent can proceed without the workload-token bridge. - Cache invalidation on fetch-level refresh failure (review high). The TS resolver's `invalidateLinearOauthCache()` only ran in the `!resp.ok` branch; if `fetch()` itself threw (timeout, DNS), the catch returned null without invalidating, leaving the stale expiring token cached for 60s and hammering Linear's token endpoint. Move invalidate into the fetch-level catch too. - Malformed expires_at log (review medium). The Python `_is_expiring` caught `ValueError` and silently returned True, masking consistently-bad writes. Add a WARN log so operators see the bad data instead of just an unexplained refresh on every task. - Positive-path refresh log (review non-blocking aws-samples#5). Added INFO-level breadcrumb on successful refresh in both resolvers so operators diagnosing intermittent 401s have a trace of which workspace refreshed and to what expiry. 11/11 existing resolver unit tests still pass; will add tests for the new race-recovery branch in a followup commit. * fix(2.0b-O2): review-2 batch — error specificity, half-creates, runbook Addresses four PR review items focused on operator UX when things go sideways: - isWebhookSecretConfigured (review high). The bare `catch { return false }` swallowed AccessDeniedException and DecryptionFailureException, making setup re-prompt for a webhook secret when the real problem was IAM. Now: only ResourceNotFoundException returns false; everything else throws a CliError pointing the operator at the IAM gap. Test updated to assert both paths. - admin invite-user half-create (review medium). If AdminCreateUser succeeds but AdminSetUserPassword fails (stricter password policy than generator, partial IAM grant on the Set verb), the user was left in FORCE_CHANGE_PASSWORD with no diagnostic. Wrap the second call in try/catch and throw a CliError that names the user, explains the broken state, and gives both a delete-user CLI and a manual-fix path. - PAK migration runbook (review non-blocking #1). Expanded the "Migration from 2.0a (PAK) to 2.0b (OAuth)" section in LINEAR_SETUP_GUIDE.md with: a pre-deploy checklist, what survives the migration vs what doesn't, an explicit rollback note (fix forward; the original PAK secret is gone with the CFN resource), and the per-step difference between 2.0a-with-Identity (skipped) vs 2.0a-with-PAK (migrate) deploys. - Vestigial AgentCore Identity dep (review non-blocking #2). bedrock-agentcore==1.9.1 is kept in agent/pyproject.toml because the workload-token bridge in server.py still calls it (now wrapped in try/except per review batch 1). Add an inline comment explaining why it's pinned even though Phase 2.0b-O2 reads Secrets Manager directly — it's the seam for resuming the AgentCore Identity path in 2.0c. CLI tests: 13/13 pass. * fix(2.0b-O2): batch 3 — schema validation, CallbackResult union, parity contract Three remaining substantive review items from PR aws-samples#160: - Validate all 11 fields in getOauthSecret (review non-blocking aws-samples#7). Was checking only access_token / refresh_token / expires_at; missing client_id or client_secret only surfaced 24h later when the refresh call needed them and found undefined. Extracted the required-field list into a const next to the StoredOauthToken interface and check the full set at deserialization. Bad secrets fail fast at fetch time with a structured log line naming the missing fields. - CallbackResult discriminated union (review non-blocking aws-samples#6). Was `{ sessionId: string|null, code: string|null, state: string|null }` which let callers construct unreachable shapes. Split into `{ kind: 'agentcore', sessionId } | { kind: 'direct-oauth', code, state }`. Updated the resolver site (`oauth-callback-server.ts`), the consumer (`bgagent linear setup`), and the test file to use exhaustive type-narrowing. The setup wizard now errors clearly if it gets the agentcore shape (parked path) instead of silently passing nulls down. - Cross-language schema-parity contract test (review non-blocking #3). CLI's StoredLinearOauthToken and Lambda's StoredOauthToken define the same JSON-in-Secrets-Manager schema independently; drift between the two would be a silent bug (CLI writes one field name, Lambda reads another, refresh works, every Lambda invocation logs a missing-field error). New test in `cdk/test/contracts/stored-oauth-token-parity.test.ts` regex-parses both interface definitions out of source and asserts the field set is equal. Also asserts the new `STORED_OAUTH_TOKEN_REQUIRED_FIELDS` const matches the interface, so future field additions can't drift between the validator and the type. CLI tests 286/286 pass. CDK resolver + contract 13/13 pass. * style: apply eslint --fix indentation on CallbackResult union * fix(2.0b-O2): review-3 batch — defensive error handling, security hardening, refresh test coverage Bugs (B1-B3): - linear-oauth-resolver: try/catch around ddb.send() in getRegistryRow so transient DDB errors fail the resolver cleanly instead of crashing the Lambda thread - orchestrate-task: try/catch around reportIssueFailure in notifyLinearOnConcurrencyCap; Linear feedback failures must never block the rejection path - Python _fetch_token: guard json.loads + KeyError so a corrupted SM payload returns None (logged ERROR) rather than raising Tests (T1-T3): - Python: TestResolveLinearApiTokenRefreshPaths covering happy refresh, invalid_grant + concurrent rotation, invalid_grant + no rotation, malformed expires_at, network failure during refresh, and corrupted secret JSON - TS: concurrent-refresh recovery via re-read; permanent-rejection on same refresh_token; cache invalidation after network failure Security (S1-S3): - agent runtime IAM: drop secretsmanager:PutSecretValue on the Linear OAuth secret prefix. Untrusted repo code in the agent must not be able to overwrite tokens; Lambdas (trusted) handle persistence. The refreshed in-memory token still works for the current task; rotated refresh_token is lost on agent exit but Linear's grace window absorbs the rare race where the agent refreshes strictly before any Lambda - Python _try_refresh_once: narrow except Exception to (urllib.error.URLError, OSError) — programmer errors propagate with clean stack traces instead of being swallowed - linear-oauth-resolver: RegistryRowStatus is now a discriminated 'active' | 'revoked' literal; missing or unknown values fail closed to revoked rather than defaulting active * fix(2.0b-O2): use email.message.Message for HTTPError hdrs in tests ty's stricter typeshed flagged dict[Unknown, Unknown] passed where Message[str, str] is expected. Drop-in replacement that satisfies both ty and runtime. * fix(2.0b-O2): update feedback test for B2 — helper now swallows internally Round-3 review B2 moved the try/catch into notifyLinearOnConcurrencyCap. The pre-existing test asserted the old contract (rejection propagates, caller must catch). Flip the contract assertion to reflect the new behavior: the helper is now best-effort end-to-end and returns undefined on internal failure. --------- Co-authored-by: bgagent <bgagent@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alain Krok <alkrok@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps defu from 6.1.4 to 6.1.6.
Release notes
Sourced from defu's releases.
Changelog
Sourced from defu's changelog.
Commits
001c290chore(release): v6.1.6407b516build: fix mixed types23e59e6chore(release): v6.1.511ba022fix: ignore inherited enumerable properties3942bfbfix: prevent prototype pollution via__proto__in defaults (#156)d3ef16dchore(deps): update actions/checkout action to v6 (#151)869a053chore(deps): update actions/setup-node action to v6 (#149)a97310cchore(deps): update codecov/codecov-action action to v6 (#154)89df6bbchore: fix typecheck9237d9cci: bump nodeDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)You can disable automated security fix PRs for this repo from the Security Alerts page.