feat(linear): add Linear integration, with Slack review fixes#63
Conversation
Adds full Slack integration enabling users to submit coding tasks by mentioning @shoof in any channel or DM, with real-time emoji reactions and threaded notifications showing task progress. Key features: - @mention task submission with natural language repo extraction - Emoji reaction progression: 👀 → ⌛ → ✅ - Threaded notifications (created, started, completed/failed) - Cancel button with instant feedback - DM support for private task submissions - OAuth multi-workspace install flow - `bgagent slack setup` CLI wizard for zero-friction onboarding - Account linking via /bgagent link New files: - CDK constructs: SlackIntegration, SlackInstallationTable, SlackUserMappingTable - Lambda handlers: events, commands, command-processor, interactions, oauth-callback, link, notify - Shared utilities: slack-verify, slack-blocks - CLI: slack.ts (setup, link commands) - Docs: SLACK_SETUP_GUIDE.md with screenshots
# Conflicts: # docs/guides/DEVELOPER_GUIDE.md # docs/guides/USER_GUIDE.md # docs/src/content/docs/developer-guide/Installation.md # docs/src/content/docs/developer-guide/Repository-preparation.md # docs/src/content/docs/user-guide/Overview.md
Add SLACK_SETUP_GUIDE.md to the explicit route map in sync-starlight.mjs, mirror it to using/Slack-setup-guide.md, and wire it into the astro sidebar. Without this, links to the Slack setup guide rendered as /architecture/... and the page itself never appeared on the site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…1-20) Full response to the 20-point review on PR aws-samples#42. Grouped by theme below rather than strictly by item number so related edits stay together. **Signature + auth (items 2, 3, 9)** - Item 2: url_verification challenge used to be answered before signature verification, exposing the endpoint as a Slack handler without auth. Now every request is verified first. During initial setup — before the signing secret is populated — url_verification is still answered so the Slack App can be wired up, but every other request is rejected. - Item 3: previously any request with X-Slack-Retry-Num was acked with 200 immediately. Transient failures on app_uninstalled or tokens_revoked silently dropped the revocation and left the bot token live forever. Security-critical event retries now reprocess; UX events still short- circuit on retry. - Item 9: slack-verify's 5-minute secret cache had no invalidation. After a signing-secret rotation, warm Lambdas rejected every request for up to 5 minutes. New verifySlackRequest helper does verify-then-retry with forced refresh; invalidateSlackSecretCache exported for explicit eviction. All three authed handlers (events, commands, interactions) go through verifySlackRequest. **Error surfacing (items 4, 7, 20)** - Item 4: slack-notify caught every error per-record and logged warn, which made retryAttempts and bisectBatchOnError useless. Now distinguishes SlackApiError (swallow — not retryable via stream replay) from infra errors (rethrow so Lambda retries the batch). - Item 7: six .catch(() => {}) swallows in slack-events and slack-command-processor replaced with a shared slackFetch helper in shared/slack-api.ts that logs at warn. Benign Slack errors (already_reacted, no_reaction) are still silenced as expected. - Item 20: safeJsonParse in slack-notify logs the parse error and the truncated payload preview instead of swallowing silently. **Revocation ordering (item 6)** - revokeInstallation previously deleted the bot-token secret even if the DDB update to mark the installation revoked had failed, leaving DB status "active" with no backing token. Now the secret delete runs only if the DDB update succeeded; both steps throw on failure so Slack retries the event end-to-end. **UX fallbacks (items 8, 10)** - Item 8: when the command-processor Lambda invocation fails from handleAppMention, swap 👀 → ❌ and reply in-thread, mirroring the no-repo-found path. Previously the user was left with a stuck 👀. - Item 10: checkChannelAccess fails closed on definitively-blocking Slack errors (channel_not_found, not_in_channel, missing_scope). Transient or unknown errors (ratelimited, internal_error, network blips) fail open so a Slack outage doesn't block task submission; slack-notify surfaces any real delivery failure downstream. **Dedup ordering (item 17)** - Moved the channel_source === 'slack' check before the terminal dedup write so we don't poke non-Slack task records' channel_metadata with a Slack-specific marker. **Dead code + type safety (items 1, 13, 14)** - Item 1: delete unused handleStatus / handleCancel / statusEmoji from slack-command-processor. They were fully implemented but never wired into the switch(subcommand). Sam's PR aws-samples#52 roadmap delivers `bgagent status` via the CLI; the Slack Cancel button already exists. - Item 13: MentionPayload used to extend SlackCommandPayload and set phantom fields (`response_url: ''`, etc.). Refactored into a discriminated union `CommandProcessorEvent = SlashCommandEvent | MentionEvent`. slack-events.ts constructs MentionEvent directly; handleSubmit takes MentionEvent only. - Item 14: SlackBlock was a single interface requiring `as unknown as SlackBlock` to construct actions blocks. Now SectionBlock | ActionsBlock discriminated union with typed button variants. Casts removed; tests updated with narrowing helpers. **Shared utilities (items 15, 16, 19)** - Items 15/16: extract ChannelSource union from shared/types.ts and use it on both TaskCreationContext.channelSource and TaskRecord.channel_source. Extract truncate() and formatDuration() into shared/slack-format.ts so they stop drifting between slack-blocks and slack-command-processor. - Item 19: formatDuration now coerces via Number() for all callers, fixing the inconsistency where taskTimedOutMessage skipped the cast. **CLI error surfacing (item 11)** - cli/src/commands/slack.ts:getStackOutput used to swallow every CloudFormation error as "not deployed yet". Now only swallows ValidationError / does-not-exist; auth and permission errors bubble up to the user. **Excess IAM grant (item 18)** - slackCommandsFn had readSlackSecretsPolicy attached, but the slash- command acknowledger only needs the signing secret (already granted separately). Grant removed; comment added explaining the intent. **OAuth CSRF (item 5)** - Documented the tradeoff in slack-oauth-callback.ts. Slack's client_secret requirement on the code exchange is the primary defense; a full state nonce requires per-session signed storage and is tracked as a follow-up coordinated with PR aws-samples#52's dispatcher rework. **Not addressed in this PR (architectural feedback for follow-up)** Alain's "What I would like to see" items 1-5 — dispatcher pattern, notification-state extraction from channel_metadata, ADR for the adapter contract — are scoped to the PR that will rebase onto Sam's PR aws-samples#52 (feat(interactive-agents)) once it lands. Sam's locked plan establishes the FanOutConsumer router that those items depend on.
Addresses review item 12 / general feedback 4: the shared utilities (slack-verify, slack-blocks) and CDK constructs had solid coverage but all 7 Lambda handlers had zero unit tests, a departure from the rest of the codebase. Added test files covering the most load-bearing paths: - slack-events.test.ts (11 tests) — url_verification challenge with/ without signature (item 2), retry semantics (drop non-critical, reprocess app_uninstalled + tokens_revoked, item 3), revoke-installation ordering (item 6: no secret delete on DDB failure), app_mention forwarding, no-repo + Lambda-failure reaction swaps (item 8) - slack-commands.test.ts (4 tests) — signature rejection, inline help, async processor invocation, graceful processor failure - slack-interactions.test.ts (6 tests) — signature rejection, cancel_task happy path + owner rejection + terminal-state warn, unknown action_id, unlinked account - slack-notify.test.ts (8 tests) — non-Slack task short-circuit, dedup ordering after source check (item 17), dedup write skip on duplicate, SlackApiError swallow vs infra-error rethrow (item 4), non-INSERT / non-notifiable filter, malformed metadata logging (item 20) - slack-link.test.ts (6 tests) — 401 without JWT, 400 without code, 404 on missing/stale code, happy path write+delete, code normalization - slack-oauth-callback.test.ts (5 tests) — missing code, unpopulated client secret, token exchange failure, create-secret-on-RNF path, restore-then-update path for reinstall after uninstall - slack-command-processor.test.ts (9 tests) — legacy source-less payload normalization, slash-submit-rejection UX, mention link-prompt + repo validation + private channel guard (item 10 hard failure) + transient fail-open (item 10 softening), createTaskCore invocation, link code persistence, help text The regression tests for items 2, 3, 4, 6, 9 all fail the buggy pre-fix version of the code, verified by running them against the unchanged source during review.
Addresses blind-review feedback S2 + S3. **S2 — checkChannelAccess transient failures now fail open.** The prior "fail closed on any unknown Slack error" was too strict: a 30-second Slack ratelimited or internal_error during conversations.info would block every Slack task submission for that window, even though the notification path downstream could have recovered. Now: - Hard failures (channel_not_found, not_in_channel, missing_scope) still fail closed — these definitively mean the bot cannot post there, and silently creating an undeliverable task is the worse UX. - Transient or unknown errors (ratelimited, internal_error, fatal_error, network blips) fail open. slack-notify surfaces any real delivery failure when the event plays through. Regression test added: `mention submit fails open on transient Slack errors` asserts createTaskCore IS called when conversations.info returns `ratelimited`. **S3 — USER_GUIDE Slack bullet drops "check status".** handleStatus was deleted in this branch (review item 1) because Sam's locked plan in PR aws-samples#52 delivers `bgagent status` via the CLI rather than a Slack slash subcommand. The user-facing docs still claimed Slack could "check status" via `/bgagent`. Rewritten to describe the shipped behavior: @mention-based submission plus threaded progress notifications with reaction-based status. Starlight content regenerated via `node docs/scripts/sync-starlight.mjs`.
Trivial enabler for the Linear integration branch. No functional change: all existing code already uses the broad ChannelSource union; adding 'linear' makes the Linear-origin task records type-check once the inbound adapter starts writing them.
- shared/linear-verify.ts: HMAC-SHA256 hex verify over raw body,
60s webhookTimestamp replay window, secret caching + rotation retry.
- LinearProjectMappingTable: linear_project_id → repo + label_filter.
- LinearUserMappingTable: {workspace}#{user} confirmed / pending#{code}
with TTL and PlatformUserIndex GSI.
- linear-webhook.ts: verify + dedup (60s TTL on (issue_id, action))
+ async-invoke processor; replies fast inside Linear's 5s budget.
- linear-webhook-processor.ts: Issue label-add detection, project→repo
+ actor→platform user resolution, createTaskCore with
channelSource 'linear' + linear_* channel metadata.
- linear-link.ts: mirror of slack-link.ts.
- LinearIntegration construct: provisions both mapping tables, the webhook dedup table, 3 Lambdas (webhook / processor / link), 2 Secrets Manager placeholders, and API Gateway routes under /v1/linear/*. Deliberately no DynamoDB Streams consumer — outbound updates go through the Linear MCP from inside the agent container. - Stack: instantiates LinearIntegration in agent.ts and emits CfnOutputs for the two secret ARNs and both table names, consumed by `bgagent linear setup` / onboard-project.
- Orchestrator invoke payload now carries `channel_source` +
`channel_metadata`, so the agent knows which inbound channel a task came
from (and the Linear issue identifier for Linear-origin tasks).
- TaskConfig + build_config accept the new fields; server.py extracts them
from the /invocations payload and forwards into pipeline.run_task.
- channel_mcp.py: new `configure_channel_mcp(repo_dir, channel_source)`.
For channel_source=='linear' it merges a `linear-server` entry into
`.mcp.json` (keyed so tools surface as `mcp__linear-server__*`, matching
the verified MCP spec). For any other channel it's a no-op — the gate
that keeps Slack/API tasks from loading Linear tools.
- config.py: `resolve_linear_api_token()` reads LinearApiTokenSecret via
LINEAR_API_TOKEN_SECRET_ARN and caches into LINEAR_API_TOKEN so the SDK
child process inherits it for the MCP's ${LINEAR_API_TOKEN} placeholder.
- pipeline.py: resolves the token + writes .mcp.json just before run_agent.
- prompt_builder.py: appends a short Linear-specific addendum naming
`mcp__linear-server__save_comment` / `update_issue` by tool name.
- Stack: grants runtime read on LinearApiTokenSecret and sets the secret
ARN in the AgentCore runtime environment via CFN override.
- run.sh: forwards LINEAR_API_TOKEN / LINEAR_API_TOKEN_SECRET_ARN into the
local container for manual testing.
- cli/src/commands/linear.ts: new `bgagent linear` command group:
- `setup` — prompts for the Linear webhook signing secret and personal
API key, populates both placeholders in Secrets Manager, prints the
next-step checklist including the webhook URL.
- `link <code>` — calls POST /v1/linear/link with the Cognito JWT.
- `onboard-project <linear-project-id> --repo owner/repo [--label bgagent]`
— writes the Linear project → repo mapping row directly to
LinearProjectMappingTable via DynamoDBDocumentClient (admin IAM path,
matching the current onboarding UX for Slack + webhooks).
- api-client + types: LinearLinkResponse type + linearLink() method.
- bgagent.ts: registers the new command group.
- cli/package.json: adds @aws-sdk/client-dynamodb + @aws-sdk/lib-dynamodb
so the CLI can write project mappings.
- docs/guides/LINEAR_SETUP_GUIDE.md: new end-to-end setup walkthrough
mirroring SLACK_SETUP_GUIDE, covering API-key generation, webhook
configuration, onboarding, linking, and troubleshooting.
- docs/scripts/sync-starlight.mjs + docs/astro.config.mjs: routing + sidebar
entry for the new guide.
- docs/guides/USER_GUIDE.md: Linear now shows up as the 5th channel.
- cdk/test/stacks/agent.test.ts: DDB table count bumped 7 → 10 to cover
the three new Linear tables.
New test coverage for the Linear integration:
- cdk/test/handlers/linear-webhook.test.ts — HMAC pass/fail, stale
webhookTimestamp, non-Issue type is 200/no-op, missing data.id is 400,
dedup-hit short-circuits processor invoke, Lambda invoke failure surfaces
as 500 so Linear retries, malformed JSON with valid signature is 400.
- cdk/test/handlers/linear-webhook-processor.test.ts — project-not-onboarded
and status=removed skip, label filter gates on create AND on update
(updatedFrom.labelIds transition), unlinked Linear actor skips, happy
path asserts channel_source='linear' + full linear_* channelMetadata,
custom label_filter honored per project mapping.
- cdk/test/handlers/linear-link.test.ts — mirror of slack-link.test.ts:
401/400/404 error paths + confirmed-mapping write + code normalization.
- cdk/test/constructs/linear-integration.test.ts — resource-count
assertions (3 DDB tables, 3 Lambdas, 2 secrets, ZERO DDB Streams
mappings), env-var wiring for both handlers, dedup-table TTL shape.
- agent/tests/test_channel_mcp.py — channel gate (slack/api/webhook/empty
are no-ops; only linear writes), .mcp.json shape (server key
`linear-server`, hosted URL, ${LINEAR_API_TOKEN} placeholder), merge
semantics (preserves other servers, overwrites stale linear entry,
tolerates malformed JSON), missing-repo_dir guard.
All 853 CDK tests and 294 agent tests pass locally.
Exempt /v1/linear/webhook from AWSManagedRulesCommonRuleSet via a scopeDownStatement. Linear Issue webhook payloads routinely exceed the ruleset's default 8 KB body-size limit (observed 8614 bytes), and the full payload is needed by the webhook Lambda for HMAC verification. The route has stronger, bespoke protection already: - HMAC verification against LinearWebhookSecret on the raw body - webhookTimestamp replay guard (60s window) - Structured JSON parsing with strict field validation - RateLimitRule (1000/IP/5min) is priority 3 and still applies CRS continues to block on every other route (user API, Slack, etc.).
The prior dedup key was issueId#action, which collapsed distinct label-toggle deliveries on the same issue into a single dispatch — a label remove/reapply within 60s would be silently dropped. Also, the prior TTL (60s) was shorter than Linear's retry horizon (+1m, +1h, +6h). A dispatched-but-ack-lost delivery would re-dispatch on the +1h or +6h retry after the dedup row had expired, double- creating the task. Fix: - Compose the dedup key as issueId#action#webhookTimestamp. Linear reuses webhookTimestamp across retries of one delivery (correctly collapsing them) but assigns a fresh timestamp to each new delivery (no collapse). This is the primitive documented in linear.app's webhook spec; the payload's webhookId field is the webhook-config id, not per-delivery. - Raise DEDUP_TTL_SECONDS to 8h, comfortably over Linear's 7h retry horizon. Added a regression test that two distinct deliveries for the same issue both dispatch, and an assertion that the TTL is >= 7h.
When shouldTrigger rejects an update event, add current_labels, updated_from_keys, updated_from_label_ids, and current_label_ids to the log line so we can diagnose cases where the filter drops a valid label-added transition. No behaviour change; diagnostic-only.
Three CLI UX improvements discovered during first-time setup: 1. Auto-link the token owner during `bgagent linear setup`. After storing the webhook secret and API key, query Linear's `viewer` + `organization` via GraphQL using the token, decode the caller's Cognito sub from the cached id_token, and write the LinearUserMapping row directly (status='active', link_method='auto_setup'). This skips the manual code-exchange ceremony for the common case where the person installing ABCA for their workspace is the same person who'll trigger tasks. Failures are fail-open: if GraphQL errors, the API key is malformed, or the CLI isn't logged in, setup prints a warning and continues. Secrets are still stored; the user can run the link flow manually. 2. `bgagent linear list-projects` — lists all projects visible to the stored API token with their full UUIDs. Solves the "Linear project URL contains a truncated UUID" footgun. 3. UUID validation on `onboard-project`. RFC-4122 shape check up- front, with a clear error message pointing at `list-projects`. Avoids silent mapping-row writes for slug-shaped UUIDs that would never match a real Linear payload. Tests cover the auto-link branch, fail-open cases, and the GraphQL wire shape.
Mirror the Slack integration's terminal-emoji status UX on Linear issues: post 👀 when the agent container starts, swap it for ✅ on success or ❌ on failure (including the pipeline's outer-except crash path). Implementation notes: - Linear's MCP v1 does not expose a reactions tool (verified 2026-05-06 against the `tools` array in the Claude Agent SDK's system init). We use direct GraphQL against `https://api.linear.app/graphql` with the same token the MCP server uses. Retire this module in favour of a prompt addendum once Linear MCP ships `create_reaction`. - Gated on `channel_source == 'linear'` + `linear_issue_id` in channel_metadata. No-op for Slack/API/webhook channels. - `react_task_started` returns the reaction id; the pipeline threads it into `react_task_finished` so we can `reactionDelete` the 👀 before posting the terminal emoji. Matches Slack's behaviour of not leaving a lingering "watching" marker. - All GraphQL failures are logged and swallowed — reactions are advisory UX, never load-bearing. - Crash-path (pipeline.py outer except): calls react_task_finished(success=False, ...) as a best-effort so a crashed/failed task doesn't leave a dangling 👀. Tests cover: - Channel gate (slack/api/webhook all no-op; linear with no issue_id also no-op) - GraphQL wire shape and variables - Delete-then-create swap on terminal transition - Fallback path when started_reaction_id is None - All error swallowing paths (HTTP error, connection exception, GraphQL errors, missing token)
The Linear MCP's tool for updating an issue is
mcp__linear-server__save_issue, not update_issue. Earlier prompt
referenced the wrong name; the agent silently skipped the state
transition on small/fast tasks because the advisory framing
("use its tools to keep the reporter informed") let the model
deprioritize the calls.
Rewrite the addendum as a required three-step contract:
1. On start — save_comment 🤖 Starting
2. On PR open — save_comment with PR URL + save_issue (transition
to In Review; fall back to In Progress; skip if neither state
exists rather than loop list_issue_statuses)
3. On completion/failure — save_comment with final status
Added explicit "If neither exists, skip the state transition — the
PR comment alone is enough. Do not invent state names or loop on
list_issue_statuses." so an impoverished workspace state-set
doesn't trigger retry behaviour.
test_background_thread_failure_503_and_backup_terminal_write polled /ping for _background_pipeline_failed = True, but the flag flips several lines before write_terminal runs in the same except block. On the first run after a fresh import the poll could win the race and mock_write.assert_called() would fire before the call happened. Swap the poll-the-flag pattern for a wait-until-the-background-thread- has-exited loop, which guarantees both side effects (the flag flip and the backup write_terminal call) have completed before we assert. Observed flaky 2026-05-06 in a full-suite run; reliably green after the fix across 5 consecutive runs.
Restructure the setup guide so the webhook URL is available before the user needs to create the webhook (prior ordering was circular). - Step 1: generate Linear API key - Step 2: run `bgagent linear setup` — it prints the webhook URL and pauses at the signing-secret prompt - Step 3: create the webhook in Linear, copy the signing secret - Step 4: back in the terminal, paste the signing secret + API key; auto-link runs for the token owner - Step 5: `bgagent linear onboard-project <uuid>` — using the `list-projects` command to find the full UUID (the URL slug is truncated) - Step 6: manual code-exchange flow, now marked as fallback for linking additional Linear users beyond the token owner Synced to docs/src/content/docs/using/Linear-setup-guide.md via mise //docs:sync.
Merges Sam's PR aws-samples#52 (FanOutConsumer dispatcher) and adjacent upstream changes on top of the Linear integration work. Key reconciliations: - `ChannelSource` union keeps all four members ('api', 'webhook', 'slack', 'linear') with upstream's better docstring. Fixed an upstream test that expected 'slack' to be an invalid value. - `TaskRecord` / `TaskConfig` / pipeline signatures now include both our `channel_source` + `channel_metadata` and upstream's `trace` + `user_id` fields (all additive). - Agent pipeline runs both new hooks in order: Linear reactions (swap 👀 for ✅/❌) then upstream's `_maybe_upload_trace`, so the reaction fires before any trace S3 upload that could be slow. - Crash path invokes both Linear reaction (best-effort ❌) and `_maybe_upload_trace` (debug artifact) so a crashed task still produces a meaningful Linear signal and a usable trace. - `server.py` adopts upstream's refactored _validate_required_params / _spawn_background structure; our `channel_source` + `channel_metadata` are threaded through the params dict. - WAF scope-down (Linear webhook path exemption from CRS body-size rule) and our 8h dedup TTL are preserved unchanged. - Agent test count went from 306 to 526; CDK suite count went from 853 to 1121. All green. FanOutConsumer (new from aws-samples#52) and LinearIntegration sit side-by-side in `stacks/agent.ts` — FanOutConsumer is independent of Linear's wiring. Linear continues to route outbound updates via agent-side MCP rather than subscribing to the fanout dispatcher (by design; keeps per-task Linear comment/state updates in the container where the task actually runs). See things-to-do.md for the "evaluate linear-notify under FanOutConsumer" follow-up. Keeping it parked: MCP-based outbound is already shipped and validated; moving to a fanout subscriber would be a strategy change, not a bug fix.
Roadmap doc has no Linear content; referring to it misled readers. Replace the link with a plain explanation of the admin-assisted path.
Describe the v1 linking model accurately: ABCA ships the consumer command (`bgagent linear link <code>`) but no Linear-side code generator — the comment-triggered flow is a planned v1.1 follow-up. Design around API-token-owner-as-sole-submitter until then.
Fix two `no-duplicate-imports` violations in the Slack tests by merging `import type` + `import` pairs into single imports. ESLint/ruff also applied cosmetic reformatting (whitespace alignment in object literals, line-splitting long inline objects). One `# noqa: S105` added to `LINEAR_API_TOKEN_ENV` — it's an env var *name*, not a secret value, so ruff's hardcoded-password rule is a false positive there. No behavioural changes. Ensures `mise run build` passes the self- mutation guard in .github/workflows/build.yml.
|
Feedback from automated review: Required before merge #: 1 Recommended follow-ups (v1.1)
|
|
Just adding as a follow up: will open an issue after to use AgentCore gateway The current approach isn't wrong — it's a conscious shortcut for v1. But as the project adds more external MCP integrations, the .mcp.json injection pattern doesn't scale. The Gateway is the project's stated integration point for external tools, and Linear should converge on it. |
1) Linear webhook: roll back dedup row on Lambda invoke failure Previously, the dedup row was written before invoking the processor Lambda. If InvokeCommand failed (throttle, 5xx, etc.), the handler returned 500 and relied on Linear's retry schedule (+1m / +1h / +6h) to redeliver — but all three retries fell inside the 8h dedup TTL, so each one hit ConditionalCheckFailedException and silently acked as "already processed." Net: a single transient invoke failure dropped the task for 8 hours. Fix: on invoke failure, delete the dedup row before returning 500 so the next retry can reserve it again. A secondary try/catch logs (but does not re-raise) if the delete itself fails — the primary invoke error is the user-visible signal. Added two regression tests. 2) CLI ChannelSource sync with CDK cli/src/types.ts:30 still had 'api' | 'webhook'; CDK had 'api' | 'webhook' | 'slack' | 'linear' since the Slack/Linear inbound adapters landed. Any CLI switch on TaskDetail.channel_source was narrowing against a stale union. Widened the CLI type and refreshed the jsdoc to match the CDK side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the thorough review — both blockers fixed in 7d652a4. #1 — Dedup row on invoke failure. Agreed this is a silent-drop trap; the 8h TTL fully swallows Linear's +1m/+1h/+6h retry horizon. Fix was the "delete on invoke-error" shape (not "write after invoke") — keeping the write before invoke preserves the dedup guarantee for the common case where a retry races an in-flight ack. Added two regression tests: one asserts the #2 — CLI v1.1 follow-ups — all five parked in our internal backlog and slotted for the v1.1 pass:
AgentCore Gateway note — acknowledged. Convergence on Gateway as the MCP integration point makes sense as the project adds more external MCPs; happy to pick that up as a follow-up once the issue lands. Full test counts post-fix: CDK 1170 (+2 rollback tests), agent 526, CLI 196 — all green. |
nits Wraps the v1.1 polish theme from PR aws-samples#87. Five small additions, all agent-side or docs: State-on-start (the user-visible one): - prompt_builder._channel_prompt_addendum now instructs the agent to transition the originating Linear issue to `In Progress` (or `Todo` fallback) at agent-start, mirroring the existing `In Review` chain fired at PR-open. Closes the gap where the issue stayed at `Backlog` during real agent work — only the 👀 reaction and "🤖 Starting" comment signaled progress, while humans-using-Linear expect the state column to reflect "being worked." Skips if the issue is already in `In Progress` or any later state; doesn't loop on list_issue_statuses. Alain aws-samples#63 review nits (4 small surgical changes): - linear_reactions.py: auth-failure circuit breaker. Track consecutive 401/403s; after 3 strikes, log ERROR once and short-circuit all later _graphql calls (return None) until the container restarts. Resets on any 2xx response. Replaces the prior behaviour where revoked tokens flooded CloudWatch with WARNs and wasted Linear API quota indefinitely. - pipeline.py: declare `linear_eyes_reaction_id: str | None = None` explicitly before the try block instead of relying on `locals().get("linear_eyes_reaction_id")` in the crash handler. Functionally identical; survives refactors and reads cleanly. - config.py::resolve_linear_api_token: narrow `except Exception` to `(BotoCoreError, ClientError)` from botocore.exceptions. Switch `print()` to `shell.log("WARN", ...)` so warnings join the structured log stream the rest of the agent uses. - LINEAR_SETUP_GUIDE.md + cli/src/commands/linear.ts: stop telling users to run `bgagent linear link <code>` when auto-link fails — the code generator is a v3 feature that doesn't ship in v1, so the suggestion was misleading. Replaced with explicit admin-assisted fallback (DynamoDB put-item with steps to find workspaceId, viewerId, Cognito sub) and a clear "this command exists but is non-functional in v1" note. Tests: 532 agent + 1268 cdk + 196 cli, all green. Deployed to backgroundagent-dev. Smoke-tested 👀-on-start (156ms, agent unblocked) in the prior commit; state-on-start smoke is the next manual step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- linear_reactions: log a single DEBUG line when the auth circuit breaker short-circuits a call, so the path isn't zero-trace once open. - config: split the `(BotoCoreError, ClientError)` catch so `AccessDeniedException` logs at ERROR instead of WARN — IAM misconfig is persistent and should page someone, not blend into transient warnings. Also drop the personal name from the inline reference to the aws-samples#63 review. - linear-webhook-processor: tighten `buildCreateTaskFailureMessage` param types to `number` / `string` (no `| undefined`) — the only caller passes `APIGatewayProxyResult` fields which are always defined. Removes dead fallback-to-`'unknown'` branches. - test_config: 2 new tests covering the split exception path (AccessDenied → ERROR; ResourceNotFound → WARN). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…weep, nits (#87) * feat(linear): comment + react on pre-container task-creation failures Closes the silent-drop UX gap that appeared whenever a Linear-triggered task was rejected before the agent container started — the user would apply the trigger label, see nothing happen, and have no way to know why. Reactions and progress comments are emitted by the agent container; nothing fired until that point, so all upstream rejections were invisible on the Linear side. This commit wires a best-effort GraphQL feedback path covering all six distinct rejection points: In `linear-webhook-processor.ts` (pre-`createTaskCore`): 1. Issue has no projectId → "isn't in a project" comment 2. Project not onboarded / removed → "isn't onboarded; admin can run `bgagent linear onboard-project`" comment 3. Webhook missing organization or actor → diagnostic comment 4. Linear actor has no linked platform user → "v1 only the API-token owner can submit; multi-user OAuth is on the v3 roadmap" comment 5. `createTaskCore` returns non-201 → message branched on status: guardrail/validation block surfaces the user-facing error string; 503 prompts the user to re-apply the label; other 4xx/5xx falls through to a generic message. In `orchestrate-task.ts` (post-201, in admission control): 6. User concurrency cap rejection → "concurrency limit; wait for one to finish, then re-apply the label" comment. All five processor paths and the orchestrator path call a shared helper, `reportIssueFailure(secretArn, issueId, message)`, that runs the comment and ❌ reaction in parallel via `Promise.allSettled`. The helper: - Reuses the existing 5-minute `getLinearSecret` cache from `linear-verify.ts` (no extra Secrets Manager hits on warm Lambdas). - Swallows network, auth, and GraphQL errors with WARN logs — Linear feedback is advisory and must never gate the rejection path. - Posts to Linear's hosted GraphQL endpoint; mutation shapes match `agent/src/linear_reactions.py` (`commentCreate`, `reactionCreate`). CDK plumbing: - `linear-integration.ts` — wires `LINEAR_API_TOKEN_SECRET_ARN` into the webhook processor and grants read on the existing `LinearIntegration.apiTokenSecret`. - `agent.ts` — grants the same secret to `orchestrator.fn` and populates the env var. The grant is unconditional; the orchestrator only invokes the helper when `task.channel_source === 'linear'`. The non-Linear case is a hard no-op at the call site — `notifyLinear- OnConcurrencyCap` early-returns on `channel_source !== 'linear'`, and the processor only handles Linear payloads. Slack/API/webhook tasks are unaffected. Tests (28 new; 1240 → 1268, all green): - `cdk/test/handlers/shared/linear-feedback.test.ts` (13 tests): mutation shape, auth header, error swallowing in 4 distinct failure modes (secret-resolution null, non-2xx, GraphQL `errors`, network throw), `Promise.allSettled` partial-success semantics. - `cdk/test/handlers/linear-webhook-processor.test.ts` (10 new tests in a `user-visible feedback` describe block): one assertion per rejection path + happy-path-doesn't-fire + filter-rejection-doesn't- fire (the latter is intentional UX — the processor sees many events that aren't tasks, and dropping a comment on each would be noisy). - `cdk/test/handlers/orchestrate-task-feedback.test.ts` (5 tests): new file; covers `notifyLinearOnConcurrencyCap` directly with `withDurableExecution` mocked. Asserts the linear path fires; the api/webhook/slack paths no-op; missing metadata, missing env, and undefined `channel_metadata` all no-op cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(linear): finish v1.1 polish — state-on-start + Alain #63 nits Wraps the v1.1 polish theme from PR #87. Five small additions, all agent-side or docs: State-on-start (the user-visible one): - prompt_builder._channel_prompt_addendum now instructs the agent to transition the originating Linear issue to `In Progress` (or `Todo` fallback) at agent-start, mirroring the existing `In Review` chain fired at PR-open. Closes the gap where the issue stayed at `Backlog` during real agent work — only the 👀 reaction and "🤖 Starting" comment signaled progress, while humans-using-Linear expect the state column to reflect "being worked." Skips if the issue is already in `In Progress` or any later state; doesn't loop on list_issue_statuses. Alain #63 review nits (4 small surgical changes): - linear_reactions.py: auth-failure circuit breaker. Track consecutive 401/403s; after 3 strikes, log ERROR once and short-circuit all later _graphql calls (return None) until the container restarts. Resets on any 2xx response. Replaces the prior behaviour where revoked tokens flooded CloudWatch with WARNs and wasted Linear API quota indefinitely. - pipeline.py: declare `linear_eyes_reaction_id: str | None = None` explicitly before the try block instead of relying on `locals().get("linear_eyes_reaction_id")` in the crash handler. Functionally identical; survives refactors and reads cleanly. - config.py::resolve_linear_api_token: narrow `except Exception` to `(BotoCoreError, ClientError)` from botocore.exceptions. Switch `print()` to `shell.log("WARN", ...)` so warnings join the structured log stream the rest of the agent uses. - LINEAR_SETUP_GUIDE.md + cli/src/commands/linear.ts: stop telling users to run `bgagent linear link <code>` when auto-link fails — the code generator is a v3 feature that doesn't ship in v1, so the suggestion was misleading. Replaced with explicit admin-assisted fallback (DynamoDB put-item with steps to find workspaceId, viewerId, Cognito sub) and a clear "this command exists but is non-functional in v1" note. Tests: 532 agent + 1268 cdk + 196 cli, all green. Deployed to backgroundagent-dev. Smoke-tested 👀-on-start (156ms, agent unblocked) in the prior commit; state-on-start smoke is the next manual step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(format): apply ruff format Whitespace-only changes flagged by CI's self-mutation guard. No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(linear): address PR #87 must-fix review items - linear_reactions: guard auth-circuit globals with `_auth_state_lock` so the daemon sweep thread and the main thread can't race the read-modify-write on `_consecutive_auth_failures` / `_auth_circuit_open`. - linear_reactions: wrap the daemon sweep target in `_sweep_stale_reactions_safe` so an unexpected exception logs at ERROR instead of dying silently (stderr from a daemon thread doesn't reliably reach CloudWatch). - linear_reactions: only increment the sweep delete counter when `_graphql(_DELETE_MUTATION, ...)` actually returns a non-None response — previously the summary log overstated success. - config: hoist `import boto3` out of the catch-narrowed try/except so an `ImportError` (boto3 missing from the image) degrades to a WARN log instead of crashing the agent. - orchestrate-task: wrap `notifyLinearOnConcurrencyCap` in a defensive try/catch — durable-execution retries the entire admission-control step on throw, which would re-fire `failTask` + `emitTaskEvent` and produce duplicate events. - tests: 1 new throw-propagation test for `notifyLinearOnConcurrencyCap`, 3 new tests for `resolve_linear_api_token` (cached env, no-arn, ImportError fallback). Auto-reset fixture in `test_linear_reactions.py` now also resets the circuit-breaker globals between tests so future cases don't leak state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(linear): address PR #87 nice-to-have review items - linear_reactions: log a single DEBUG line when the auth circuit breaker short-circuits a call, so the path isn't zero-trace once open. - config: split the `(BotoCoreError, ClientError)` catch so `AccessDeniedException` logs at ERROR instead of WARN — IAM misconfig is persistent and should page someone, not blend into transient warnings. Also drop the personal name from the inline reference to the #63 review. - linear-webhook-processor: tighten `buildCreateTaskFailureMessage` param types to `number` / `string` (no `| undefined`) — the only caller passes `APIGatewayProxyResult` fields which are always defined. Removes dead fallback-to-`'unknown'` branches. - test_config: 2 new tests covering the split exception path (AccessDenied → ERROR; ResourceNotFound → WARN). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(linear): address PR #87 re-review must-fix items - linear-webhook-processor: extracted `safeReportIssueFailure` helper and routed all 5 bare `await reportIssueFailure(...)` call sites through it. The helper is uniformly non-throwing — wraps the call in try/catch to defend against a future signature refactor that could break the helper's `Promise.allSettled` never-throw contract. A synchronous throw would otherwise propagate, fail the Lambda, and trigger SQS retries on a poison message. - linear-webhook-processor: dropped the `!` non-null assertion on `process.env.LINEAR_API_TOKEN_SECRET_ARN` at module scope. The helper now guards on `!API_TOKEN_SECRET_ARN` and logs a single clear `Skipping Linear feedback: LINEAR_API_TOKEN_SECRET_ARN not set` diagnostic per skip — matches the orchestrator pattern in `notifyLinearOnConcurrencyCap`. - test_linear_reactions: new `TestAuthCircuitBreaker` class with 5 tests covering the previously-untested circuit: * 3 consecutive 401/403s open the circuit * Once open, calls short-circuit without hitting the network * A 2xx between failures resets the counter * Non-auth status (500) doesn't increment the counter * 401 and 403 are both treated as auth failures - test_linear-webhook-processor: 2 new tests assert `safeReportIssueFailure` swallows both synchronous throws and async rejections from the underlying helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(linear): address PR #87 re-review nice-to-have items - test_config: cover the BotoCoreError branch of `resolve_linear_api_token` with an `EndpointConnectionError` case. The PR-#87 split into ClientError + BotoCoreError branches previously had no test on the BotoCoreError path. - test_linear_reactions: new `test_sweep_preserves_just_posted_eyes_via_exclude_id` exercises the `exclude_id` filter — the existing sweep test never collided prior reaction ids with the newly posted one, so the branch was effectively dead code in tests. The new test plants the just- posted 👀 in the prior reactions list and asserts it survives the sweep while an older ❌ is deleted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(linear): annotate circuit breaker globals so ty doesn't narrow `ty check` infers `_consecutive_auth_failures = 0` as `Literal[0]` and `_auth_circuit_open = False` as `Literal[False]`, which then rejects the legitimate runtime flips (and the test fixture that resets them between cases). Adding explicit `int` / `bool` annotations widens the inferred type and fixes the CI typecheck failure introduced in `f4633be`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: bgagent <bgagent@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alain Krok <alkrok@amazon.com>
Replace the personal-API-key flow with the Linear OAuth `actor=app` install path verified by the 2.0b spike. Major changes: - Step 1: AgentCore credential provider via `bgagent linear oauth-register-workspace`, capturing the AWS-hosted callback URL that Linear will actually see. - Step 2: Linear OAuth app creation via `bgagent linear app-template`, documenting the GitHub-username-with-[bot]-suffix and Webhooks-ON gates that produce Linear's misleading "Invalid redirect_uri" error when missing. - Step 4: OAuth dance via the rewritten `bgagent linear setup` — ephemeral localhost HTTPS callback; no own ALB/Lambda needed since AWS proxies the OAuth flow. - Step 7: clarify that the PAK-owner auto-link becomes the setup-runner auto-link; the manual DDB mapping path stays for now until self-service `@bgagent link` ships. - New "Adding additional Linear workspaces" section for multi-workspace deployments. - New "Migration from 2.0a (PAK) to 2.0b (OAuth)" runbook. - Troubleshooting expanded to cover the Invalid-redirect_uri and 401-from-Linear scenarios surfaced in the spike. Notes the docs reference commands shipping in Wave 2 (aws-samples#63 oauth-register-workspace, aws-samples#65 setup wizard, aws-samples#67 add-workspace) — the 2.0b branch is a coherent unit and aws-samples#62 must land before those flows are wired so the docs aren't a moving target during implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… 2.0b) (aws-samples#160) * feat(linear): resolve API token via AgentCore Identity (Phase 2.0a) Migrates the agent runtime's Linear personal API token resolution from AWS Secrets Manager to AWS Bedrock AgentCore Identity. This is the "validate Identity SDK" step of the v2 plan; Phase 2.0b will swap the API key for OAuth and converge Linear MCP onto AgentCore Gateway in one cutover. Per Alain's guidance: "start by using api key, if it works, switch to oauth. you will setup an outbound auth for your server using agentcore identity. that identity can be (AC identity is like a wrapper around secrets manager) api key or oauth." Lambdas (orchestrator + processor) intentionally keep using Secrets Manager via the existing `LinearApiTokenSecret` for now. The Python `bedrock_agentcore` SDK has no Node.js equivalent — Lambda migration requires `@aws-sdk/client-bedrock-agentcore` raw API calls and folds into 2.0b's bigger refactor. End-state of 2.0a: agent reads from Identity, Lambdas read from Secrets Manager, both pointing at the same underlying token value (admin populates both). `agent/src/config.py::resolve_linear_api_token`: - Drops boto3 SecretsManager fetch + `LINEAR_API_TOKEN_SECRET_ARN` env. - Reads new env `LINEAR_API_KEY_PROVIDER_NAME` (provider name in Identity vault). - Calls `IdentityClient.get_api_key()` with the workload access token auto-injected into `BedrockAgentCoreContext` by AgentCore Runtime (verified by reading the SDK's `auth.py` decorator implementation — no manual workload-identity mint needed inside the runtime). - Caches the resolved token in `LINEAR_API_TOKEN` so downstream consumers stay unchanged: `channel_mcp.py`'s `${LINEAR_API_TOKEN}` placeholder in `.mcp.json` and `linear_reactions.py`'s GraphQL Authorization header. Preserves PR aws-samples#87's nice-to-have improvements: - `ImportError` graceful fallback (now for `bedrock_agentcore` instead of `boto3`) — degrade with WARN, don't crash the agent. - `AccessDeniedException` and `ResourceNotFoundException` logged at ERROR severity (persistent IAM/config bugs that should page). Other ClientErrors stay at WARN (transient throttle/network). `agent/pyproject.toml`: adds `bedrock-agentcore==1.9.1` dep. `cdk/src/stacks/agent.ts`: - On the AgentCore runtime: drops `linearIntegration.apiTokenSecret. grantRead(runtime)` and the `LINEAR_API_TOKEN_SECRET_ARN` env-var override. Adds `LINEAR_API_KEY_PROVIDER_NAME` env (hardcoded `'linear-api-key'` for now; can parametrize later via context if multi-environment naming is needed) and IAM permissions for `bedrock-agentcore:GetResourceApiKey` and `bedrock-agentcore:GetWorkloadAccessToken`. - Lambdas (orchestrator + processor) untouched — they still grant on the Linear secret and read from Secrets Manager. - Resource scope on the new IAM is `*` for now; AgentCore Identity ARN format isn't fully standardized in public docs as of 2026-05-15. Tighten in 2.0b when OAuth migration documents the canonical resource shape. `docs/guides/LINEAR_SETUP_GUIDE.md`: adds Step 4.5 documenting the one-time `agentcore add credential --type api-key --name linear-api-key` admin command users must run alongside the existing `bgagent linear setup` wizard. Notes that Lambdas keep Secrets Manager temporarily and 2.0b will retire the dual-store setup. Starlight mirror synced. `agent/tests/test_config.py::TestResolveLinearApiToken` — 10 tests covering: cached env var fast-path; missing provider name; missing region; workload token absent (outside runtime); happy path with env-var side-effect; botocore error swallowed with WARN; SDK returns None defensively; ImportError fallback; AccessDeniedException → ERROR severity; ResourceNotFoundException → ERROR severity. 542 agent / 1271 cdk / 196 cli, all green. Lint + typecheck clean. CDK synth clean. `bedrock_agentcore` SDK confirmed working in our runtime image (verified in `node_modules` post-install). The `BedrockAgentCoreContext` workload token auto-injection is documented behaviour for code running inside AgentCore Runtime — verified by reading the SDK's `@requires_api_key` decorator implementation, which uses the same context lookup we use here. Stacked on PR aws-samples#87 (`feat/linear-processor-feedback`). Will conflict on `config.py` and `test_config.py` if aws-samples#87 needs further rework before merge — happy to rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(linear): use aws CLI for credential provider, not the agentcore command The setup guide referenced `agentcore add credential` which doesn't actually work end-to-end: - The Python `bedrock-agentcore-starter-toolkit` CLI (`agentcore`) only exposes agent-lifecycle commands; there is no `credential-provider` subcommand. Confirmed by reading the toolkit's CLI reference and by user trying `agentcore configure credential-provider --type api-key --name ...` and receiving `No such command 'credential-provider'`. - The new npm `@aws/agentcore` CLI does have `agentcore add credential` but uses a declarative project model — the credential lands in `agentcore.json` + `.env.local`, not the actual AgentCore Identity vault, until `agentcore deploy` runs against a project structured for that CLI. ABCA isn't structured that way. Switch the docs to the plain AWS CLI which works directly against the AgentCore Identity API: aws bedrock-agentcore-control create-api-key-credential-provider \ --name linear-api-key \ --api-key "<paste lin_api_… token here>" \ --region us-east-1 Plus the matching `list-api-key-credential-providers` for verification. Add a "Tooling note" at the bottom of the section explaining why the plain AWS CLI is the right path here vs. the two `agentcore` CLIs. Starlight mirror synced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(linear): pass runtimeUserId so AgentCore injects WorkloadAccessToken Smoke on backgroundagent-dev caught a real bug in the Phase 2.0a migration: the agent's `resolve_linear_api_token()` was correctly calling `IdentityClient.get_api_key()` but failing earlier at `BedrockAgentCoreContext.get_workload_access_token()` returning None. The Linear MCP then loaded with an unresolved `${LINEAR_API_TOKEN}` placeholder and 👀 didn't post. Root cause (from reading bedrock-agentcore-sdk-python source): The `WorkloadAccessToken` request header (which the runtime container reads to populate `BedrockAgentCoreContext`) is only injected by AgentCore Identity when `InvokeAgentRuntimeCommand` is called with `runtimeUserId`. Per AWS docs at https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-oauth.html: "Agent Runtime exchanges this token for a Workload Access Token via bedrock-agentcore:GetWorkloadAccessTokenForJWT API and delivers it to your agent code via the payload header `WorkloadAccessToken`." Without `runtimeUserId`, AgentCore never derives a workload token and the header is absent. `app.py::_build_request_context` reads the header off the inbound request; the agent sees None. Fix: 1. Thread `userId` through the `ComputeStrategy.startSession` interface (compute-strategy.ts). 2. Pass `task.user_id` (the task's Cognito sub) at the call site in orchestrate-task.ts. 3. Set `runtimeUserId: input.userId` on `InvokeAgentRuntimeCommand` in agentcore-strategy.ts. Log it alongside session_id for traceability. 4. ECS strategy accepts the new parameter to satisfy the interface; doesn't use it (ECS doesn't go through AgentCore Identity). 5. Grant the orchestrator role `bedrock-agentcore:InvokeAgentRuntimeForUser` alongside `InvokeAgentRuntime` (task-orchestrator.ts). Without this, the new `runtimeUserId` parameter would 403. Tests updated: - `agentcore-strategy.test.ts`: pin that `runtimeUserId` flows from input into the SDK command; pass `userId: 'cognito-user-1'` in 4 call sites. - `ecs-strategy.test.ts`: pass `userId` (unused by ECS) on 3 call sites. - `start-session-composition.test.ts`: pass `userId: 'cognito-test'` on 3 call sites. - `task-orchestrator.test.ts`: assert the IAM action list includes `InvokeAgentRuntimeForUser` (2 assertions). 542 agent / 1273 cdk / 196 cli — all green. Lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(linear): two undocumented gotchas to make AgentCore Identity actually work End-to-end smoke on backgroundagent-dev surfaced two more silent failure modes after the runtimeUserId fix landed: `BedrockAgentCoreContext.get_workload_access_token()` returned None inside the pipeline thread even though the platform delivered the token on the inbound request. Cause: Python ContextVar storage is per-thread, not shared across `threading.Thread` boundaries. Our `_run_task_background` spawns a new thread for the pipeline, so any context-var the SDK's middleware sets in the request handler thread doesn't reach it. Compounding factor: the SDK's `_build_request_context` middleware only runs when using `BedrockAgentCoreApp` from `bedrock_agentcore.runtime`. Plain FastAPI apps like ours never get that bridge at all. Fix: read the workload token off the request in `_extract_invocation_params` (handling both observed header spellings — `WorkloadAccessToken` and `x-amzn-bedrock-agentcore-runtime-workload-accesstoken`), thread it through the kwargs of `_run_task_background`, and have the pipeline thread call `BedrockAgentCoreContext.set_workload_access_token` on entry. (cdk/src/stacks/agent.ts) After (1) was applied, `IdentityClient.get_api_key()` actually fired and got `AccessDeniedException: ... not authorized to perform: secretsmanager:GetSecretValue`. Cause: AgentCore Identity stores api-key credentials in Secrets Manager under reserved prefix `bedrock-agentcore-identity!*` (the actual ARN shape: `arn:aws:secretsmanager:REGION:ACCOUNT:secret:bedrock-agentcore- identity!default/apikey/<provider-name>-<hash>`). The `GetResourceApiKey` control-plane API surfaces the underlying secret to the caller, and AWS verifies the *caller* role (our runtime role) has `GetSecretValue` on the actual secret resource — not the SLR. Fix: grant the runtime role `secretsmanager:GetSecretValue` scoped to the `bedrock-agentcore-identity!*` prefix in the current account/region. Tightly scoped to Identity-managed secrets; doesn't leak read access to other Secrets Manager resources. - Runtime container reads workload token from request, propagates across thread boundary, calls IdentityClient successfully - 👀 reaction posts at +525ms after task pickup, no warnings - Linear MCP loads cleanly with the resolved token - No more `workload access token not in context` WARN - No more `AccessDeniedException` from `GetResourceApiKey` Three undocumented requirements total for Phase 2.0a (combining with the runtimeUserId fix from the prior commit): 1. Caller (orchestrator) sends `runtimeUserId` and has `InvokeAgentRuntimeForUser` IAM 2. Runtime container bridges the workload-token header into the ContextVar, with per-thread propagation if the pipeline runs in a spawned thread 3. Runtime role has `secretsmanager:GetSecretValue` on `bedrock-agentcore-identity!*` All three are silent failures on their own; missing any one returns None or AccessDenied without obvious "you forgot X" diagnostics. Will file an upstream issue against `aws/bedrock-agentcore-sdk-python` summarising all three so others don't burn the same cycles. Tests: 542 agent / 1273 cdk / 196 cli — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(2.0b): foundation — workspace registry, admin invite-user, Linear app template Wave 1 of Phase 2.0b: prereq pieces for the Linear OAuth migration. - LinearWorkspaceRegistryTable: maps Linear org-id → AgentCore credential provider name, so webhook + orchestrator Lambdas can resolve the workspace's OAuth token without knowing about provider naming. - bgagent admin invite-user: wraps Cognito admin-create-user with the right defaults and prints a base64 bundle that --from-bundle decodes into ~/.bgagent/config.json. Replaces a four-flag dance with a single paste for joining teammates. - bgagent linear app-template: prints the Linear OAuth app form values captured from the 2.0b spike — GitHub username with [bot] suffix and Webhooks ON gate the actor=app flow; misleading "Invalid redirect_uri" error is the symptom when either is missing. - USER_GUIDE roles section + joining-an-existing-deployment flow: makes the four-role lifecycle explicit (stack admin / workspace admin / repo onboarder / teammate) so a teammate landing on the docs has a clear non-admin path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(linear): rewrite setup guide for OAuth (2.0b) Replace the personal-API-key flow with the Linear OAuth `actor=app` install path verified by the 2.0b spike. Major changes: - Step 1: AgentCore credential provider via `bgagent linear oauth-register-workspace`, capturing the AWS-hosted callback URL that Linear will actually see. - Step 2: Linear OAuth app creation via `bgagent linear app-template`, documenting the GitHub-username-with-[bot]-suffix and Webhooks-ON gates that produce Linear's misleading "Invalid redirect_uri" error when missing. - Step 4: OAuth dance via the rewritten `bgagent linear setup` — ephemeral localhost HTTPS callback; no own ALB/Lambda needed since AWS proxies the OAuth flow. - Step 7: clarify that the PAK-owner auto-link becomes the setup-runner auto-link; the manual DDB mapping path stays for now until self-service `@bgagent link` ships. - New "Adding additional Linear workspaces" section for multi-workspace deployments. - New "Migration from 2.0a (PAK) to 2.0b (OAuth)" runbook. - Troubleshooting expanded to cover the Invalid-redirect_uri and 401-from-Linear scenarios surfaced in the spike. Notes the docs reference commands shipping in Wave 2 (aws-samples#63 oauth-register-workspace, aws-samples#65 setup wizard, aws-samples#67 add-workspace) — the 2.0b branch is a coherent unit and aws-samples#62 must land before those flows are wired so the docs aren't a moving target during implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): workload-token retrieval helper for AgentCore Identity (2.0b C1) The CLI runs OUTSIDE AgentCore Runtime, so the in-container ContextVar trick from 2.0a does not apply. This module gives every 2.0b OAuth-flow command a single way to obtain a workload access token: - getWorkloadAccessToken({region, workloadName, userId}) calls the data-plane GetWorkloadAccessTokenForUserId, scoping the resulting token to (workload, cognito_sub) so OAuth-token retrieval is per platform user. - decodeCognitoSub(idToken) extracts the sub claim from the cached id_token, parsing only — token validation is API Gateway's job. - DEFAULT_CLI_WORKLOAD_NAME is the deployment-time convention; the workload identity itself will be created by a follow-up CDK custom resource (aws-samples#61). Stack output 'CliWorkloadIdentityName' wires the CLI to whatever the deployed name actually is. Two SDK errors get translated into actionable remediation hints: - ValidationException: WorkloadIdentity is linked to a service — documented footgun from the spike, surfaces when the CLI is pointed at a runtime workload. - AccessDeniedException / ResourceNotFoundException — same surface treatment, with the bgagent-side checklist embedded in the message. Adds @aws-sdk/client-bedrock-agentcore + bedrock-agentcore-control as CLI deps. Pins all CLI AWS SDK clients to 3.1024.0 (matching) to keep the @smithy/core dependency graph deduplicated; mixed-version pins caused interface-collision typecheck errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): bgagent linear oauth-register-workspace (2.0b B2) Registers a Linear workspace as an AgentCore OAuth2 credential provider. The command: - Validates the workspace slug shape ([a-zA-Z0-9_-]{4,50}) so the resulting provider name fits AgentCore's 64-char limit. - Prompts for clientId + clientSecret (interactive, not echoed). - Calls CreateOauth2CredentialProvider with credentialProviderVendor= 'CustomOauth2' and explicit authorizationServerMetadata for Linear's fixed endpoints (Linear has no .well-known/openid-configuration, so vendor-discovery cannot auto-resolve). - Prints the AWS-hosted callback URL the operator pastes into Linear's app form — the AWS-side proxy that Linear actually redirects to. - Idempotent: re-running with an existing provider name fetches the callbackUrl and reports "already exists — re-using it". Smoke test against dev account (2026-05-19) revealed AWS surfaces the duplicate-name case as ValidationException (NOT ConflictException as CFN/REST conventions would suggest). Detection is by message-substring match; tests cover both the duplicate path and the "ValidationException for a non-duplicate reason" path so we don't accidentally swallow input validation errors. AccessDeniedException gets a remediation hint pointing at the 'bedrock-agentcore:CreateOauth2CredentialProvider' permission, since the most common misconfiguration is running the command as a Cognito-authenticated CLI user (no permissions) rather than as an admin/stack-deploy IAM principal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(2.0b): CLI workload identity + localhost OAuth callback server (A3) Two pieces that together let the CLI run the OAuth dance without any externally-facing infrastructure: CDK side (CliWorkloadIdentity construct, wired into the agent stack): - Creates a dedicated AgentCore Identity workload identity named `bgagent-cli`, distinct from the runtime workload (which is service- linked and cannot mint user-scoped tokens). - Allowlists `https://localhost:8443/oauth/callback` as a permitted resourceOauth2ReturnUrl. AgentCore validates browser-redirect URLs against this list, so the CLI cannot finish the OAuth dance without it. - Implementation: AwsCustomResource (no L2/L1 for AgentCore Identity in CDK as of May 2026). Idempotent — Create/Update/Delete lifecycle wired so re-deploys reconcile the allowlist and stack-deletes don't leak workload identities (50/account-region quota). - Stack outputs `CliWorkloadIdentityName` and `LinearWorkspaceRegistryTableName` so the CLI can discover them at runtime. CLI side (oauth-callback-server module): - Generates a fresh self-signed cert in /tmp via openssl on each invocation; cert is cleaned up when the server shuts down. - Starts an HTTPS listener on localhost:8443/oauth/callback, captures the first request's `session_id` query param, renders a success page, shuts down. Uses res.once('finish') to ensure the response body flushes before the listener closes — otherwise the browser hangs waiting for bytes that never arrive (caught by integration test). - Translates EADDRINUSE and timeout into actionable CliErrors. The CLI URL constant and the CDK default allowlist must agree on the exact URL string — drift would silently break the OAuth dance with "redirect_uri not allowlisted". A regression-locking test on the URL constant + matching CDK default flags the issue at unit-test time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): bgagent linear setup OAuth dance orchestration (2.0b C2/C3) Replaces the personal-API-key wizard with a 7-step OAuth flow that authorizes a Linear workspace via AgentCore Identity: 1. Resolve stack outputs (CliWorkloadIdentityName, registry table, user mapping table, webhook secret ARN). Errors loudly if any are missing — typically means the stack predates 2.0b. 2. Read Cognito sub from cached id_token. 3. Mint workload access token via getWorkloadAccessTokenForUserId. 4. Initiate OAuth dance: getResourceOauth2Token returns an authorize URL + sessionUri. customParameters: {actor: 'app'} propagates so Linear surfaces the Agent install variant of the consent screen (verified via 2.0b spike). 5. Start localhost HTTPS callback server, open browser to the auth URL, await session_id from the callback. 6. Poll getResourceOauth2Token (5s/600s) until accessToken arrives; translate sessionStatus=FAILED into a Linear-app-config remediation hint. 7. Query Linear viewer + organization with the OAuth token, persist the workspace registry row + admin user-mapping row, then prompt for the webhook signing secret if not already configured. Hard cutover from PAK: the new wizard is OAuth-only — there is no --use-pak flag. The webhook signing secret prompt remains because HMAC verification of inbound Linear webhooks is independent of how the agent calls Linear outbound. Webhook prompt is skipped on subsequent add-workspace runs by detecting the lin_wh_ prefix on the stored secret; --rotate-webhook-secret forces a re-prompt. Splits queryLinearIdentity out so both the legacy PAK auto-link helper (authorization=`lin_api_…`) and the OAuth path (authorization= `Bearer <token>`) reuse the same GraphQL query. The PAK helper stays exported to support the legacy linkage path until LinearApiTokenSecret is retired in aws-samples#70. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cdk): use full SDK v3 package name for AgentCore Identity custom resource CDK's AwsCustomResource auto-derives the SDK package name from `service` by lowercasing and dropping hyphens — `'BedrockAgentCoreControl'` becomes `@aws-sdk/client-bedrockagentcorecontrol`, which doesn't exist. The actual package is `@aws-sdk/client-bedrock-agentcore-control` (hyphens). Verified by deploy: with the lowercased mapping the Lambda backing the CR fails with "Package @aws-sdk/client-bedrockagentcorecontrol does not exist" and the stack rolls back. Switching to the full v3 package name (supported per the AwsSdkCall.service jsdoc) routes the import correctly. Verified end-to-end: `bgagent-cli` workload identity created with `https://localhost:8443/oauth/callback` on the resourceOauth2ReturnUrls allowlist, stack outputs populated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(2.0b): park diagnostic flags + tokenEndpointAuthMethods + force-reauth Smoke test against backgroundagent-dev (2026-05-19) hit a service-side bug in AgentCore Identity: USER_FEDERATION token-exchange against Linear's /oauth/token never completes. sessionStatus stays IN_PROGRESS indefinitely, no FAILED transition, no diagnostics on the wire. Verified via manual curl that Linear's token endpoint works perfectly with the same clientId/secret/scopes/code/actor=app — bug is on AWS's side. AgentCore Identity has zero token-injection APIs, so Option 3 (do OAuth ourselves + inject) is architecturally impossible. AWS support case + PAR-compatibility upstream issue aws/bedrock-agentcore-sdk-python#111 are the official fix paths. Parking the wizard work but committing the diagnostic flags we added during triage so they're available when this is unparked: - `tokenEndpointAuthMethods: ['client_secret_post']` on the provider metadata. Linear expects POST-body credentials; AgentCore defaults to Basic. Field name verified against the SDK type (`tokenEndpointAuthMethods`, not the `Supported` suffix the boto3 reference suggested). - `--verbose-poll` flag on `bgagent linear setup` — prints per-poll sessionStatus + response keys so the stuck state is visible. - `--force-reauth` flag — sets `forceAuthentication: true` on GetResourceOauth2Token to bypass cached tokens after a Linear-side revoke. - `CompleteResourceTokenAuth` call between callback capture and poll loop. Per AWS sample 09-Outbound_Auth_Self_Hosted, this is required to bind the captured session to a userId. Confirmed it's NOT what unblocks our specific bug, but is correct per spec for any USER_FEDERATION flow. Status of resume paths in memory/project_oauth_2_0b.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(2.0b-O2): direct Linear OAuth + per-workspace Secrets Manager Replaces the parked AgentCore Identity OAuth flow with a CLI-side direct OAuth dance against Linear's /oauth/token endpoint. The flow verified by the manual curl smoke test on 2026-05-19 returned a valid access_token in <100ms, so we know Linear's side works. AWS's USER_FEDERATION wrapper is broken specifically for Linear (or actor=app); see memory/project_oauth_2_0b.md for the parked-bug details and resume prompt. Architecture: - New module cli/src/linear-oauth.ts owns the OAuth helpers: generatePkce (S256), buildAuthorizationUrl (with actor=app), exchangeAuthorizationCode, refreshAccessToken, StoredLinearOauthToken JSON shape, computeExpiresAt, isAccessTokenExpiring (60s threshold), linearOauthSecretName. 19 hermetic tests (no network). - Per-workspace Secrets Manager secret bgagent-linear-oauth-<slug> holds the token JSON. CLI creates+updates at runtime via upsertOauthSecret (CreateSecret + ResourceExistsException → PutSecretValue fallback). - LinearWorkspaceRegistryTable row gains oauth_secret_arn. Lambdas resolve workspace → secret_arn → token JSON, with refresh-if-expiring. (Lambda migration is Wave C.) - bgagent linear setup is rewritten end-to-end: prompt-for-credentials → PKCE → open browser → callback captures ?code+?state → state verify (CSRF) → exchangeAuthorizationCode → query Linear viewer+org → write secret + registry row + user mapping → webhook secret prompt (unchanged from prior wizard). No AgentCore calls. No polling. No CompleteResourceTokenAuth. - Localhost callback server now exposes both AgentCore-style (session_id) and direct-Linear-style (code+state+error) shapes via a CallbackResult with nullable fields. Backward-compat with the parked AgentCore path's tests. Removals: - cli/src/agentcore-identity.ts + test (workload-token helper) - cdk/src/constructs/cli-workload-identity.ts + test (workload identity) - providerNameForWorkspace, buildLinearProviderInput, registerLinearWorkspace, initiateOauthDance, completeResourceTokenAuth, pollForOauthAccessToken, AgentCore SDK imports — all gone from linear.ts - bgagent linear oauth-register-workspace command (no AWS-side provider to register; folded into setup) - CliWorkloadIdentityName CfnOutput from agent.ts - 6 describe blocks of AgentCore-flavored tests in linear.test.ts Net change: -1100 lines, +700 lines of new direct-OAuth wiring. 286/286 CLI tests pass. 9/9 linear-integration CDK tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(2.0b-O2): space-separated OAuth scopes + --no-actor-app diagnostic End-to-end smoke test against backgroundagent-dev (2026-05-20): - The OAuth dance was failing with Linear's "Invalid redirect_uri" error even though the redirect_uri was correct. Root cause: scopes were comma-separated (`read,write,...`) instead of space-separated. RFC 6749 §3.3 mandates space; Linear surfaces the violation as the misleading "Invalid redirect_uri" error, the same misdirection we hit during the 2.0b spike. Fix: `.join(' ')` in buildAuthorizationUrl. - Adds `--no-actor-app` diagnostic flag on `bgagent linear setup`. Drops the `actor=app` query param so a stuck flow can be isolated to agent-install vs vanilla-OAuth without changing the Linear app config. Off by default; surfaces a warning when invoked. After the fix, full smoke test passed: - Browser opens to Linear consent - User authorizes, redirects to https://localhost:8443/oauth/callback - CLI captures code+state, exchanges for access_token + refresh_token - Token JSON persisted to bgagent-linear-oauth-maguireb in Secrets Manager - LinearWorkspaceRegistryTable row written with oauth_secret_arn - LinearUserMappingTable row written for the admin - Token verified against Linear's GraphQL viewer query (works) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(2.0b-O2): Wave C — migrate Lambdas + agent runtime to per-workspace OAuth Replaces every consumer of the legacy LinearApiTokenSecret PAK with the per-workspace Secrets Manager OAuth-token pattern from Waves A/B. Deploy of this commit will fully cut over the integration; the LinearApiTokenSecret construct is gone. CDK side: - New `cdk/src/handlers/shared/linear-oauth-resolver.ts` resolves workspace_id → registry row → oauth_secret_arn → token JSON → refresh-if-expiring → access_token. In-memory caches (1m TTL) on both registry rows and token JSON. Lazy refresh with PutSecretValue write-back so concurrent Lambdas see the rotated token. 11 unit tests. - linear-feedback.ts: postIssueComment / addIssueReaction / reportIssueFailure now take a {linearWorkspaceId, registryTableName} context instead of an apiTokenSecretArn. Auth header switches from bare PAK value to `Bearer ${accessToken}`. - linear-webhook-processor.ts: env vars LINEAR_WORKSPACE_REGISTRY_TABLE_NAME replace LINEAR_API_TOKEN_SECRET_ARN. safeReportIssueFailure threads the webhook payload's organizationId through to the resolver. Webhook processor now stamps `linear_oauth_secret_arn` + `linear_workspace_slug` into channel_metadata at task-creation time so the agent runtime can fetch the secret directly without a registry round-trip. - orchestrate-task.ts: notifyLinearOnConcurrencyCap reads LINEAR_WORKSPACE_REGISTRY_TABLE_NAME and the task's channel_metadata.linear_workspace_id. - LinearIntegration construct: drops apiTokenSecret + ApiTokenSecret Secrets Manager resource entirely. Webhook processor IAM now grants Get+Put on `bgagent-linear-oauth-*` Secrets Manager prefix. - Agent stack: orchestrator IAM mirrors the new prefix grant. Runtime IAM drops AgentCore Identity grants and gains Get+Put on `bgagent-linear-oauth-*`. LINEAR_API_KEY_PROVIDER_NAME env var, LINEAR_API_TOKEN_SECRET_ARN env var, and LinearApiTokenSecretArn CfnOutput all removed. Agent side (Python): - config.py::resolve_linear_api_token: rewritten to read the per-task channel_metadata.linear_oauth_secret_arn (or LINEAR_OAUTH_SECRET_ARN env fallback) via boto3.secretsmanager. Lazy refresh: if expires_at is within 60s, POST refresh_token grant to Linear /oauth/token using client_id/client_secret co-located in the secret JSON, write the rotated token back via put_secret_value, return the new access_token. - pipeline.py: passes config.channel_metadata into resolve_linear_api_token. - linear-oauth.ts (CLI): StoredLinearOauthToken schema gains client_id + client_secret fields so Lambda + agent refresh can run without per-Lambda OAuth env vars. Setup wizard writes them. Tests pruned of AgentCore Identity mocks; new tests cover the Secrets-Manager-direct path (CDK 11 + agent 6 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(orchestrator): bundle import.meta.url shim for durable-execution SDK `@aws/durable-execution-sdk-js@1.1.3`'s ESM build calls `fileURLToPath(import.meta.url)` at module load. esbuild's ESM→CJS bundling leaves `import.meta.url` undefined, crashing every invocation with `TypeError: path must be a string`. Define an identifier substitution + banner that materializes a valid file:// URL from `__filename` at runtime. Discovered while smoke-testing Wave C end-to-end on backgroundagent-dev. Refs: aws/aws-durable-execution-sdk-js#543 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cli): switch OAuth callback to plain HTTP localhost Per RFC 8252 §7.3, OAuth providers (including Linear) treat http://localhost as a special case that doesn't need TLS — the connection never leaves the host. The previous self-signed-cert HTTPS approach forced testers through a "connection not private" warning that scared them off mid-setup. Drops the openssl shell-out + temp-cert plumbing (~60 lines) along with the user-facing warning copy in `bgagent linear setup`. Updates the callback constants to http://localhost:8080/oauth/callback and the test suite to plain http.GET. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(agent): ruff lint + format for OAuth refresh path Five lint errors surfaced when CI ran `ruff check --fix` against the Wave C agent changes: - F401 unused `timezone` import in `config.py` (replaced with `timedelta`, which is what's actually needed) - RUF034 useless if-else in the `expires_at` ternary — both branches returned identical strings before the recompute below; flatten into a single straightforward `if expires_in: ... else: ...` block - E501 three line-length violations in `config.py` and `test_config.py` — break the long expressions onto helper-named intermediates Confirmed locally: `ruff check .` clean, `ruff format --check` clean, `pytest tests/test_config.py` 15/15 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(linear-feedback): rewrite tests against OAuth context signature Wave C migrated postIssueComment / addIssueReaction / reportIssueFailure from a (secretArn: string, ...) signature to a (ctx: LinearFeedbackContext, ...) signature, but the test file still passed bare strings — TypeScript caught it at compile time only when CI ran a full build. Three test suites failed to compile (the typecheck error blocked the whole suite, not just this file). - Mock `resolveLinearOauthToken` (the new resolver) instead of `getLinearSecret` (the old PAK fetcher). - Build a `LinearFeedbackContext` fixture with linearWorkspaceId + registryTableName, pass it everywhere SECRET_ARN was used. - Update the Authorization-header assertion to match the new `Bearer <token>` form (PAK was bare-token; OAuth is Bearer-prefixed). All 41 tests across linear-feedback, linear-webhook-processor, and orchestrate-task-feedback pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cdk): align yarn.lock with upstream main + bump table count for OAuth registry Two CI failures came together because they share a root cause: this branch's yarn.lock had drifted from upstream main during interim re-resolves, leaving an inconsistent dep tree that broke ts-jest's module resolution for @aws-cdk/mixins-preview/aws-bedrockagentcore. Restoring upstream main's yarn.lock fixes the resolution; the agent.test.ts table-count assertion then needs to bump from 12 to 13 to account for the LinearWorkspaceRegistryTable added in Phase 2.0b Wave A4. Verified locally: agent.test.ts (44/44) and github-tags.test.ts (5/5) both pass after the changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style: apply eslint --fix from CI's self-mutation guard CI runs `mise run build`, which invokes `eslint --fix` and then fails if the working tree changed (self-mutation guard). Three cosmetic lints needed applying: - Import-order: DynamoDBClient and CliError moved earlier in their files to satisfy alphabetic-by-package ordering - formatJson import added in alphabetic position in linear.ts - Three template literals with no interpolation converted to single-quoted strings in oauth-callback-server.ts and linear.ts (eslint quotes rule prefers single-quotes when no template variables are used) Pure mechanical fixes; no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(oauth): refresh-token race recovery + log gaps from review Addresses the blocker + critical items from PR review: - Refresh-token race (review blocker). Linear rotates refresh_tokens on every use; concurrent Lambdas/agents racing the same secret will all read the same expiring token and one's refresh will succeed while the others get `invalid_grant`. On `invalid_grant`, re-read the secret from Secrets Manager (bypassing cache). If the refresh_token has changed, another caller already rotated; use the freshly-read token (or retry refresh once if it's also expiring). If unchanged, the refresh_token is permanently rejected and the workspace needs re-onboarding. Implemented in both the TS resolver (linear-oauth-resolver.ts) and Python resolver (config.py). - Unguarded bedrock_agentcore import in agent/src/server.py (review critical). The bare `from bedrock_agentcore.runtime.context import BedrockAgentCoreContext` inside `_run_task_background` killed the entire pipeline thread with no diagnostic if the SDK was missing or its module structure changed. Wrap in try/except (ImportError, AttributeError) and log via _warn_cw — the Linear token resolver has its own SM fallback, so the agent can proceed without the workload-token bridge. - Cache invalidation on fetch-level refresh failure (review high). The TS resolver's `invalidateLinearOauthCache()` only ran in the `!resp.ok` branch; if `fetch()` itself threw (timeout, DNS), the catch returned null without invalidating, leaving the stale expiring token cached for 60s and hammering Linear's token endpoint. Move invalidate into the fetch-level catch too. - Malformed expires_at log (review medium). The Python `_is_expiring` caught `ValueError` and silently returned True, masking consistently-bad writes. Add a WARN log so operators see the bad data instead of just an unexplained refresh on every task. - Positive-path refresh log (review non-blocking aws-samples#5). Added INFO-level breadcrumb on successful refresh in both resolvers so operators diagnosing intermittent 401s have a trace of which workspace refreshed and to what expiry. 11/11 existing resolver unit tests still pass; will add tests for the new race-recovery branch in a followup commit. * fix(2.0b-O2): review-2 batch — error specificity, half-creates, runbook Addresses four PR review items focused on operator UX when things go sideways: - isWebhookSecretConfigured (review high). The bare `catch { return false }` swallowed AccessDeniedException and DecryptionFailureException, making setup re-prompt for a webhook secret when the real problem was IAM. Now: only ResourceNotFoundException returns false; everything else throws a CliError pointing the operator at the IAM gap. Test updated to assert both paths. - admin invite-user half-create (review medium). If AdminCreateUser succeeds but AdminSetUserPassword fails (stricter password policy than generator, partial IAM grant on the Set verb), the user was left in FORCE_CHANGE_PASSWORD with no diagnostic. Wrap the second call in try/catch and throw a CliError that names the user, explains the broken state, and gives both a delete-user CLI and a manual-fix path. - PAK migration runbook (review non-blocking #1). Expanded the "Migration from 2.0a (PAK) to 2.0b (OAuth)" section in LINEAR_SETUP_GUIDE.md with: a pre-deploy checklist, what survives the migration vs what doesn't, an explicit rollback note (fix forward; the original PAK secret is gone with the CFN resource), and the per-step difference between 2.0a-with-Identity (skipped) vs 2.0a-with-PAK (migrate) deploys. - Vestigial AgentCore Identity dep (review non-blocking #2). bedrock-agentcore==1.9.1 is kept in agent/pyproject.toml because the workload-token bridge in server.py still calls it (now wrapped in try/except per review batch 1). Add an inline comment explaining why it's pinned even though Phase 2.0b-O2 reads Secrets Manager directly — it's the seam for resuming the AgentCore Identity path in 2.0c. CLI tests: 13/13 pass. * fix(2.0b-O2): batch 3 — schema validation, CallbackResult union, parity contract Three remaining substantive review items from PR aws-samples#160: - Validate all 11 fields in getOauthSecret (review non-blocking aws-samples#7). Was checking only access_token / refresh_token / expires_at; missing client_id or client_secret only surfaced 24h later when the refresh call needed them and found undefined. Extracted the required-field list into a const next to the StoredOauthToken interface and check the full set at deserialization. Bad secrets fail fast at fetch time with a structured log line naming the missing fields. - CallbackResult discriminated union (review non-blocking aws-samples#6). Was `{ sessionId: string|null, code: string|null, state: string|null }` which let callers construct unreachable shapes. Split into `{ kind: 'agentcore', sessionId } | { kind: 'direct-oauth', code, state }`. Updated the resolver site (`oauth-callback-server.ts`), the consumer (`bgagent linear setup`), and the test file to use exhaustive type-narrowing. The setup wizard now errors clearly if it gets the agentcore shape (parked path) instead of silently passing nulls down. - Cross-language schema-parity contract test (review non-blocking #3). CLI's StoredLinearOauthToken and Lambda's StoredOauthToken define the same JSON-in-Secrets-Manager schema independently; drift between the two would be a silent bug (CLI writes one field name, Lambda reads another, refresh works, every Lambda invocation logs a missing-field error). New test in `cdk/test/contracts/stored-oauth-token-parity.test.ts` regex-parses both interface definitions out of source and asserts the field set is equal. Also asserts the new `STORED_OAUTH_TOKEN_REQUIRED_FIELDS` const matches the interface, so future field additions can't drift between the validator and the type. CLI tests 286/286 pass. CDK resolver + contract 13/13 pass. * style: apply eslint --fix indentation on CallbackResult union * fix(2.0b-O2): review-3 batch — defensive error handling, security hardening, refresh test coverage Bugs (B1-B3): - linear-oauth-resolver: try/catch around ddb.send() in getRegistryRow so transient DDB errors fail the resolver cleanly instead of crashing the Lambda thread - orchestrate-task: try/catch around reportIssueFailure in notifyLinearOnConcurrencyCap; Linear feedback failures must never block the rejection path - Python _fetch_token: guard json.loads + KeyError so a corrupted SM payload returns None (logged ERROR) rather than raising Tests (T1-T3): - Python: TestResolveLinearApiTokenRefreshPaths covering happy refresh, invalid_grant + concurrent rotation, invalid_grant + no rotation, malformed expires_at, network failure during refresh, and corrupted secret JSON - TS: concurrent-refresh recovery via re-read; permanent-rejection on same refresh_token; cache invalidation after network failure Security (S1-S3): - agent runtime IAM: drop secretsmanager:PutSecretValue on the Linear OAuth secret prefix. Untrusted repo code in the agent must not be able to overwrite tokens; Lambdas (trusted) handle persistence. The refreshed in-memory token still works for the current task; rotated refresh_token is lost on agent exit but Linear's grace window absorbs the rare race where the agent refreshes strictly before any Lambda - Python _try_refresh_once: narrow except Exception to (urllib.error.URLError, OSError) — programmer errors propagate with clean stack traces instead of being swallowed - linear-oauth-resolver: RegistryRowStatus is now a discriminated 'active' | 'revoked' literal; missing or unknown values fail closed to revoked rather than defaulting active * fix(2.0b-O2): use email.message.Message for HTTPError hdrs in tests ty's stricter typeshed flagged dict[Unknown, Unknown] passed where Message[str, str] is expected. Drop-in replacement that satisfies both ty and runtime. * fix(2.0b-O2): update feedback test for B2 — helper now swallows internally Round-3 review B2 moved the try/catch into notifyLinearOnConcurrencyCap. The pre-existing test asserted the old contract (rejection propagates, caller must catch). Flip the contract assertion to reflect the new behavior: the helper is now best-effort end-to-end and returns undefined on internal failure. --------- Co-authored-by: bgagent <bgagent@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alain Krok <alkrok@amazon.com>
Summary
channel_source='linear'. Agent updates the originating Linear issue inline (comments, state transition, reactions) via Linear's hosted MCP server + a direct GraphQL client for reactions. Live demo: https://isadeks.github.io/abca-linear-demo/ (7 agent-authored PRs merged, site deployed per merge).checkChannelAccess. Originally on a separate local branch; folded into this PR per your preference for a single combined review surface.aws-samples/mainto absorb Sam's PR feat(interactive-agents): async-only background task UX + Cedar HITL design #52 (FanOutConsumer) and adjacent changes. Resolved 13 conflicts across schema additions, construct wiring, agent pipeline ordering, and a flaky-test fix. Full test suite green post-merge (CDK 1121, agent 526, CLI 76).Response to your #42 review concerns
Being honest about what Linear addresses vs. sidesteps:
SlackNotifyFn+FanOutConsumer= exactly 2/2 at the AWS limit. Structural fix still needs to happen, not in this PR — migratingSlackNotifyFn(from #42) intoFanOutConsumeras a dispatcher subscriber would drop the count back to 1 and restore headroom. We deliberately held the line and didn't add a third.channelSource === 'linear'appears only inlinear-webhook-processor.ts(one call site, one docstring). If a fourth channel ships as a stream consumer in the future, the duplication concern returns — but the #52 dispatcher pattern is the intended long-term shape for that; future channels should subscribe to it rather than add another filter-in-each-consumer.linear-webhook.tsonly, with key${issueId}#${action}#${webhookTimestamp}(8h TTL, exceeds Linear's 7h retry horizon). No TaskRecord-level dedup extensions.channel_metadataaccumulates lifecycle state (e.g.slack_session_msg_ts)linear_*_msg_tsfields onTaskRecord. All per-issue state (comments, state id, reaction id) lives in Linear itself, accessed at runtime via MCP + GraphQL. The container is the only place that touches Linear state; nothing is stashed on the record.Architectural notes
Why MCP-based outbound instead of a FanOutConsumer subscriber. Linear ships a first-class MCP server (
https://mcp.linear.app/mcp), so the agent can update the issue from inside the container using tools — same place the work is happening, no extra Lambda hop, no duplicated state. The fanout pattern is a better fit for ephemeral notifications (Slack thread updates, email) than stateful back-and-forth with a ticket (comment threads, state transitions, reactions that need the originating issue id).Operationally:
linear-integration.tshas zero stream consumers and zero outbound Lambdas. The only Lambdas are the webhook receiver, the async processor, andlinear-link(CLI account linking). Everything user-visible happens in the agent container.WAF scope-down on
/v1/linear/webhook. Linear Issue webhook payloads routinely exceed the 8 KB body-size rule inAWSManagedRulesCommonRuleSet(observed 8614 bytes in smoke). Scoped the CRS out for this one path viascopeDownStatement. The route retains HMAC verification,webhookTimestampreplay guard (60s), JSON schema validation, webhook dedup, and the priority-3 rate-limit rule.AWSManagedRulesKnownBadInputsRuleSetremains unchanged.Auto-link on
bgagent linear setup. Linear identifies the API-key owner viaviewer { id }, so we look up the caller's Cognitosubfrom the cached id_token and write theLinearUserMappingrow directly. Fail-open on auth / network errors (secrets still get stored; user is told to re-runbgagent login). This closes the "now go pastebgagent linkin a comment" dance for the common single-admin case.v1 multi-user scope. Only the API-token owner can submit tasks from Linear in v1. Tasks triggered by other Linear users are dropped by the processor with
"Linear actor has no linked platform user — skipping task creation".bgagent linear link <code>is registered as a CLI command but the Linear-side code generator isn't in v1 — OAuth bot install is the proper multi-user UX and will subsume any link-code bridge, so shipping the bridge didn't seem worth it. Constraint is documented in the setup guide.In scope / out of scope
In this PR (v1 / shipped):
LinearProjectMappingTable,LinearUserMappingTable) + dedup tableLinearIntegrationCDK construct (zero stream consumers, zero outbound Lambdas)configure_channel_mcp, gated onchannel_source='linear')create_reaction)bgagent linear setup(+ auto-link),bgagent linear link,bgagent linear list-projects,bgagent linear onboard-project(+ UUID validation)save_comment,save_issuewith graceful "no matching state" fallback)test_serverrace fixOut of scope (planned follow-ups):
/bgagent retryin a Linear comment) for partial-PR recovery.Commentwebhook subscription + slash-command parser in processor.bgagent cancelSIGKILLs the container, so Linear-side ❌ never fires from the agent'sexcept.Validation
webhookTimestampkeys → no delivery collapseonboard-projectrejects the Linear-URL-slug formbgagent cancelmid-run (documents the v3 gap)aws dynamodb describe-table --query 'Table.StreamSpecification'onTaskEventsTable— 2 consumers (SlackNotifyFn+FanOutConsumer); Linear adds 0.grep -rn "channelSource === 'linear'" cdk/src/handlers/— only inlinear-webhook-processor.ts.TaskRecordhas nolinear_*_msg_tsfields.Test plan for reviewer
mise //cdk:deploy— CFN diff should add onlyLinearIntegration/*resources on top of currentaws-samples/main. Slack beyond Lambda bundle refreshes should not change.bgagent linear setup. It prints the webhook URL, pauses at the signing-secret prompt. Create a webhook in Linear at that URL (subscribe toIssue), copy the signing secret, return to the terminal and paste secret + API key. Auto-link fires for the token owner; success message:✓ Linked Linear user … → platform user ….bgagent linear list-projects→ copy the target project UUID.bgagent linear onboard-project <uuid> --repo <your-fork>/<demo-repo> --label abca.abcalabel applied at creation time.🤖 Starting…comment, PR opens, state →In Review, 👀 swaps for ✅ when the PR URL comment is posted and the task finishes.Closes #42.