Skip to content

feat(linear): v1.1 polish — pre-container feedback, state-on-start, sweep, nits#87

Merged
krokoko merged 15 commits into
aws-samples:mainfrom
isadeks:feat/linear-processor-feedback
May 18, 2026
Merged

feat(linear): v1.1 polish — pre-container feedback, state-on-start, sweep, nits#87
krokoko merged 15 commits into
aws-samples:mainfrom
isadeks:feat/linear-processor-feedback

Conversation

@isadeks
Copy link
Copy Markdown
Contributor

@isadeks isadeks commented May 13, 2026

Summary

Wraps the v1.1 polish theme for the Linear integration, addressing the silent-drop gaps and PR #63 review recommendations in a single PR. Six rejection paths now produce user-visible feedback; the issue state transitions on agent-start (not just at PR-open); a stale-reaction sweep keeps re-runs clean; and four small review nits are addressed.

What changed

1. Pre-container failure feedback (5 paths in processor + 1 in orchestrator)

Linear-triggered tasks that fail before the agent container starts now produce a Linear comment + ❌ reaction within ~20s of label-apply. Previously, six distinct rejection points all silently dropped:

# Trigger Layer Message
1 Issue has no projectId processor "isn't in a project — ABCA needs a Linear project to route tasks to a repo"
2 Project not onboarded / removed processor "isn't onboarded; admin can run bgagent linear onboard-project"
3 Webhook missing organization or actor processor diagnostic ("report to your ABCA admin")
4 Linear actor has no linked platform user processor "v1 only the API-token owner can submit; multi-user OAuth on v3 roadmap"
5 createTaskCore returns non-201 processor branched on status: guardrail/validation surfaces error string; 503 prompts retry; other 4xx/5xx generic
6 User concurrency cap rejection orchestrator "concurrency limit; wait for one to finish, then re-apply the label"

All six call sites use a shared helper cdk/src/handlers/shared/linear-feedback.tsreportIssueFailure(secretArn, issueId, message) posts the comment and reaction in parallel via Promise.allSettled. Reuses the existing 5-minute getLinearSecret cache.

2. State transition at agent-start

Mirrors the existing In Review transition that fires at PR-open. The prompt addendum's step 1 now instructs the agent to transition Backlog/TodoIn Progress before doing repo work, so Linear's state column reflects "being worked" during the minutes between label-apply and PR-open. Falls back to Todo if In Progress doesn't exist; skips if the issue is already in In Progress or any later state. Doesn't loop on list_issue_statuses.

3. Stale-reaction sweep

Re-runs (label removed and re-applied; or pre-container ❌ followed by a successful retry) used to leave prior 👀/✅/❌ accumulated next to the new run's reaction. Added _sweep_stale_reactions(issue_id) to agent/src/linear_reactions.py:

  • Architecture: 👀-first, sweep-async — the user-visible 👀 fires first (~150ms typical). The sweep runs on a daemon thread after the 👀 lands, so cold-network latency (Linear's first call from a fresh container can be ~5s) doesn't gate the user-visible signal or block the agent pipeline.
  • Filters by viewer + emoji — only deletes the bot's own 👀/✅/❌, never touches reactions a human added even if they happen to use the same emoji.
  • Excludes the just-posted 👀 from its own sweep (passes exclude_id=rid).
  • Best-effort — failure leaves the visual-duplication bug we set out to fix, but the pipeline proceeds unaffected.

Smoke-tested on backgroundagent-dev: react_task_started total = 156ms; sweep done in background at +303ms; agent started at +3ms after react_task_started exit.

4. PR #63 review nits (4)

  • Auth circuit breaker in linear_reactions.py — track consecutive 401/403s; after 3 strikes, ERROR once and short-circuit until container restart. Replaces the prior behaviour where revoked tokens flooded CloudWatch with WARNs forever.
  • Explicit linear_eyes_reaction_id: str | None = None init in pipeline.py instead of locals().get(...) in the crash handler.
  • Narrow except Exception in resolve_linear_api_token to (BotoCoreError, ClientError). Switch print()shell.log("WARN", ...).
  • Admin-assisted fallback docs for bgagent linear setup auto-link failures. Replaces misleading "run bgagent linear link <code>" suggestion (the command is non-functional in v1) with explicit DynamoDB put-item steps and a clear v3-OAuth note.

Plumbing

  • linear-integration.ts — wires LINEAR_API_TOKEN_SECRET_ARN into the webhook processor + grants read on LinearIntegration.apiTokenSecret.
  • agent.ts — same wiring for orchestrator.fn (declared after LinearIntegration so done in the stack rather than as a prop).
  • Both grants verified at synth: LinearIntegrationWebhookProcessorFn and TaskOrchestratorOrchestratorFn both carry the env var and IAM policy.

Test plan

Unit tests — 1240 baseline → 1268 CDK, 526 → 532 agent, 196 CLI, all green:

  • cdk/test/handlers/shared/linear-feedback.test.ts (new, 13 tests) — mutation shape, auth header, error swallowing in 4 distinct failure modes, Promise.allSettled partial-success semantics.
  • cdk/test/handlers/linear-webhook-processor.test.ts — extended with a user-visible feedback describe block, 10 new tests: one assertion per rejection path + happy-path-doesn't-fire + filter-rejection-doesn't-fire (latter is intentional UX).
  • cdk/test/handlers/orchestrate-task-feedback.test.ts (new, 5 tests) — covers notifyLinearOnConcurrencyCap with withDurableExecution mocked.
  • agent/tests/test_linear_reactions.py — extended with 5 sweep tests covering scoping rules (viewer-owned bgagent emojis only; preserves human reactions even with same emoji; preserves bot reactions of other emojis; sweep failure doesn't block 👀; viewer-id cached across calls).

Lint + synth + agent quality: clean.

Smoke tested on backgroundagent-dev:

  • Path 2 (project not onboarded) ✅
  • Path 5a (guardrail block) ✅ — ❌ + comment within ~20s of label
  • 👀-fast architecture ✅ — eyes-visible at +156ms confirmed via runtime container logs

Smoke tests not run for paths 3 (defensive, no natural reproducer) and 5b (503, no natural reproducer) — covered by unit tests. Path 6 (concurrency cap) requires 4 simultaneous tasks to reproduce; covered by unit tests.

Reviewer notes

  • No linear_*_msg_ts fields on TaskRecord — all per-issue state stays in Linear, accessed at runtime.
  • No new stream consumers — reactions/comments use direct GraphQL from agent + Lambda; doesn't touch the FanOutConsumer / TaskEventsTable 2-reader-per-shard limit (closed by feat(fanout): migrate SlackNotifyFn to FanOutConsumer subscriber (#64) #79 anyway).
  • Sweep is daemon-threaded — never blocks the agent pipeline. If the container terminates before the sweep finishes, the worst case is the visual-duplication bug we set out to fix (status quo from before this PR).
  • Auth circuit breaker is per-container — resets on container restart, doesn't persist to DynamoDB. Right tradeoff for a 5-15 minute task; revisit if container lifetimes grow much longer.

Closes the silent-drop UX gap that appeared whenever a Linear-triggered task
was rejected before the agent container started — the user would apply the
trigger label, see nothing happen, and have no way to know why. Reactions
and progress comments are emitted by the agent container; nothing fired
until that point, so all upstream rejections were invisible on the Linear
side.

This commit wires a best-effort GraphQL feedback path covering all six
distinct rejection points:

In `linear-webhook-processor.ts` (pre-`createTaskCore`):
  1. Issue has no projectId → "isn't in a project" comment
  2. Project not onboarded / removed → "isn't onboarded; admin can run
     `bgagent linear onboard-project`" comment
  3. Webhook missing organization or actor → diagnostic comment
  4. Linear actor has no linked platform user → "v1 only the API-token
     owner can submit; multi-user OAuth is on the v3 roadmap" comment
  5. `createTaskCore` returns non-201 → message branched on status:
     guardrail/validation block surfaces the user-facing error string;
     503 prompts the user to re-apply the label; other 4xx/5xx falls
     through to a generic message.

In `orchestrate-task.ts` (post-201, in admission control):
  6. User concurrency cap rejection → "concurrency limit; wait for one
     to finish, then re-apply the label" comment.

All five processor paths and the orchestrator path call a shared helper,
`reportIssueFailure(secretArn, issueId, message)`, that runs the comment
and ❌ reaction in parallel via `Promise.allSettled`. The helper:

  - Reuses the existing 5-minute `getLinearSecret` cache from
    `linear-verify.ts` (no extra Secrets Manager hits on warm Lambdas).
  - Swallows network, auth, and GraphQL errors with WARN logs — Linear
    feedback is advisory and must never gate the rejection path.
  - Posts to Linear's hosted GraphQL endpoint; mutation shapes match
    `agent/src/linear_reactions.py` (`commentCreate`, `reactionCreate`).

CDK plumbing:

  - `linear-integration.ts` — wires `LINEAR_API_TOKEN_SECRET_ARN` into
    the webhook processor and grants read on the existing
    `LinearIntegration.apiTokenSecret`.
  - `agent.ts` — grants the same secret to `orchestrator.fn` and
    populates the env var. The grant is unconditional; the orchestrator
    only invokes the helper when `task.channel_source === 'linear'`.

The non-Linear case is a hard no-op at the call site — `notifyLinear-
OnConcurrencyCap` early-returns on `channel_source !== 'linear'`, and the
processor only handles Linear payloads. Slack/API/webhook tasks are
unaffected.

Tests (28 new; 1240 → 1268, all green):

  - `cdk/test/handlers/shared/linear-feedback.test.ts` (13 tests):
    mutation shape, auth header, error swallowing in 4 distinct failure
    modes (secret-resolution null, non-2xx, GraphQL `errors`, network
    throw), `Promise.allSettled` partial-success semantics.
  - `cdk/test/handlers/linear-webhook-processor.test.ts` (10 new tests
    in a `user-visible feedback` describe block): one assertion per
    rejection path + happy-path-doesn't-fire + filter-rejection-doesn't-
    fire (the latter is intentional UX — the processor sees many events
    that aren't tasks, and dropping a comment on each would be noisy).
  - `cdk/test/handlers/orchestrate-task-feedback.test.ts` (5 tests):
    new file; covers `notifyLinearOnConcurrencyCap` directly with
    `withDurableExecution` mocked. Asserts the linear path fires; the
    api/webhook/slack paths no-op; missing metadata, missing env, and
    undefined `channel_metadata` all no-op cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@isadeks isadeks requested a review from a team as a code owner May 13, 2026 22:35
@isadeks isadeks marked this pull request as draft May 13, 2026 22:47
 nits

Wraps the v1.1 polish theme from PR aws-samples#87. Five small additions, all
agent-side or docs:

State-on-start (the user-visible one):
  - prompt_builder._channel_prompt_addendum now instructs the agent to
    transition the originating Linear issue to `In Progress` (or `Todo`
    fallback) at agent-start, mirroring the existing `In Review` chain
    fired at PR-open. Closes the gap where the issue stayed at `Backlog`
    during real agent work — only the 👀 reaction and "🤖 Starting"
    comment signaled progress, while humans-using-Linear expect the state
    column to reflect "being worked." Skips if the issue is already in
    `In Progress` or any later state; doesn't loop on list_issue_statuses.

Alain aws-samples#63 review nits (4 small surgical changes):
  - linear_reactions.py: auth-failure circuit breaker. Track consecutive
    401/403s; after 3 strikes, log ERROR once and short-circuit all later
    _graphql calls (return None) until the container restarts. Resets on
    any 2xx response. Replaces the prior behaviour where revoked tokens
    flooded CloudWatch with WARNs and wasted Linear API quota indefinitely.
  - pipeline.py: declare `linear_eyes_reaction_id: str | None = None`
    explicitly before the try block instead of relying on
    `locals().get("linear_eyes_reaction_id")` in the crash handler.
    Functionally identical; survives refactors and reads cleanly.
  - config.py::resolve_linear_api_token: narrow `except Exception` to
    `(BotoCoreError, ClientError)` from botocore.exceptions. Switch
    `print()` to `shell.log("WARN", ...)` so warnings join the structured
    log stream the rest of the agent uses.
  - LINEAR_SETUP_GUIDE.md + cli/src/commands/linear.ts: stop telling
    users to run `bgagent linear link <code>` when auto-link fails — the
    code generator is a v3 feature that doesn't ship in v1, so the
    suggestion was misleading. Replaced with explicit admin-assisted
    fallback (DynamoDB put-item with steps to find workspaceId, viewerId,
    Cognito sub) and a clear "this command exists but is non-functional
    in v1" note.

Tests: 532 agent + 1268 cdk + 196 cli, all green. Deployed to
backgroundagent-dev. Smoke-tested 👀-on-start (156ms, agent unblocked)
in the prior commit; state-on-start smoke is the next manual step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@isadeks isadeks changed the title feat(linear): comment + react on pre-container task-creation failures feat(linear): v1.1 polish — pre-container feedback, state-on-start, sweep, nits May 14, 2026
Whitespace-only changes flagged by CI's self-mutation guard. No behaviour
change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@isadeks isadeks marked this pull request as ready for review May 14, 2026 21:08
@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented May 15, 2026

Code Review Summary

Great PR — the architecture is sound, test coverage is thorough (28 new tests), and the "best-effort, never gate the pipeline" philosophy is consistently applied. A few items to address before merge:


Must fix

1. Thread-safety of circuit breaker globals (agent/src/linear_reactions.py)

_consecutive_auth_failures and _auth_circuit_open are read/written from both the main thread and the daemon sweep thread. The +=1 increment and the check-then-set pattern are not atomic. Worst case: duplicate "circuit OPEN" ERROR logs or lost counter resets.

→ Add a threading.Lock around the three module-level globals.

2. notifyLinearOnConcurrencyCap inside a retriable durable execution step (cdk/src/handlers/orchestrate-task.ts)

If notifyLinearOnConcurrencyCap throws unexpectedly, the durable execution framework retries the entire admission-control step — re-running failTask and emitTaskEvent (duplicate events).

→ Wrap in a defensive try-catch:

try {
  await notifyLinearOnConcurrencyCap(task);
} catch (feedbackErr) {
  logger.warn('Linear concurrency-cap feedback failed (non-fatal)', { error: ... });
}

Should fix

3. ImportError regression in resolve_linear_api_token (agent/src/config.py)

import boto3 is inside the try block but the narrowed except (BotoCoreError, ClientError) no longer catches ImportError. If boto3 is missing from the container image, the agent hard-crashes instead of gracefully degrading.

→ Add ImportError to the except tuple, or move the imports before the try.

4. Daemon thread crash silence (agent/src/linear_reactions.py)

If _sweep_stale_reactions hits an unexpected TypeError/AttributeError, the thread silently dies with no application-level log (stderr may not reach CloudWatch in containerized environments).

→ Wrap the thread target in a top-level try-except that logs via log("ERROR", ...).

5. Sweep delete counter inaccuracy (agent/src/linear_reactions.py)

_graphql(_DELETE_MUTATION, {"id": rid}) return value is discarded but deletes += 1 always fires. The summary log gives a false impression of success when deletes actually failed.

if _graphql(...) is not None: deletes += 1


Nice to have (non-blocking)

  • Tighten buildCreateTaskFailureMessage params — the sole callsite passes APIGatewayProxyResult fields which are number and string, never undefined. The | undefined arms are dead code.
  • Add a log when circuit breaker short-circuits — once the circuit opens, all subsequent calls return None with zero trace. A single "circuit still open" DEBUG log aids debugging.
  • Log AccessDeniedException at ERROR in resolveToken — IAM misconfig is persistent, not transient. WARN severity means no alert fires for a broken deployment.
  • Person name in comment (config.py line 70: "per Alain's feat(linear): add Linear integration, with Slack review fixes #63 review") — prefer referencing only the issue number.

Overall: solid work. The two "must fix" items are small changes that prevent real (if unlikely) failure modes. Happy to re-review once addressed.

bgagent and others added 2 commits May 15, 2026 09:01
- linear_reactions: guard auth-circuit globals with `_auth_state_lock`
  so the daemon sweep thread and the main thread can't race the
  read-modify-write on `_consecutive_auth_failures` /
  `_auth_circuit_open`.
- linear_reactions: wrap the daemon sweep target in
  `_sweep_stale_reactions_safe` so an unexpected exception logs at
  ERROR instead of dying silently (stderr from a daemon thread doesn't
  reliably reach CloudWatch).
- linear_reactions: only increment the sweep delete counter when
  `_graphql(_DELETE_MUTATION, ...)` actually returns a non-None
  response — previously the summary log overstated success.
- config: hoist `import boto3` out of the catch-narrowed try/except
  so an `ImportError` (boto3 missing from the image) degrades to a
  WARN log instead of crashing the agent.
- orchestrate-task: wrap `notifyLinearOnConcurrencyCap` in a
  defensive try/catch — durable-execution retries the entire
  admission-control step on throw, which would re-fire `failTask` +
  `emitTaskEvent` and produce duplicate events.
- tests: 1 new throw-propagation test for `notifyLinearOnConcurrencyCap`,
  3 new tests for `resolve_linear_api_token` (cached env, no-arn,
  ImportError fallback). Auto-reset fixture in
  `test_linear_reactions.py` now also resets the circuit-breaker
  globals between tests so future cases don't leak state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- linear_reactions: log a single DEBUG line when the auth circuit
  breaker short-circuits a call, so the path isn't zero-trace once
  open.
- config: split the `(BotoCoreError, ClientError)` catch so
  `AccessDeniedException` logs at ERROR instead of WARN — IAM
  misconfig is persistent and should page someone, not blend into
  transient warnings. Also drop the personal name from the inline
  reference to the aws-samples#63 review.
- linear-webhook-processor: tighten `buildCreateTaskFailureMessage`
  param types to `number` / `string` (no `| undefined`) — the only
  caller passes `APIGatewayProxyResult` fields which are always
  defined. Removes dead fallback-to-`'unknown'` branches.
- test_config: 2 new tests covering the split exception path
  (AccessDenied → ERROR; ResourceNotFound → WARN).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@isadeks
Copy link
Copy Markdown
Contributor Author

isadeks commented May 15, 2026

Thanks for the thorough review @krokoko. All 9 items addressed across two commits — split must/should-fix from nice-to-have so the load-bearing changes are isolated.

Must-fix (dd5474eb)

  1. Thread-safety of circuit breaker globals — added _auth_state_lock around the read-modify-write on _consecutive_auth_failures / _auth_circuit_open. The 401/403 increment-and-check is now atomic; the open-circuit check reads the flag under the lock then drops the lock before logging.
  2. notifyLinearOnConcurrencyCap defensive try/catch — wrapped in orchestrate-task.ts. New unit test (reportIssueFailure rejection propagates (caller must catch)) asserts the helper does throw on its caller, so the surrounding catch in admission-control is load-bearing rather than redundant.
  3. ImportError regression — split import boto3 into its own try/except ImportError block ahead of the narrowed (BotoCoreError, ClientError) catch. New test_import_error_degrades_gracefully covers it.
  4. Daemon thread crash silence — added _sweep_stale_reactions_safe wrapper that catches Exception at the thread top-level and logs at ERROR. Thread name unchanged so existing _join_sweep_thread() test helper still finds it.
  5. Sweep delete counter inaccuracydeletes += 1 now gated on _graphql(_DELETE_MUTATION, ...) is not None.

Also expanded the autouse fixture in test_linear_reactions.py to reset the circuit-breaker globals between tests so future cases don't accumulate state.

Nice-to-have (420fc113)

  • DEBUG log on circuit short-circuitlinear_reactions: auth circuit still open; short-circuiting call.
  • AccessDeniedException → ERROR — split the catch into ClientError (inspect Error.Code, escalate AccessDeniedException to ERROR) and BotoCoreError (WARN). Two new tests cover both branches.
  • Tightened buildCreateTaskFailureMessage paramsnumber, string (not | undefined) since the only caller passes APIGatewayProxyResult fields. Removed dead ?? 'unknown' fallbacks.
  • Person name scrubAlain's #63 review#63 review in config.py.

Verification

  • Agent: 537 tests pass (was 532 — 5 new in test_config.py)
  • CDK: focused suites (linear-webhook-processor, orchestrate-task-feedback, linear-feedback, orchestrate-task, create-task-core) all green; 1 new test in orchestrate-task-feedback.test.ts
  • Compile + ESLint + ruff format/check clean
  • Smoke-tested on backgroundagent-dev

Ready for re-review.

Copy link
Copy Markdown
Contributor

@krokoko krokoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review of PR #87

First review items: all addressed ✓

All 9 items from the initial review (5 must/should-fix + 4 nice-to-have) are correctly addressed in dd5474e and 420fc11:

  • Thread-safety lock on circuit breaker globals — correctly implemented with _auth_state_lock
  • Defensive try/catch around notifyLinearOnConcurrencyCap — in place, with a test proving the catch is load-bearing
  • ImportError regression — hoisted into its own try/except before the narrowed (BotoCoreError, ClientError) catch
  • Daemon thread crash wrapper — _sweep_stale_reactions_safe catches Exception and logs at ERROR
  • Sweep delete counter — gated on _graphql(...) is not None
  • All 4 nice-to-haves (DEBUG log, AccessDenied → ERROR, param tightening, name scrub) — done

New findings

Should fix

1. Missing defensive try/catch around reportIssueFailure in the webhook processor (linear-webhook-processor.ts)

The orchestrator correctly wraps notifyLinearOnConcurrencyCap in try/catch as belt-and-suspenders. The 5 bare await reportIssueFailure(...) calls in the webhook processor have no such defense. While Promise.allSettled structurally guarantees the function cannot reject today, a synchronous TypeError before the Promise.allSettled is constructed (e.g. if the function signature is refactored) would propagate, causing Lambda failure and SQS retries.

Suggestion: extract a local wrapper at the top of the file and call it at all 5 sites:

async function safeReportIssueFailure(issueId: string, message: string): Promise<void> {
  try {
    await reportIssueFailure(API_TOKEN_SECRET_ARN, issueId, message);
  } catch (err) {
    logger.warn('Linear feedback failed (non-fatal)', {
      issue_id: issueId,
      error: err instanceof Error ? err.message : String(err),
    });
  }
}

2. process.env.LINEAR_API_TOKEN_SECRET_ARN! non-null assertion without runtime guard (linear-webhook-processor.ts:30)

The orchestrator path correctly checks if (!secretArn) and logs a warning before returning. The processor uses a ! assertion at module scope — if the env var is missing due to a deploy misconfiguration, every feedback call silently fails with a cryptic WARN from resolveToken ("could not resolve API token") instead of a clear one-time diagnostic.

Suggestion: match the orchestrator pattern — drop the !, add a guard at usage or a startup-time log.

3. Auth circuit breaker has zero tests (test_linear_reactions.py)

The circuit breaker is ~40 lines of thread-safe logic (_AUTH_FAILURE_THRESHOLD, _consecutive_auth_failures, _auth_circuit_open, _auth_state_lock) with no test coverage. Missing scenarios:

  • 3 consecutive 401/403s open the circuit
  • Once open, calls short-circuit without hitting the network
  • A 2xx between failures resets the counter
  • Non-auth errors (500) don't reset the counter

The _reset_module_state fixture already handles cleanup, so adding a TestAuthCircuitBreaker class should be straightforward.

Nice to have

4. BotoCoreError handler path untested (test_config.py) — The PR narrowed except Exception to two separate handlers but only ClientError paths have tests. One test for BotoCoreError (e.g. EndpointConnectionError) would close this.

5. exclude_id filter in sweep is never exercised (test_linear_reactions.py) — In test_sweep_deletes_only_viewer_owned_bgagent_emojis, prior reaction IDs never collide with the new reaction ID, so the exclude_id branch is dead code in tests. Add a prior reaction whose ID matches the newly posted one and verify it's preserved.

Out of scope (pre-existing, noting for tracking)

6. resp.json() in linear_reactions.py::_graphql can throw JSONDecodeError — The body = resp.json() if resp.content else {} line is outside any try/except. If Linear returns a 200 with non-JSON body (HTML maintenance page), it raises ValueError. The new sweep code increases the probability of hitting this (3+ _graphql calls per react_task_started). The _sweep_stale_reactions_safe wrapper catches it for the daemon thread, but the main-thread path would propagate. Worth a separate fix.


Summary table

# Finding Severity
1 Bare await reportIssueFailure(...) — 5 sites without try/catch Should fix
2 ! non-null assertion without runtime guard Should fix
3 Auth circuit breaker has zero tests Should fix
4 BotoCoreError path untested Nice to have
5 exclude_id filter untested Nice to have
6 Pre-existing resp.json() crash risk Out of scope

Items 1–3 are worth addressing before merge. 4–5 would improve confidence but aren't blocking. 6 should be tracked separately.

Positive notes

  • Promise.allSettled in reportIssueFailure is the right construct — structurally guarantees the never-throw contract
  • Thread safety is correctly implemented — single lock, no nested acquisitions, no deadlock risk
  • Test quality is strong: _join_sweep_thread() for daemon thread sync, comprehensive positive and negative assertions on feedback paths, and the test proving reportIssueFailure rejection propagates (validating the orchestrator's catch is load-bearing) is particularly thoughtful
  • The "best-effort, never gate the pipeline" philosophy is consistently applied across all layers
  • CDK wiring (IAM grants + env vars) is correct in both linear-integration.ts and agent.ts
  • Docs sync is correct

krokoko and others added 5 commits May 17, 2026 19:07
- linear-webhook-processor: extracted `safeReportIssueFailure` helper
  and routed all 5 bare `await reportIssueFailure(...)` call sites
  through it. The helper is uniformly non-throwing — wraps the call
  in try/catch to defend against a future signature refactor that
  could break the helper's `Promise.allSettled` never-throw contract.
  A synchronous throw would otherwise propagate, fail the Lambda,
  and trigger SQS retries on a poison message.
- linear-webhook-processor: dropped the `!` non-null assertion on
  `process.env.LINEAR_API_TOKEN_SECRET_ARN` at module scope. The
  helper now guards on `!API_TOKEN_SECRET_ARN` and logs a single
  clear `Skipping Linear feedback: LINEAR_API_TOKEN_SECRET_ARN not
  set` diagnostic per skip — matches the orchestrator pattern in
  `notifyLinearOnConcurrencyCap`.
- test_linear_reactions: new `TestAuthCircuitBreaker` class with 5
  tests covering the previously-untested circuit:
    * 3 consecutive 401/403s open the circuit
    * Once open, calls short-circuit without hitting the network
    * A 2xx between failures resets the counter
    * Non-auth status (500) doesn't increment the counter
    * 401 and 403 are both treated as auth failures
- test_linear-webhook-processor: 2 new tests assert
  `safeReportIssueFailure` swallows both synchronous throws and
  async rejections from the underlying helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- test_config: cover the BotoCoreError branch of
  `resolve_linear_api_token` with an `EndpointConnectionError`
  case. The PR-aws-samples#87 split into ClientError + BotoCoreError branches
  previously had no test on the BotoCoreError path.
- test_linear_reactions: new
  `test_sweep_preserves_just_posted_eyes_via_exclude_id` exercises
  the `exclude_id` filter — the existing sweep test never collided
  prior reaction ids with the newly posted one, so the branch was
  effectively dead code in tests. The new test plants the just-
  posted 👀 in the prior reactions list and asserts it survives the
  sweep while an older ❌ is deleted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ty check` infers `_consecutive_auth_failures = 0` as `Literal[0]` and
`_auth_circuit_open = False` as `Literal[False]`, which then rejects
the legitimate runtime flips (and the test fixture that resets them
between cases). Adding explicit `int` / `bool` annotations widens the
inferred type and fixes the CI typecheck failure introduced in
`f4633be`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@isadeks
Copy link
Copy Markdown
Contributor Author

isadeks commented May 18, 2026

Thanks for the second pass @krokoko. All 5 actionable findings addressed across two commits, with the out-of-scope item parked for a follow-up. Ty narrowing pre-existing in f4633be caught by CI and fixed in 0a3711c.

Should-fix (f4633be)

1. Bare await reportIssueFailure(...) at 5 call sites in linear-webhook-processor.ts

Extracted a local safeReportIssueFailure(issueId, message) helper at the top of the file and routed all 5 sites through it. The helper wraps the call in try/catch so a synchronous throw (e.g. from a future signature refactor that breaks the helper's Promise.allSettled never-throw contract) can't propagate, fail the Lambda, and trigger SQS retries on a poison message. Two new tests assert the helper swallows both synchronous throws and async rejections.

2. process.env.LINEAR_API_TOKEN_SECRET_ARN! non-null assertion

Dropped the !. The new safeReportIssueFailure guards on !API_TOKEN_SECRET_ARN and logs a single clear Skipping Linear feedback: LINEAR_API_TOKEN_SECRET_ARN not set diagnostic per skip — matches the orchestrator pattern in notifyLinearOnConcurrencyCap. No more cryptic per-call WARN from resolveToken on deploy misconfig.

3. Auth circuit breaker has zero tests

New TestAuthCircuitBreaker class in test_linear_reactions.py with 5 tests covering all the scenarios you flagged:

  • test_three_consecutive_401s_open_the_circuit — threshold semantics
  • test_open_circuit_short_circuits_subsequent_calls — once open, no network
  • test_2xx_response_resets_failure_counter — 200 between 401s clears the counter
  • test_non_auth_errors_do_not_increment_failure_counter — 500 doesn't count
  • test_403_treated_same_as_401 — both auth statuses

Nice-to-have (a713c62)

4. BotoCoreError handler path untested

New test_botocore_error_logged_at_warn in test_config.py exercising the BotoCoreError branch with EndpointConnectionError. Asserts WARN severity and exception type in the log message.

5. exclude_id filter never exercised

New test_sweep_preserves_just_posted_eyes_via_exclude_id plants the just-posted reaction id in the prior reactions list and asserts the sweep skips it while still deleting an older . Previously dead branch in tests; now load-bearing.

Out of scope (0a3711c follow-up)

6. Pre-existing resp.json() crash risk on non-JSON 200

Tracked in things-to-do.md for a separate fix — the daemon-thread sweep is already wrapped by _sweep_stale_reactions_safe (added in dd5474e), but the main-thread react_task_started / react_task_finished paths would still propagate a JSONDecodeError. Worth fixing on its own so the diff is reviewable.

CI fix (0a3711c)

The first push of the must-fix commit had a ty check failure I missed locally — the module-level globals _consecutive_auth_failures = 0 / _auth_circuit_open = False were narrowed by ty to Literal[0] / Literal[False], which then rejected the legitimate runtime flips and the test fixture's resets. Fixed with explicit int / bool annotations.

Verification

  • Agent: mise //agent:quality clean — lint, format, ty check, 544 tests pass (was 537 — 7 new across test_linear_reactions.py and test_config.py)
  • CDK: mise //cdk:test clean — 1273 tests pass (was 1240 baseline — 2 new in linear-webhook-processor.test.ts for safeReportIssueFailure's sync + async failure handling)
  • mise //cdk:synth:quiet clean (only pre-existing CdkNag warnings on VPC endpoint security groups)

Ready for re-review.

isadeks pushed a commit to isadeks/sample-autonomous-cloud-coding-agents that referenced this pull request May 18, 2026
Migrates the agent runtime's Linear personal API token resolution from
AWS Secrets Manager to AWS Bedrock AgentCore Identity. This is the
"validate Identity SDK" step of the v2 plan; Phase 2.0b will swap the
API key for OAuth and converge Linear MCP onto AgentCore Gateway in
one cutover.

Per Alain's guidance: "start by using api key, if it works, switch to
oauth. you will setup an outbound auth for your server using agentcore
identity. that identity can be (AC identity is like a wrapper around
secrets manager) api key or oauth."

Lambdas (orchestrator + processor) intentionally keep using Secrets
Manager via the existing `LinearApiTokenSecret` for now. The Python
`bedrock_agentcore` SDK has no Node.js equivalent — Lambda migration
requires `@aws-sdk/client-bedrock-agentcore` raw API calls and folds
into 2.0b's bigger refactor. End-state of 2.0a: agent reads from
Identity, Lambdas read from Secrets Manager, both pointing at the same
underlying token value (admin populates both).

`agent/src/config.py::resolve_linear_api_token`:

  - Drops boto3 SecretsManager fetch + `LINEAR_API_TOKEN_SECRET_ARN` env.
  - Reads new env `LINEAR_API_KEY_PROVIDER_NAME` (provider name in
    Identity vault).
  - Calls `IdentityClient.get_api_key()` with the workload access token
    auto-injected into `BedrockAgentCoreContext` by AgentCore Runtime
    (verified by reading the SDK's `auth.py` decorator implementation —
    no manual workload-identity mint needed inside the runtime).
  - Caches the resolved token in `LINEAR_API_TOKEN` so downstream
    consumers stay unchanged: `channel_mcp.py`'s `${LINEAR_API_TOKEN}`
    placeholder in `.mcp.json` and `linear_reactions.py`'s GraphQL
    Authorization header.

Preserves PR aws-samples#87's nice-to-have improvements:

  - `ImportError` graceful fallback (now for `bedrock_agentcore` instead
    of `boto3`) — degrade with WARN, don't crash the agent.
  - `AccessDeniedException` and `ResourceNotFoundException` logged at
    ERROR severity (persistent IAM/config bugs that should page).
    Other ClientErrors stay at WARN (transient throttle/network).

`agent/pyproject.toml`: adds `bedrock-agentcore==1.9.1` dep.

`cdk/src/stacks/agent.ts`:

  - On the AgentCore runtime: drops `linearIntegration.apiTokenSecret.
    grantRead(runtime)` and the `LINEAR_API_TOKEN_SECRET_ARN` env-var
    override. Adds `LINEAR_API_KEY_PROVIDER_NAME` env (hardcoded
    `'linear-api-key'` for now; can parametrize later via context if
    multi-environment naming is needed) and IAM permissions for
    `bedrock-agentcore:GetResourceApiKey` and
    `bedrock-agentcore:GetWorkloadAccessToken`.
  - Lambdas (orchestrator + processor) untouched — they still grant on
    the Linear secret and read from Secrets Manager.
  - Resource scope on the new IAM is `*` for now; AgentCore Identity ARN
    format isn't fully standardized in public docs as of 2026-05-15.
    Tighten in 2.0b when OAuth migration documents the canonical
    resource shape.

`docs/guides/LINEAR_SETUP_GUIDE.md`: adds Step 4.5 documenting the
one-time `agentcore add credential --type api-key --name linear-api-key`
admin command users must run alongside the existing `bgagent linear
setup` wizard. Notes that Lambdas keep Secrets Manager temporarily and
2.0b will retire the dual-store setup. Starlight mirror synced.

`agent/tests/test_config.py::TestResolveLinearApiToken` — 10 tests
covering: cached env var fast-path; missing provider name; missing
region; workload token absent (outside runtime); happy path with
env-var side-effect; botocore error swallowed with WARN; SDK returns
None defensively; ImportError fallback; AccessDeniedException → ERROR
severity; ResourceNotFoundException → ERROR severity.

542 agent / 1271 cdk / 196 cli, all green. Lint + typecheck clean.
CDK synth clean.

`bedrock_agentcore` SDK confirmed working in our runtime image (verified
in `node_modules` post-install). The `BedrockAgentCoreContext` workload
token auto-injection is documented behaviour for code running inside
AgentCore Runtime — verified by reading the SDK's `@requires_api_key`
decorator implementation, which uses the same context lookup we use
here.

Stacked on PR aws-samples#87 (`feat/linear-processor-feedback`). Will conflict on
`config.py` and `test_config.py` if aws-samples#87 needs further rework before
merge — happy to rebase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@krokoko krokoko self-requested a review May 18, 2026 20:19
@krokoko krokoko added this pull request to the merge queue May 18, 2026
Merged via the queue into aws-samples:main with commit f38baad May 18, 2026
2 checks passed
isadeks added a commit to isadeks/sample-autonomous-cloud-coding-agents that referenced this pull request May 26, 2026
… 2.0b) (aws-samples#160)

* feat(linear): resolve API token via AgentCore Identity (Phase 2.0a)

Migrates the agent runtime's Linear personal API token resolution from
AWS Secrets Manager to AWS Bedrock AgentCore Identity. This is the
"validate Identity SDK" step of the v2 plan; Phase 2.0b will swap the
API key for OAuth and converge Linear MCP onto AgentCore Gateway in
one cutover.

Per Alain's guidance: "start by using api key, if it works, switch to
oauth. you will setup an outbound auth for your server using agentcore
identity. that identity can be (AC identity is like a wrapper around
secrets manager) api key or oauth."

Lambdas (orchestrator + processor) intentionally keep using Secrets
Manager via the existing `LinearApiTokenSecret` for now. The Python
`bedrock_agentcore` SDK has no Node.js equivalent — Lambda migration
requires `@aws-sdk/client-bedrock-agentcore` raw API calls and folds
into 2.0b's bigger refactor. End-state of 2.0a: agent reads from
Identity, Lambdas read from Secrets Manager, both pointing at the same
underlying token value (admin populates both).

`agent/src/config.py::resolve_linear_api_token`:

  - Drops boto3 SecretsManager fetch + `LINEAR_API_TOKEN_SECRET_ARN` env.
  - Reads new env `LINEAR_API_KEY_PROVIDER_NAME` (provider name in
    Identity vault).
  - Calls `IdentityClient.get_api_key()` with the workload access token
    auto-injected into `BedrockAgentCoreContext` by AgentCore Runtime
    (verified by reading the SDK's `auth.py` decorator implementation —
    no manual workload-identity mint needed inside the runtime).
  - Caches the resolved token in `LINEAR_API_TOKEN` so downstream
    consumers stay unchanged: `channel_mcp.py`'s `${LINEAR_API_TOKEN}`
    placeholder in `.mcp.json` and `linear_reactions.py`'s GraphQL
    Authorization header.

Preserves PR aws-samples#87's nice-to-have improvements:

  - `ImportError` graceful fallback (now for `bedrock_agentcore` instead
    of `boto3`) — degrade with WARN, don't crash the agent.
  - `AccessDeniedException` and `ResourceNotFoundException` logged at
    ERROR severity (persistent IAM/config bugs that should page).
    Other ClientErrors stay at WARN (transient throttle/network).

`agent/pyproject.toml`: adds `bedrock-agentcore==1.9.1` dep.

`cdk/src/stacks/agent.ts`:

  - On the AgentCore runtime: drops `linearIntegration.apiTokenSecret.
    grantRead(runtime)` and the `LINEAR_API_TOKEN_SECRET_ARN` env-var
    override. Adds `LINEAR_API_KEY_PROVIDER_NAME` env (hardcoded
    `'linear-api-key'` for now; can parametrize later via context if
    multi-environment naming is needed) and IAM permissions for
    `bedrock-agentcore:GetResourceApiKey` and
    `bedrock-agentcore:GetWorkloadAccessToken`.
  - Lambdas (orchestrator + processor) untouched — they still grant on
    the Linear secret and read from Secrets Manager.
  - Resource scope on the new IAM is `*` for now; AgentCore Identity ARN
    format isn't fully standardized in public docs as of 2026-05-15.
    Tighten in 2.0b when OAuth migration documents the canonical
    resource shape.

`docs/guides/LINEAR_SETUP_GUIDE.md`: adds Step 4.5 documenting the
one-time `agentcore add credential --type api-key --name linear-api-key`
admin command users must run alongside the existing `bgagent linear
setup` wizard. Notes that Lambdas keep Secrets Manager temporarily and
2.0b will retire the dual-store setup. Starlight mirror synced.

`agent/tests/test_config.py::TestResolveLinearApiToken` — 10 tests
covering: cached env var fast-path; missing provider name; missing
region; workload token absent (outside runtime); happy path with
env-var side-effect; botocore error swallowed with WARN; SDK returns
None defensively; ImportError fallback; AccessDeniedException → ERROR
severity; ResourceNotFoundException → ERROR severity.

542 agent / 1271 cdk / 196 cli, all green. Lint + typecheck clean.
CDK synth clean.

`bedrock_agentcore` SDK confirmed working in our runtime image (verified
in `node_modules` post-install). The `BedrockAgentCoreContext` workload
token auto-injection is documented behaviour for code running inside
AgentCore Runtime — verified by reading the SDK's `@requires_api_key`
decorator implementation, which uses the same context lookup we use
here.

Stacked on PR aws-samples#87 (`feat/linear-processor-feedback`). Will conflict on
`config.py` and `test_config.py` if aws-samples#87 needs further rework before
merge — happy to rebase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(linear): use aws CLI for credential provider, not the agentcore command

The setup guide referenced `agentcore add credential` which doesn't actually
work end-to-end:

  - The Python `bedrock-agentcore-starter-toolkit` CLI (`agentcore`) only
    exposes agent-lifecycle commands; there is no `credential-provider`
    subcommand. Confirmed by reading the toolkit's CLI reference and by
    user trying `agentcore configure credential-provider --type api-key
    --name ...` and receiving `No such command 'credential-provider'`.
  - The new npm `@aws/agentcore` CLI does have `agentcore add credential`
    but uses a declarative project model — the credential lands in
    `agentcore.json` + `.env.local`, not the actual AgentCore Identity
    vault, until `agentcore deploy` runs against a project structured for
    that CLI. ABCA isn't structured that way.

Switch the docs to the plain AWS CLI which works directly against the
AgentCore Identity API:

    aws bedrock-agentcore-control create-api-key-credential-provider \
      --name linear-api-key \
      --api-key "<paste lin_api_… token here>" \
      --region us-east-1

Plus the matching `list-api-key-credential-providers` for verification.
Add a "Tooling note" at the bottom of the section explaining why the
plain AWS CLI is the right path here vs. the two `agentcore` CLIs.

Starlight mirror synced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(linear): pass runtimeUserId so AgentCore injects WorkloadAccessToken

Smoke on backgroundagent-dev caught a real bug in the Phase 2.0a
migration: the agent's `resolve_linear_api_token()` was correctly
calling `IdentityClient.get_api_key()` but failing earlier at
`BedrockAgentCoreContext.get_workload_access_token()` returning None.
The Linear MCP then loaded with an unresolved `${LINEAR_API_TOKEN}`
placeholder and 👀 didn't post.

Root cause (from reading bedrock-agentcore-sdk-python source):

The `WorkloadAccessToken` request header (which the runtime container
reads to populate `BedrockAgentCoreContext`) is only injected by
AgentCore Identity when `InvokeAgentRuntimeCommand` is called with
`runtimeUserId`. Per AWS docs at
https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-oauth.html:

  "Agent Runtime exchanges this token for a Workload Access Token via
   bedrock-agentcore:GetWorkloadAccessTokenForJWT API and delivers it
   to your agent code via the payload header `WorkloadAccessToken`."

Without `runtimeUserId`, AgentCore never derives a workload token and
the header is absent. `app.py::_build_request_context` reads the
header off the inbound request; the agent sees None.

Fix:

1. Thread `userId` through the `ComputeStrategy.startSession` interface
   (compute-strategy.ts).
2. Pass `task.user_id` (the task's Cognito sub) at the call site in
   orchestrate-task.ts.
3. Set `runtimeUserId: input.userId` on `InvokeAgentRuntimeCommand` in
   agentcore-strategy.ts. Log it alongside session_id for traceability.
4. ECS strategy accepts the new parameter to satisfy the interface;
   doesn't use it (ECS doesn't go through AgentCore Identity).
5. Grant the orchestrator role `bedrock-agentcore:InvokeAgentRuntimeForUser`
   alongside `InvokeAgentRuntime` (task-orchestrator.ts). Without this,
   the new `runtimeUserId` parameter would 403.

Tests updated:
- `agentcore-strategy.test.ts`: pin that `runtimeUserId` flows from
  input into the SDK command; pass `userId: 'cognito-user-1'` in 4 call
  sites.
- `ecs-strategy.test.ts`: pass `userId` (unused by ECS) on 3 call sites.
- `start-session-composition.test.ts`: pass `userId: 'cognito-test'` on
  3 call sites.
- `task-orchestrator.test.ts`: assert the IAM action list includes
  `InvokeAgentRuntimeForUser` (2 assertions).

542 agent / 1273 cdk / 196 cli — all green. Lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(linear): two undocumented gotchas to make AgentCore Identity actually work

End-to-end smoke on backgroundagent-dev surfaced two more silent failure
modes after the runtimeUserId fix landed:

`BedrockAgentCoreContext.get_workload_access_token()` returned None inside
the pipeline thread even though the platform delivered the token on the
inbound request. Cause: Python ContextVar storage is per-thread, not
shared across `threading.Thread` boundaries. Our `_run_task_background`
spawns a new thread for the pipeline, so any context-var the SDK's
middleware sets in the request handler thread doesn't reach it.

Compounding factor: the SDK's `_build_request_context` middleware only
runs when using `BedrockAgentCoreApp` from `bedrock_agentcore.runtime`.
Plain FastAPI apps like ours never get that bridge at all.

Fix: read the workload token off the request in `_extract_invocation_params`
(handling both observed header spellings — `WorkloadAccessToken` and
`x-amzn-bedrock-agentcore-runtime-workload-accesstoken`), thread it through
the kwargs of `_run_task_background`, and have the pipeline thread call
`BedrockAgentCoreContext.set_workload_access_token` on entry.

   (cdk/src/stacks/agent.ts)

After (1) was applied, `IdentityClient.get_api_key()` actually fired and
got `AccessDeniedException: ... not authorized to perform:
secretsmanager:GetSecretValue`.

Cause: AgentCore Identity stores api-key credentials in Secrets Manager
under reserved prefix `bedrock-agentcore-identity!*` (the actual ARN
shape: `arn:aws:secretsmanager:REGION:ACCOUNT:secret:bedrock-agentcore-
identity!default/apikey/<provider-name>-<hash>`). The `GetResourceApiKey`
control-plane API surfaces the underlying secret to the caller, and AWS
verifies the *caller* role (our runtime role) has `GetSecretValue` on
the actual secret resource — not the SLR.

Fix: grant the runtime role `secretsmanager:GetSecretValue` scoped to
the `bedrock-agentcore-identity!*` prefix in the current
account/region. Tightly scoped to Identity-managed secrets; doesn't
leak read access to other Secrets Manager resources.

- Runtime container reads workload token from request, propagates across
  thread boundary, calls IdentityClient successfully
- 👀 reaction posts at +525ms after task pickup, no warnings
- Linear MCP loads cleanly with the resolved token
- No more `workload access token not in context` WARN
- No more `AccessDeniedException` from `GetResourceApiKey`

Three undocumented requirements total for Phase 2.0a (combining with
the runtimeUserId fix from the prior commit):

  1. Caller (orchestrator) sends `runtimeUserId` and has
     `InvokeAgentRuntimeForUser` IAM
  2. Runtime container bridges the workload-token header into the
     ContextVar, with per-thread propagation if the pipeline runs in a
     spawned thread
  3. Runtime role has `secretsmanager:GetSecretValue` on
     `bedrock-agentcore-identity!*`

All three are silent failures on their own; missing any one returns None
or AccessDenied without obvious "you forgot X" diagnostics. Will file an
upstream issue against `aws/bedrock-agentcore-sdk-python` summarising
all three so others don't burn the same cycles.

Tests: 542 agent / 1273 cdk / 196 cli — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(2.0b): foundation — workspace registry, admin invite-user, Linear app template

Wave 1 of Phase 2.0b: prereq pieces for the Linear OAuth migration.

- LinearWorkspaceRegistryTable: maps Linear org-id → AgentCore credential
  provider name, so webhook + orchestrator Lambdas can resolve the
  workspace's OAuth token without knowing about provider naming.
- bgagent admin invite-user: wraps Cognito admin-create-user with the
  right defaults and prints a base64 bundle that --from-bundle decodes
  into ~/.bgagent/config.json. Replaces a four-flag dance with a single
  paste for joining teammates.
- bgagent linear app-template: prints the Linear OAuth app form values
  captured from the 2.0b spike — GitHub username with [bot] suffix and
  Webhooks ON gate the actor=app flow; misleading "Invalid redirect_uri"
  error is the symptom when either is missing.
- USER_GUIDE roles section + joining-an-existing-deployment flow: makes
  the four-role lifecycle explicit (stack admin / workspace admin /
  repo onboarder / teammate) so a teammate landing on the docs has a
  clear non-admin path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(linear): rewrite setup guide for OAuth (2.0b)

Replace the personal-API-key flow with the Linear OAuth `actor=app`
install path verified by the 2.0b spike. Major changes:

- Step 1: AgentCore credential provider via `bgagent linear
  oauth-register-workspace`, capturing the AWS-hosted callback URL
  that Linear will actually see.
- Step 2: Linear OAuth app creation via `bgagent linear app-template`,
  documenting the GitHub-username-with-[bot]-suffix and Webhooks-ON
  gates that produce Linear's misleading "Invalid redirect_uri" error
  when missing.
- Step 4: OAuth dance via the rewritten `bgagent linear setup` —
  ephemeral localhost HTTPS callback; no own ALB/Lambda needed since
  AWS proxies the OAuth flow.
- Step 7: clarify that the PAK-owner auto-link becomes the
  setup-runner auto-link; the manual DDB mapping path stays for now
  until self-service `@bgagent link` ships.
- New "Adding additional Linear workspaces" section for
  multi-workspace deployments.
- New "Migration from 2.0a (PAK) to 2.0b (OAuth)" runbook.
- Troubleshooting expanded to cover the Invalid-redirect_uri and
  401-from-Linear scenarios surfaced in the spike.

Notes the docs reference commands shipping in Wave 2 (aws-samples#63
oauth-register-workspace, aws-samples#65 setup wizard, aws-samples#67 add-workspace) — the
2.0b branch is a coherent unit and aws-samples#62 must land before those flows
are wired so the docs aren't a moving target during implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): workload-token retrieval helper for AgentCore Identity (2.0b C1)

The CLI runs OUTSIDE AgentCore Runtime, so the in-container ContextVar
trick from 2.0a does not apply. This module gives every 2.0b OAuth-flow
command a single way to obtain a workload access token:

- getWorkloadAccessToken({region, workloadName, userId}) calls the
  data-plane GetWorkloadAccessTokenForUserId, scoping the resulting
  token to (workload, cognito_sub) so OAuth-token retrieval is per
  platform user.
- decodeCognitoSub(idToken) extracts the sub claim from the cached
  id_token, parsing only — token validation is API Gateway's job.
- DEFAULT_CLI_WORKLOAD_NAME is the deployment-time convention; the
  workload identity itself will be created by a follow-up CDK custom
  resource (aws-samples#61). Stack output 'CliWorkloadIdentityName' wires the
  CLI to whatever the deployed name actually is.

Two SDK errors get translated into actionable remediation hints:
- ValidationException: WorkloadIdentity is linked to a service —
  documented footgun from the spike, surfaces when the CLI is
  pointed at a runtime workload.
- AccessDeniedException / ResourceNotFoundException — same surface
  treatment, with the bgagent-side checklist embedded in the message.

Adds @aws-sdk/client-bedrock-agentcore + bedrock-agentcore-control as
CLI deps. Pins all CLI AWS SDK clients to 3.1024.0 (matching) to keep
the @smithy/core dependency graph deduplicated; mixed-version pins
caused interface-collision typecheck errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): bgagent linear oauth-register-workspace (2.0b B2)

Registers a Linear workspace as an AgentCore OAuth2 credential provider.
The command:

- Validates the workspace slug shape ([a-zA-Z0-9_-]{4,50}) so the
  resulting provider name fits AgentCore's 64-char limit.
- Prompts for clientId + clientSecret (interactive, not echoed).
- Calls CreateOauth2CredentialProvider with credentialProviderVendor=
  'CustomOauth2' and explicit authorizationServerMetadata for Linear's
  fixed endpoints (Linear has no .well-known/openid-configuration, so
  vendor-discovery cannot auto-resolve).
- Prints the AWS-hosted callback URL the operator pastes into Linear's
  app form — the AWS-side proxy that Linear actually redirects to.
- Idempotent: re-running with an existing provider name fetches the
  callbackUrl and reports "already exists — re-using it".

Smoke test against dev account (2026-05-19) revealed AWS surfaces the
duplicate-name case as ValidationException (NOT ConflictException as
CFN/REST conventions would suggest). Detection is by message-substring
match; tests cover both the duplicate path and the "ValidationException
for a non-duplicate reason" path so we don't accidentally swallow input
validation errors.

AccessDeniedException gets a remediation hint pointing at the
'bedrock-agentcore:CreateOauth2CredentialProvider' permission, since
the most common misconfiguration is running the command as a
Cognito-authenticated CLI user (no permissions) rather than as an
admin/stack-deploy IAM principal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(2.0b): CLI workload identity + localhost OAuth callback server (A3)

Two pieces that together let the CLI run the OAuth dance without any
externally-facing infrastructure:

CDK side (CliWorkloadIdentity construct, wired into the agent stack):
- Creates a dedicated AgentCore Identity workload identity named
  `bgagent-cli`, distinct from the runtime workload (which is service-
  linked and cannot mint user-scoped tokens).
- Allowlists `https://localhost:8443/oauth/callback` as a permitted
  resourceOauth2ReturnUrl. AgentCore validates browser-redirect URLs
  against this list, so the CLI cannot finish the OAuth dance without it.
- Implementation: AwsCustomResource (no L2/L1 for AgentCore Identity in
  CDK as of May 2026). Idempotent — Create/Update/Delete lifecycle wired
  so re-deploys reconcile the allowlist and stack-deletes don't leak
  workload identities (50/account-region quota).
- Stack outputs `CliWorkloadIdentityName` and `LinearWorkspaceRegistryTableName`
  so the CLI can discover them at runtime.

CLI side (oauth-callback-server module):
- Generates a fresh self-signed cert in /tmp via openssl on each
  invocation; cert is cleaned up when the server shuts down.
- Starts an HTTPS listener on localhost:8443/oauth/callback, captures the
  first request's `session_id` query param, renders a success page,
  shuts down. Uses res.once('finish') to ensure the response body
  flushes before the listener closes — otherwise the browser hangs
  waiting for bytes that never arrive (caught by integration test).
- Translates EADDRINUSE and timeout into actionable CliErrors.

The CLI URL constant and the CDK default allowlist must agree on the
exact URL string — drift would silently break the OAuth dance with
"redirect_uri not allowlisted". A regression-locking test on the URL
constant + matching CDK default flags the issue at unit-test time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): bgagent linear setup OAuth dance orchestration (2.0b C2/C3)

Replaces the personal-API-key wizard with a 7-step OAuth flow that
authorizes a Linear workspace via AgentCore Identity:

  1. Resolve stack outputs (CliWorkloadIdentityName, registry table,
     user mapping table, webhook secret ARN). Errors loudly if any are
     missing — typically means the stack predates 2.0b.
  2. Read Cognito sub from cached id_token.
  3. Mint workload access token via getWorkloadAccessTokenForUserId.
  4. Initiate OAuth dance: getResourceOauth2Token returns an authorize
     URL + sessionUri. customParameters: {actor: 'app'} propagates so
     Linear surfaces the Agent install variant of the consent screen
     (verified via 2.0b spike).
  5. Start localhost HTTPS callback server, open browser to the auth
     URL, await session_id from the callback.
  6. Poll getResourceOauth2Token (5s/600s) until accessToken arrives;
     translate sessionStatus=FAILED into a Linear-app-config remediation
     hint.
  7. Query Linear viewer + organization with the OAuth token, persist
     the workspace registry row + admin user-mapping row, then prompt
     for the webhook signing secret if not already configured.

Hard cutover from PAK: the new wizard is OAuth-only — there is no
--use-pak flag. The webhook signing secret prompt remains because
HMAC verification of inbound Linear webhooks is independent of how the
agent calls Linear outbound. Webhook prompt is skipped on subsequent
add-workspace runs by detecting the lin_wh_ prefix on the stored
secret; --rotate-webhook-secret forces a re-prompt.

Splits queryLinearIdentity out so both the legacy PAK auto-link helper
(authorization=`lin_api_…`) and the OAuth path (authorization=
`Bearer <token>`) reuse the same GraphQL query. The PAK helper stays
exported to support the legacy linkage path until LinearApiTokenSecret
is retired in aws-samples#70.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cdk): use full SDK v3 package name for AgentCore Identity custom resource

CDK's AwsCustomResource auto-derives the SDK package name from `service`
by lowercasing and dropping hyphens — `'BedrockAgentCoreControl'` becomes
`@aws-sdk/client-bedrockagentcorecontrol`, which doesn't exist. The
actual package is `@aws-sdk/client-bedrock-agentcore-control` (hyphens).

Verified by deploy: with the lowercased mapping the Lambda backing the
CR fails with "Package @aws-sdk/client-bedrockagentcorecontrol does not
exist" and the stack rolls back. Switching to the full v3 package name
(supported per the AwsSdkCall.service jsdoc) routes the import correctly.

Verified end-to-end: `bgagent-cli` workload identity created with
`https://localhost:8443/oauth/callback` on the resourceOauth2ReturnUrls
allowlist, stack outputs populated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(2.0b): park diagnostic flags + tokenEndpointAuthMethods + force-reauth

Smoke test against backgroundagent-dev (2026-05-19) hit a service-side
bug in AgentCore Identity: USER_FEDERATION token-exchange against
Linear's /oauth/token never completes. sessionStatus stays IN_PROGRESS
indefinitely, no FAILED transition, no diagnostics on the wire.

Verified via manual curl that Linear's token endpoint works perfectly
with the same clientId/secret/scopes/code/actor=app — bug is on AWS's
side. AgentCore Identity has zero token-injection APIs, so Option 3
(do OAuth ourselves + inject) is architecturally impossible. AWS
support case + PAR-compatibility upstream issue
aws/bedrock-agentcore-sdk-python#111 are the official fix paths.

Parking the wizard work but committing the diagnostic flags we added
during triage so they're available when this is unparked:

- `tokenEndpointAuthMethods: ['client_secret_post']` on the provider
  metadata. Linear expects POST-body credentials; AgentCore defaults to
  Basic. Field name verified against the SDK type (`tokenEndpointAuthMethods`,
  not the `Supported` suffix the boto3 reference suggested).
- `--verbose-poll` flag on `bgagent linear setup` — prints per-poll
  sessionStatus + response keys so the stuck state is visible.
- `--force-reauth` flag — sets `forceAuthentication: true` on
  GetResourceOauth2Token to bypass cached tokens after a Linear-side
  revoke.
- `CompleteResourceTokenAuth` call between callback capture and poll
  loop. Per AWS sample 09-Outbound_Auth_Self_Hosted, this is required
  to bind the captured session to a userId. Confirmed it's NOT what
  unblocks our specific bug, but is correct per spec for any
  USER_FEDERATION flow.

Status of resume paths in memory/project_oauth_2_0b.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(2.0b-O2): direct Linear OAuth + per-workspace Secrets Manager

Replaces the parked AgentCore Identity OAuth flow with a CLI-side
direct OAuth dance against Linear's /oauth/token endpoint. The flow
verified by the manual curl smoke test on 2026-05-19 returned a valid
access_token in <100ms, so we know Linear's side works. AWS's
USER_FEDERATION wrapper is broken specifically for Linear (or
actor=app); see memory/project_oauth_2_0b.md for the parked-bug
details and resume prompt.

Architecture:
- New module cli/src/linear-oauth.ts owns the OAuth helpers:
  generatePkce (S256), buildAuthorizationUrl (with actor=app),
  exchangeAuthorizationCode, refreshAccessToken,
  StoredLinearOauthToken JSON shape, computeExpiresAt,
  isAccessTokenExpiring (60s threshold), linearOauthSecretName.
  19 hermetic tests (no network).
- Per-workspace Secrets Manager secret bgagent-linear-oauth-<slug>
  holds the token JSON. CLI creates+updates at runtime via upsertOauthSecret
  (CreateSecret + ResourceExistsException → PutSecretValue fallback).
- LinearWorkspaceRegistryTable row gains oauth_secret_arn. Lambdas
  resolve workspace → secret_arn → token JSON, with refresh-if-expiring.
  (Lambda migration is Wave C.)
- bgagent linear setup is rewritten end-to-end:
  prompt-for-credentials → PKCE → open browser → callback captures
  ?code+?state → state verify (CSRF) → exchangeAuthorizationCode →
  query Linear viewer+org → write secret + registry row + user mapping
  → webhook secret prompt (unchanged from prior wizard).
  No AgentCore calls. No polling. No CompleteResourceTokenAuth.
- Localhost callback server now exposes both AgentCore-style
  (session_id) and direct-Linear-style (code+state+error) shapes
  via a CallbackResult with nullable fields. Backward-compat with
  the parked AgentCore path's tests.

Removals:
- cli/src/agentcore-identity.ts + test (workload-token helper)
- cdk/src/constructs/cli-workload-identity.ts + test (workload identity)
- providerNameForWorkspace, buildLinearProviderInput,
  registerLinearWorkspace, initiateOauthDance, completeResourceTokenAuth,
  pollForOauthAccessToken, AgentCore SDK imports — all gone from linear.ts
- bgagent linear oauth-register-workspace command (no AWS-side provider
  to register; folded into setup)
- CliWorkloadIdentityName CfnOutput from agent.ts
- 6 describe blocks of AgentCore-flavored tests in linear.test.ts

Net change: -1100 lines, +700 lines of new direct-OAuth wiring.
286/286 CLI tests pass. 9/9 linear-integration CDK tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(2.0b-O2): space-separated OAuth scopes + --no-actor-app diagnostic

End-to-end smoke test against backgroundagent-dev (2026-05-20):

- The OAuth dance was failing with Linear's "Invalid redirect_uri" error
  even though the redirect_uri was correct. Root cause: scopes were
  comma-separated (`read,write,...`) instead of space-separated. RFC
  6749 §3.3 mandates space; Linear surfaces the violation as the
  misleading "Invalid redirect_uri" error, the same misdirection we
  hit during the 2.0b spike. Fix: `.join(' ')` in buildAuthorizationUrl.
- Adds `--no-actor-app` diagnostic flag on `bgagent linear setup`. Drops
  the `actor=app` query param so a stuck flow can be isolated to
  agent-install vs vanilla-OAuth without changing the Linear app config.
  Off by default; surfaces a warning when invoked.

After the fix, full smoke test passed:
- Browser opens to Linear consent
- User authorizes, redirects to https://localhost:8443/oauth/callback
- CLI captures code+state, exchanges for access_token + refresh_token
- Token JSON persisted to bgagent-linear-oauth-maguireb in Secrets Manager
- LinearWorkspaceRegistryTable row written with oauth_secret_arn
- LinearUserMappingTable row written for the admin
- Token verified against Linear's GraphQL viewer query (works)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(2.0b-O2): Wave C — migrate Lambdas + agent runtime to per-workspace OAuth

Replaces every consumer of the legacy LinearApiTokenSecret PAK with the
per-workspace Secrets Manager OAuth-token pattern from Waves A/B. Deploy
of this commit will fully cut over the integration; the LinearApiTokenSecret
construct is gone.

CDK side:
- New `cdk/src/handlers/shared/linear-oauth-resolver.ts` resolves
  workspace_id → registry row → oauth_secret_arn → token JSON →
  refresh-if-expiring → access_token. In-memory caches (1m TTL) on
  both registry rows and token JSON. Lazy refresh with PutSecretValue
  write-back so concurrent Lambdas see the rotated token. 11 unit tests.
- linear-feedback.ts: postIssueComment / addIssueReaction /
  reportIssueFailure now take a {linearWorkspaceId, registryTableName}
  context instead of an apiTokenSecretArn. Auth header switches from
  bare PAK value to `Bearer ${accessToken}`.
- linear-webhook-processor.ts: env vars LINEAR_WORKSPACE_REGISTRY_TABLE_NAME
  replace LINEAR_API_TOKEN_SECRET_ARN. safeReportIssueFailure threads
  the webhook payload's organizationId through to the resolver. Webhook
  processor now stamps `linear_oauth_secret_arn` + `linear_workspace_slug`
  into channel_metadata at task-creation time so the agent runtime can
  fetch the secret directly without a registry round-trip.
- orchestrate-task.ts: notifyLinearOnConcurrencyCap reads
  LINEAR_WORKSPACE_REGISTRY_TABLE_NAME and the task's
  channel_metadata.linear_workspace_id.
- LinearIntegration construct: drops apiTokenSecret + ApiTokenSecret
  Secrets Manager resource entirely. Webhook processor IAM now grants
  Get+Put on `bgagent-linear-oauth-*` Secrets Manager prefix.
- Agent stack: orchestrator IAM mirrors the new prefix grant.
  Runtime IAM drops AgentCore Identity grants and gains Get+Put on
  `bgagent-linear-oauth-*`. LINEAR_API_KEY_PROVIDER_NAME env var,
  LINEAR_API_TOKEN_SECRET_ARN env var, and LinearApiTokenSecretArn
  CfnOutput all removed.

Agent side (Python):
- config.py::resolve_linear_api_token: rewritten to read the per-task
  channel_metadata.linear_oauth_secret_arn (or LINEAR_OAUTH_SECRET_ARN
  env fallback) via boto3.secretsmanager. Lazy refresh: if expires_at
  is within 60s, POST refresh_token grant to Linear /oauth/token using
  client_id/client_secret co-located in the secret JSON, write the
  rotated token back via put_secret_value, return the new access_token.
- pipeline.py: passes config.channel_metadata into resolve_linear_api_token.
- linear-oauth.ts (CLI): StoredLinearOauthToken schema gains client_id +
  client_secret fields so Lambda + agent refresh can run without
  per-Lambda OAuth env vars. Setup wizard writes them.

Tests pruned of AgentCore Identity mocks; new tests cover the
Secrets-Manager-direct path (CDK 11 + agent 6 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(orchestrator): bundle import.meta.url shim for durable-execution SDK

`@aws/durable-execution-sdk-js@1.1.3`'s ESM build calls
`fileURLToPath(import.meta.url)` at module load. esbuild's ESM→CJS
bundling leaves `import.meta.url` undefined, crashing every invocation
with `TypeError: path must be a string`.

Define an identifier substitution + banner that materializes a valid
file:// URL from `__filename` at runtime. Discovered while smoke-testing
Wave C end-to-end on backgroundagent-dev.

Refs: aws/aws-durable-execution-sdk-js#543

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cli): switch OAuth callback to plain HTTP localhost

Per RFC 8252 §7.3, OAuth providers (including Linear) treat
http://localhost as a special case that doesn't need TLS — the
connection never leaves the host. The previous self-signed-cert HTTPS
approach forced testers through a "connection not private" warning that
scared them off mid-setup.

Drops the openssl shell-out + temp-cert plumbing (~60 lines) along with
the user-facing warning copy in `bgagent linear setup`. Updates the
callback constants to http://localhost:8080/oauth/callback and the test
suite to plain http.GET.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(agent): ruff lint + format for OAuth refresh path

Five lint errors surfaced when CI ran `ruff check --fix` against the
Wave C agent changes:

- F401 unused `timezone` import in `config.py` (replaced with
  `timedelta`, which is what's actually needed)
- RUF034 useless if-else in the `expires_at` ternary — both branches
  returned identical strings before the recompute below; flatten into
  a single straightforward `if expires_in: ... else: ...` block
- E501 three line-length violations in `config.py` and
  `test_config.py` — break the long expressions onto helper-named
  intermediates

Confirmed locally: `ruff check .` clean, `ruff format --check` clean,
`pytest tests/test_config.py` 15/15 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(linear-feedback): rewrite tests against OAuth context signature

Wave C migrated postIssueComment / addIssueReaction / reportIssueFailure
from a (secretArn: string, ...) signature to a (ctx: LinearFeedbackContext,
...) signature, but the test file still passed bare strings — TypeScript
caught it at compile time only when CI ran a full build. Three test
suites failed to compile (the typecheck error blocked the whole suite,
not just this file).

- Mock `resolveLinearOauthToken` (the new resolver) instead of
  `getLinearSecret` (the old PAK fetcher).
- Build a `LinearFeedbackContext` fixture with linearWorkspaceId +
  registryTableName, pass it everywhere SECRET_ARN was used.
- Update the Authorization-header assertion to match the new
  `Bearer <token>` form (PAK was bare-token; OAuth is Bearer-prefixed).

All 41 tests across linear-feedback, linear-webhook-processor, and
orchestrate-task-feedback pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cdk): align yarn.lock with upstream main + bump table count for OAuth registry

Two CI failures came together because they share a root cause: this
branch's yarn.lock had drifted from upstream main during interim
re-resolves, leaving an inconsistent dep tree that broke ts-jest's
module resolution for @aws-cdk/mixins-preview/aws-bedrockagentcore.
Restoring upstream main's yarn.lock fixes the resolution; the
agent.test.ts table-count assertion then needs to bump from 12 to
13 to account for the LinearWorkspaceRegistryTable added in
Phase 2.0b Wave A4.

Verified locally: agent.test.ts (44/44) and github-tags.test.ts
(5/5) both pass after the changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style: apply eslint --fix from CI's self-mutation guard

CI runs `mise run build`, which invokes `eslint --fix` and then fails if
the working tree changed (self-mutation guard). Three cosmetic lints
needed applying:

- Import-order: DynamoDBClient and CliError moved earlier in their
  files to satisfy alphabetic-by-package ordering
- formatJson import added in alphabetic position in linear.ts
- Three template literals with no interpolation converted to
  single-quoted strings in oauth-callback-server.ts and linear.ts
  (eslint quotes rule prefers single-quotes when no template
  variables are used)

Pure mechanical fixes; no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(oauth): refresh-token race recovery + log gaps from review

Addresses the blocker + critical items from PR review:

- Refresh-token race (review blocker). Linear rotates refresh_tokens
  on every use; concurrent Lambdas/agents racing the same secret will
  all read the same expiring token and one's refresh will succeed
  while the others get `invalid_grant`. On `invalid_grant`, re-read
  the secret from Secrets Manager (bypassing cache). If the
  refresh_token has changed, another caller already rotated; use the
  freshly-read token (or retry refresh once if it's also expiring).
  If unchanged, the refresh_token is permanently rejected and the
  workspace needs re-onboarding. Implemented in both the TS resolver
  (linear-oauth-resolver.ts) and Python resolver (config.py).

- Unguarded bedrock_agentcore import in agent/src/server.py
  (review critical). The bare `from bedrock_agentcore.runtime.context
  import BedrockAgentCoreContext` inside `_run_task_background` killed
  the entire pipeline thread with no diagnostic if the SDK was
  missing or its module structure changed. Wrap in
  try/except (ImportError, AttributeError) and log via _warn_cw —
  the Linear token resolver has its own SM fallback, so the agent
  can proceed without the workload-token bridge.

- Cache invalidation on fetch-level refresh failure (review high).
  The TS resolver's `invalidateLinearOauthCache()` only ran in the
  `!resp.ok` branch; if `fetch()` itself threw (timeout, DNS), the
  catch returned null without invalidating, leaving the stale
  expiring token cached for 60s and hammering Linear's token
  endpoint. Move invalidate into the fetch-level catch too.

- Malformed expires_at log (review medium). The Python `_is_expiring`
  caught `ValueError` and silently returned True, masking
  consistently-bad writes. Add a WARN log so operators see the bad
  data instead of just an unexplained refresh on every task.

- Positive-path refresh log (review non-blocking aws-samples#5). Added
  INFO-level breadcrumb on successful refresh in both resolvers
  so operators diagnosing intermittent 401s have a trace of which
  workspace refreshed and to what expiry.

11/11 existing resolver unit tests still pass; will add tests for
the new race-recovery branch in a followup commit.

* fix(2.0b-O2): review-2 batch — error specificity, half-creates, runbook

Addresses four PR review items focused on operator UX when things go
sideways:

- isWebhookSecretConfigured (review high). The bare
  `catch { return false }` swallowed AccessDeniedException and
  DecryptionFailureException, making setup re-prompt for a webhook
  secret when the real problem was IAM. Now: only
  ResourceNotFoundException returns false; everything else throws a
  CliError pointing the operator at the IAM gap. Test updated to
  assert both paths.

- admin invite-user half-create (review medium). If
  AdminCreateUser succeeds but AdminSetUserPassword fails (stricter
  password policy than generator, partial IAM grant on the Set verb),
  the user was left in FORCE_CHANGE_PASSWORD with no diagnostic. Wrap
  the second call in try/catch and throw a CliError that names the
  user, explains the broken state, and gives both a delete-user CLI
  and a manual-fix path.

- PAK migration runbook (review non-blocking #1). Expanded the
  "Migration from 2.0a (PAK) to 2.0b (OAuth)" section in
  LINEAR_SETUP_GUIDE.md with: a pre-deploy checklist, what survives
  the migration vs what doesn't, an explicit rollback note (fix
  forward; the original PAK secret is gone with the CFN resource),
  and the per-step difference between 2.0a-with-Identity (skipped) vs
  2.0a-with-PAK (migrate) deploys.

- Vestigial AgentCore Identity dep (review non-blocking #2).
  bedrock-agentcore==1.9.1 is kept in agent/pyproject.toml because
  the workload-token bridge in server.py still calls it (now wrapped
  in try/except per review batch 1). Add an inline comment
  explaining why it's pinned even though Phase 2.0b-O2 reads
  Secrets Manager directly — it's the seam for resuming the
  AgentCore Identity path in 2.0c.

CLI tests: 13/13 pass.

* fix(2.0b-O2): batch 3 — schema validation, CallbackResult union, parity contract

Three remaining substantive review items from PR aws-samples#160:

- Validate all 11 fields in getOauthSecret (review non-blocking aws-samples#7).
  Was checking only access_token / refresh_token / expires_at; missing
  client_id or client_secret only surfaced 24h later when the refresh
  call needed them and found undefined. Extracted the required-field
  list into a const next to the StoredOauthToken interface and check
  the full set at deserialization. Bad secrets fail fast at fetch
  time with a structured log line naming the missing fields.

- CallbackResult discriminated union (review non-blocking aws-samples#6). Was
  `{ sessionId: string|null, code: string|null, state: string|null }`
  which let callers construct unreachable shapes. Split into
  `{ kind: 'agentcore', sessionId } | { kind: 'direct-oauth', code, state }`.
  Updated the resolver site (`oauth-callback-server.ts`), the
  consumer (`bgagent linear setup`), and the test file to use
  exhaustive type-narrowing. The setup wizard now errors clearly if
  it gets the agentcore shape (parked path) instead of silently
  passing nulls down.

- Cross-language schema-parity contract test (review non-blocking #3).
  CLI's StoredLinearOauthToken and Lambda's StoredOauthToken define
  the same JSON-in-Secrets-Manager schema independently; drift
  between the two would be a silent bug (CLI writes one field name,
  Lambda reads another, refresh works, every Lambda invocation logs
  a missing-field error). New test in
  `cdk/test/contracts/stored-oauth-token-parity.test.ts` regex-parses
  both interface definitions out of source and asserts the field set
  is equal. Also asserts the new
  `STORED_OAUTH_TOKEN_REQUIRED_FIELDS` const matches the interface,
  so future field additions can't drift between the validator and
  the type.

CLI tests 286/286 pass. CDK resolver + contract 13/13 pass.

* style: apply eslint --fix indentation on CallbackResult union

* fix(2.0b-O2): review-3 batch — defensive error handling, security hardening, refresh test coverage

Bugs (B1-B3):
- linear-oauth-resolver: try/catch around ddb.send() in getRegistryRow so
  transient DDB errors fail the resolver cleanly instead of crashing the
  Lambda thread
- orchestrate-task: try/catch around reportIssueFailure in
  notifyLinearOnConcurrencyCap; Linear feedback failures must never block
  the rejection path
- Python _fetch_token: guard json.loads + KeyError so a corrupted SM
  payload returns None (logged ERROR) rather than raising

Tests (T1-T3):
- Python: TestResolveLinearApiTokenRefreshPaths covering happy refresh,
  invalid_grant + concurrent rotation, invalid_grant + no rotation,
  malformed expires_at, network failure during refresh, and corrupted
  secret JSON
- TS: concurrent-refresh recovery via re-read; permanent-rejection on
  same refresh_token; cache invalidation after network failure

Security (S1-S3):
- agent runtime IAM: drop secretsmanager:PutSecretValue on the Linear
  OAuth secret prefix. Untrusted repo code in the agent must not be
  able to overwrite tokens; Lambdas (trusted) handle persistence. The
  refreshed in-memory token still works for the current task; rotated
  refresh_token is lost on agent exit but Linear's grace window
  absorbs the rare race where the agent refreshes strictly before any
  Lambda
- Python _try_refresh_once: narrow except Exception to
  (urllib.error.URLError, OSError) — programmer errors propagate with
  clean stack traces instead of being swallowed
- linear-oauth-resolver: RegistryRowStatus is now a discriminated
  'active' | 'revoked' literal; missing or unknown values fail closed
  to revoked rather than defaulting active

* fix(2.0b-O2): use email.message.Message for HTTPError hdrs in tests

ty's stricter typeshed flagged dict[Unknown, Unknown] passed where
Message[str, str] is expected. Drop-in replacement that satisfies
both ty and runtime.

* fix(2.0b-O2): update feedback test for B2 — helper now swallows internally

Round-3 review B2 moved the try/catch into notifyLinearOnConcurrencyCap.
The pre-existing test asserted the old contract (rejection propagates,
caller must catch). Flip the contract assertion to reflect the new
behavior: the helper is now best-effort end-to-end and returns undefined
on internal failure.

---------

Co-authored-by: bgagent <bgagent@noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alain Krok <alkrok@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants