Skip to content

fix(prWorkItems): preserve workItemId across racing pipeline writers#1130

Merged
zbigniewsobiecki merged 1 commit intodevfrom
fix/auto-merge-link-preservation
Apr 16, 2026
Merged

fix(prWorkItems): preserve workItemId across racing pipeline writers#1130
zbigniewsobiecki merged 1 commit intodevfrom
fix/auto-merge-link-preservation

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Summary

PRs flagged with auto on Linear weren't auto-merging. Trace: every recent llmist PR has pr_work_items.work_item_id IS NULL, so pr-ready-to-merge bails with "No work item linked to PR" even though the implementation correctly captured MNG-93 at run time.

Root cause is a three-way race against linkPRToWorkItem between the implementation pipeline and the review pipeline that's queued as soon as the PR opens. The review pipeline captures workItemId synchronously at webhook arrival time (DB-only resolveWorkItemId), gets null because the implementation hasn't yet linked the PR, then writes that stale null back to pr_work_items post-execution — clobbering the correct work_item_id the implementation wrote in between.

Fix — three intersecting changes

  1. linkPRToWorkItem Step 2 ON CONFLICT: work_item_id is now COALESCE(EXCLUDED.work_item_id, pr_work_items.work_item_id) — one-way set. A later writer with null cannot erase a known link.
  2. linkPRToWorkItem Step 1 prologue: deletes any racing orphan (projectId, NULL workItemId, prNumber) row before promoting the work-item-only row, so the partial-unique-index uq_pr_work_items_project_pr can't fire (the implementation otherwise hits this and silently logs a warn).
  3. runAgentExecutionPipeline: re-resolves workItemId via the existing resolveWorkItemId helper at run time and patches agentInput so the corrected value flows into runAgent (and into agent_runs.work_item_id), not just into the post-execution link.

Tests

  • tests/integration/db/prWorkItemsRepository.test.ts (+2): preserves an existing non-null workItemId when called with workItemId=null, removes a stale orphan ... row when promoting a work-item-only row. Both fail before the fix.
  • tests/unit/triggers/shared/agent-execution.test.ts (+3): re-resolves workItemId from DB when result.workItemId is undefined, preserves trigger-supplied workItemId when DB lookup is unnecessary, leaves workItemId undefined when neither trigger nor DB has one.
  • tests/unit/db/repositories/prWorkItemsRepository.test.ts (+2): deletes any racing orphan (NULL workItemId, prNumber) row before Step 1 UPDATE, does NOT call delete when workItemId is null. Updated existing onConflictDoUpdate assertion since set.workItemId is now an SQL expression (COALESCE).

Counts: unit 7814/7814 pass, integration 524/524 pass, lint clean, typecheck clean.

Drive-by cleanups

To leave lint at zero warnings:

  • Drop unused renderManifestStep import in web/src/components/projects/pm-wizard.tsx (left over from PR feat(006/5): cleanup legacy — spec 006 complete #1129's manifest migration).
  • Rename a pm-conformance describe string '/${id}/webhook convention''/<id>/webhook convention' so biome's noTemplateCurlyInString stops flagging it.

Out of scope

  • Backfilling the 15 broken historical llmist rows — separate manual SQL after this lands.
  • Defensive PR-body-parsing fallback in resolveWorkItemId (Linear regex) — explicitly excluded; the real fixes above are sufficient.
  • Refactoring how PROpenedTrigger captures workItemId — Part C handles the consequence at the right layer; rewriting capture would balloon scope.

Test plan

  • npm test (7814/7814 pass)
  • npm run test:integration (524/524 pass, ~5.5 min)
  • npm run typecheck clean
  • npm run lint clean (was 2 pre-existing warnings; now 0)
  • After merge: kick a Linear-triggered llmist PR, watch pr_work_items.work_item_id stay set through implementation + review, confirm pr-ready-to-merge fires when the reviewer approves.

🤖 Generated with Claude Code

When the implementation pipeline (PM-triggered) opens a PR, the GitHub
pr-opened webhook fires immediately and queues a review pipeline. The
review pipeline captures workItemId synchronously from a DB lookup that
returns null (the implementation hasn't yet linked the PR), then later
writes that stale null back to pr_work_items, clobbering the correct
work_item_id the implementation wrote in between.

Effect: lookupWorkItemForPR returns null for the PR, the
pr-ready-to-merge trigger bails with "No work item linked to PR", and
PRs flagged with `auto` on Linear never auto-merge. Affected every
recent llmist PR (15 in a row); Trello/JIRA-backed projects appear
unaffected so far but share the racing pipeline code path.

Three intersecting fixes, all narrow:

- linkPRToWorkItem Step 2 ON CONFLICT now uses
  COALESCE(EXCLUDED.work_item_id, pr_work_items.work_item_id) so
  work_item_id is one-way set — a later writer with null can't erase a
  known link.
- linkPRToWorkItem Step 1 deletes any racing orphan
  (projectId, NULL workItemId, prNumber) row before promoting the
  work-item-only row, preventing the partial-unique-index violation
  the implementation otherwise hits silently.
- runAgentExecutionPipeline re-resolves workItemId via the existing
  resolveWorkItemId helper at run time and patches agentInput so the
  corrected value flows into runAgent (and into agent_runs.work_item_id),
  not just into the post-execution link.

Tests: +2 integration (preservation, orphan cleanup), +5 unit
(re-resolution paths + delete/no-delete coverage). All 7814 unit and
524 integration tests pass.

Drive-by cleanups so lint+typecheck are clean: drop unused
`renderManifestStep` import in pm-wizard.tsx, rename a pm-conformance
describe string that triggered noTemplateCurlyInString.

Out of scope: backfilling the 15 broken historical llmist rows
(separate manual SQL), no PR-body-parsing fallback in
resolveWorkItemId, no rewrite of how PROpenedTrigger captures
workItemId.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@zbigniewsobiecki zbigniewsobiecki merged commit 906f8fc into dev Apr 16, 2026
8 checks passed
@zbigniewsobiecki zbigniewsobiecki deleted the fix/auto-merge-link-preservation branch April 16, 2026 15:52
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

zbigniewsobiecki added a commit that referenced this pull request Apr 16, 2026
…ine lists

After spec 006/5 ("delete legacy bootstrap; pmRegistry becomes a delegate")
landed, two paths that the worker container relies on broke for Linear:

1. `cascade-tools` CLI ran with an empty PM registry. Spec 006/5 removed
   the only path that previously self-bootstrapped on registry access.
   Router (src/router/index.ts:8) and worker (src/worker-entry.ts:19) were
   updated to import the integrations barrel explicitly, but the CLI's
   lazy oclif command loader was missed. Result: every `cascade-tools pm
   <cmd>` call from inside an agent run threw `Unknown PM integration
   type: 'linear'. Registered:` (empty), failing every backlog-manager
   chain that tries to enumerate work items.

2. `buildPipelineLists` in src/agents/definitions/contextSteps.ts only
   read Trello (`lists`) and JIRA (`statuses`) configs. Linear projects
   exposed `statuses` in their config (identical shape to JIRA's), but
   the function never called `getLinearConfig`. Result: every Linear
   project's `fetchPipelineSnapshotStep` logged `No pipeline lists
   configured, skipping` and returned `[]`, so the backlog-manager
   prompt never received the pipeline snapshot it expected.

Combined effect: after PR #1130 landed (auto-merge link preservation), the
chained backlog-manager fired correctly on llmist (Linear-backed) but
bailed without picking up a next card.

Fix 1 — `src/cli/bootstrap.ts` (NEW): three side-effect imports that
register every provider manifest. Loaded from `bin/cascade-tools.js` (the
CLI entry script) before `Config.load`, mirroring the router/worker
bootstrap pattern.

Routing through `bin/cascade-tools.js` rather than `src/cli/base.ts`
intentionally — `cli/base.ts` is transitively imported by gadgets that
some integration tests pull in, and prepending the bootstrap there made
`tests/integration/trigger-registry.test.ts` fail with a circular-import
symptom (`new ReadyToProcessLabelTrigger()` from `trello/manifest.ts`
threw "is not a constructor"). Loading from the binary keeps the
side-effect off any test path.

Fix 2 — extend `buildPipelineLists` to also call `getLinearConfig` and
chain `?? linearConfig?.statuses?.X` onto each `addList` call. JIRA and
Linear share the `Record<string, string>` `statuses` shape, so the
existing nullish-coalescing pattern extends naturally. No behavior
change for Trello/JIRA.

Tests: tests/unit/cli/bootstrap.test.ts (NEW) asserts
`listPMProviders()` returns `['linear', 'jira', 'trello']` after
importing the bootstrap module. Extended pipelineSnapshot.test.ts with a
Linear fixture asserting all 6 list IDs are passed to
`provider.listWorkItems`. All 7816 unit + 524 integration tests pass.

Out of scope: re-running the failed MNG-94 backlog-manager (it reported
success — just found nothing actionable). Worker container image will
pick up the fix on the next deploy via dist/cli/bootstrap.js (verified
present after `npm run build`).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
zbigniewsobiecki added a commit that referenced this pull request Apr 16, 2026
…ine lists (#1131)

After spec 006/5 ("delete legacy bootstrap; pmRegistry becomes a delegate")
landed, two paths that the worker container relies on broke for Linear:

1. `cascade-tools` CLI ran with an empty PM registry. Spec 006/5 removed
   the only path that previously self-bootstrapped on registry access.
   Router (src/router/index.ts:8) and worker (src/worker-entry.ts:19) were
   updated to import the integrations barrel explicitly, but the CLI's
   lazy oclif command loader was missed. Result: every `cascade-tools pm
   <cmd>` call from inside an agent run threw `Unknown PM integration
   type: 'linear'. Registered:` (empty), failing every backlog-manager
   chain that tries to enumerate work items.

2. `buildPipelineLists` in src/agents/definitions/contextSteps.ts only
   read Trello (`lists`) and JIRA (`statuses`) configs. Linear projects
   exposed `statuses` in their config (identical shape to JIRA's), but
   the function never called `getLinearConfig`. Result: every Linear
   project's `fetchPipelineSnapshotStep` logged `No pipeline lists
   configured, skipping` and returned `[]`, so the backlog-manager
   prompt never received the pipeline snapshot it expected.

Combined effect: after PR #1130 landed (auto-merge link preservation), the
chained backlog-manager fired correctly on llmist (Linear-backed) but
bailed without picking up a next card.

Fix 1 — `src/cli/bootstrap.ts` (NEW): three side-effect imports that
register every provider manifest. Loaded from `bin/cascade-tools.js` (the
CLI entry script) before `Config.load`, mirroring the router/worker
bootstrap pattern.

Routing through `bin/cascade-tools.js` rather than `src/cli/base.ts`
intentionally — `cli/base.ts` is transitively imported by gadgets that
some integration tests pull in, and prepending the bootstrap there made
`tests/integration/trigger-registry.test.ts` fail with a circular-import
symptom (`new ReadyToProcessLabelTrigger()` from `trello/manifest.ts`
threw "is not a constructor"). Loading from the binary keeps the
side-effect off any test path.

Fix 2 — extend `buildPipelineLists` to also call `getLinearConfig` and
chain `?? linearConfig?.statuses?.X` onto each `addList` call. JIRA and
Linear share the `Record<string, string>` `statuses` shape, so the
existing nullish-coalescing pattern extends naturally. No behavior
change for Trello/JIRA.

Tests: tests/unit/cli/bootstrap.test.ts (NEW) asserts
`listPMProviders()` returns `['linear', 'jira', 'trello']` after
importing the bootstrap module. Extended pipelineSnapshot.test.ts with a
Linear fixture asserting all 6 list IDs are passed to
`provider.listWorkItems`. All 7816 unit + 524 integration tests pass.

Out of scope: re-running the failed MNG-94 backlog-manager (it reported
success — just found nothing actionable). Worker container image will
pick up the fix on the next deploy via dist/cli/bootstrap.js (verified
present after `npm run build`).

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant