fix(prWorkItems): preserve workItemId across racing pipeline writers#1130
Merged
zbigniewsobiecki merged 1 commit intodevfrom Apr 16, 2026
Merged
fix(prWorkItems): preserve workItemId across racing pipeline writers#1130zbigniewsobiecki merged 1 commit intodevfrom
zbigniewsobiecki merged 1 commit intodevfrom
Conversation
When the implementation pipeline (PM-triggered) opens a PR, the GitHub pr-opened webhook fires immediately and queues a review pipeline. The review pipeline captures workItemId synchronously from a DB lookup that returns null (the implementation hasn't yet linked the PR), then later writes that stale null back to pr_work_items, clobbering the correct work_item_id the implementation wrote in between. Effect: lookupWorkItemForPR returns null for the PR, the pr-ready-to-merge trigger bails with "No work item linked to PR", and PRs flagged with `auto` on Linear never auto-merge. Affected every recent llmist PR (15 in a row); Trello/JIRA-backed projects appear unaffected so far but share the racing pipeline code path. Three intersecting fixes, all narrow: - linkPRToWorkItem Step 2 ON CONFLICT now uses COALESCE(EXCLUDED.work_item_id, pr_work_items.work_item_id) so work_item_id is one-way set — a later writer with null can't erase a known link. - linkPRToWorkItem Step 1 deletes any racing orphan (projectId, NULL workItemId, prNumber) row before promoting the work-item-only row, preventing the partial-unique-index violation the implementation otherwise hits silently. - runAgentExecutionPipeline re-resolves workItemId via the existing resolveWorkItemId helper at run time and patches agentInput so the corrected value flows into runAgent (and into agent_runs.work_item_id), not just into the post-execution link. Tests: +2 integration (preservation, orphan cleanup), +5 unit (re-resolution paths + delete/no-delete coverage). All 7814 unit and 524 integration tests pass. Drive-by cleanups so lint+typecheck are clean: drop unused `renderManifestStep` import in pm-wizard.tsx, rename a pm-conformance describe string that triggered noTemplateCurlyInString. Out of scope: backfilling the 15 broken historical llmist rows (separate manual SQL), no PR-body-parsing fallback in resolveWorkItemId, no rewrite of how PROpenedTrigger captures workItemId. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
zbigniewsobiecki
added a commit
that referenced
this pull request
Apr 16, 2026
…ine lists
After spec 006/5 ("delete legacy bootstrap; pmRegistry becomes a delegate")
landed, two paths that the worker container relies on broke for Linear:
1. `cascade-tools` CLI ran with an empty PM registry. Spec 006/5 removed
the only path that previously self-bootstrapped on registry access.
Router (src/router/index.ts:8) and worker (src/worker-entry.ts:19) were
updated to import the integrations barrel explicitly, but the CLI's
lazy oclif command loader was missed. Result: every `cascade-tools pm
<cmd>` call from inside an agent run threw `Unknown PM integration
type: 'linear'. Registered:` (empty), failing every backlog-manager
chain that tries to enumerate work items.
2. `buildPipelineLists` in src/agents/definitions/contextSteps.ts only
read Trello (`lists`) and JIRA (`statuses`) configs. Linear projects
exposed `statuses` in their config (identical shape to JIRA's), but
the function never called `getLinearConfig`. Result: every Linear
project's `fetchPipelineSnapshotStep` logged `No pipeline lists
configured, skipping` and returned `[]`, so the backlog-manager
prompt never received the pipeline snapshot it expected.
Combined effect: after PR #1130 landed (auto-merge link preservation), the
chained backlog-manager fired correctly on llmist (Linear-backed) but
bailed without picking up a next card.
Fix 1 — `src/cli/bootstrap.ts` (NEW): three side-effect imports that
register every provider manifest. Loaded from `bin/cascade-tools.js` (the
CLI entry script) before `Config.load`, mirroring the router/worker
bootstrap pattern.
Routing through `bin/cascade-tools.js` rather than `src/cli/base.ts`
intentionally — `cli/base.ts` is transitively imported by gadgets that
some integration tests pull in, and prepending the bootstrap there made
`tests/integration/trigger-registry.test.ts` fail with a circular-import
symptom (`new ReadyToProcessLabelTrigger()` from `trello/manifest.ts`
threw "is not a constructor"). Loading from the binary keeps the
side-effect off any test path.
Fix 2 — extend `buildPipelineLists` to also call `getLinearConfig` and
chain `?? linearConfig?.statuses?.X` onto each `addList` call. JIRA and
Linear share the `Record<string, string>` `statuses` shape, so the
existing nullish-coalescing pattern extends naturally. No behavior
change for Trello/JIRA.
Tests: tests/unit/cli/bootstrap.test.ts (NEW) asserts
`listPMProviders()` returns `['linear', 'jira', 'trello']` after
importing the bootstrap module. Extended pipelineSnapshot.test.ts with a
Linear fixture asserting all 6 list IDs are passed to
`provider.listWorkItems`. All 7816 unit + 524 integration tests pass.
Out of scope: re-running the failed MNG-94 backlog-manager (it reported
success — just found nothing actionable). Worker container image will
pick up the fix on the next deploy via dist/cli/bootstrap.js (verified
present after `npm run build`).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 tasks
zbigniewsobiecki
added a commit
that referenced
this pull request
Apr 16, 2026
…ine lists (#1131) After spec 006/5 ("delete legacy bootstrap; pmRegistry becomes a delegate") landed, two paths that the worker container relies on broke for Linear: 1. `cascade-tools` CLI ran with an empty PM registry. Spec 006/5 removed the only path that previously self-bootstrapped on registry access. Router (src/router/index.ts:8) and worker (src/worker-entry.ts:19) were updated to import the integrations barrel explicitly, but the CLI's lazy oclif command loader was missed. Result: every `cascade-tools pm <cmd>` call from inside an agent run threw `Unknown PM integration type: 'linear'. Registered:` (empty), failing every backlog-manager chain that tries to enumerate work items. 2. `buildPipelineLists` in src/agents/definitions/contextSteps.ts only read Trello (`lists`) and JIRA (`statuses`) configs. Linear projects exposed `statuses` in their config (identical shape to JIRA's), but the function never called `getLinearConfig`. Result: every Linear project's `fetchPipelineSnapshotStep` logged `No pipeline lists configured, skipping` and returned `[]`, so the backlog-manager prompt never received the pipeline snapshot it expected. Combined effect: after PR #1130 landed (auto-merge link preservation), the chained backlog-manager fired correctly on llmist (Linear-backed) but bailed without picking up a next card. Fix 1 — `src/cli/bootstrap.ts` (NEW): three side-effect imports that register every provider manifest. Loaded from `bin/cascade-tools.js` (the CLI entry script) before `Config.load`, mirroring the router/worker bootstrap pattern. Routing through `bin/cascade-tools.js` rather than `src/cli/base.ts` intentionally — `cli/base.ts` is transitively imported by gadgets that some integration tests pull in, and prepending the bootstrap there made `tests/integration/trigger-registry.test.ts` fail with a circular-import symptom (`new ReadyToProcessLabelTrigger()` from `trello/manifest.ts` threw "is not a constructor"). Loading from the binary keeps the side-effect off any test path. Fix 2 — extend `buildPipelineLists` to also call `getLinearConfig` and chain `?? linearConfig?.statuses?.X` onto each `addList` call. JIRA and Linear share the `Record<string, string>` `statuses` shape, so the existing nullish-coalescing pattern extends naturally. No behavior change for Trello/JIRA. Tests: tests/unit/cli/bootstrap.test.ts (NEW) asserts `listPMProviders()` returns `['linear', 'jira', 'trello']` after importing the bootstrap module. Extended pipelineSnapshot.test.ts with a Linear fixture asserting all 6 list IDs are passed to `provider.listWorkItems`. All 7816 unit + 524 integration tests pass. Out of scope: re-running the failed MNG-94 backlog-manager (it reported success — just found nothing actionable). Worker container image will pick up the fix on the next deploy via dist/cli/bootstrap.js (verified present after `npm run build`). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PRs flagged with
autoon Linear weren't auto-merging. Trace: every recent llmist PR haspr_work_items.work_item_id IS NULL, sopr-ready-to-mergebails with"No work item linked to PR"even though the implementation correctly capturedMNG-93at run time.Root cause is a three-way race against
linkPRToWorkItembetween the implementation pipeline and the review pipeline that's queued as soon as the PR opens. The review pipeline capturesworkItemIdsynchronously at webhook arrival time (DB-onlyresolveWorkItemId), getsnullbecause the implementation hasn't yet linked the PR, then writes that stalenullback topr_work_itemspost-execution — clobbering the correctwork_item_idthe implementation wrote in between.Fix — three intersecting changes
linkPRToWorkItemStep 2 ON CONFLICT:work_item_idis nowCOALESCE(EXCLUDED.work_item_id, pr_work_items.work_item_id)— one-way set. A later writer withnullcannot erase a known link.linkPRToWorkItemStep 1 prologue: deletes any racing orphan(projectId, NULL workItemId, prNumber)row before promoting the work-item-only row, so the partial-unique-indexuq_pr_work_items_project_prcan't fire (the implementation otherwise hits this and silently logs awarn).runAgentExecutionPipeline: re-resolvesworkItemIdvia the existingresolveWorkItemIdhelper at run time and patchesagentInputso the corrected value flows intorunAgent(and intoagent_runs.work_item_id), not just into the post-execution link.Tests
tests/integration/db/prWorkItemsRepository.test.ts(+2):preserves an existing non-null workItemId when called with workItemId=null,removes a stale orphan ... row when promoting a work-item-only row. Both fail before the fix.tests/unit/triggers/shared/agent-execution.test.ts(+3):re-resolves workItemId from DB when result.workItemId is undefined,preserves trigger-supplied workItemId when DB lookup is unnecessary,leaves workItemId undefined when neither trigger nor DB has one.tests/unit/db/repositories/prWorkItemsRepository.test.ts(+2):deletes any racing orphan (NULL workItemId, prNumber) row before Step 1 UPDATE,does NOT call delete when workItemId is null. Updated existingonConflictDoUpdateassertion sinceset.workItemIdis now an SQL expression (COALESCE).Counts: unit 7814/7814 pass, integration 524/524 pass, lint clean, typecheck clean.
Drive-by cleanups
To leave lint at zero warnings:
renderManifestStepimport inweb/src/components/projects/pm-wizard.tsx(left over from PR feat(006/5): cleanup legacy — spec 006 complete #1129's manifest migration).pm-conformancedescribe string'/${id}/webhook convention'→'/<id>/webhook convention'so biome'snoTemplateCurlyInStringstops flagging it.Out of scope
resolveWorkItemId(Linear regex) — explicitly excluded; the real fixes above are sufficient.PROpenedTriggercapturesworkItemId— Part C handles the consequence at the right layer; rewriting capture would balloon scope.Test plan
npm test(7814/7814 pass)npm run test:integration(524/524 pass, ~5.5 min)npm run typecheckcleannpm run lintclean (was 2 pre-existing warnings; now 0)pr_work_items.work_item_idstay set through implementation + review, confirmpr-ready-to-mergefires when the reviewer approves.🤖 Generated with Claude Code