Skip to content

fix(router): coalesce PM create→update webhooks to stop JIRA double-firing agents#1179

Merged
zbigniewsobiecki merged 1 commit intodevfrom
fix/jira-create-coalesce
Apr 24, 2026
Merged

fix(router): coalesce PM create→update webhooks to stop JIRA double-firing agents#1179
zbigniewsobiecki merged 1 commit intodevfrom
fix/jira-create-coalesce

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Summary

  • Bug: Creating a JIRA issue directly in a non-default workflow column fires two agents (e.g. implementation and planning) on the same work item.
  • Cause: JIRA emits two webhooks ~hundreds of ms apart — issue_created at the workflow's initial status, then issue_updated transitioning to the target column. Each resolves a different STATUS_TO_AGENT entry, each enqueues one job.
  • Fix: Router-level coalesce window keyed by `${projectId}:${workItemId}`. A `pm:status-changed` create trigger waits `PM_CREATE_COALESCE_WINDOW_MS` (default 2000, 0 disables) before enqueue; an update for the same key within the window supersedes the create — no ack comment, no job queued, `onBlocked` called to clear any dedup markers.

Evidence

Loki capture of UA-11 (ua-store) reproduces the bug exactly:

09:53:46.527  Received JIRA webhook  event: jira:issue_created
09:53:46.799  Received JIRA webhook  event: jira:issue_updated
09:53:46.954  JIRA issue entered agent-triggering status  eventKind:create  toStatus:'To Do'    agentType:implementation
09:53:47.110  JIRA issue entered agent-triggering status  eventKind:move    toStatus:'PLANNING'  agentType:planning
09:53:48.934  jira job queued  jobId:…e8tt64  eventType:jira:issue_created    (implementation)
09:53:48.978  jira job queued  jobId:…tsgkgd  eventType:jira:issue_updated    (planning)

Scope

  • JIRA: bug fixed.
  • Linear: same `isCreate`/`onCreate`/`onMove` code shape, safe today (Linear bundles creation into one webhook) — now belt-and-braces protected.
  • Trello: single webhook on create, unaffected.
  • The coalesce lives in the long-lived router process (`src/router/webhook-processor.ts`). Workers are ephemeral per-job containers — in-memory coalescing there wouldn't work across containers.

Files

  • New: `src/pm/create-coalesce-window.ts` — `registerPendingCreate(key, ttlMs) → 'proceed' | 'superseded'` + `clearPendingCreate(key)` + `getCoalesceWindowMs()`.
  • `src/types/index.ts` — optional `coalesceKey` + `coalesceRole` on `TriggerResult`.
  • `src/triggers/jira/status-changed.ts`, `src/triggers/linear/status-changed.ts` — emit coalesce metadata on every result.
  • `src/router/webhook-processor.ts` — new Step 7b: drain or defer based on coalesce role.
  • `CLAUDE.md` — document `PM_CREATE_COALESCE_WINDOW_MS`.

Short-term mitigation (before this ships)

`cascade projects trigger-set --agent implementation --event pm:status-changed --enable --params '{"onCreate":false,"onMove":true}'` — stops implementation firing on the ephemeral initial status; planning still fires on the transition.

Test plan

  • `npm run lint` clean
  • `npm run typecheck` clean
  • `npm test` — 454 files, 8351 tests pass
  • New coverage:
    • `tests/unit/pm/create-coalesce-window.test.ts` — 7 tests (ttl, supersede, key isolation, double-register, disabled, no-op clear)
    • `tests/unit/triggers/jira-status-changed.test.ts` — coalesce-metadata tests for create + move
    • `tests/unit/triggers/linear-status-changed.test.ts` — symmetric create + move tests
    • `tests/unit/router/webhook-processor.test.ts` — router-level supersede + no-supersede tests with real 50ms window
  • E2E verification in staging after merge: create a JIRA issue in a non-default column, confirm only one agent fires.

🤖 Generated with Claude Code

… double-firing agents

When a user creates a JIRA issue directly in a non-default workflow column,
JIRA emits two webhooks ~hundreds of ms apart: `issue_created` at the
workflow's initial status, then `issue_updated` transitioning to the target
column. Each webhook resolves a different agent via `STATUS_TO_AGENT`, so
both fire on the same work item (observed as UA-11: implementation +
planning running concurrently).

Fix: router-level coalesce window keyed by `${projectId}:${workItemId}`.
A `pm:status-changed` create trigger waits `PM_CREATE_COALESCE_WINDOW_MS`
(default 2000, 0 disables) before enqueue; an update for the same key
within the window supersedes the create — no ack posted, no job queued,
`onBlocked` called to clear any dedup markers. Trivially also protects
Linear against the same code-shape risk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 91.54930% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pm/create-coalesce-window.ts 88.88% 5 Missing ⚠️
src/router/webhook-processor.ts 95.45% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@zbigniewsobiecki zbigniewsobiecki merged commit 620408f into dev Apr 24, 2026
9 checks passed
@zbigniewsobiecki zbigniewsobiecki deleted the fix/jira-create-coalesce branch April 24, 2026 10:19
Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

The coalescing logic is clean, robust, and correctly solves the JIRA double-enqueue problem for mapped columns. The create-coalesce-window.ts implementation handles race conditions and cleanup perfectly. However, there is a subtle correctness bug in how TriggerHandler.handle() returns null for unmapped JIRA status updates that defeats the fix when a user creates an issue directly into a non-triggering column (like "Done").

Architecture & Design

  • [BLOCKING] Coalesce bypass on unmapped updates: JIRA issues created in non-triggering columns will still erroneously fire the default column's agent because the follow-up update webhook gets dropped by the trigger handler before it can clear the coalesce window.

Code Issues

Blocking

  • src/triggers/jira/status-changed.ts — The current implementation of handle() returns null when a status update does not map to an enabled agent. Because it returns null, processRouterWebhook receives no TriggerResult, exits early, and never executes Step 7b (clearPendingCreate).

    The Bug Scenario: If a user creates a JIRA issue directly in a non-triggering column (e.g. "Done"), JIRA emits issue_created for the default initial workflow column (e.g. "To Do", which maps to implementation), immediately followed by issue_updated for the target "Done" column.

    1. issue_created ("To Do") matches the implementation agent, returns a TriggerResult with coalesceRole: 'create', and enters the 2-second wait window.
    2. issue_updated ("Done") is received. Since "Done" doesn't map to an enabled agent, handle() returns null. processRouterWebhook exits without calling clearPendingCreate().
    3. After 2 seconds, the implementation agent fires for the issue that was created in "Done"!

    The Fix: When handle() processes a status update that doesn't trigger an agent, it should return a "coalesce-only" TriggerResult with agentType: null instead of null. This ensures processRouterWebhook still clears the pending create, and then safely exits at the if (!result.agentType) check without queueing a job.

Should Fix

  • src/triggers/linear/status-changed.ts — The exact same issue exists for Linear. Even though Linear bundles creation today, since we are adding belt-and-braces protection here, LinearStatusChangedTrigger.handle() should also return an agentType: null coalesce result instead of null when a state transition doesn't match an agent, to maintain symmetry and robustness.

Nitpicks

  • src/pm/create-coalesce-window.ts: Excellent test isolation mechanism with __resetCoalesceWindowForTests(). Timer state management is very clean.

🕵️ opencode · google/gemini-3.1-pro-preview · run details

Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

The coalescing logic is clean, robust, and correctly solves the JIRA double-enqueue problem for mapped columns. The create-coalesce-window.ts implementation handles race conditions and cleanup perfectly. However, there is a subtle correctness bug in how TriggerHandler.handle() returns null for unmapped JIRA status updates that defeats the fix when a user creates an issue directly into a non-triggering column (like "Done").

Architecture & Design

  • [BLOCKING] Coalesce bypass on unmapped updates: JIRA issues created in non-triggering columns will still erroneously fire the default column's agent because the follow-up update webhook gets dropped by the trigger handler before it can clear the coalesce window.

Code Issues

Blocking

  • src/triggers/jira/status-changed.ts:98 — The current implementation of handle() returns null when a status update does not map to an enabled agent. Because it returns null, processRouterWebhook receives no TriggerResult, exits early, and never executes Step 7b (clearPendingCreate).

    The Bug Scenario: If a user creates a JIRA issue directly in a non-triggering column (e.g. "Done"), JIRA emits issue_created for the default initial workflow column (e.g. "To Do", which maps to implementation), immediately followed by issue_updated for the target "Done" column.

    1. issue_created ("To Do") matches the implementation agent, returns a TriggerResult with coalesceRole: 'create', and enters the 2-second wait window.
    2. issue_updated ("Done") is received. Since "Done" doesn't map to an enabled agent, handle() returns null. processRouterWebhook exits without calling clearPendingCreate().
    3. After 2 seconds, the implementation agent fires for the issue that was created in "Done"!

    The Fix: When handle() processes a status update that doesn't trigger an agent, it should return a "coalesce-only" TriggerResult with agentType: null instead of null. This ensures processRouterWebhook still clears the pending create, and then safely exits at the if (!result.agentType) check without queueing a job.

    // Example for JiraStatusChangedTrigger.handle()
    const coalesceUpdateResult = isCreateEvent(payload) ? null : {
      agentType: null,
      agentInput: { workItemId: issueKey }, // minimal valid AgentInput
      coalesceKey: `${ctx.project.id}:${issueKey}`,
      coalesceRole: 'update' as const,
    };
    
    const agentType = resolveAgentType(newStatus, jiraConfig.statuses);
    if (!agentType) return coalesceUpdateResult; // Instead of return null;
    
    const { enabled, parameters } = await checkTriggerEnabledWithParams(...);
    if (!enabled) return coalesceUpdateResult; // Instead of return null;
    
    const isCreate = isCreateEvent(payload);
    if (!shouldFireOnEvent(isCreate, parameters)) return coalesceUpdateResult; // Instead of return null;

Should Fix

  • src/triggers/linear/status-changed.ts:88 — The exact same issue exists for Linear. Even though Linear bundles creation today, since we are adding belt-and-braces protection here, LinearStatusChangedTrigger.handle() should also return an agentType: null coalesce result instead of null when a state transition doesn't match an agent, to maintain symmetry and robustness.

Nitpicks

  • src/pm/create-coalesce-window.ts — Excellent test isolation mechanism with __resetCoalesceWindowForTests(). Timer state management is very clean.

🕵️ opencode · google/gemini-3.1-pro-preview · run details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants