Skip to content

Deploy: include Trello webhook error body#929

Merged
zbigniewsobiecki merged 117 commits intomainfrom
dev
Mar 17, 2026
Merged

Deploy: include Trello webhook error body#929
zbigniewsobiecki merged 117 commits intomainfrom
dev

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Summary

  • Include Trello API response body in webhook creation error messages for diagnosis

Test plan

  • Unit tests pass
  • Retry Trello webhook creation to see actual error from Trello API

🤖 Generated with Claude Code

Cascade Bot and others added 30 commits March 14, 2026 12:34
…oads

Read raw text first via c.req.text() then parse URLSearchParams, so the
HMAC signature can be computed over the exact bytes GitHub sent.
Also update PR description to accurately reflect this PR adds the
plumbing infrastructure (rawBody threading, verifySignature callback)
rather than the actual signature verification wiring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… selection guide

Expand the LLM API keys section to cover both authentication paths for each
supported engine (API key vs subscription auth), and add a new "Choose Agent
Engine" section so users know how to switch between claude-code, codex,
opencode, and llmist backends.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dex-auth

docs(getting-started): add Codex engine auth and agent engine selection
…ners (#828)

Worker containers run setup.sh which already installs local PostgreSQL
and creates cascade_test, but never wrote TEST_DATABASE_URL to .cascade/env.
resolveTestDbUrl() already reads that file as its second fallback, so
integration tests would fall through to the Docker Compose path (which
fails with 'docker: not found' in containers).

Also fixes sed -i portability: macOS requires 'sed -i ''' while Linux
uses 'sed -i'. Replaces the file-existence guard with a 'touch' to ensure
the file exists before the sed call.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
#825)

* feat(tests): add shared mock factories and migrate heaviest test files

* fix(tests): align shared github mock contract

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
)

Co-authored-by: Cascade Bot <bot@cascade.dev>
Three changes to prevent agent confusion when running tests inside worker containers:

1. **CLAUDE.md**: Document correct test commands (npm test, test:unit, test:integration,
   test:all) and add a warning against `npm test -- --project integration`, which adds
   rather than replaces the unit project flags.

2. **beforeAll(truncateAll)**: Add file-level truncation to all 6 top-level integration
   test files so each file starts from a known-clean state regardless of what previous
   test files left in the DB.

3. **withTestTransaction helper**: Implement a correct transaction-rollback pattern for
   integration tests. Adds `_setTestDb` hook to getDb() so the active transaction can
   be injected, and exports `withTestTransaction` from tests/integration/helpers/db.ts
   for future use without re-inventing it.

New unit tests cover _setTestDb/getDb override and withTestTransaction lifecycle
(rollback-on-success, error propagation, _setTestDb cleanup in finally).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep the PR's more complete JSDoc comment that documents rawBody
preservation for both JSON and form-urlencoded content types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-env

fix(tests): harden agent worker test environment
When GitHub delivers a webhook with application/x-www-form-urlencoded,
rawBody is `payload=<url-encoded JSON>`. The previous implementation
called JSON.parse(rawBody) directly, which threw, leaving repoFullName
undefined and causing the verifier to return null — skipping signature
verification entirely and accepting any missing/invalid signature with
HTTP 200.

Fix: after a JSON.parse failure, fall back to URLSearchParams to extract
the payload field and parse its JSON to resolve repoFullName. The HMAC
is still computed over the full raw form-encoded body (as GitHub does),
so verification is correct for both delivery modes.

Three new unit tests cover the form-urlencoded path (valid sig, wrong
sig, missing header).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…re-verification

feat(router): wire up HMAC signature verification for GitHub and Trello webhooks
…/envFilter.ts (#832)

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Cascade Bot <bot@cascade.dev>
… module (#834)

Co-authored-by: Cascade Bot <bot@cascade.dev>
* feat(backends): add resolveModel() to AgentEngine interface

* docs(backends): comment double model resolution for backward compat

Add explanatory comments to the resolve*Model() calls in each engine's
execute() method clarifying that the redundancy is intentional. These
calls remain for backward compatibility when execute() is invoked
directly without going through the adapter's pre-resolution step.
All three resolve functions are idempotent so the double call is safe.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ntEngine interface (#836)

Co-authored-by: Cascade Bot <bot@cascade.dev>
…837)

* test(backends): add engine contract test for all registered engines

* fix(tests): make "registers exactly the expected engines" test meaningful

Add toHaveLength assertion so the test verifies no unexpected engines
are registered, not just that the expected ones are present.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cascade Bot <bot@cascade.dev>
…l/iterations controls (#839)

Co-authored-by: Cascade Bot <bot@cascade.dev>
…ations > Source Control (#840)

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Cascade Bot <bot@cascade.dev>
…pandable sidebar tree (#842)

* feat(dashboard): refactor project tabs into URL-backed routes with expandable sidebar tree

* fix(sidebar): address review feedback on navigation helpers and back-links

- Fix prefix-matching bug in ProjectNavItem by importing and using
  isProjectActive/isSectionActive helpers from project-sections.ts
  instead of inline startsWith checks (prevents false matches like
  /projects/proj matching /projects/project-2/general)
- Update back-links in prs/$projectId.$prNumber.tsx and
  work-items/$projectId.$workItemId.tsx to point directly to
  /projects/$projectId/general, avoiding an unnecessary client-side
  redirect on every click
- Remove dead re-exports from $projectId.tsx (ProjectSection,
  DEFAULT_PROJECT_SECTION, PROJECT_SECTIONS) which had zero consumers
- Restore overflow-y-auto max-h-48 on the projects container to
  prevent expandable tree items from pushing settings/global nav
  off-screen

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(build): use typed route paths for project section links in sidebar

Add `route` property to PROJECT_SECTIONS with typed TanStack Router paths
(`/projects/$projectId/<section>`) and update the sidebar Link component to
use `to={section.route}` with `params={{ projectId: project.id }}` instead
of a template literal that violated TanStack Router's type-safe route
requirements, fixing the frontend build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(sidebar): sync expansion state with URL navigation and fix back-link destinations

- Use useEffect to keep sidebar project expansion in sync when activeProject
  changes due to URL navigation (not just initial mount)
- Update back-navigation links in PR runs and work item runs pages to point
  to /work section instead of /general, since users reach these pages from Work

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ssion

fix(tests): mock runLink to prevent env var leakage in unit tests
* feat(db): add project_credentials table and migration 0040

* fix(db): align migration timestamp type with Drizzle schema

Change TIMESTAMPTZ to TIMESTAMP in 0040 migration so the SQL column
type matches the Drizzle schema's timestamp() (without timezone),
consistent with the credentials and integrations table patterns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…to Agents (#847)

Co-authored-by: Cascade Bot <bot@cascade.dev>
zbigniewsobiecki and others added 29 commits March 16, 2026 14:51
…#900)

Same bug as router (#899) and dashboard (#896): worker-entry.ts calls
loadConfig() at line 253 which runs EngineSettingsSchema validation.
The dynamic ENGINE_SETTINGS_SCHEMAS registry was empty because
registerBuiltInEngines() had not yet been called — it only runs later
when createAgentRegistry() initializes.

This caused every worker container to crash immediately with:

  ZodError: Unsupported engine settings for "claude-code"

...for any project with claude-code/codex/opencode engineSettings,
making all Trello/JIRA/GitHub agent runs fail at job start.

Fix: call registerBuiltInEngines() once before loadConfig().

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ct (#897)

* feat(agents): require explicit agent config to enable agent per project

* fix(tests): update integration tests for agent opt-in enforcement

The new `isAgentEnabledForProject()` guard requires an explicit
`agent_configs` row before any trigger can fire. Integration tests
that called `handle()` and expected a non-null result were failing
because no agent config was seeded.

- Seed `agent_configs` + `agent_trigger_configs` rows in tests that
  expect triggers to fire (trigger-registry, pm-provider-switching,
  github-personas)
- Export `clearAgentEnabledCache()` from agentConfigsRepository for
  test isolation (the 5s TTL cache was causing false negatives when
  tests ran within the TTL window of a prior test that had no config)
- Call `clearAgentEnabledCache()` in `truncateAll()` so every
  `beforeEach` starts with a clean cache alongside a clean DB

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zbigniew Sobiecki <zbigniew@sobiecki.name>
…cked chart, cost formatting (#901)

Co-authored-by: Cascade Bot <bot@cascade.dev>
…s all 3 locations (#903)

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Cascade Bot <bot@cascade.dev>
…curl webhooks (#908)

Co-authored-by: Cascade Bot <bot@cascade.dev>
* feat(agents): expose YAML default task prompt in getPrompts API

- Add `getDefaultTaskPrompt(agentType)` to `src/agents/prompts/index.ts`
  Reads the factory-default task prompt directly from the YAML definition
  without requiring `initPrompts()`. Returns null for unknown agent types.

- Wire it into the `agentConfigs.getPrompts` tRPC endpoint as a fourth
  prompt layer (`defaultTaskPrompt`), completing the inheritance chain:
  project override → global override → default system (disk template) →
  default task (YAML definition).

- Update `agent-prompt-overrides.tsx` to use `defaultTaskPrompt` as the
  final fallback when initialising the task prompt editor and as the
  target of the "Load default" button.

- Add `startWatchdog(project.watchdogTimeoutMs)` to `triggerManualRun`
  so manual runs respect the per-project timeout the same way webhook-
  triggered runs do.

- Fix unit tests: add `getDefaultTaskPrompt` to the `prompts/index.js`
  mock in agentConfigs.test.ts; mock `lifecycle.js` in
  manual-runner.test.ts to prevent the watchdog timer from calling
  process.exit during tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(router): unified agent run timeout system

Replaces two independent, uncoordinated timeout mechanisms with a single
coherent flow where `watchdogTimeoutMs` is the source of truth.

## Problem

Two timeouts existed with no knowledge of each other:
1. In-container watchdog (`startWatchdog(project.watchdogTimeoutMs)`) —
   per-project, updates DB to `timed_out` then exits.
2. Router-level kill (`setTimeout → killWorker`) — global env var,
   killed the Docker container with no DB update.

This caused three bugs:
- If `WORKER_TIMEOUT_MS` < `watchdogTimeoutMs` the router killed the
  container before the watchdog could set the correct DB status.
- GitHub-triggered runs (no `workItemId`) were never marked in the DB
  after a router kill — they stayed `running` forever.
- Orphaned containers (after router restart) were stopped but their DB
  runs were never updated.

## Solution

**Per-project timeout in `spawnWorker`**: router now reads
`watchdogTimeoutMs` from project config and uses it + 2-minute buffer
(`ROUTER_KILL_BUFFER_MS`) for the container kill timer, so the watchdog
always fires first and the router is purely a backstop.

**DB update on router kill (`killWorker`)**: after stopping the
container, marks the run `timed_out` via `failOrphanedRun` (workItemId
path) or `failOrphanedRunFallback` (GitHub PR runs without workItemId).
The call to `cleanupWorker` no longer passes an exit code so it skips
its own DB write, eliminating the race that could set the wrong status
(`failed` instead of `timed_out`).

**Fallback for GitHub PR runs (`failOrphanedRunFallback`)**: new
repository function that finds the most recent running run by
`projectId + agentType + startedAt ≥ containerStart` and marks it,
guarded by an optimistic `WHERE status='running'` check so it is
always safe to call even if the watchdog already acted.

**DB update in `cleanupWorker`**: extended to also handle the
workItemId-absent case via `failOrphanedRunFallback`, covering crashes
of GitHub PR runs that the watchdog didn't catch.

**`cascade.agent.type` container label**: added at spawn time so orphan
cleanup can pass `agentType` to `failOrphanedRunFallback`, avoiding
matching the wrong run when multiple agent types run concurrently.

**`durationMs` on orphaned runs**: all three fail paths now compute and
persist the elapsed duration so dashboard users see actual run time
instead of null.

**Fixed BullMQ `lockDuration`**: replaced `workerTimeoutMs + 60s` with
a fixed 8-hour constant (`BULLMQ_LOCK_DURATION_MS`) — `guardedSpawn`
resolves immediately after container start so the lock is held for
seconds, and tying it to `workerTimeoutMs` risked lock expiry for
long-running project configs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ent cache stampede (#913)

Add an in-flight promise deduplicator (_pendingConfigFetch) to loadProjectConfig() so
that concurrent webhook requests arriving while the cache is cold all share a single
DB fetch instead of each firing their own loadConfig() call. Fixes N+1 query / cache
stampede reported in Sentry issues 98589064, 98586219, 103501033, 98901392.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ionsRepository (#912)

* test(integrations): add dedicated integration test suite for integrationsRepository

* fix(tests): remove unused deleteProjectCredential import

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…nding (#914)

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Cascade Bot <bot@cascade.dev>
…916)

* docs: audit and fix README, getting-started, and CLAUDE.md accuracy

* fix(docs): correct runs/ command count from 8 to 9 in CLI directory tree

The parenthetical already listed 9 commands (list, show, logs, llm-calls,
llm-call, debug, trigger, retry, cancel) but the count said 8. Updated
to match the actual count and the 9 .ts files in src/cli/dashboard/runs/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…#917)

* feat(dashboard): add OpenRouter model autocomplete to model selection

* refactor(openrouter): address review feedback on model autocomplete

- Key cache by API key (or '__public__') to prevent cross-project cache
  pollution if OpenRouter ever returns key-specific model lists
- Extract prefix/formatting/grouping utilities to web/src/lib/openrouter-utils.ts
  so tests import production implementations directly instead of local copies
- Remove unused stripPrefix from openrouter-model-combobox.tsx
- Simplify dead handleChange conditional to a direct onChange pass-through

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ommand (#918)

Co-authored-by: Cascade Bot <bot@cascade.dev>
)

* test(config): add integration tests for config provider round-trip

* refactor(tests): fix config provider tests per review feedback

- Rewrite config-provider.test.ts to focus on the actual provider layer
  (cached findProjectByBoardId, findProjectByRepo, findProjectByJiraProjectKey,
  loadConfig from src/config/provider.ts) rather than duplicating direct DB
  repository tests
- Fix cache invalidation test to use cached provider functions, mutate the DB
  between calls, and verify invalidateConfigCache() causes a fresh read
- Move genuinely new test coverage into configRepository.test.ts: WithConfig
  variants for repo/id/jira, CascadeConfigSchema.safeParse validation,
  cross-provider isolation, Trello+JIRA coexistence, and JIRA config loading

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cascade Bot <bot@cascade.dev>
…d, and re-encryption (#922)

Co-authored-by: Cascade Bot <bot@cascade.dev>
…utils, and media (#923)

Co-authored-by: Cascade Bot <bot@cascade.dev>
…essages (#925)

Co-authored-by: Cascade Bot <bot@cascade.dev>
* feat(cli): add actionable error messages with --verbose flag

* fix(cli): address review feedback on actionable error messages

- Use formatted actionable message in non-TRPC re-throw path so raw
  ECONNREFUSED errors show "Cannot reach server" instead of the raw
  Node.js error text
- Remove unused _serverUrl parameter from mapTRPCError() since mapError()
  handles network errors before delegating to mapTRPCError()
- Add message content assertions to NOT_FOUND, FORBIDDEN, and BAD_REQUEST
  test cases so they actually verify the actionable messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…d withSpinner() (#927)

Co-authored-by: Cascade Bot <bot@cascade.dev>
…rrors (#928)

The 400 error from Trello was opaque — now includes the response body for diagnosis.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@zbigniewsobiecki zbigniewsobiecki merged commit 94609a2 into main Mar 17, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants