feat: add manual agent triggering and run retry#312
Conversation
Merge dev to main
Merge dev to main
Merge dev to main
Merge dev to main
Adds the ability to manually trigger agent runs and retry failed runs via both CLI and web UI (future PR). Core changes: - New manual-runner.ts with triggerManualRun() and triggerRetryRun() - In-memory duplicate prevention via trigger tracking - Two new tRPC mutations: runs.trigger and runs.retry - CLI commands: cascade runs trigger and cascade runs retry - Added 'manual' to triggerType union in AgentInput - Comprehensive unit tests (9 tests, all passing) Implementation follows the debug-runner.ts pattern: - Fire-and-forget execution (no await on runAgent) - Org access validation in tRPC layer - Error handling with logger - Status tracking to prevent concurrent duplicates Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
🔍 Reviewing PR... |
aaight
left a comment
There was a problem hiding this comment.
Summary
Clean implementation that follows existing patterns well. No blocking issues. Two minor items worth considering.
Code Issues
Should Fix
- src/api/routers/runs.ts:181 —
agentType: z.string()has no.min(1)constraint. Other routers in this codebase (agentConfigs.ts:57,prompts.ts:30) consistently usez.string().min(1)foragentType. An empty string would pass validation and silently fail in the fire-and-forgetrunAgentcall, making it hard to debug.
Minor
- src/api/routers/runs.ts + src/triggers/shared/manual-runner.ts — The
retrymutation fetches the run viagetRunByIdfor auth validation (line 248), thentriggerRetryRunfetches the same run again from DB (manual-runner.ts line 128). The existingtriggerDebugAnalysispattern avoids this by extracting run data at the router level and passing it down. Not a bug, but an unnecessary extra DB call per retry. Consider passing the already-fetched run data totriggerRetryRuninstead.
| .input( | ||
| z.object({ | ||
| projectId: z.string(), | ||
| agentType: z.string(), |
There was a problem hiding this comment.
Minor: agentType: z.string() — other routers use z.string().min(1) for agent type validation. Without it, an empty string silently fails in the fire-and-forget path.
| */ | ||
| export async function triggerRetryRun( | ||
| runId: string, | ||
| project: ProjectConfig, |
There was a problem hiding this comment.
This getRunById call duplicates the fetch already done in the retry tRPC mutation. Consider accepting the run data as a parameter (similar to how triggerDebugAnalysis receives its inputs from the router) to avoid the extra DB round-trip.
|
👀 Checking this out... |
|
🔍 Reviewing PR... |
zbigniewsobiecki
left a comment
There was a problem hiding this comment.
Code Review Summary
I've reviewed the manual agent triggering and run retry implementation. The code is well-structured and follows the existing patterns (particularly debug-runner.ts). Here are my findings:
✅ Strengths
- Excellent test coverage - 9 comprehensive unit tests covering duplicate prevention, parameter extraction, error handling, and fire-and-forget behavior
- Follows established patterns - Mirrors
debug-runner.tsdesign with in-memory tracking and fire-and-forget execution - Clean separation of concerns - tRPC mutations handle auth/validation, runner handles business logic
- Type safety - Proper TypeScript types and Zod validation
- Good error handling - Catches and logs errors in fire-and-forget promises
💡 Observations & Questions
1. Fire-and-forget Return Timing
The triggerManualRun and triggerRetryRun functions return immediately (fire-and-forget), but the tRPC mutations call them with await:
// In runs.ts line 215+
await triggerManualRun(...).catch((err) => { ... });This seems inconsistent. If these are truly fire-and-forget, should we remove the await? Or is the await intentional to catch the duplicate detection error before returning { triggered: true } to the client?
Recommendation: Consider clarifying the intent - either:
- Remove
awaitif truly fire-and-forget (liketriggerDebugAnalysison line 166) - Keep
awaitbut document that we want to catch duplicate errors synchronously
2. Duplicate Prevention Scope
The duplicate prevention only checks project+agent+card+pr. This means:
- ✅ Can't trigger the same card twice concurrently
- ❓ Can trigger with different
modelOverridevalues while another is running
Is this intentional? Should modelOverride be part of the deduplication key?
3. Missing PR Context on Retry
From manual-runner.ts:156:
// For PR-based agents, we don't store branch/SHA in DB, so we can't restore them.
// The retry will fetch fresh data from GitHub if needed.This is a known limitation and well-documented. However, retrying a PR-based agent might fail if:
- The original PR branch was deleted
- The PR was closed/merged
Question: Should we validate that PR-based runs have an accessible PR before allowing retry? Or is failing gracefully acceptable?
4. Schema Migration Not Included
The PR adds 'manual' to the triggerType union in src/types/index.ts, but I don't see a database migration to add this value to any enum constraint (if one exists).
Question: Does the trigger_type column have an enum constraint in PostgreSQL? If so, we need a migration to add 'manual' to the allowed values.
🔍 Minor Observations
-
CLI UX: The CLI commands use
--agent-type(kebab-case) but the API usesagentType(camelCase). This is consistent with the existing codebase pattern, so no change needed - just noting for future consideration. -
Error messages: Very clear and actionable (e.g., "Manual trigger already running for project=X, agent=Y"). Great UX!
-
Test quality: The tests properly use
setImmediateto handle async fire-and-forget behavior. Nice attention to detail!
🎯 Verdict
This is solid work that follows established patterns and includes excellent test coverage. The questions above are mostly about clarifying intent rather than bugs. The implementation is production-ready pending answers to the questions above.
Primary concerns:
- Clarify the
awaitintent in tRPC mutations (observation #1) - Verify database schema handles
'manual'trigger type (observation #4)
Nice to address (optional):
Summary
Adds the ability to manually trigger agent runs and retry failed runs via CLI (web UI in future PR).
runs.triggerandruns.retrywith org access validationcascade runs triggerandcascade runs retrymanual-runner.tsfollowing thedebug-runner.tspattern (fire-and-forget, duplicate prevention)'manual'totriggerTypeunionImplementation Details
manual-runner.ts
triggerManualRun()— accepts project/agent/card params, builds AgentInput, calls runAgent fire-and-forgettriggerRetryRun()— reads original run from DB, extracts params, delegates to triggerManualRuntRPC Mutations
runs.trigger— validates org ownership, loads project config, fires triggerruns.retry— verifies run exists + org access, loads config, fires retryCLI Commands
cascade runs trigger --project ID --agent-type TYPE [--card-id ID] [--model MODEL]cascade runs retry RUN_ID [--model MODEL]Test Coverage
Related
Trello card: https://trello.com/c/6993656f949847855086dc50
Implements steps 1-5 and 8-9 from the card (CLI layer). Web UI (steps 6-7) deferred to future PR.
🤖 Generated with Claude Code