Skip to content

feat: add manual agent triggering and run retry#312

Merged
zbigniewsobiecki merged 5 commits intodevfrom
feature/manual-agent-triggering
Feb 16, 2026
Merged

feat: add manual agent triggering and run retry#312
zbigniewsobiecki merged 5 commits intodevfrom
feature/manual-agent-triggering

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Summary

Adds the ability to manually trigger agent runs and retry failed runs via CLI (web UI in future PR).

  • New tRPC mutations: runs.trigger and runs.retry with org access validation
  • CLI commands: cascade runs trigger and cascade runs retry
  • Core runner: manual-runner.ts following the debug-runner.ts pattern (fire-and-forget, duplicate prevention)
  • Type update: Added 'manual' to triggerType union

Implementation Details

manual-runner.ts

  • triggerManualRun() — accepts project/agent/card params, builds AgentInput, calls runAgent fire-and-forget
  • triggerRetryRun() — reads original run from DB, extracts params, delegates to triggerManualRun
  • In-memory tracking prevents duplicate concurrent triggers for same project+agent+card+PR combo

tRPC Mutations

  • runs.trigger — validates org ownership, loads project config, fires trigger
  • runs.retry — verifies run exists + org access, loads config, fires retry
  • Both use fire-and-forget pattern (matches debug analysis)

CLI Commands

  • cascade runs trigger --project ID --agent-type TYPE [--card-id ID] [--model MODEL]
  • cascade runs retry RUN_ID [--model MODEL]

Test Coverage

  • 9 unit tests covering trigger/retry flows, duplicate prevention, error handling
  • All existing tests pass (1310 tests)
  • Type checking: ✓
  • Linting: ✓

Related

Trello card: https://trello.com/c/6993656f949847855086dc50

Implements steps 1-5 and 8-9 from the card (CLI layer). Web UI (steps 6-7) deferred to future PR.

🤖 Generated with Claude Code

zbigniewsobiecki and others added 5 commits February 16, 2026 18:16
Adds the ability to manually trigger agent runs and retry failed runs
via both CLI and web UI (future PR).

Core changes:
- New manual-runner.ts with triggerManualRun() and triggerRetryRun()
- In-memory duplicate prevention via trigger tracking
- Two new tRPC mutations: runs.trigger and runs.retry
- CLI commands: cascade runs trigger and cascade runs retry
- Added 'manual' to triggerType union in AgentInput
- Comprehensive unit tests (9 tests, all passing)

Implementation follows the debug-runner.ts pattern:
- Fire-and-forget execution (no await on runAgent)
- Org access validation in tRPC layer
- Error handling with logger
- Status tracking to prevent concurrent duplicates

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@aaight
Copy link
Copy Markdown
Collaborator

aaight commented Feb 16, 2026

🔍 Reviewing PR...

Copy link
Copy Markdown
Collaborator

@aaight aaight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Clean implementation that follows existing patterns well. No blocking issues. Two minor items worth considering.

Code Issues

Should Fix

  • src/api/routers/runs.ts:181agentType: z.string() has no .min(1) constraint. Other routers in this codebase (agentConfigs.ts:57, prompts.ts:30) consistently use z.string().min(1) for agentType. An empty string would pass validation and silently fail in the fire-and-forget runAgent call, making it hard to debug.

Minor

  • src/api/routers/runs.ts + src/triggers/shared/manual-runner.ts — The retry mutation fetches the run via getRunById for auth validation (line 248), then triggerRetryRun fetches the same run again from DB (manual-runner.ts line 128). The existing triggerDebugAnalysis pattern avoids this by extracting run data at the router level and passing it down. Not a bug, but an unnecessary extra DB call per retry. Consider passing the already-fetched run data to triggerRetryRun instead.

Comment thread src/api/routers/runs.ts
.input(
z.object({
projectId: z.string(),
agentType: z.string(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: agentType: z.string() — other routers use z.string().min(1) for agent type validation. Without it, an empty string silently fails in the fire-and-forget path.

*/
export async function triggerRetryRun(
runId: string,
project: ProjectConfig,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This getRunById call duplicates the fetch already done in the retry tRPC mutation. Consider accepting the run data as a parameter (similar to how triggerDebugAnalysis receives its inputs from the router) to avoid the extra DB round-trip.

@zbigniewsobiecki
Copy link
Copy Markdown
Member Author

👀 Checking this out...

@zbigniewsobiecki
Copy link
Copy Markdown
Member Author

🔍 Reviewing PR...

Copy link
Copy Markdown
Member Author

@zbigniewsobiecki zbigniewsobiecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

I've reviewed the manual agent triggering and run retry implementation. The code is well-structured and follows the existing patterns (particularly debug-runner.ts). Here are my findings:

✅ Strengths

  1. Excellent test coverage - 9 comprehensive unit tests covering duplicate prevention, parameter extraction, error handling, and fire-and-forget behavior
  2. Follows established patterns - Mirrors debug-runner.ts design with in-memory tracking and fire-and-forget execution
  3. Clean separation of concerns - tRPC mutations handle auth/validation, runner handles business logic
  4. Type safety - Proper TypeScript types and Zod validation
  5. Good error handling - Catches and logs errors in fire-and-forget promises

💡 Observations & Questions

1. Fire-and-forget Return Timing

The triggerManualRun and triggerRetryRun functions return immediately (fire-and-forget), but the tRPC mutations call them with await:

// In runs.ts line 215+
await triggerManualRun(...).catch((err) => { ... });

This seems inconsistent. If these are truly fire-and-forget, should we remove the await? Or is the await intentional to catch the duplicate detection error before returning { triggered: true } to the client?

Recommendation: Consider clarifying the intent - either:

  • Remove await if truly fire-and-forget (like triggerDebugAnalysis on line 166)
  • Keep await but document that we want to catch duplicate errors synchronously

2. Duplicate Prevention Scope

The duplicate prevention only checks project+agent+card+pr. This means:

  • ✅ Can't trigger the same card twice concurrently
  • ❓ Can trigger with different modelOverride values while another is running

Is this intentional? Should modelOverride be part of the deduplication key?

3. Missing PR Context on Retry

From manual-runner.ts:156:

// For PR-based agents, we don't store branch/SHA in DB, so we can't restore them.
// The retry will fetch fresh data from GitHub if needed.

This is a known limitation and well-documented. However, retrying a PR-based agent might fail if:

  • The original PR branch was deleted
  • The PR was closed/merged

Question: Should we validate that PR-based runs have an accessible PR before allowing retry? Or is failing gracefully acceptable?

4. Schema Migration Not Included

The PR adds 'manual' to the triggerType union in src/types/index.ts, but I don't see a database migration to add this value to any enum constraint (if one exists).

Question: Does the trigger_type column have an enum constraint in PostgreSQL? If so, we need a migration to add 'manual' to the allowed values.

🔍 Minor Observations

  1. CLI UX: The CLI commands use --agent-type (kebab-case) but the API uses agentType (camelCase). This is consistent with the existing codebase pattern, so no change needed - just noting for future consideration.

  2. Error messages: Very clear and actionable (e.g., "Manual trigger already running for project=X, agent=Y"). Great UX!

  3. Test quality: The tests properly use setImmediate to handle async fire-and-forget behavior. Nice attention to detail!

🎯 Verdict

This is solid work that follows established patterns and includes excellent test coverage. The questions above are mostly about clarifying intent rather than bugs. The implementation is production-ready pending answers to the questions above.

Primary concerns:

  • Clarify the await intent in tRPC mutations (observation #1)
  • Verify database schema handles 'manual' trigger type (observation #4)

Nice to address (optional):

  • Document the duplicate prevention scope (observation #2)
  • Consider PR validation on retry (observation #3)

@zbigniewsobiecki zbigniewsobiecki merged commit 700544b into dev Feb 16, 2026
4 checks passed
@zbigniewsobiecki zbigniewsobiecki deleted the feature/manual-agent-triggering branch March 16, 2026 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants