Skip to content

fix(backlog-manager): prevent hallucinated tool use (narrated commands)#799

Merged
zbigniewsobiecki merged 1 commit intodevfrom
fix/backlog-manager-hallucinated-tool-use
Mar 14, 2026
Merged

fix(backlog-manager): prevent hallucinated tool use (narrated commands)#799
zbigniewsobiecki merged 1 commit intodevfrom
fix/backlog-manager-hallucinated-tool-use

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Summary

  • Adds an explicit engine-agnostic EXECUTE COMMANDS — DO NOT JUST DESCRIBE THEM rule to the backlog-manager prompt's Rules section
  • Adds a regression test asserting the rule renders correctly in the compiled prompt

Problem

Run d521aad7 (backlog-manager, openrouter/google/gemini-3-flash-preview) completed silently without ever moving the Trello card. The model wrote bash commands inside markdown code blocks as text output instead of invoking them as tool calls, then stopped with reason: stop. Text output has no effect on the system.

Engine log excerpt:

I am posting a selection comment to the card and moving it to the TODO list.

​```bash
cascade-tools pm post-comment ... && cascade-tools pm move-work-item ...
​```

Solution

Added one rule to the Rules section of backlog-manager.eta:

EXECUTE COMMANDS — DO NOT JUST DESCRIBE THEM: When you decide to post a comment or move a card, you MUST actually invoke the command as a tool call. Writing a command inside a code block without invoking it does NOT execute it — text output has no effect on the system. If you find yourself writing out a command without calling it, stop and call it instead.

Why engine-agnostic? No mention of "bash tool" or any specific backend — applies equally to OpenCode, Claude Code, and any future backends.

Why not a completion requirement? Valid no-op runs exist (pipeline non-empty, all cards blocked). Adding requiresWorkItemMoved would incorrectly fail those runs.

Test plan

  • New unit test: backlog-manager prompt warns against describing commands instead of invoking them
  • All 4678 existing tests pass locally
  • Lint and typecheck clean

🤖 Generated with Claude Code

…lls, not describe them

Gemini Flash (and similar models) sometimes output bash commands inside
markdown code blocks instead of actually invoking them as tool calls,
then stop with reason:stop — causing silent no-ops (card never moved).

Adds an engine-agnostic CRITICAL rule to the Rules section of the
backlog-manager prompt instructing the model to always invoke tool calls
rather than narrate them as text.

Also adds a regression test asserting the rule appears in the rendered
system prompt.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@zbigniewsobiecki zbigniewsobiecki merged commit ea5927f into dev Mar 14, 2026
6 checks passed
@zbigniewsobiecki zbigniewsobiecki deleted the fix/backlog-manager-hallucinated-tool-use branch March 14, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant