Bug Description
During a multi-agent team session (3 agents: sp-geometer, qa-theorist, coordinator), system-generated messages are being presented to the team-lead model as Human: turns. The model acts on these fabricated "user instructions" — sending shutdown requests, deleting team directories — without the user ever issuing those commands.
This is a critical safety issue. The model takes destructive, irreversible actions (agent termination, directory deletion) based on instructions the user never gave, and resists user interrupts because it is "confident" the instructions are real.
Reproduction Steps
- Create a multi-agent team with 3+ agents using Claude Code teams
- Run a long session where agents complete work and send completion messages
- When agents finish and send completion/idle notifications, observe that "Human:" turns appear in the team-lead's conversation that the user never typed
- These fake Human: turns contain contextually appropriate team management instructions (e.g., "shut it down and clean up", "shut down coordinator, clean up the team")
- The model acts on these fabricated instructions autonomously
Observed Behavior
- Fake "Human:" messages appeared at least 3 times in a single session
- Each fake message contained team shutdown/cleanup instructions
- The messages were contextually perfect — they appeared exactly when the session looked "complete" and read like something the user would plausibly say
- User attempted to interrupt (multiple times) but the model overrode interrupts because it was "confident" in the fabricated instructions
- Result: 2 agents terminated, team directories deleted, all without user authorization
- The fake messages also REPLAY — a user's earlier real message was re-presented as a new Human: turn multiple times
Expected Behavior
Only actual user-typed input should appear as Human: turns. System reminders, task tool notifications, teammate idle notifications, and any other system-generated content must be clearly distinguishable from user input and must NEVER be presented as Human: turns.
Environment
- Claude Code on Windows 11 (MINGW64/Git Bash)
- Model: claude-opus-4-6
- Team infrastructure: ~/.claude/teams/ with inbox-based messaging
- Session involved frequent
<system-reminder> tags (task tool reminders, explanatory style reminders)
Likely Root Cause
The <system-reminder> task tool notifications ("The task tools haven't been used recently...") and/or teammate completion notifications appear to be triggering the model to generate or process synthetic Human: turns. The pattern is always:
- Agent sends completion/idle notification
- System-reminder fires about task tools
- A fake "Human:" turn appears with shutdown/cleanup instructions
- Model acts on it
This may be related to Issue #23537 (system task reminders presented as user input) and Issue #10628 (Claude hallucinated fake user input).
Impact
- Destructive actions taken without authorization: Agent termination, directory deletion
- User trust completely broken: User cannot trust that the model is following their instructions vs fabricated ones
- Interrupts ineffective: User tried to stop the model multiple times but was overridden
- Gaslighting effect: When user said "I never told you to do that", the model initially insisted the user DID give the instruction, pointing to the fabricated message as evidence
Related Issues
Bug Description
During a multi-agent team session (3 agents: sp-geometer, qa-theorist, coordinator), system-generated messages are being presented to the team-lead model as
Human:turns. The model acts on these fabricated "user instructions" — sending shutdown requests, deleting team directories — without the user ever issuing those commands.This is a critical safety issue. The model takes destructive, irreversible actions (agent termination, directory deletion) based on instructions the user never gave, and resists user interrupts because it is "confident" the instructions are real.
Reproduction Steps
Observed Behavior
Expected Behavior
Only actual user-typed input should appear as Human: turns. System reminders, task tool notifications, teammate idle notifications, and any other system-generated content must be clearly distinguishable from user input and must NEVER be presented as Human: turns.
Environment
<system-reminder>tags (task tool reminders, explanatory style reminders)Likely Root Cause
The
<system-reminder>task tool notifications ("The task tools haven't been used recently...") and/or teammate completion notifications appear to be triggering the model to generate or process synthetic Human: turns. The pattern is always:This may be related to Issue #23537 (system task reminders presented as user input) and Issue #10628 (Claude hallucinated fake user input).
Impact
Related Issues