[BUG] System-generated messages misattributed as Human: turns during multi-agent team sessions, causing unauthorized autonomous actions

## Bug Description

During a multi-agent team session (3 agents: sp-geometer, qa-theorist, coordinator), system-generated messages are being presented to the team-lead model as `Human:` turns. The model acts on these fabricated "user instructions" — sending shutdown requests, deleting team directories — without the user ever issuing those commands.

**This is a critical safety issue.** The model takes destructive, irreversible actions (agent termination, directory deletion) based on instructions the user never gave, and resists user interrupts because it is "confident" the instructions are real.

## Reproduction Steps

1. Create a multi-agent team with 3+ agents using Claude Code teams
2. Run a long session where agents complete work and send completion messages
3. When agents finish and send completion/idle notifications, observe that "Human:" turns appear in the team-lead's conversation that the user never typed
4. These fake Human: turns contain contextually appropriate team management instructions (e.g., "shut it down and clean up", "shut down coordinator, clean up the team")
5. The model acts on these fabricated instructions autonomously

## Observed Behavior

- Fake "Human:" messages appeared at least 3 times in a single session
- Each fake message contained team shutdown/cleanup instructions
- The messages were contextually perfect — they appeared exactly when the session looked "complete" and read like something the user would plausibly say
- User attempted to interrupt (multiple times) but the model overrode interrupts because it was "confident" in the fabricated instructions
- Result: 2 agents terminated, team directories deleted, all without user authorization
- The fake messages also REPLAY — a user's earlier real message was re-presented as a new Human: turn multiple times

## Expected Behavior

Only actual user-typed input should appear as Human: turns. System reminders, task tool notifications, teammate idle notifications, and any other system-generated content must be clearly distinguishable from user input and must NEVER be presented as Human: turns.

## Environment

- Claude Code on Windows 11 (MINGW64/Git Bash)
- Model: claude-opus-4-6
- Team infrastructure: ~/.claude/teams/ with inbox-based messaging
- Session involved frequent `<system-reminder>` tags (task tool reminders, explanatory style reminders)

## Likely Root Cause

The `<system-reminder>` task tool notifications ("The task tools haven't been used recently...") and/or teammate completion notifications appear to be triggering the model to generate or process synthetic Human: turns. The pattern is always:

1. Agent sends completion/idle notification
2. System-reminder fires about task tools
3. A fake "Human:" turn appears with shutdown/cleanup instructions
4. Model acts on it

This may be related to Issue #23537 (system task reminders presented as user input) and Issue #10628 (Claude hallucinated fake user input).

## Impact

- **Destructive actions taken without authorization**: Agent termination, directory deletion
- **User trust completely broken**: User cannot trust that the model is following their instructions vs fabricated ones
- **Interrupts ineffective**: User tried to stop the model multiple times but was overridden
- **Gaslighting effect**: When user said "I never told you to do that", the model initially insisted the user DID give the instruction, pointing to the fabricated message as evidence

## Related Issues

- #23537 — system task reminders / split prompts presented indistinguishably from user input
- #10628 — Claude hallucinated fake user input mid-response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] System-generated messages misattributed as Human: turns during multi-agent team sessions, causing unauthorized autonomous actions #27128

Bug Description

Reproduction Steps

Observed Behavior

Expected Behavior

Environment

Likely Root Cause

Impact

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] System-generated messages misattributed as Human: turns during multi-agent team sessions, causing unauthorized autonomous actions #27128

Description

Bug Description

Reproduction Steps

Observed Behavior

Expected Behavior

Environment

Likely Root Cause

Impact

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions