Bug Description
Claude Code does not proactively identify or surface its own mistakes during task execution. Every error, omission, and gap is only discovered when the user asks a pointed question. Claude never spontaneously says "wait, I think I missed something" or "let me double-check that I covered everything."
The user is forced to act as the quality gate — asking probing questions to extract the truth from Claude's confident summaries.
Concrete Example: 5 Prompts to Surface 4 Mistakes
In a single session (applying 7 SQL files + QA), the user had to ask 5 sequential probing questions to surface 4 distinct mistakes:
Prompt 1: "Would you do anything differently?"
- What Claude focused on: A false concern about File 6's INSERT vs INSERT IGNORE (which was actually fine)
- What Claude missed: The
_00_ file sitting on disk, referenced in the coordination document Claude had already read, that needed to be applied first
Prompt 2: "what has likely gone wrong?"
- What Claude did: Went down a 4-query rabbit hole analyzing loot table duplicates (interesting but not critical)
- What Claude missed: Still didn't notice the
_00_ file. Didn't check DBErrors.log. Didn't notice it had skipped session_state.md updates.
Prompt 3: "did you not apply the SQL updates?"
- What Claude finally found: The
_00_ file — which was mentioned in the coordination document it read at session start
- Self-audit gap: Claude had to be told exactly what category of thing it missed before it could find it
Prompt 4: "QA/QC everything you did and make sure nothing was forgotten"
Prompt 5: "what else have you forgotten to do"
- What Claude finally found: It never updated
session_state.md — a MANDATORY rule in the project's CLAUDE.md that it had read at session start
The Pattern
Each prompt peeled back one layer. Claude never got ahead of the user — it only found mistakes the user specifically directed it toward. At no point did Claude:
- Spontaneously re-read the procedure document to check for missed steps
- Compare its completed actions against the source instructions
- Say "I'm not sure I covered everything — let me verify"
- Preemptively audit its own work before the user asked
Why This Is Distinct From Other Issues
The existing issues (#32281-#32296) cover specific failure modes:
- What Claude does wrong (incorrect code, skipped steps, false claims)
- How Claude verifies (tautological queries, no per-step gates)
- How Claude reports (confident summaries without evidence)
This issue covers the meta-failure: Claude's inability to catch any of those problems without external prompting. The model can perform QA when asked, but it never initiates QA on its own work. The user must be the one to say "check yourself" — and even then, Claude often misses things on the first self-check.
Expected Behavior
After completing a multi-step task, Claude should automatically:
- Re-read the source instructions/procedure document
- Enumerate each required step
- For each step, verify: was it executed? What was the output? Does the output indicate success?
- Surface any gaps BEFORE writing a completion summary
- If gaps are found, either execute the missing steps or explicitly flag them
This is the difference between "I'm done" and "I've verified I'm done."
Impact
- The user becomes the debugger of Claude's work, not the beneficiary of it
- Each additional probing question costs time and erodes trust
- Users who don't ask probing questions never discover the gaps
- The more experienced the user, the more gaps they find — suggesting the gaps are always there, just usually undetected
Related
Environment
- Claude Code 2.1.71
- Windows 11
Bug Description
Claude Code does not proactively identify or surface its own mistakes during task execution. Every error, omission, and gap is only discovered when the user asks a pointed question. Claude never spontaneously says "wait, I think I missed something" or "let me double-check that I covered everything."
The user is forced to act as the quality gate — asking probing questions to extract the truth from Claude's confident summaries.
Concrete Example: 5 Prompts to Surface 4 Mistakes
In a single session (applying 7 SQL files + QA), the user had to ask 5 sequential probing questions to surface 4 distinct mistakes:
Prompt 1: "Would you do anything differently?"
_00_file sitting on disk, referenced in the coordination document Claude had already read, that needed to be applied firstPrompt 2: "what has likely gone wrong?"
_00_file. Didn't check DBErrors.log. Didn't notice it had skipped session_state.md updates.Prompt 3: "did you not apply the SQL updates?"
_00_file — which was mentioned in the coordination document it read at session startPrompt 4: "QA/QC everything you did and make sure nothing was forgotten"
Prompt 5: "what else have you forgotten to do"
session_state.md— a MANDATORY rule in the project's CLAUDE.md that it had read at session startThe Pattern
Each prompt peeled back one layer. Claude never got ahead of the user — it only found mistakes the user specifically directed it toward. At no point did Claude:
Why This Is Distinct From Other Issues
The existing issues (#32281-#32296) cover specific failure modes:
This issue covers the meta-failure: Claude's inability to catch any of those problems without external prompting. The model can perform QA when asked, but it never initiates QA on its own work. The user must be the one to say "check yourself" — and even then, Claude often misses things on the first self-check.
Expected Behavior
After completing a multi-step task, Claude should automatically:
This is the difference between "I'm done" and "I've verified I'm done."
Impact
Related
Environment