diff --git a/PR-11131-ROOT-CAUSE-ANALYSIS.md b/PR-11131-ROOT-CAUSE-ANALYSIS.md new file mode 100644 index 0000000000..1527689e79 --- /dev/null +++ b/PR-11131-ROOT-CAUSE-ANALYSIS.md @@ -0,0 +1,297 @@ +# PR #11131 Root Cause Analysis: Tool Block ID Sanitization + +## Executive Summary + +PR #11131 fixed a `ToolResultIdMismatchError` that occurred when tool result IDs didn't match tool use IDs in the API conversation history. The root cause was an **inconsistent application of ID sanitization** between where `tool_use` blocks were saved to history (`Task.ts`) and where `tool_result` blocks were created (`presentAssistantMessage.ts`). This affected approximately **926 occurrences** reported via PostHog telemetry in version 3.46.0. + +**Bug History**: The bug was introduced in **v3.40.0** (January 13, 2026) when `sanitizeToolUseId()` was first added to `Task.ts` (PR #10649) but not to `presentAssistantMessage.ts`. It persisted through versions 3.40.x, 3.41.x, 3.42.x, 3.43.x, 3.44.x, 3.45.x, and 3.46.0 before being fixed in **v3.46.1** (January 31, 2026). + +## Background: The Tool Execution Flow + +To understand the problem, it's essential to understand how tool execution and API conversation history work in this codebase: + +### 1. Tool Use Block Creation and Storage + +When the AI model responds with tool calls: + +1. **Streaming Phase** (`Task.ts:3400-3570`): Tool use blocks arrive during streaming +2. **Sanitization** (`Task.ts:3459-3480`): Tool use IDs are sanitized using `sanitizeToolUseId()` before being saved to API history + - Original ID: `functions.read_file:0` + - Sanitized ID: `functions_read_file_0` +3. **Storage** (`Task.ts:3549-3553`): The assistant message with **sanitized** tool_use IDs is added to `apiConversationHistory` + +```typescript +const sanitizedId = sanitizeToolUseId(toolCallId) +// ... later ... +await this.addToApiConversationHistory( + { role: "assistant", content: assistantContent }, + reasoningMessage || undefined, +) +``` + +### 2. Tool Result Block Creation (THE BUG LOCATION) + +When tools are executed in `presentAssistantMessage.ts`: + +1. **Tool Execution**: Tools run and produce results +2. **Result Creation** (BEFORE PR #11131): Tool result blocks were created with **unsanitized** `toolCallId`: + ```typescript + cline.pushToolResultToUserContent({ + type: "tool_result", + tool_use_id: toolCallId, // ❌ UNSANITIZED: "functions.read_file:0" + content: resultContent, + }) + ``` +3. **The Mismatch**: + - Tool use ID in history: `functions_read_file_0` (sanitized) + - Tool result ID trying to reference it: `functions.read_file:0` (unsanitized) + - **Result**: `ToolResultIdMismatchError` + +## Bug History: When Was It Introduced? + +Through code inspection of the git history, the bug was introduced in a specific version and persisted for several releases: + +### Initial Introduction: v3.40.0 (January 13, 2026) + +The `sanitizeToolUseId()` function was **first introduced** in commit `621d950` (PR #10649) on January 12, 2026, and released in v3.40.0 on January 13, 2026. + +**What happened in PR #10649**: +- Created `src/utils/tool-id.ts` with the `sanitizeToolUseId()` function +- Added sanitization to `Task.ts` when saving `tool_use` blocks to API history +- **Did NOT add** sanitization to `presentAssistantMessage.ts` where `tool_result` blocks are created +- This created the split-brain condition that caused the bug + +**Why it was added**: To fix API validation errors where tool IDs from certain providers (MCP tools, Gemini, OpenRouter) contained special characters that violated the API validation pattern `^[a-zA-Z0-9_-]+$`. + +### Versions Affected + +The bug existed in the following versions: +- **v3.40.0** (January 13, 2026) - Bug introduced +- **v3.40.1** (January 14, 2026) +- **v3.41.x** series (January 15-18, 2026) +- **v3.42.0** (January 23, 2026) +- **v3.43.0** (January 24, 2026) +- **v3.44.x** series (January 27, 2026) +- **v3.45.0** (January 28, 2026) +- **v3.46.0** (January 30, 2026) - PostHog reported 926 errors from this version + +**Total duration**: Approximately **18 days** (January 13-31, 2026) + +### Bug Fixed: v3.46.1 (January 31, 2026) + +The bug was fixed in commit `fe85422` (PR #11131) on January 31, 2026, and released in v3.46.1. + +**What changed in PR #11131**: +- Added `sanitizeToolUseId()` import to `presentAssistantMessage.ts` +- Applied sanitization to all 7 locations where `tool_result` blocks are created +- Added test coverage for the exact pattern seen in PostHog errors + +### Versions Before the Bug (v3.39.3 and earlier) + +Prior to v3.40.0, the `sanitizeToolUseId()` function **did not exist**, so: +- Tool use IDs were stored in API history **without sanitization** +- Tool result IDs referenced them **without sanitization** +- Both matched perfectly, so **no mismatch errors occurred** + +However, this meant that providers generating IDs with special characters would cause **API validation errors** instead of mismatch errors. PR #10649 was created to fix those API validation errors, but inadvertently introduced the mismatch bug. + +### 3. Validation and Error Detection + +Before messages are added to API history, they pass through `validateAndFixToolResultIds()`: + +- **Location**: Called in `Task.ts:1163` and `Task.ts:1241` +- **Purpose**: Validates that tool_result IDs match tool_use IDs from the previous assistant message +- **Error Tracking**: Reports mismatches to PostHog telemetry as `ToolResultIdMismatchError` +- **Automatic Fixing**: Attempts to fix mismatches by position-based matching, but this is a **workaround** for what should have been prevented + +## The Root Cause + +The root cause was a **split-brain problem** in ID sanitization: + +### Path 1: Tool Use IDs (Correct) +``` +AI Model Response → tool_use block created → sanitizeToolUseId() applied +→ Saved to API history with sanitized ID (e.g., "functions_read_file_0") +``` + +### Path 2: Tool Result IDs (Incorrect - Before PR #11131) +``` +Tool execution → tool_result block created → NO SANITIZATION APPLIED +→ Attempted to reference tool_use with unsanitized ID (e.g., "functions.read_file:0") +``` + +### Why This Happened + +The ID sanitization logic was added to `Task.ts` to handle API validation requirements (IDs must match `^[a-zA-Z0-9_-]+$`), but the corresponding sanitization was **not added** to `presentAssistantMessage.ts` where tool results are created. This created an asymmetry in the codebase. + +## The Fix (PR #11131) + +The fix was surgical and precise: + +1. **Import sanitizeToolUseId** in `presentAssistantMessage.ts`: + ```typescript + import { sanitizeToolUseId } from "../../utils/tool-id" + ``` + +2. **Apply sanitization at all 7 locations** where tool_result blocks are created: + ```typescript + cline.pushToolResultToUserContent({ + type: "tool_result", + tool_use_id: sanitizeToolUseId(toolCallId), // ✅ NOW SANITIZED + content: resultContent, + }) + ``` + +3. **Added test coverage** for the exact pattern seen in PostHog errors: + ```typescript + expect(sanitizeToolUseId("functions.read_file:0")).toBe("functions_read_file_0") + ``` + +## Scope of Change: Relationship to Queued Outgoing User Prompts + +### Understanding "Queued Outgoing User Prompts" + +The codebase has a message queueing system for handling user prompts that arrive while the task is busy: + +1. **Queueing** (`MessageQueueService.ts`): Messages are queued when they arrive during certain task states +2. **Processing** (`Task.ts:4756-4771`): After operations complete, queued messages are dequeued and submitted via `submitUserMessage()` +3. **Flow to History**: These user messages eventually flow through the same `addToApiConversationHistory()` path + +### How Tool Block ID Sanitization Relates to Queued Prompts + +The relationship is **indirect but critical**: + +#### Scenario: Parallel Tool Execution with Queued User Prompts + +Consider this sequence: + +1. **T=0ms**: AI responds with multiple tool_use blocks (e.g., `read_file`, `new_task`) +2. **T=10ms**: Tools begin executing in parallel during streaming +3. **T=20ms**: User types a new message → **queued** because task is busy +4. **T=30ms**: First tool (`read_file`) completes → `pushToolResultToUserContent()` called +5. **T=40ms**: Second tool (`new_task`) triggers delegation → `flushPendingToolResultsToHistory()` called +6. **T=50ms**: Queued user message is dequeued and processed + +#### The Critical Point: ID Validation During Flush + +When `flushPendingToolResultsToHistory()` is called (step 5): + +```typescript +// Task.ts:1237-1241 +const validatedMessage = validateAndFixToolResultIds(userMessage, historyForValidation) +``` + +This validation compares: +- **Tool result IDs** from `userMessageContent` (accumulated from tool executions) +- **Tool use IDs** from the previous assistant message in `apiConversationHistory` + +**Before PR #11131**: The validation would detect a mismatch because: +- Tool use ID in history: `functions_read_file_0` (sanitized) +- Tool result ID in pending content: `functions.read_file:0` (unsanitized) +- Result: `ToolResultIdMismatchError` captured in PostHog + +**After PR #11131**: Both IDs are sanitized consistently: +- Tool use ID in history: `functions_read_file_0` (sanitized in Task.ts) +- Tool result ID in pending content: `functions_read_file_0` (sanitized in presentAssistantMessage.ts) +- Result: No mismatch, validation passes + +### Why This Matters for Queued Prompts + +Queued user prompts are particularly relevant because: + +1. **Timing Sensitivity**: When messages are queued, there's a higher likelihood of tool results being flushed to history before the next API request (due to the asynchronous nature of queueing) + +2. **Validation Trigger Points**: Queued prompts trigger `submitUserMessage()` which may cause: + - Context condensing (`condenseContext()` → `flushPendingToolResultsToHistory()`) + - New API requests (which flush pending tool results) + +3. **Race Condition Exposure**: The queueing mechanism exposes timing-dependent code paths where tool results might be validated against API history at different points than in synchronous execution + +## Impact Analysis + +### Affected Patterns + +The bug affected API providers that generate function call IDs with special characters: + +- **Gemini/OpenRouter**: Generate IDs like `functions.read_file:0`, `functions.write_to_file:1` +- **MCP Tools**: Could generate IDs like `mcp.server:tool/name` + +### Why PostHog Reported 926 Occurrences + +1. **Version 3.46.0**: The issue existed in production +2. **Gemini/OpenRouter Usage**: Users with these providers would hit the issue on every tool call +3. **Validation Catches All**: `validateAndFixToolResultIds()` reports **every** mismatch to PostHog +4. **Multiple Tools Per Turn**: A single AI turn with 3 tools = 3 error reports + +### Why It Didn't Break Completely + +The validation layer (`validateAndFixToolResultIds()`) served as a **defensive safety net**: + +- **Position-based fixing**: Attempts to match tool results to tool uses by position +- **Automatic correction**: Fixes mismatches when possible +- **Error tracking**: Reports to PostHog but doesn't fail the operation + +However, this was masking the **root cause** rather than preventing the issue from occurring. + +## Technical Debt Implications + +### Before PR #11131 (Technical Debt) + +1. **Split Responsibility**: ID sanitization logic existed in two places: + - `Task.ts`: Sanitizes before saving to history + - `validateAndFixToolResultIds()`: Attempts to fix mismatches after the fact + +2. **Validation as Workaround**: The validation layer was doing **corrective work** that should have been **preventative** + +3. **Telemetry Noise**: 926 error reports for what should have been a non-error + +### After PR #11131 (Debt Reduced) + +1. **Consistent Sanitization**: IDs are sanitized at creation time in both places: + - `Task.ts`: When saving tool_use blocks + - `presentAssistantMessage.ts`: When creating tool_result blocks + +2. **Validation as Safety Net**: Now only catches **genuine anomalies** rather than systematic mismatches + +3. **Clean Telemetry**: Error reports only for actual problems + +## Conclusion + +PR #11131 fixed a fundamental architectural inconsistency where tool block IDs were sanitized in one part of the system but not another. The scope of the change is **narrow** (adding sanitization to 7 locations in one file) but the **impact is significant** because: + +1. **Prevents Systematic Errors**: Eliminates 926+ PostHog error reports +2. **Ensures API Compliance**: Tool result IDs now always match tool use IDs +3. **Reduces Reliance on Defensive Code**: Validation layer no longer needs to fix preventable issues +4. **Improves Reliability**: Especially important for queued user prompts which can trigger validation at unexpected times + +The relationship to queued outgoing user prompts is that they **expose timing-dependent code paths** where tool results are flushed to history and validated. Without consistent ID sanitization, these paths would trigger `ToolResultIdMismatchError` reports, which PR #11131 now prevents. + +## References + +### Pull Requests and Issues +- **PR #11131** (Bug Fix): https://github.com/RooCodeInc/Roo-Code/pull/11131 + - Commit: `fe85422d9c3411218c9d89850743745ad1105b75` + - Released in: v3.46.1 (January 31, 2026) +- **PR #10649** (Bug Introduction): https://github.com/RooCodeInc/Roo-Code/pull/10649 + - Commit: `621d9500deabba4f3e9e91169e10d476816bc623` + - Released in: v3.40.0 (January 13, 2026) + - Purpose: Sanitize tool_use IDs to match API validation pattern +- **Linear Issue**: EXT-711 + +### Key Files +- **presentAssistantMessage.ts**: `src/core/assistant-message/presentAssistantMessage.ts` + - Where the fix was applied (7 sanitization calls added) +- **Task.ts**: `src/core/task/Task.ts` + - Where sanitization was originally added (lines 3459-3480) +- **validateToolResultIds.ts**: `src/core/task/validateToolResultIds.ts` + - Validation logic that detected and reported the errors +- **tool-id.ts**: `src/utils/tool-id.ts` + - Created in PR #10649, contains `sanitizeToolUseId()` function +- **tool-id.spec.ts**: `src/utils/__tests__/tool-id.spec.ts` + - Test coverage added in both PRs + +### Version Timeline +- **v3.39.3 and earlier**: No sanitization, no mismatch errors (but had API validation errors) +- **v3.40.0** (Jan 13, 2026): Bug introduced via PR #10649 +- **v3.40.x - v3.46.0** (Jan 13-30, 2026): Bug persisted, 926 errors reported in v3.46.0 +- **v3.46.1** (Jan 31, 2026): Bug fixed via PR #11131