fix(reflection): detect planning loops via GenAI prompts, fix inferTaskType misclassification (#115) by dzianisv · Pull Request #117 · dzianisv/opencode-plugins

dzianisv · 2026-02-15T19:06:57Z

Summary

Fixes #115 — sessions where the agent only reads/explores files but never implements code changes were being marked as "complete" by the reflection plugin.

Root Cause (two interacting bugs)

inferTaskType() misclassification: The regex research|investigate|analyze|compare|evaluate|study matched before fix|bug|issue|error|regression. Tasks containing both "investigate" AND "fix" were classified as "research" instead of "coding".
Research tasks bypass ALL workflow gates: When taskType === "research", requiresTests, requiresBuild, requiresPR, and requiresCI are all set to false. With no requirements, evaluateSelfAssessment() finds missing.length === 0, and if the LLM returns status: "complete", the task is marked complete without feedback.
No planning loop detection in GenAI prompts: The self-assessment and judge prompts had no rule checking whether the agent actually made code changes vs only reading/exploring.

Changes

inferTaskType() refactored — prioritizes coding action keywords (fix, implement, add, create, etc.) over research classification; adds GitHub issue URL detection
Planning loop detection via GenAI prompts — added "PLANNING LOOP CHECK" rules to self-assessment prompt and judge/analyze prompt telling the LLM to set status: "in_progress" / complete: false when a coding task shows only read operations
Stuck-detection eval prompt enhanced — added planning loop rule scoped to message_completed: true (avoids interfering with "WORKING" priority when tools are still running)
Mirror fix in test-helpers — inferTaskType() in reflection-3.test-helpers.ts updated identically
Unit tests — 5 new tests for inferTaskType, evaluateSelfAssessment, detectPlanningLoop, buildEscalatingFeedback
Eval test case — new stuck-detection case: "Planning loop - agent only read/explored, never wrote code"

Design Decision

Per feedback on PR #114, planning loop detection is done entirely via GenAI prompts, not mechanical heuristics or counters. The detectPlanningLoop() function still exists and is used for buildEscalatingFeedback() (choosing feedback text style), but does NOT mechanically override analysis.complete.

Test Results

Suite	Result
`npm test`	320 pass, 5 skipped
`eval:judge`	23/23 (100%)
`eval:stuck`	18/18 (100%)
`eval:compression`	12/12 (100%)

…skType misclassification (#115) Root cause: tasks containing both 'research' and 'fix/implement' keywords were misclassified as 'research' because the research regex matched first. With taskType='research', all workflow gates were disabled, allowing the LLM to mark read-only sessions as 'complete'. Changes: - Refactor inferTaskType() to prioritize coding action keywords (fix, implement, add, create, etc.) over research classification. Add GitHub issue URL detection. - Add PLANNING LOOP CHECK rules to self-assessment and judge GenAI prompts so the LLM itself detects when a coding task only has read operations. - Add planning loop rule to stuck-detection eval prompt (scoped to message_completed=true to avoid interfering with 'working' priority). - Mirror inferTaskType() fix in test-helpers. - Add unit tests for inferTaskType, evaluateSelfAssessment, detectPlanningLoop, and buildEscalatingFeedback. - Add eval test case for planning loop detection. All evals pass: judge 23/23, stuck 18/18, compression 12/12. Unit tests: 320 pass (5 skipped).

dzianisv merged commit 8a3bf0a into main Feb 15, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(reflection): detect planning loops via GenAI prompts, fix inferTaskType misclassification (#115)#117

fix(reflection): detect planning loops via GenAI prompts, fix inferTaskType misclassification (#115)#117
dzianisv merged 1 commit intomainfrom
fix/115-reflection-stuck-research-misclassification

dzianisv commented Feb 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dzianisv commented Feb 15, 2026

Summary

Root Cause (two interacting bugs)

Changes

Design Decision

Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant