feat(harness): add schema validation retry loop with session continuity#219
Closed
santoshkumarradha wants to merge 3 commits intomainfrom
Closed
feat(harness): add schema validation retry loop with session continuity#219santoshkumarradha wants to merge 3 commits intomainfrom
santoshkumarradha wants to merge 3 commits intomainfrom
Conversation
Contributor
Performance
✓ No regressions detected |
…mpts Add diagnose_output_failure() that classifies validation failures into specific categories: file missing, empty, invalid JSON, or schema mismatch with field-level diff. Enhance build_followup_prompt() to include schema file references and explicit rewrite instructions for the retry loop.
Replace single-shot _handle_schema_output() with _handle_schema_with_retry() that retries up to schema_max_retries times (default 2) when JSON validation fails. Each retry: - Diagnoses the specific failure via diagnose_output_failure() - Sends a follow-up prompt to the agent with error context - For Claude: passes resume=session_id to continue the conversation - For CLI providers: fresh call with the follow-up prompt - Accumulates cost, turns, and messages across all attempts This activates the previously dead-code build_followup_prompt() from _schema.py and adds resume_session_id support to the Claude Code provider.
Standalone script exercising the harness with 5 escalating schema levels:
- simple (2 fields), medium (lists + optionals), complex (13 nested fields),
deeply_nested (recursive TreeNode), massive (>4K tokens, file-based path)
Tested live with both claude-code and codex providers — all levels pass.
Includes manual retry test mode (--retry-test) to exercise the new retry loop.
62cc883 to
783894f
Compare
Member
Author
|
Superseded by #220 (combined harness PR). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
schema_max_retriestimes (default 2) by sending follow-up prompts with diagnostic error contextdiagnose_output_failure()classifies failures into specific categories: file missing, empty, invalid JSON, or schema validation error with field-level diffresume=session_idto the Claude Code provider so the agent continues the same conversation (preserving full context of the first attempt)Problem
The harness had a structural gap:
build_followup_prompt()existed in_schema.pybut was never called from_runner.py. When complex JSON schemas failed validation (truncated output, missing fields, wrong types), the harness returned an error immediately with zero recovery. The only retry was for transient network errors (rate limits, timeouts), not for schema validation failures.What Changed
sdk/python/agentfield/harness/_schema.pydiagnose_output_failure(file_path, schema)— reads the output file and classifies the failure mode with actionable detail (parse error location, expected vs actual top-level keys)build_followup_prompt()— now includes schema file reference (for large schemas) and explicit JSON rewrite instructionssdk/python/agentfield/harness/_runner.py_handle_schema_output()with_handle_schema_with_retry():parse_and_validate()returns None, enters retry loopresume_session_idso Claude continues the same conversationschema_max_retriesoption (default: 2)_accumulate_metrics()helper for multi-attempt cost/turn trackingschema_max_retriesto the resolved options listsdk/python/agentfield/harness/providers/claude.pyresume_session_idis in options, passesresume=session_idtoClaudeAgentOptionssdk/python/tests/debug_complex_json.py(new)--retry-testmode to exercise the retry loop with manual follow-up promptsclaude-codeandcodexprovidersTesting
simple(2 fields): ✅ 10scomplex(13 nested fields, enums, lists of objects): ✅ ~105sdeeply_nested(recursive TreeNode): ✅ ~61smassive(>4K tokens, file-based schema path): ✅ ~160sRetry Flow