Skip to content

feat(harness): add schema validation retry loop with session continuity#219

Closed
santoshkumarradha wants to merge 3 commits intomainfrom
fix/harness-json-retry
Closed

feat(harness): add schema validation retry loop with session continuity#219
santoshkumarradha wants to merge 3 commits intomainfrom
fix/harness-json-retry

Conversation

@santoshkumarradha
Copy link
Copy Markdown
Member

Summary

  • Add schema validation retry loop — when JSON output fails validation, the harness now retries up to schema_max_retries times (default 2) by sending follow-up prompts with diagnostic error context
  • Add output failure diagnosisdiagnose_output_failure() classifies failures into specific categories: file missing, empty, invalid JSON, or schema validation error with field-level diff
  • Add Claude session continuity — on retry, passes resume=session_id to the Claude Code provider so the agent continues the same conversation (preserving full context of the first attempt)

Problem

The harness had a structural gap: build_followup_prompt() existed in _schema.py but was never called from _runner.py. When complex JSON schemas failed validation (truncated output, missing fields, wrong types), the harness returned an error immediately with zero recovery. The only retry was for transient network errors (rate limits, timeouts), not for schema validation failures.

What Changed

sdk/python/agentfield/harness/_schema.py

  • Added diagnose_output_failure(file_path, schema) — reads the output file and classifies the failure mode with actionable detail (parse error location, expected vs actual top-level keys)
  • Enhanced build_followup_prompt() — now includes schema file reference (for large schemas) and explicit JSON rewrite instructions

sdk/python/agentfield/harness/_runner.py

  • Replaced _handle_schema_output() with _handle_schema_with_retry():
    • After initial execution, if parse_and_validate() returns None, enters retry loop
    • Each retry: diagnoses failure → builds follow-up prompt → calls provider again → re-validates
    • Passes resume_session_id so Claude continues the same conversation
    • Accumulates costs, turns, and messages across all attempts
    • Configurable via schema_max_retries option (default: 2)
  • Added _accumulate_metrics() helper for multi-attempt cost/turn tracking
  • Added schema_max_retries to the resolved options list

sdk/python/agentfield/harness/providers/claude.py

  • When resume_session_id is in options, passes resume=session_id to ClaudeAgentOptions

sdk/python/tests/debug_complex_json.py (new)

  • Standalone debug script with 5 escalating schema complexity levels (simple → massive >4K tokens)
  • Tests both inline and file-based schema paths
  • Includes --retry-test mode to exercise the retry loop with manual follow-up prompts
  • Tested live with both claude-code and codex providers

Testing

  • 651 unit tests pass (0 failures, 0 regressions)
  • Live tests with real agents: all 5 schema levels pass with both Claude Code and Codex
    • simple (2 fields): ✅ 10s
    • complex (13 nested fields, enums, lists of objects): ✅ ~105s
    • deeply_nested (recursive TreeNode): ✅ ~61s
    • massive (>4K tokens, file-based schema path): ✅ ~160s
# Run the debug script
cd sdk/python
.venv/bin/python tests/debug_complex_json.py --provider claude-code --test complex
.venv/bin/python tests/debug_complex_json.py --provider codex --test massive
.venv/bin/python tests/debug_complex_json.py --provider claude-code --test all --retry-test

Retry Flow

1. Agent executes prompt → writes JSON to .agentfield_output.json
2. parse_and_validate() → Layer 1 (direct parse) → Layer 2 (cosmetic repair)
3. If validation fails:
   a. diagnose_output_failure() → "JSON parses but fails schema validation: missing field 'score'"
   b. build_followup_prompt() → "Error: ... Rewrite the COMPLETE, corrected JSON to: {path}"
   c. provider.execute(followup, {resume_session_id: session_id}) → agent fixes the file
   d. parse_and_validate() again → success or retry
4. Repeat up to schema_max_retries times (default 2)
5. On final failure: return HarnessResult with accumulated diagnostics

@santoshkumarradha santoshkumarradha requested review from a team and AbirAbbas as code owners March 4, 2026 12:48
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 4, 2026

Performance

SDK Memory Δ Latency Δ Tests Status
Python 9.3 KB +4% 0.34 µs -3%

✓ No regressions detected

…mpts

Add diagnose_output_failure() that classifies validation failures into
specific categories: file missing, empty, invalid JSON, or schema mismatch
with field-level diff. Enhance build_followup_prompt() to include schema
file references and explicit rewrite instructions for the retry loop.
Replace single-shot _handle_schema_output() with _handle_schema_with_retry()
that retries up to schema_max_retries times (default 2) when JSON validation
fails. Each retry:
  - Diagnoses the specific failure via diagnose_output_failure()
  - Sends a follow-up prompt to the agent with error context
  - For Claude: passes resume=session_id to continue the conversation
  - For CLI providers: fresh call with the follow-up prompt
  - Accumulates cost, turns, and messages across all attempts

This activates the previously dead-code build_followup_prompt() from _schema.py
and adds resume_session_id support to the Claude Code provider.
Standalone script exercising the harness with 5 escalating schema levels:
  - simple (2 fields), medium (lists + optionals), complex (13 nested fields),
    deeply_nested (recursive TreeNode), massive (>4K tokens, file-based path)
Tested live with both claude-code and codex providers — all levels pass.
Includes manual retry test mode (--retry-test) to exercise the new retry loop.
@santoshkumarradha
Copy link
Copy Markdown
Member Author

Superseded by #220 (combined harness PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant