feat(harness): add schema validation retry loop with session continuity by santoshkumarradha · Pull Request #219 · Agent-Field/agentfield

santoshkumarradha · 2026-03-04T12:48:17Z

Summary

Add schema validation retry loop — when JSON output fails validation, the harness now retries up to schema_max_retries times (default 2) by sending follow-up prompts with diagnostic error context
Add output failure diagnosis — diagnose_output_failure() classifies failures into specific categories: file missing, empty, invalid JSON, or schema validation error with field-level diff
Add Claude session continuity — on retry, passes resume=session_id to the Claude Code provider so the agent continues the same conversation (preserving full context of the first attempt)

Problem

The harness had a structural gap: build_followup_prompt() existed in _schema.py but was never called from _runner.py. When complex JSON schemas failed validation (truncated output, missing fields, wrong types), the harness returned an error immediately with zero recovery. The only retry was for transient network errors (rate limits, timeouts), not for schema validation failures.

What Changed

`sdk/python/agentfield/harness/_schema.py`

Added diagnose_output_failure(file_path, schema) — reads the output file and classifies the failure mode with actionable detail (parse error location, expected vs actual top-level keys)
Enhanced build_followup_prompt() — now includes schema file reference (for large schemas) and explicit JSON rewrite instructions

`sdk/python/agentfield/harness/_runner.py`

Replaced _handle_schema_output() with _handle_schema_with_retry():
- After initial execution, if parse_and_validate() returns None, enters retry loop
- Each retry: diagnoses failure → builds follow-up prompt → calls provider again → re-validates
- Passes resume_session_id so Claude continues the same conversation
- Accumulates costs, turns, and messages across all attempts
- Configurable via schema_max_retries option (default: 2)
Added _accumulate_metrics() helper for multi-attempt cost/turn tracking
Added schema_max_retries to the resolved options list

`sdk/python/agentfield/harness/providers/claude.py`

When resume_session_id is in options, passes resume=session_id to ClaudeAgentOptions

`sdk/python/tests/debug_complex_json.py` (new)

Standalone debug script with 5 escalating schema complexity levels (simple → massive >4K tokens)
Tests both inline and file-based schema paths
Includes --retry-test mode to exercise the retry loop with manual follow-up prompts
Tested live with both claude-code and codex providers

Testing

651 unit tests pass (0 failures, 0 regressions)
Live tests with real agents: all 5 schema levels pass with both Claude Code and Codex
- simple (2 fields): ✅ 10s
- complex (13 nested fields, enums, lists of objects): ✅ ~105s
- deeply_nested (recursive TreeNode): ✅ ~61s
- massive (>4K tokens, file-based schema path): ✅ ~160s

# Run the debug script
cd sdk/python
.venv/bin/python tests/debug_complex_json.py --provider claude-code --test complex
.venv/bin/python tests/debug_complex_json.py --provider codex --test massive
.venv/bin/python tests/debug_complex_json.py --provider claude-code --test all --retry-test

Retry Flow

1. Agent executes prompt → writes JSON to .agentfield_output.json
2. parse_and_validate() → Layer 1 (direct parse) → Layer 2 (cosmetic repair)
3. If validation fails:
   a. diagnose_output_failure() → "JSON parses but fails schema validation: missing field 'score'"
   b. build_followup_prompt() → "Error: ... Rewrite the COMPLETE, corrected JSON to: {path}"
   c. provider.execute(followup, {resume_session_id: session_id}) → agent fixes the file
   d. parse_and_validate() again → success or retry
4. Repeat up to schema_max_retries times (default 2)
5. On final failure: return HarnessResult with accumulated diagnostics

github-actions · 2026-03-04T12:49:41Z

Performance

SDK	Memory	Δ	Latency	Δ	Tests	Status
Python	9.3 KB	+4%	0.34 µs	-3%	✓	✓

✓ No regressions detected

…mpts Add diagnose_output_failure() that classifies validation failures into specific categories: file missing, empty, invalid JSON, or schema mismatch with field-level diff. Enhance build_followup_prompt() to include schema file references and explicit rewrite instructions for the retry loop.

Replace single-shot _handle_schema_output() with _handle_schema_with_retry() that retries up to schema_max_retries times (default 2) when JSON validation fails. Each retry: - Diagnoses the specific failure via diagnose_output_failure() - Sends a follow-up prompt to the agent with error context - For Claude: passes resume=session_id to continue the conversation - For CLI providers: fresh call with the follow-up prompt - Accumulates cost, turns, and messages across all attempts This activates the previously dead-code build_followup_prompt() from _schema.py and adds resume_session_id support to the Claude Code provider.

Standalone script exercising the harness with 5 escalating schema levels: - simple (2 fields), medium (lists + optionals), complex (13 nested fields), deeply_nested (recursive TreeNode), massive (>4K tokens, file-based path) Tested live with both claude-code and codex providers — all levels pass. Includes manual retry test mode (--retry-test) to exercise the new retry loop.

santoshkumarradha · 2026-03-04T15:36:55Z

Superseded by #220 (combined harness PR).

santoshkumarradha requested review from a team and AbirAbbas as code owners March 4, 2026 12:48

santoshkumarradha added 3 commits March 4, 2026 18:24

santoshkumarradha force-pushed the fix/harness-json-retry branch from 62cc883 to 783894f Compare March 4, 2026 12:55

santoshkumarradha mentioned this pull request Mar 4, 2026

feat(harness): OpenCode support with schema retry, error preservation, and project_dir routing #220

Merged

santoshkumarradha closed this Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(harness): add schema validation retry loop with session continuity#219

feat(harness): add schema validation retry loop with session continuity#219
santoshkumarradha wants to merge 3 commits intomainfrom
fix/harness-json-retry

santoshkumarradha commented Mar 4, 2026

Uh oh!

github-actions Bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

santoshkumarradha commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

santoshkumarradha commented Mar 4, 2026

Summary

Problem

What Changed

sdk/python/agentfield/harness/_schema.py

sdk/python/agentfield/harness/_runner.py

sdk/python/agentfield/harness/providers/claude.py

sdk/python/tests/debug_complex_json.py (new)

Testing

Retry Flow

Uh oh!

github-actions Bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Uh oh!

santoshkumarradha commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`sdk/python/agentfield/harness/_schema.py`

`sdk/python/agentfield/harness/_runner.py`

`sdk/python/agentfield/harness/providers/claude.py`

`sdk/python/tests/debug_complex_json.py` (new)

github-actions Bot commented Mar 4, 2026 •

edited

Loading