Skip to content

Claude provider: agentic loop max_iterations=10 too low and not configurable #40

@franklixuefei

Description

@franklixuefei

Claude provider: agentic loop max_iterations=10 too low and not configurable

Problem

The Claude provider's _execute_agentic_loop() has a hardcoded max_iterations=10 default. This is the number of tool-use roundtrips a single agent can make before the provider throws a ProviderError. For any agent that needs to read multiple files, write code, and verify results, 10 iterations is far too few.

The Copilot provider has no equivalent iteration limit — it uses the SDK's built-in agentic loop with a 30-minute wall-clock timeout (max_session_seconds=1800) and a 5-minute idle timeout (idle_timeout_seconds=300). There is no cap on the number of tool calls.

Impact

Complex agents (e.g., a "coder" agent that reads a plan, explores the codebase, writes implementation code, and verifies changes) routinely need 20-40+ tool-use iterations. With the current limit of 10, these agents always fail with:

Agentic loop exceeded maximum iterations (10)
💡 Suggestion: The agent may be stuck in a tool-use loop. Check your MCP tools.

This makes the Claude provider unusable for any non-trivial coding workflow.

Root Cause

In conductor/providers/claude.py, _execute_agentic_loop():

async def _execute_agentic_loop(
    self,
    messages: list[dict[str, Any]],
    model: str,
    temperature: float | None,
    max_tokens: int,
    tools: list[dict[str, Any]] | None,
    output_schema: dict[str, OutputField] | None,
    has_output_schema: bool,
    max_iterations: int = 10,  # <-- hardcoded, not configurable
    ...
) -> tuple[ClaudeResponse, int | None, bool]:

The value is never overridden — the caller in _execute_with_retry() doesn't pass it, and there's no way to set it from the workflow YAML runtime config or per-agent config.

Suggested Fix

Option A: Make it configurable (preferred)

  1. Add a max_agent_iterations field to the Claude provider's runtime config (similar to max_tokens, temperature, etc.)
  2. Allow per-agent override via the agent config in the workflow YAML
  3. Set a reasonable default (e.g., 50) that matches the Copilot provider's practical behavior

Example workflow YAML:

runtime:
  provider: claude
  config:
    max_agent_iterations: 50  # or per-agent override

Option B: Match Copilot provider behavior

Replace the iteration count limit with a wall-clock timeout, matching the Copilot provider's IdleRecoveryConfig pattern:

  • max_session_seconds: 1800 (30 minutes per agent)
  • idle_timeout_seconds: 300 (5 minutes without API activity)

This would provide true feature parity.

Option C: Quick fix (minimum viable)

Just increase the default from 10 to 50. This covers most practical use cases without architectural changes.

Workaround

Patch the default locally:

sed -i 's/max_iterations: int = 10,/max_iterations: int = 50,/' \
  "$(python -c 'import conductor.providers.claude; print(conductor.providers.claude.__file__)')"

Environment

  • conductor-cli: installed from git+https://github.com/microsoft/conductor.git
  • Provider: claude
  • Workflow: implement.yaml (coder agent with filesystem MCP server)
  • The coder agent typically makes 20-40 tool calls per epic implementation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions