Expose max_session_seconds in workflow YAML and add permission-denied fast-fail

## Summary

When a Copilot session gets stuck (e.g., due to the race condition in #27, or any other permission/connectivity issue), agents spin uselessly for the full **1800s (30 min) hardcoded timeout** before failing. In a `for_each` group with 10 items, a single stuck agent can turn a 13-minute workflow into a 60-minute timeout.

Two mitigations would dramatically reduce blast radius:

## 1. Expose `max_session_seconds` in workflow YAML

`IdleRecoveryConfig.max_session_seconds` is currently hardcoded to 1800s and only settable via Python constructor. Workflow authors should be able to tune this per-workflow or per-agent.

**Proposed YAML schema:**

```yaml
workflow:
  runtime:
    provider: copilot
    max_session_seconds: 300  # workflow-level default
```

Or per-agent:

```yaml
agents:
  - name: source_gatherer
    max_session_seconds: 120  # this agent should finish in ~60s
```

For `for_each` groups, the per-item agent timeout is especially important — a source-gathering agent that takes 30 minutes is certainly stuck, not working.

**Implementation:** Plumb the value through `create_provider()` in `factory.py` → `CopilotProvider.__init__()` → `IdleRecoveryConfig(max_session_seconds=...)`.

## 2. Detect permission-denied loops and fail fast

When every tool call returns "Permission denied", the agent is in an unrecoverable state — no amount of retrying will fix a missing session registration or a policy denial. Currently the agent keeps trying different tools, spawning sub-agents, and rephrasing requests for the full session timeout.

**Proposed behavior:** If an agent receives "Permission denied" (or the full string "Permission denied and could not request permission from user") on N consecutive tool results (e.g., N=5), Conductor should kill the session immediately with a clear `ProviderError` rather than waiting for `max_session_seconds`.

**Implementation options:**
- In `_send_and_wait()` or the event callback in `copilot.py`, track consecutive tool results containing the permission-denied string
- After N consecutive denials, raise `ProviderError("All tool calls denied — possible permission configuration issue")` with `retryable=False`
- This could be an `IdleRecoveryConfig` option: `max_consecutive_denials: int = 5`

## Impact

With both changes, a stuck agent in a for-each group would fail in ~30s instead of ~1800s, keeping total workflow runtime close to the healthy baseline even when the race condition in #27 is hit.

## Related

- #27 — Root cause race condition in SDK causing the permission denials

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose max_session_seconds in workflow YAML and add permission-denied fast-fail #28

Summary

1. Expose `max_session_seconds` in workflow YAML

2. Detect permission-denied loops and fail fast

Impact

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expose max_session_seconds in workflow YAML and add permission-denied fast-fail #28

Description

Summary

1. Expose max_session_seconds in workflow YAML

2. Detect permission-denied loops and fail fast

Impact

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Expose `max_session_seconds` in workflow YAML