Skip to content

feat: usability features — script steps, file tags, workflow resume, interrupts, and web dashboard#7

Merged
jrob5756 merged 31 commits intomainfrom
feat/usability-features
Feb 26, 2026
Merged

feat: usability features — script steps, file tags, workflow resume, interrupts, and web dashboard#7
jrob5756 merged 31 commits intomainfrom
feat/usability-features

Conversation

@jrob5756
Copy link
Copy Markdown
Collaborator

Summary

Large feature branch adding five major usability features to Conductor, each implemented as a series of epics with full test coverage.

1. Script Execution Steps (type: script)

  • Run shell commands as workflow steps with command + args
  • Captures stdout, stderr, exit_code into workflow context
  • Jinja2-rendered working_dir, configurable timeout_seconds
  • Route on exit_code for conditional branching

2. External File References (!file tag)

  • YAML !file tag resolves external files inline during loading
  • Supports .md, .yaml, .json, .txt with cycle detection
  • Environment variable interpolation in referenced files

3. Workflow Resume (Checkpoint & Restore)

  • Automatic checkpointing of workflow state after each agent
  • conductor resume CLI command to pick up from last checkpoint
  • Full serialization of WorkflowContext and LimitEnforcer
  • Session-based checkpoint management with cleanup

4. Mid-Agent Interrupt System

  • Background keyboard listener (Ctrl+G) for interrupt injection
  • Between-agent gates and mid-agent guidance injection
  • Copilot and Claude provider support for interrupt handling
  • Rich terminal UI for interrupt interaction
  • Activity-aware idle detection with automatic recovery

5. Real-Time Web Dashboard

  • --web flag launches FastAPI + WebSocket dashboard alongside workflow
  • --web-bg forks to background, prints URL, and exits
  • React + Vite + React Flow frontend with DAG visualization
  • Live node status updates, edge animations, activity streaming
  • Detail panel with prompt/output inspection and cost tracking
  • Node tooltips, conditional edge labels, fit-view controls
  • Unread badges on log/activity tabs, activity stream filtering
  • Non-blocking human gate prompts (asyncio.to_thread)

6. Supporting Improvements

  • Pub/sub event system (WorkflowEventEmitter) decoupling engine from renderers
  • Enhanced exception hierarchy
  • Provider registry for multi-provider workflows
  • Comprehensive test suite (20+ new test files, ~8000 lines of tests)
  • Documentation updates and reorganization

Stats

  • 30 commits across 127 files
  • +25,048 / -1,729 lines
  • Full test coverage for all new features

Testing

make test          # all tests pass
make lint          # clean
make typecheck     # clean

Jason Robert and others added 30 commits February 24, 2026 18:16
…or, thread-safety docs

- Fix _file_stack from set to list to preserve insertion order in cycle detection
- Use append/pop instead of add/discard for proper LIFO tracking
- Use list initializations in load() and load_string() instead of sets
- Remove sorted() from cycle chain display (list preserves order naturally)
- Add UnicodeDecodeError handling with clear 'not valid UTF-8' message
- Add TestFileTagNonUtf8 test class with test_non_utf8_file_raises_configuration_error
- Clarify _create_file_tag_constructor_class docstring: cross-instance isolation only

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…low-syntax.md

- Add 'External File References' entry to Table of Contents
- Add comprehensive section after 'Tools' with syntax, content-type
  detection, path resolution (with directory tree diagram), and
  load_string() behavior notes
- Add four usage examples: prompt from .md, structured output schema
  from .yaml, tool list from external file, nested inclusion pattern
- Add Environment Variables subsection explaining ${VAR} resolution
- Add Error Handling subsection with example ConfigurationError messages
- Add Limitations subsection (UTF-8 only, no globs, no URLs, etc.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Epic 1: Extended AgentDef with type='script', command/args/env/working_dir/timeout
fields, field/model validators for mutual exclusivity, and cross-reference validators
rejecting scripts in parallel/for_each groups.

Epic 2: Created ScriptExecutor with asyncio subprocess execution, Jinja2 template
rendering, per-script timeout handling, env merging, and FileNotFoundError/OSError
handling.

Epic 3: Wired ScriptExecutor into WorkflowEngine's main run loop with context
storage, iteration tracking, route evaluation, and workflow-level timeout enforcement
via _execute_script() helper.

Epic 4: Created examples/script-step.yaml demonstrating script → route → agent
pattern using simpleeval exit_code routing.

All 1192 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- docs/workflow-syntax.md: update type field to include 'script', add Script Steps
  section documenting command/args/env/working_dir/timeout fields, output structure
  (stdout/stderr/exit_code), routing patterns, and restrictions (no parallel/for_each)
- AGENTS.md: add ScriptExecutor to executor package description and workflow
  execution flow
- README.md: add script steps to feature list and script-step.yaml to examples table
- executor/script.py: clarify _verbose_log deferred-import comment
- tests/test_executor/test_script.py: add test for working_dir with Jinja2 template
  and test documenting that env values are not rendered through Jinja2
- tests/test_engine/test_script_workflow.py: add negative integration test confirming
  script steps in parallel groups raise ConfigurationError at validation time

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…AGENTS.md numbering

- Render working_dir through template engine in ScriptExecutor for consistency
  with command/args rendering (instead of passing raw string to subprocess)
- Update test_working_dir_with_jinja2_template to test actual rendering behavior
  (remove manual pre-rendering workaround)
- Fix duplicate step number '6.' in AGENTS.md Workflow Execution Flow section
  (renumber to 7. and 8.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Move plan docs (file-tag, script-execution, logging-redesign) into
  docs/projects/usability-features/
- Add usability-features.brainstorm.md consolidating feature plans
- Remove outdated architecture-decisions.md, planned-features.md,
  and brainstorming docs (caching, checkpointing, cost-tracking)
…t and LimitEnforcer

- Add WorkflowContext.to_dict() serializing workflow_inputs, agent_outputs,
  current_iteration, and execution_history with deep copies
- Add WorkflowContext.from_dict(data) classmethod reconstructing context
  from serialized dict with deep copy isolation
- Add LimitEnforcer.to_dict() serializing current_iteration, max_iterations,
  and execution_history (excludes transient start_time, current_agent)
- Add LimitEnforcer.from_dict(data, timeout_seconds) classmethod with
  restored iteration state, fresh start_time, and config-supplied timeout
- Add 34 tests in tests/test_engine/test_context_serialization.py covering
  round-trips, edge cases, JSON serializability, and behavioral integration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Extract main execution loop into _execute_loop() shared by run() and resume()
- Save checkpoints on ConductorError, KeyboardInterrupt, and Exception
- Add _save_checkpoint_on_failure() helper that never raises
- Add resume() method that resets timeout and enters loop at specified agent
- Add set_context() and set_limits() for external state restoration
- Add workflow_path parameter to WorkflowEngine.__init__()
- Track _current_agent_name and _last_checkpoint_path as instance variables
- Change CheckpointManager.save_checkpoint() error param to BaseException
- Add 14-test suite covering all acceptance criteria in test_resume.py
- Update workflow-resume.plan.md: Epic 3 marked DONE

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… and remove weak hash-mismatch test

- Replace 30-line inline MCP server building block in run_workflow_async() with
  single call to await _build_mcp_servers(config), completing the helper extraction
  and eliminating code duplication between run_workflow_async() and resume_workflow_async()
- Remove test_hash_mismatch_warning which only asserted mock_resume.called without
  exercising the actual warning logic; test_hash_mismatch_warning_in_resume_async
  already provides genuine coverage by mocking at the engine level

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add _session_ids tracking and get_session_ids() to CopilotProvider
- Add set_resume_session_ids() and resume_session() attempt with fallback
- Wire session ID collection into WorkflowEngine checkpoint save
- Wire session ID restoration in resume_workflow_async() via ProviderRegistry
- Add unit tests for session ID tracking and resume fallback (11 tests)
- Fix redundant except clause: (RuntimeError, Exception) -> Exception
- Add checkpoint CLI commands and CheckpointError exception
- Add checkpoint unit tests
- Update usability brainstorm with workflow resume design

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Check last_activity_ref timestamp before declaring session stuck;
  if events arrived within the idle timeout window, reset and keep
  waiting instead of sending disruptive recovery prompts
- Reset recovery_attempts counter when new activity is detected,
  giving each task (tool call, reasoning step) its own budget of
  max_recovery_attempts rather than sharing across the full session
- Fix Rich Text.append() crashes when SDK returns None for event
  name attributes (use `or "unknown"` + str() wrapping)
- Add tests for active-session bypass and counter reset behavior
- Redesign KeyboardListener to use dedicated daemon reader thread
  delivering bytes via asyncio.Queue, eliminating run_in_executor
  thread leaks; wait_for timeouts on queue.get() are cleanly cancellable
- Fix Esc hint visibility: use _verbose_console.print() instead of
  verbose_log() so hint always displays regardless of --verbose flag
- Replace tautological TestListenerCreation tests with integration-style
  tests that call real run_workflow_async() with mocked dependencies
- Add TestReaderThread test class for the new reader thread architecture
- Remove TestQueueGetBlocking (no longer applicable with asyncio.Queue)
- Update all listener detection tests to feed bytes via
  asyncio.Queue.put_nowait() instead of mocking _read_byte_blocking

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Escape untrusted content in Rich panel using rich.markup.escape() to
  prevent markup injection in output previews and accumulated guidance
- Strip guidance text whitespace before storing in InterruptResult to
  prevent whitespace injection into subsequent agent prompts
- Add tests: test_panel_escapes_rich_markup_in_output_preview,
  test_panel_escapes_rich_markup_in_guidance,
  test_continue_guidance_is_stripped (35 tests total)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add user_guidance field, add_guidance(), get_guidance_prompt_section() to WorkflowContext
- Update to_dict()/from_dict() with backward-compatible user_guidance serialization
- Add optional guidance_section parameter to AgentExecutor.execute()
- Wire guidance retrieval in WorkflowEngine._execute_loop() for regular agents
- Add 8 tests for WorkflowContext guidance methods in test_context.py
- Add 6 tests for executor guidance injection in test_agent_guidance.py
- Update test_context_serialization.py to include user_guidance in empty context dict

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wire interrupt event check into _execute_loop() and handle all InterruptResult actions.

- Add InterruptHandler instance creation in WorkflowEngine.__init__() with skip_gates
- Add _get_top_level_agent_names() helper method
- Add _check_interrupt() async method: checks event, clears it, builds output preview, delegates to InterruptHandler
- Add _handle_interrupt_result() async method: match/case for CONTINUE, SKIP, STOP, CANCEL
- Insert interrupt check at end of while loop body (after route evaluation) covering regular agents, parallel groups, and for-each groups
- Insert interrupt check before continue in script step path
- Backward compatible: interrupt_event=None short-circuits immediately
- 25 integration tests covering all actions, guidance accumulation, skip routing, checkpoint save, parallel group deferral

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix 4 review issues from previous implementation:

- FakeSession done_event: add optional done_event param; _deliver_post_abort
  sets it on completion so post-abort wait resolves immediately in tests
- RuntimeWarning: use close_coro_and_raise side_effect to properly close the
  coroutine before raising TimeoutError in test_post_abort_timeout
- send_followup model field: add model=self._default_model to AgentOutput
  constructor, consistent with all other execution paths
- _abort_supported guard: early-return in _abort_session when flag is False
  (strict identity check), making the flag functional not merely diagnostic

Add test_abort_skipped_when_previously_unsupported and assertion for model
field in test_send_followup_sends_guidance. All 24 tests pass in 0.42s.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add interrupt_signal parameter to _execute_agentic_loop() with check
  at the top of each while loop iteration; if set, clear event and call
  _request_partial_output() to send a user message requesting emit_output
- Change _execute_agentic_loop return type to 3-tuple (response, tokens, is_partial)
- Update _execute_with_retry() to forward interrupt_signal and handle
  partial output (skip schema validation, return AgentOutput(partial=True))
- Update execute() to forward interrupt_signal through the call chain
- Add _request_partial_output() helper that appends a user message asking
  Claude to call emit_output with best partial results
- Add 17 tests in test_claude_interrupt.py covering interrupt detection,
  user message format, partial output parsing, schema validation skip,
  signal clearing, token accounting, guidance injection, and fresh
  conversation semantics

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add WorkflowEvent frozen dataclass with type, timestamp, data fields and to_dict() method
- Add WorkflowEventEmitter with subscribe(), unsubscribe(), emit() using threading.Lock
- Callback exception isolation: failing callbacks are logged but don't block others
- emit() snapshots subscriber list under lock before iterating
- Comprehensive unit tests covering all acceptance criteria

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Narrow script try/except in workflow.py to only wrap _execute_script()
  so post-processing (record_execution, check_timeout, evaluate_routes,
  check_interrupt) propagates naturally without emitting spurious script_failed

- Add TestGateEvents class to test_event_emission.py with 3 tests:
  test_gate_presented_and_resolved, test_gate_resolved_to_end,
  test_gate_event_ordering — covering gate event emission, $end routing
  through gates, and event ordering

- Test count increased from 19 to 22; all 761 tests pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add cancelled() guard in start() polling loop before calling .exception()
  to prevent CancelledError propagation when serve task is cancelled during startup
- Raise RuntimeError('Server task was cancelled before starting') in that path
- Rewrite test_start_raises_on_server_failure to call await dashboard.start()
  with unittest.mock.patch.object mocking uvicorn.Server.serve to raise OSError,
  validating the actual production code path instead of inlining polling logic
- Add test_start_raises_on_cancelled_task covering the new cancelled() guard path

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix 6 review bugs in index.html:
- BUG: Parallel groups now counted as single units in agentsTotal (removed
  child agent names from agentNames, added group name instead; groupAgents
  set prevents duplicate node creation in agent loop)
- BUG: For-each group names added to agentNames so they are counted in
  agentsTotal, fixing '1/0 agents' display for for-each-only workflows
- BUG: parallel_completed now uses server-authoritative data.failure_count
  instead of local groupProgress counter, consistent with for_each_completed
  and replay-safe
- CODE QUALITY: workflowFailure variable moved to state declarations block,
  eliminating reliance on var hoisting
- UX: workflow_failed handler calls setNodeState(data.agent_name, 'failed')
  to visually mark the running agent as failed on workflow failure
- RELIABILITY: CDN scripts pinned to specific patch versions
  (cytoscape@3.30.4, dagre@0.8.5, cytoscape-dagre@2.5.0)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add --web, --web-port, --web-bg CLI flags to the run command, wire up
emitter and dashboard lifecycle in run_workflow_async(), and add the web
optional dependency extra to pyproject.toml.

- Add [project.optional-dependencies] with web extra (fastapi, uvicorn,
  websockets) using PEP 621 syntax for pip extras compatibility
- Add --web, --web-port, --web-bg Typer options to run command in app.py
- Update run_workflow_async() with keyword-only web/web_port/web_bg params
- Create WorkflowEventEmitter and pass to WorkflowEngine(event_emitter=)
- Lazy-import WebDashboard with try/except ImportError giving actionable
  error message (pip install conductor-cli[web]) and exit code 1
- Non-fatal dashboard startup: wrap start() in try/except, warn and continue
- Print dashboard URL to stderr regardless of --silent/--quiet
- Post-execution lifecycle: --web-bg calls wait_for_clients_disconnect(),
  default --web blocks on asyncio.Event().wait() with Ctrl+C messaging
- Always call dashboard.stop() in finally block for cleanup
- Use contextlib.suppress(asyncio.CancelledError) per ruff SIM105
- Add 9 tests in test_web_flags.py covering flag acceptance, dependency
  check, and startup failure handling

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace blocking sys.stdin.buffer.read(1) in the keyboard listener's
reader thread with select()-based polling (100ms timeout), allowing the
thread to check _stop_flag and exit cleanly. Join the reader thread in
stop() to ensure it releases the stdin lock before interpreter
finalization, preventing the "could not acquire lock" fatal error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ps required

- Stream SDK events (reasoning, tool calls, messages) from providers through
  the event callback chain to the web dashboard in real-time
- Add agent detail panel to dashboard with rendered prompt, activity stream
  (reasoning/tool calls/turn markers), and enriched output (input/output tokens)
- Add --web-bg flag as standalone background mode that forks a detached child
  process, prints the dashboard URL, and exits the CLI immediately
- Make web dependencies (fastapi, uvicorn, websockets) required instead of
  optional — remove [web] extras group and import guard
- Disable interactive interrupt (Esc guidance) when running in --web mode
- Update all documentation: README, CLI reference, AGENTS.md, examples,
  skill references

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Allow users to manually reposition nodes when the automatic dagre
layout doesn't produce an ideal arrangement.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Multi-feature workflow for visually testing the web dashboard.
Exercises script steps, parallel groups, conditional routing,
loop-back patterns, human gates, and for-each dynamic parallel.
- Format workflow.py to pass ruff formatter check
- Wrap blocking Rich Prompt.ask/IntPrompt.ask in typed helper
  functions to satisfy ty's strict DefaultType checking when used
  with asyncio.to_thread()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jrob5756 jrob5756 merged commit f2791ff into main Feb 26, 2026
7 checks passed
@jrob5756 jrob5756 deleted the feat/usability-features branch February 26, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant