feat: usability features — script steps, file tags, workflow resume, interrupts, and web dashboard by jrob5756 · Pull Request #7 · microsoft/conductor

jrob5756 · 2026-02-26T14:32:58Z

Summary

Large feature branch adding five major usability features to Conductor, each implemented as a series of epics with full test coverage.

1. Script Execution Steps (`type: script`)

Run shell commands as workflow steps with command + args
Captures stdout, stderr, exit_code into workflow context
Jinja2-rendered working_dir, configurable timeout_seconds
Route on exit_code for conditional branching

2. External File References (`!file` tag)

YAML !file tag resolves external files inline during loading
Supports .md, .yaml, .json, .txt with cycle detection
Environment variable interpolation in referenced files

3. Workflow Resume (Checkpoint & Restore)

Automatic checkpointing of workflow state after each agent
conductor resume CLI command to pick up from last checkpoint
Full serialization of WorkflowContext and LimitEnforcer
Session-based checkpoint management with cleanup

4. Mid-Agent Interrupt System

Background keyboard listener (Ctrl+G) for interrupt injection
Between-agent gates and mid-agent guidance injection
Copilot and Claude provider support for interrupt handling
Rich terminal UI for interrupt interaction
Activity-aware idle detection with automatic recovery

5. Real-Time Web Dashboard

--web flag launches FastAPI + WebSocket dashboard alongside workflow
--web-bg forks to background, prints URL, and exits
React + Vite + React Flow frontend with DAG visualization
Live node status updates, edge animations, activity streaming
Detail panel with prompt/output inspection and cost tracking
Node tooltips, conditional edge labels, fit-view controls
Unread badges on log/activity tabs, activity stream filtering
Non-blocking human gate prompts (asyncio.to_thread)

6. Supporting Improvements

Pub/sub event system (WorkflowEventEmitter) decoupling engine from renderers
Enhanced exception hierarchy
Provider registry for multi-provider workflows
Comprehensive test suite (20+ new test files, ~8000 lines of tests)
Documentation updates and reorganization

Stats

30 commits across 127 files
+25,048 / -1,729 lines
Full test coverage for all new features

Testing

make test          # all tests pass
make lint          # clean
make typecheck     # clean

…or, thread-safety docs - Fix _file_stack from set to list to preserve insertion order in cycle detection - Use append/pop instead of add/discard for proper LIFO tracking - Use list initializations in load() and load_string() instead of sets - Remove sorted() from cycle chain display (list preserves order naturally) - Add UnicodeDecodeError handling with clear 'not valid UTF-8' message - Add TestFileTagNonUtf8 test class with test_non_utf8_file_raises_configuration_error - Clarify _create_file_tag_constructor_class docstring: cross-instance isolation only Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…low-syntax.md - Add 'External File References' entry to Table of Contents - Add comprehensive section after 'Tools' with syntax, content-type detection, path resolution (with directory tree diagram), and load_string() behavior notes - Add four usage examples: prompt from .md, structured output schema from .yaml, tool list from external file, nested inclusion pattern - Add Environment Variables subsection explaining ${VAR} resolution - Add Error Handling subsection with example ConfigurationError messages - Add Limitations subsection (UTF-8 only, no globs, no URLs, etc.) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Epic 1: Extended AgentDef with type='script', command/args/env/working_dir/timeout fields, field/model validators for mutual exclusivity, and cross-reference validators rejecting scripts in parallel/for_each groups. Epic 2: Created ScriptExecutor with asyncio subprocess execution, Jinja2 template rendering, per-script timeout handling, env merging, and FileNotFoundError/OSError handling. Epic 3: Wired ScriptExecutor into WorkflowEngine's main run loop with context storage, iteration tracking, route evaluation, and workflow-level timeout enforcement via _execute_script() helper. Epic 4: Created examples/script-step.yaml demonstrating script → route → agent pattern using simpleeval exit_code routing. All 1192 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- docs/workflow-syntax.md: update type field to include 'script', add Script Steps section documenting command/args/env/working_dir/timeout fields, output structure (stdout/stderr/exit_code), routing patterns, and restrictions (no parallel/for_each) - AGENTS.md: add ScriptExecutor to executor package description and workflow execution flow - README.md: add script steps to feature list and script-step.yaml to examples table - executor/script.py: clarify _verbose_log deferred-import comment - tests/test_executor/test_script.py: add test for working_dir with Jinja2 template and test documenting that env values are not rendered through Jinja2 - tests/test_engine/test_script_workflow.py: add negative integration test confirming script steps in parallel groups raise ConfigurationError at validation time Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…AGENTS.md numbering - Render working_dir through template engine in ScriptExecutor for consistency with command/args rendering (instead of passing raw string to subprocess) - Update test_working_dir_with_jinja2_template to test actual rendering behavior (remove manual pre-rendering workaround) - Fix duplicate step number '6.' in AGENTS.md Workflow Execution Flow section (renumber to 7. and 8.) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Move plan docs (file-tag, script-execution, logging-redesign) into docs/projects/usability-features/ - Add usability-features.brainstorm.md consolidating feature plans - Remove outdated architecture-decisions.md, planned-features.md, and brainstorming docs (caching, checkpointing, cost-tracking)

…t and LimitEnforcer - Add WorkflowContext.to_dict() serializing workflow_inputs, agent_outputs, current_iteration, and execution_history with deep copies - Add WorkflowContext.from_dict(data) classmethod reconstructing context from serialized dict with deep copy isolation - Add LimitEnforcer.to_dict() serializing current_iteration, max_iterations, and execution_history (excludes transient start_time, current_agent) - Add LimitEnforcer.from_dict(data, timeout_seconds) classmethod with restored iteration state, fresh start_time, and config-supplied timeout - Add 34 tests in tests/test_engine/test_context_serialization.py covering round-trips, edge cases, JSON serializability, and behavioral integration Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Extract main execution loop into _execute_loop() shared by run() and resume() - Save checkpoints on ConductorError, KeyboardInterrupt, and Exception - Add _save_checkpoint_on_failure() helper that never raises - Add resume() method that resets timeout and enters loop at specified agent - Add set_context() and set_limits() for external state restoration - Add workflow_path parameter to WorkflowEngine.__init__() - Track _current_agent_name and _last_checkpoint_path as instance variables - Change CheckpointManager.save_checkpoint() error param to BaseException - Add 14-test suite covering all acceptance criteria in test_resume.py - Update workflow-resume.plan.md: Epic 3 marked DONE Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… and remove weak hash-mismatch test - Replace 30-line inline MCP server building block in run_workflow_async() with single call to await _build_mcp_servers(config), completing the helper extraction and eliminating code duplication between run_workflow_async() and resume_workflow_async() - Remove test_hash_mismatch_warning which only asserted mock_resume.called without exercising the actual warning logic; test_hash_mismatch_warning_in_resume_async already provides genuine coverage by mocking at the engine level Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add _session_ids tracking and get_session_ids() to CopilotProvider - Add set_resume_session_ids() and resume_session() attempt with fallback - Wire session ID collection into WorkflowEngine checkpoint save - Wire session ID restoration in resume_workflow_async() via ProviderRegistry - Add unit tests for session ID tracking and resume fallback (11 tests) - Fix redundant except clause: (RuntimeError, Exception) -> Exception - Add checkpoint CLI commands and CheckpointError exception - Add checkpoint unit tests - Update usability brainstorm with workflow resume design Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Check last_activity_ref timestamp before declaring session stuck; if events arrived within the idle timeout window, reset and keep waiting instead of sending disruptive recovery prompts - Reset recovery_attempts counter when new activity is detected, giving each task (tool call, reasoning step) its own budget of max_recovery_attempts rather than sharing across the full session - Fix Rich Text.append() crashes when SDK returns None for event name attributes (use `or "unknown"` + str() wrapping) - Add tests for active-session bypass and counter reset behavior

- Redesign KeyboardListener to use dedicated daemon reader thread delivering bytes via asyncio.Queue, eliminating run_in_executor thread leaks; wait_for timeouts on queue.get() are cleanly cancellable - Fix Esc hint visibility: use _verbose_console.print() instead of verbose_log() so hint always displays regardless of --verbose flag - Replace tautological TestListenerCreation tests with integration-style tests that call real run_workflow_async() with mocked dependencies - Add TestReaderThread test class for the new reader thread architecture - Remove TestQueueGetBlocking (no longer applicable with asyncio.Queue) - Update all listener detection tests to feed bytes via asyncio.Queue.put_nowait() instead of mocking _read_byte_blocking Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Escape untrusted content in Rich panel using rich.markup.escape() to prevent markup injection in output previews and accumulated guidance - Strip guidance text whitespace before storing in InterruptResult to prevent whitespace injection into subsequent agent prompts - Add tests: test_panel_escapes_rich_markup_in_output_preview, test_panel_escapes_rich_markup_in_guidance, test_continue_guidance_is_stripped (35 tests total) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add user_guidance field, add_guidance(), get_guidance_prompt_section() to WorkflowContext - Update to_dict()/from_dict() with backward-compatible user_guidance serialization - Add optional guidance_section parameter to AgentExecutor.execute() - Wire guidance retrieval in WorkflowEngine._execute_loop() for regular agents - Add 8 tests for WorkflowContext guidance methods in test_context.py - Add 6 tests for executor guidance injection in test_agent_guidance.py - Update test_context_serialization.py to include user_guidance in empty context dict Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Wire interrupt event check into _execute_loop() and handle all InterruptResult actions. - Add InterruptHandler instance creation in WorkflowEngine.__init__() with skip_gates - Add _get_top_level_agent_names() helper method - Add _check_interrupt() async method: checks event, clears it, builds output preview, delegates to InterruptHandler - Add _handle_interrupt_result() async method: match/case for CONTINUE, SKIP, STOP, CANCEL - Insert interrupt check at end of while loop body (after route evaluation) covering regular agents, parallel groups, and for-each groups - Insert interrupt check before continue in script step path - Backward compatible: interrupt_event=None short-circuits immediately - 25 integration tests covering all actions, guidance accumulation, skip routing, checkpoint save, parallel group deferral Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix 4 review issues from previous implementation: - FakeSession done_event: add optional done_event param; _deliver_post_abort sets it on completion so post-abort wait resolves immediately in tests - RuntimeWarning: use close_coro_and_raise side_effect to properly close the coroutine before raising TimeoutError in test_post_abort_timeout - send_followup model field: add model=self._default_model to AgentOutput constructor, consistent with all other execution paths - _abort_supported guard: early-return in _abort_session when flag is False (strict identity check), making the flag functional not merely diagnostic Add test_abort_skipped_when_previously_unsupported and assertion for model field in test_send_followup_sends_guidance. All 24 tests pass in 0.42s. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add interrupt_signal parameter to _execute_agentic_loop() with check at the top of each while loop iteration; if set, clear event and call _request_partial_output() to send a user message requesting emit_output - Change _execute_agentic_loop return type to 3-tuple (response, tokens, is_partial) - Update _execute_with_retry() to forward interrupt_signal and handle partial output (skip schema validation, return AgentOutput(partial=True)) - Update execute() to forward interrupt_signal through the call chain - Add _request_partial_output() helper that appends a user message asking Claude to call emit_output with best partial results - Add 17 tests in test_claude_interrupt.py covering interrupt detection, user message format, partial output parsing, schema validation skip, signal clearing, token accounting, guidance injection, and fresh conversation semantics Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add WorkflowEvent frozen dataclass with type, timestamp, data fields and to_dict() method - Add WorkflowEventEmitter with subscribe(), unsubscribe(), emit() using threading.Lock - Callback exception isolation: failing callbacks are logged but don't block others - emit() snapshots subscriber list under lock before iterating - Comprehensive unit tests covering all acceptance criteria Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Narrow script try/except in workflow.py to only wrap _execute_script() so post-processing (record_execution, check_timeout, evaluate_routes, check_interrupt) propagates naturally without emitting spurious script_failed - Add TestGateEvents class to test_event_emission.py with 3 tests: test_gate_presented_and_resolved, test_gate_resolved_to_end, test_gate_event_ordering — covering gate event emission, $end routing through gates, and event ordering - Test count increased from 19 to 22; all 761 tests pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add cancelled() guard in start() polling loop before calling .exception() to prevent CancelledError propagation when serve task is cancelled during startup - Raise RuntimeError('Server task was cancelled before starting') in that path - Rewrite test_start_raises_on_server_failure to call await dashboard.start() with unittest.mock.patch.object mocking uvicorn.Server.serve to raise OSError, validating the actual production code path instead of inlining polling logic - Add test_start_raises_on_cancelled_task covering the new cancelled() guard path Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix 6 review bugs in index.html: - BUG: Parallel groups now counted as single units in agentsTotal (removed child agent names from agentNames, added group name instead; groupAgents set prevents duplicate node creation in agent loop) - BUG: For-each group names added to agentNames so they are counted in agentsTotal, fixing '1/0 agents' display for for-each-only workflows - BUG: parallel_completed now uses server-authoritative data.failure_count instead of local groupProgress counter, consistent with for_each_completed and replay-safe - CODE QUALITY: workflowFailure variable moved to state declarations block, eliminating reliance on var hoisting - UX: workflow_failed handler calls setNodeState(data.agent_name, 'failed') to visually mark the running agent as failed on workflow failure - RELIABILITY: CDN scripts pinned to specific patch versions (cytoscape@3.30.4, dagre@0.8.5, cytoscape-dagre@2.5.0) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add --web, --web-port, --web-bg CLI flags to the run command, wire up emitter and dashboard lifecycle in run_workflow_async(), and add the web optional dependency extra to pyproject.toml. - Add [project.optional-dependencies] with web extra (fastapi, uvicorn, websockets) using PEP 621 syntax for pip extras compatibility - Add --web, --web-port, --web-bg Typer options to run command in app.py - Update run_workflow_async() with keyword-only web/web_port/web_bg params - Create WorkflowEventEmitter and pass to WorkflowEngine(event_emitter=) - Lazy-import WebDashboard with try/except ImportError giving actionable error message (pip install conductor-cli[web]) and exit code 1 - Non-fatal dashboard startup: wrap start() in try/except, warn and continue - Print dashboard URL to stderr regardless of --silent/--quiet - Post-execution lifecycle: --web-bg calls wait_for_clients_disconnect(), default --web blocks on asyncio.Event().wait() with Ctrl+C messaging - Always call dashboard.stop() in finally block for cleanup - Use contextlib.suppress(asyncio.CancelledError) per ruff SIM105 - Add 9 tests in test_web_flags.py covering flag acceptance, dependency check, and startup failure handling Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace blocking sys.stdin.buffer.read(1) in the keyboard listener's reader thread with select()-based polling (100ms timeout), allowing the thread to check _stop_flag and exit cleanly. Join the reader thread in stop() to ensure it releases the stdin lock before interpreter finalization, preventing the "could not acquire lock" fatal error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ps required - Stream SDK events (reasoning, tool calls, messages) from providers through the event callback chain to the web dashboard in real-time - Add agent detail panel to dashboard with rendered prompt, activity stream (reasoning/tool calls/turn markers), and enriched output (input/output tokens) - Add --web-bg flag as standalone background mode that forks a detached child process, prints the dashboard URL, and exits the CLI immediately - Make web dependencies (fastapi, uvicorn, websockets) required instead of optional — remove [web] extras group and import guard - Disable interactive interrupt (Esc guidance) when running in --web mode - Update all documentation: README, CLI reference, AGENTS.md, examples, skill references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Allow users to manually reposition nodes when the automatic dagre layout doesn't produce an ideal arrangement. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Multi-feature workflow for visually testing the web dashboard. Exercises script steps, parallel groups, conditional routing, loop-back patterns, human gates, and for-each dynamic parallel.

- Format workflow.py to pass ruff formatter check - Wrap blocking Rich Prompt.ask/IntPrompt.ask in typed helper functions to satisfy ty's strict DefaultType checking when used with asyncio.to_thread() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Jason Robert and others added 30 commits February 24, 2026 18:16

fix: break long line in script.py to satisfy ruff E501 lint rule

df5fa81

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: web ui docs

d065b91

feat(web): migrate dashboard to React + Vite, fix MCP configs

c37028c

feat(web): enable draggable nodes in workflow graph

c6762f1

Allow users to manually reposition nodes when the automatic dagre layout doesn't produce an ideal arrangement. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add web dashboard test workflow example

2b27355

Multi-feature workflow for visually testing the web dashboard. Exercises script steps, parallel groups, conditional routing, loop-back patterns, human gates, and for-each dynamic parallel.

feat(web): enhance dashboard with tooltips, cost tracking, and UX polish

14616ce

jrob5756 merged commit f2791ff into main Feb 26, 2026
7 checks passed

jrob5756 deleted the feat/usability-features branch February 26, 2026 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: usability features — script steps, file tags, workflow resume, interrupts, and web dashboard#7

feat: usability features — script steps, file tags, workflow resume, interrupts, and web dashboard#7
jrob5756 merged 31 commits intomainfrom
feat/usability-features

jrob5756 commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jrob5756 commented Feb 26, 2026

Summary

1. Script Execution Steps (type: script)

2. External File References (!file tag)

3. Workflow Resume (Checkpoint & Restore)

4. Mid-Agent Interrupt System

5. Real-Time Web Dashboard

6. Supporting Improvements

Stats

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Script Execution Steps (`type: script`)

2. External File References (`!file` tag)