fix: cross-platform compatibility for ScriptExecutor and tests by lucioctinoco · Pull Request #10 · microsoft/conductor

lucioctinoco · 2026-03-01T21:27:19Z

Summary

Fixes 48 test failures when running the test suite on Windows by making the production code and tests fully cross-platform.

Problem

The existing code had several Unix-specific assumptions that caused failures on Windows:

ScriptExecutor used create_subprocess_exec which cannot find shell builtins (echo, true, false) on Windows these aren't standalone executables, they're built into the shell.
Tests used Unix-only commands (sh, printf, pwd, sleep, false, true) that don't exist on Windows.
Path handling: Path("/a") produces \a on Windows, breaking checkpoint serialization tests that expected forward slashes.
File encoding: open(path, "w") defaults to cp1252 on Windows, which can't encode the Unicode box-drawing characters used in Rich console output.
POSIX permissions: os.chmod(path, 0o444) doesn't prevent writes on NTFS, so permission-denied tests always passed.

Approach

Production: exec-first with shell fallback (`script.py`)

Rather than always using create_subprocess_shell (which introduces shell injection risk) or always create_subprocess_exec (which can't find shell builtins), the new _create_process() method uses a two-stage strategy:

Try create_subprocess_exec first this is the secure path with no shell injection surface. It works for standalone executables (python, node, curl, etc.).
On FileNotFoundError, fall back to create_subprocess_shell this handles shell builtins and commands that require shell features (pipes, redirection, globbing). Platform-aware quoting is used: shlex.join() on Unix, subprocess.list2cmdline() on Windows.

This preserves security for the common case (standalone executables) while maintaining compatibility with shell builtins and complex shell expressions.

Command-not-found detection

After shell fallback execution, the executor detects "command not found" errors by requiring both:

An expected exit code: 127 on Unix, {1, 9009} on Windows
A stderr pattern match: "not found" or "not recognized"

The dual requirement prevents false positives a script that legitimately exits with code 1 won't be misidentified as "command not found" unless its stderr also contains the diagnostic pattern.

Production: UTF-8 encoding (`run.py`)

Log files are now opened with encoding="utf-8" instead of the system default, preventing UnicodeEncodeError on Windows where cp1252 can't encode Rich's Unicode box-drawing characters.

Tests: true cross-platform independence

All Unix-only commands in tests have been replaced with sys.executable -c Python one-liners:

Before (Unix-only)	After (cross-platform)
`echo hello`	`sys.executable -c "print('hello')"`
`sh -c 'echo ...'`	`sys.executable -c "..."`
`printf 'text'`	`sys.executable -c "print('text', end='')"`
`pwd`	`sys.executable -c "import os; print(os.getcwd())"`
`sleep 10`	`sys.executable -c "import time; time.sleep(10)"`
`false`	`sys.executable -c "import sys; sys.exit(1)"`
`true`	`sys.executable -c "import sys; sys.exit(0)"`

Tests that only inspect config/plans without executing (e.g. TestScriptDryRun) still use echo since no subprocess is created.

Tests: path and encoding fixes

Checkpoint tests: Path("/a") PurePosixPath("/a") to guarantee forward-slash serialization on all platforms.
Logging tests: read_text() read_text(encoding="utf-8") to match the production write encoding.
Permission tests: Two chmod-based tests skipped on Windows with @pytest.mark.skipif(sys.platform == "win32") since NTFS ACLs don't honor POSIX mode bits.

Files Changed

File	Change
`src/conductor/executor/script.py`	Exec-first/shell-fallback via `_create_process()`, command-not-found detection
`src/conductor/cli/run.py`	`encoding="utf-8"` on log file open
`tests/test_executor/test_script.py`	All Unix commands `sys.executable`
`tests/test_engine/test_script_workflow.py`	`echo`/`true`/`false` `sys.executable`
`tests/test_engine/test_event_emission.py`	`echo` `sys.executable`
`tests/test_engine/test_checkpoint.py`	`Path` `PurePosixPath` for test data
`tests/test_cli/test_logging.py`	UTF-8 encoding on `read_text()`, skip chmod tests on Windows

Testing

Full test suite passes on Windows: 1,539 passed, 12 skipped, 0 failures.

lucioctinoco · 2026-03-01T21:35:47Z

@microsoft-github-policy-service agree company="Microsoft"

lucioctinoco · 2026-03-01T21:46:03Z

@jrob5756 could you please review this PR when you get a chance? Thanks!

jrob5756

Review Summary

Thanks for the thorough PR description and the real problem analysis, Lucio. The Windows compatibility issue is legitimate and the test fixes are well-crafted. However, I'd like to suggest a different approach for the production code.

What's great

Test conversions (echo → sys.executable -c) are clean and correct — these are the real fix for the 48 Windows test failures
UTF-8 encoding fix in run.py — genuine bug fix for Windows
read_text(encoding="utf-8") in test assertions — properly paired with the production fix
PurePosixPath in checkpoint tests — correct for cross-platform path serialization
skipif(win32) on chmod tests — appropriate since NTFS doesn't honor POSIX permissions
PR description is excellent — thorough analysis, clear before/after table

Concern with the production approach

The exec-first/shell-fallback strategy in _create_process() adds significant complexity and introduces some issues:

False positives on Windows: Exit code 1 is in _NOT_FOUND_CODES for Windows. A legitimate script that exits 1 and writes "not found" to stderr (e.g., "Resource not found") would be misidentified as "command not found."
FileNotFoundError catch is too broad: It can also mean the cwd doesn't exist, not just "command not found." Silently falling through to shell in that case produces confusing errors.
The heuristic runs on all executions, not just after the shell fallback — so even the exec-happy-path is subject to false positives.
The core problem is a testing concern, not a production one. Real workflows use python, node, curl, docker — all standalone executables. Shell builtins like echo/true/false appeared only in the tests, which this PR already fixes.

Recommended approach

Keep the test changes, the UTF-8 fix, the PurePosixPath fix, and the skipif changes. Drop the _create_process() production changes entirely.

The current create_subprocess_exec on main is simple, secure, and gives clear error messages. If a Windows user tries command: echo, the error tells them to use a real executable — which naturally leads them to either:

# Windows-specific with shell builtins
command: cmd
args: ["/c", "echo", "hello"]

# Cross-platform
command: python
args: ["-c", "print('hello')"]

This is the same pattern used by GitHub Actions, Docker, and most CI systems — the workflow author is explicit about what they want. No heuristics, no fallback chains, no shell injection surface.

jrob5756 · 2026-03-02T18:07:07Z

        env = {**os.environ, **agent.env} if agent.env else None

        _verbose_log(f"  Script: {rendered_command} {' '.join(rendered_args)}")



These three imports (shlex, subprocess, sys) are only needed for the shell fallback logic. If we go with the simpler approach of keeping create_subprocess_exec only and letting the tests handle cross-platform concerns via sys.executable -c, these imports can be dropped entirely.

jrob5756 · 2026-03-02T18:07:07Z

-        except FileNotFoundError as exc:
-            raise ExecutionError(
-                f"Script '{agent.name}': command not found: '{rendered_command}'",
-                agent_name=agent.name,


The original inline create_subprocess_exec with the clear FileNotFoundError → ExecutionError was simpler and gave better error messages. The _create_process() extraction adds complexity to solve a problem that's primarily a testing concern.

Suggestion: Keep the original execute() method from main as-is. The test changes in this PR (converting echo/true/false to sys.executable -c) already fix the Windows test failures without needing this production fallback.

jrob5756 · 2026-03-02T18:07:07Z

+        # but fall back to create_subprocess_shell for shell builtins (echo, true,
+        # etc.) that don't exist as standalone executables on some platforms.
+        process = await self._create_process(
+            rendered_command, rendered_args, rendered_working_dir, env, agent.name


⚠️ False positive risk on Windows: _NOT_FOUND_CODES includes exit code 1 for Windows. A legitimate script that exits 1 and writes something like "Resource not found" or "File not found" to stderr would be misidentified as "command not found."

Also, this check runs unconditionally after every execution — not just when the shell fallback was used. So even commands that succeeded via create_subprocess_exec are subject to this heuristic.

Additionally, this constant is re-created on every execute() call. The _UPPER_SNAKE naming convention suggests a module-level constant.

jrob5756 · 2026-03-02T18:07:07Z

+        _NOT_FOUND_CODES = {127} if sys.platform != "win32" else {1, 9009}
+        if process.returncode in _NOT_FOUND_CODES:
+            stderr_lower = stderr_text.lower()
+            if "not found" in stderr_lower or "not recognized" in stderr_lower:


⚠️ The FileNotFoundError catch is too broad. This exception can also mean the working directory (cwd) doesn't exist — not just that the command wasn't found. Silently falling through to shell execution in that case produces a confusing shell-level error instead of the clear "working directory not found" diagnosis the user needs.

The original code raised a clear ExecutionError immediately with the agent name and a helpful suggestion. The pass here replaces a precise error with a speculative fallthrough.

jrob5756 · 2026-03-02T18:07:07Z

            exit_code=process.returncode,
        )
+
+    async def _create_process(


The platform-aware quoting (shlex.join on Unix, list2cmdline on Windows) is correct. However, note that when the shell fallback is taken, user-controlled template values (from Jinja2 rendering) flow into the shell command string. While shlex.join quotes arguments, it's designed for correctness rather than adversarial inputs — so the fallback to shell introduces a shell injection surface that doesn't exist in the exec-only path.

If we keep create_subprocess_exec only, this concern goes away entirely.

Production changes: - ScriptExecutor: use exec-first strategy with shell fallback. Tries create_subprocess_exec first (secure, no shell injection risk), falls back to create_subprocess_shell only when the command is not found as a standalone executable (shell builtins like echo, true, false). - ScriptExecutor: add command-not-found detection after shell fallback via exit codes (127 on Unix, 1/9009 on Windows) paired with stderr pattern matching to avoid false positives. - cli/run.py: open log files with encoding='utf-8' instead of system default (cp1252 on Windows) to handle Unicode log content. Test changes: - test_script.py: replace all Unix-only commands (sh, echo, printf, pwd, sleep, false) with sys.executable -c Python invocations for true cross-platform independence. - test_script_workflow.py: replace echo/true/false with sys.executable. - test_event_emission.py: replace echo with sys.executable. - test_checkpoint.py: use PurePosixPath for test data expecting forward-slash paths (Path('/a') produces '\a' on Windows). - test_logging.py: add encoding='utf-8' to read_text() calls to match production write encoding. Skip two chmod-based permission tests on Windows where NTFS ACLs don't honor POSIX mode bits.

lucioctinoco · 2026-03-02T21:03:01Z

Thanks for the thorough review @jrob5756! Great points on the false-positive risks with the exec-first/shell-fallback approach.

I've updated the PR to:

Dropped the _create_process() production changes entirely restored script.py to exactly what's on main
Kept all the test-level fixes (echo/sh/pwd/sleep/true/false sys.executable -c)
Kept the UTF-8 encoding fix in run.py
Kept the PurePosixPath fix in checkpoint tests
Kept the skipif(win32) on chmod tests
Kept the read_text(encoding='utf-8') in test assertions

All 1539 tests pass (12 skipped on Windows as expected).

lucioctinoco force-pushed the fix/windows-cross-platform-compat branch from 9594888 to 55de448 Compare March 1, 2026 21:37

jrob5756 reviewed Mar 2, 2026

View reviewed changes

lucioctinoco force-pushed the fix/windows-cross-platform-compat branch from 55de448 to 275ba20 Compare March 2, 2026 21:02

jrob5756 approved these changes Mar 2, 2026

View reviewed changes

jrob5756 merged commit ad637e6 into microsoft:main Mar 2, 2026
7 checks passed

jrob5756 mentioned this pull request Mar 17, 2026

Feature: explicit cross-platform execution modes for script steps (exec/shell/auto) #19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: cross-platform compatibility for ScriptExecutor and tests#10

fix: cross-platform compatibility for ScriptExecutor and tests#10
jrob5756 merged 1 commit intomicrosoft:mainfrom
lucioctinoco:fix/windows-cross-platform-compat

lucioctinoco commented Mar 1, 2026

Uh oh!

lucioctinoco commented Mar 1, 2026

Uh oh!

lucioctinoco commented Mar 1, 2026

Uh oh!

jrob5756 left a comment

Uh oh!

jrob5756 Mar 2, 2026

Uh oh!

jrob5756 Mar 2, 2026

Uh oh!

jrob5756 Mar 2, 2026

Uh oh!

jrob5756 Mar 2, 2026

Uh oh!

jrob5756 Mar 2, 2026

Uh oh!

lucioctinoco commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		env = {os.environ, agent.env} if agent.env else None

		_verbose_log(f" Script: {rendered_command} {' '.join(rendered_args)}")

Conversation

lucioctinoco commented Mar 1, 2026

Summary

Problem

Approach

Production: exec-first with shell fallback (script.py)

Command-not-found detection

Production: UTF-8 encoding (run.py)

Tests: true cross-platform independence

Tests: path and encoding fixes

Files Changed

Testing

Uh oh!

lucioctinoco commented Mar 1, 2026

Uh oh!

lucioctinoco commented Mar 1, 2026

Uh oh!

jrob5756 left a comment

Choose a reason for hiding this comment

Review Summary

What's great

Concern with the production approach

Recommended approach

Uh oh!

jrob5756 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jrob5756 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jrob5756 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jrob5756 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jrob5756 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

lucioctinoco commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Production: exec-first with shell fallback (`script.py`)

Production: UTF-8 encoding (`run.py`)