Skip to content

fix: cross-platform compatibility for ScriptExecutor and tests#10

Merged
jrob5756 merged 1 commit intomicrosoft:mainfrom
lucioctinoco:fix/windows-cross-platform-compat
Mar 2, 2026
Merged

fix: cross-platform compatibility for ScriptExecutor and tests#10
jrob5756 merged 1 commit intomicrosoft:mainfrom
lucioctinoco:fix/windows-cross-platform-compat

Conversation

@lucioctinoco
Copy link
Copy Markdown
Contributor

Summary

Fixes 48 test failures when running the test suite on Windows by making the production code and tests fully cross-platform.

Problem

The existing code had several Unix-specific assumptions that caused failures on Windows:

  1. ScriptExecutor used create_subprocess_exec which cannot find shell builtins (echo, true, false) on Windows these aren't standalone executables, they're built into the shell.
  2. Tests used Unix-only commands (sh, printf, pwd, sleep, false, true) that don't exist on Windows.
  3. Path handling: Path("/a") produces \a on Windows, breaking checkpoint serialization tests that expected forward slashes.
  4. File encoding: open(path, "w") defaults to cp1252 on Windows, which can't encode the Unicode box-drawing characters used in Rich console output.
  5. POSIX permissions: os.chmod(path, 0o444) doesn't prevent writes on NTFS, so permission-denied tests always passed.

Approach

Production: exec-first with shell fallback (script.py)

Rather than always using create_subprocess_shell (which introduces shell injection risk) or always create_subprocess_exec (which can't find shell builtins), the new _create_process() method uses a two-stage strategy:

  1. Try create_subprocess_exec first this is the secure path with no shell injection surface. It works for standalone executables (python, node, curl, etc.).
  2. On FileNotFoundError, fall back to create_subprocess_shell this handles shell builtins and commands that require shell features (pipes, redirection, globbing). Platform-aware quoting is used: shlex.join() on Unix, subprocess.list2cmdline() on Windows.

This preserves security for the common case (standalone executables) while maintaining compatibility with shell builtins and complex shell expressions.

Command-not-found detection

After shell fallback execution, the executor detects "command not found" errors by requiring both:

  • An expected exit code: 127 on Unix, {1, 9009} on Windows
  • A stderr pattern match: "not found" or "not recognized"

The dual requirement prevents false positives a script that legitimately exits with code 1 won't be misidentified as "command not found" unless its stderr also contains the diagnostic pattern.

Production: UTF-8 encoding (run.py)

Log files are now opened with encoding="utf-8" instead of the system default, preventing UnicodeEncodeError on Windows where cp1252 can't encode Rich's Unicode box-drawing characters.

Tests: true cross-platform independence

All Unix-only commands in tests have been replaced with sys.executable -c Python one-liners:

Before (Unix-only) After (cross-platform)
echo hello sys.executable -c "print('hello')"
sh -c 'echo ...' sys.executable -c "..."
printf 'text' sys.executable -c "print('text', end='')"
pwd sys.executable -c "import os; print(os.getcwd())"
sleep 10 sys.executable -c "import time; time.sleep(10)"
false sys.executable -c "import sys; sys.exit(1)"
true sys.executable -c "import sys; sys.exit(0)"

Tests that only inspect config/plans without executing (e.g. TestScriptDryRun) still use echo since no subprocess is created.

Tests: path and encoding fixes

  • Checkpoint tests: Path("/a") PurePosixPath("/a") to guarantee forward-slash serialization on all platforms.
  • Logging tests: read_text() read_text(encoding="utf-8") to match the production write encoding.
  • Permission tests: Two chmod-based tests skipped on Windows with @pytest.mark.skipif(sys.platform == "win32") since NTFS ACLs don't honor POSIX mode bits.

Files Changed

File Change
src/conductor/executor/script.py Exec-first/shell-fallback via _create_process(), command-not-found detection
src/conductor/cli/run.py encoding="utf-8" on log file open
tests/test_executor/test_script.py All Unix commands sys.executable
tests/test_engine/test_script_workflow.py echo/true/false sys.executable
tests/test_engine/test_event_emission.py echo sys.executable
tests/test_engine/test_checkpoint.py Path PurePosixPath for test data
tests/test_cli/test_logging.py UTF-8 encoding on read_text(), skip chmod tests on Windows

Testing

Full test suite passes on Windows: 1,539 passed, 12 skipped, 0 failures.

@lucioctinoco
Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree company="Microsoft"

@lucioctinoco lucioctinoco force-pushed the fix/windows-cross-platform-compat branch from 9594888 to 55de448 Compare March 1, 2026 21:37
@lucioctinoco
Copy link
Copy Markdown
Contributor Author

@jrob5756 could you please review this PR when you get a chance? Thanks!

Copy link
Copy Markdown
Collaborator

@jrob5756 jrob5756 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Thanks for the thorough PR description and the real problem analysis, Lucio. The Windows compatibility issue is legitimate and the test fixes are well-crafted. However, I'd like to suggest a different approach for the production code.

What's great

  • Test conversions (echosys.executable -c) are clean and correct — these are the real fix for the 48 Windows test failures
  • UTF-8 encoding fix in run.py — genuine bug fix for Windows
  • read_text(encoding="utf-8") in test assertions — properly paired with the production fix
  • PurePosixPath in checkpoint tests — correct for cross-platform path serialization
  • skipif(win32) on chmod tests — appropriate since NTFS doesn't honor POSIX permissions
  • PR description is excellent — thorough analysis, clear before/after table

Concern with the production approach

The exec-first/shell-fallback strategy in _create_process() adds significant complexity and introduces some issues:

  1. False positives on Windows: Exit code 1 is in _NOT_FOUND_CODES for Windows. A legitimate script that exits 1 and writes "not found" to stderr (e.g., "Resource not found") would be misidentified as "command not found."

  2. FileNotFoundError catch is too broad: It can also mean the cwd doesn't exist, not just "command not found." Silently falling through to shell in that case produces confusing errors.

  3. The heuristic runs on all executions, not just after the shell fallback — so even the exec-happy-path is subject to false positives.

  4. The core problem is a testing concern, not a production one. Real workflows use python, node, curl, docker — all standalone executables. Shell builtins like echo/true/false appeared only in the tests, which this PR already fixes.

Recommended approach

Keep the test changes, the UTF-8 fix, the PurePosixPath fix, and the skipif changes. Drop the _create_process() production changes entirely.

The current create_subprocess_exec on main is simple, secure, and gives clear error messages. If a Windows user tries command: echo, the error tells them to use a real executable — which naturally leads them to either:

# Windows-specific with shell builtins
command: cmd
args: ["/c", "echo", "hello"]

# Cross-platform
command: python
args: ["-c", "print('hello')"]

This is the same pattern used by GitHub Actions, Docker, and most CI systems — the workflow author is explicit about what they want. No heuristics, no fallback chains, no shell injection surface.

env = {**os.environ, **agent.env} if agent.env else None

_verbose_log(f" Script: {rendered_command} {' '.join(rendered_args)}")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three imports (shlex, subprocess, sys) are only needed for the shell fallback logic. If we go with the simpler approach of keeping create_subprocess_exec only and letting the tests handle cross-platform concerns via sys.executable -c, these imports can be dropped entirely.

except FileNotFoundError as exc:
raise ExecutionError(
f"Script '{agent.name}': command not found: '{rendered_command}'",
agent_name=agent.name,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original inline create_subprocess_exec with the clear FileNotFoundError → ExecutionError was simpler and gave better error messages. The _create_process() extraction adds complexity to solve a problem that's primarily a testing concern.

Suggestion: Keep the original execute() method from main as-is. The test changes in this PR (converting echo/true/false to sys.executable -c) already fix the Windows test failures without needing this production fallback.

Comment thread src/conductor/executor/script.py Outdated
# but fall back to create_subprocess_shell for shell builtins (echo, true,
# etc.) that don't exist as standalone executables on some platforms.
process = await self._create_process(
rendered_command, rendered_args, rendered_working_dir, env, agent.name
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ False positive risk on Windows: _NOT_FOUND_CODES includes exit code 1 for Windows. A legitimate script that exits 1 and writes something like "Resource not found" or "File not found" to stderr would be misidentified as "command not found."

Also, this check runs unconditionally after every execution — not just when the shell fallback was used. So even commands that succeeded via create_subprocess_exec are subject to this heuristic.

Additionally, this constant is re-created on every execute() call. The _UPPER_SNAKE naming convention suggests a module-level constant.

Comment thread src/conductor/executor/script.py Outdated
_NOT_FOUND_CODES = {127} if sys.platform != "win32" else {1, 9009}
if process.returncode in _NOT_FOUND_CODES:
stderr_lower = stderr_text.lower()
if "not found" in stderr_lower or "not recognized" in stderr_lower:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ The FileNotFoundError catch is too broad. This exception can also mean the working directory (cwd) doesn't exist — not just that the command wasn't found. Silently falling through to shell execution in that case produces a confusing shell-level error instead of the clear "working directory not found" diagnosis the user needs.

The original code raised a clear ExecutionError immediately with the agent name and a helpful suggestion. The pass here replaces a precise error with a speculative fallthrough.

Comment thread src/conductor/executor/script.py Outdated
exit_code=process.returncode,
)

async def _create_process(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The platform-aware quoting (shlex.join on Unix, list2cmdline on Windows) is correct. However, note that when the shell fallback is taken, user-controlled template values (from Jinja2 rendering) flow into the shell command string. While shlex.join quotes arguments, it's designed for correctness rather than adversarial inputs — so the fallback to shell introduces a shell injection surface that doesn't exist in the exec-only path.

If we keep create_subprocess_exec only, this concern goes away entirely.

Production changes:
- ScriptExecutor: use exec-first strategy with shell fallback. Tries
  create_subprocess_exec first (secure, no shell injection risk), falls
  back to create_subprocess_shell only when the command is not found as
  a standalone executable (shell builtins like echo, true, false).
- ScriptExecutor: add command-not-found detection after shell fallback
  via exit codes (127 on Unix, 1/9009 on Windows) paired with stderr
  pattern matching to avoid false positives.
- cli/run.py: open log files with encoding='utf-8' instead of system
  default (cp1252 on Windows) to handle Unicode log content.

Test changes:
- test_script.py: replace all Unix-only commands (sh, echo, printf, pwd,
  sleep, false) with sys.executable -c Python invocations for true
  cross-platform independence.
- test_script_workflow.py: replace echo/true/false with sys.executable.
- test_event_emission.py: replace echo with sys.executable.
- test_checkpoint.py: use PurePosixPath for test data expecting
  forward-slash paths (Path('/a') produces '\a' on Windows).
- test_logging.py: add encoding='utf-8' to read_text() calls to match
  production write encoding. Skip two chmod-based permission tests on
  Windows where NTFS ACLs don't honor POSIX mode bits.
@lucioctinoco lucioctinoco force-pushed the fix/windows-cross-platform-compat branch from 55de448 to 275ba20 Compare March 2, 2026 21:02
@lucioctinoco
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @jrob5756! Great points on the false-positive risks with the exec-first/shell-fallback approach.

I've updated the PR to:

  • Dropped the _create_process() production changes entirely restored script.py to exactly what's on main
  • Kept all the test-level fixes (echo/sh/pwd/sleep/true/false sys.executable -c)
  • Kept the UTF-8 encoding fix in run.py
  • Kept the PurePosixPath fix in checkpoint tests
  • Kept the skipif(win32) on chmod tests
  • Kept the read_text(encoding='utf-8') in test assertions

All 1539 tests pass (12 skipped on Windows as expected).

@jrob5756 jrob5756 merged commit ad637e6 into microsoft:main Mar 2, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants