fix: cross-platform compatibility for ScriptExecutor and tests#10
Conversation
|
@microsoft-github-policy-service agree company="Microsoft" |
9594888 to
55de448
Compare
|
@jrob5756 could you please review this PR when you get a chance? Thanks! |
jrob5756
left a comment
There was a problem hiding this comment.
Review Summary
Thanks for the thorough PR description and the real problem analysis, Lucio. The Windows compatibility issue is legitimate and the test fixes are well-crafted. However, I'd like to suggest a different approach for the production code.
What's great
- Test conversions (
echo→sys.executable -c) are clean and correct — these are the real fix for the 48 Windows test failures - UTF-8 encoding fix in
run.py— genuine bug fix for Windows read_text(encoding="utf-8")in test assertions — properly paired with the production fixPurePosixPathin checkpoint tests — correct for cross-platform path serializationskipif(win32)on chmod tests — appropriate since NTFS doesn't honor POSIX permissions- PR description is excellent — thorough analysis, clear before/after table
Concern with the production approach
The exec-first/shell-fallback strategy in _create_process() adds significant complexity and introduces some issues:
-
False positives on Windows: Exit code
1is in_NOT_FOUND_CODESfor Windows. A legitimate script that exits1and writes "not found" to stderr (e.g., "Resource not found") would be misidentified as "command not found." -
FileNotFoundErrorcatch is too broad: It can also mean thecwddoesn't exist, not just "command not found." Silently falling through to shell in that case produces confusing errors. -
The heuristic runs on all executions, not just after the shell fallback — so even the exec-happy-path is subject to false positives.
-
The core problem is a testing concern, not a production one. Real workflows use
python,node,curl,docker— all standalone executables. Shell builtins likeecho/true/falseappeared only in the tests, which this PR already fixes.
Recommended approach
Keep the test changes, the UTF-8 fix, the PurePosixPath fix, and the skipif changes. Drop the _create_process() production changes entirely.
The current create_subprocess_exec on main is simple, secure, and gives clear error messages. If a Windows user tries command: echo, the error tells them to use a real executable — which naturally leads them to either:
# Windows-specific with shell builtins
command: cmd
args: ["/c", "echo", "hello"]
# Cross-platform
command: python
args: ["-c", "print('hello')"]This is the same pattern used by GitHub Actions, Docker, and most CI systems — the workflow author is explicit about what they want. No heuristics, no fallback chains, no shell injection surface.
| env = {**os.environ, **agent.env} if agent.env else None | ||
|
|
||
| _verbose_log(f" Script: {rendered_command} {' '.join(rendered_args)}") | ||
|
|
There was a problem hiding this comment.
These three imports (shlex, subprocess, sys) are only needed for the shell fallback logic. If we go with the simpler approach of keeping create_subprocess_exec only and letting the tests handle cross-platform concerns via sys.executable -c, these imports can be dropped entirely.
| except FileNotFoundError as exc: | ||
| raise ExecutionError( | ||
| f"Script '{agent.name}': command not found: '{rendered_command}'", | ||
| agent_name=agent.name, |
There was a problem hiding this comment.
The original inline create_subprocess_exec with the clear FileNotFoundError → ExecutionError was simpler and gave better error messages. The _create_process() extraction adds complexity to solve a problem that's primarily a testing concern.
Suggestion: Keep the original execute() method from main as-is. The test changes in this PR (converting echo/true/false to sys.executable -c) already fix the Windows test failures without needing this production fallback.
| # but fall back to create_subprocess_shell for shell builtins (echo, true, | ||
| # etc.) that don't exist as standalone executables on some platforms. | ||
| process = await self._create_process( | ||
| rendered_command, rendered_args, rendered_working_dir, env, agent.name |
There was a problem hiding this comment.
_NOT_FOUND_CODES includes exit code 1 for Windows. A legitimate script that exits 1 and writes something like "Resource not found" or "File not found" to stderr would be misidentified as "command not found."
Also, this check runs unconditionally after every execution — not just when the shell fallback was used. So even commands that succeeded via create_subprocess_exec are subject to this heuristic.
Additionally, this constant is re-created on every execute() call. The _UPPER_SNAKE naming convention suggests a module-level constant.
| _NOT_FOUND_CODES = {127} if sys.platform != "win32" else {1, 9009} | ||
| if process.returncode in _NOT_FOUND_CODES: | ||
| stderr_lower = stderr_text.lower() | ||
| if "not found" in stderr_lower or "not recognized" in stderr_lower: |
There was a problem hiding this comment.
FileNotFoundError catch is too broad. This exception can also mean the working directory (cwd) doesn't exist — not just that the command wasn't found. Silently falling through to shell execution in that case produces a confusing shell-level error instead of the clear "working directory not found" diagnosis the user needs.
The original code raised a clear ExecutionError immediately with the agent name and a helpful suggestion. The pass here replaces a precise error with a speculative fallthrough.
| exit_code=process.returncode, | ||
| ) | ||
|
|
||
| async def _create_process( |
There was a problem hiding this comment.
The platform-aware quoting (shlex.join on Unix, list2cmdline on Windows) is correct. However, note that when the shell fallback is taken, user-controlled template values (from Jinja2 rendering) flow into the shell command string. While shlex.join quotes arguments, it's designed for correctness rather than adversarial inputs — so the fallback to shell introduces a shell injection surface that doesn't exist in the exec-only path.
If we keep create_subprocess_exec only, this concern goes away entirely.
Production changes:
- ScriptExecutor: use exec-first strategy with shell fallback. Tries
create_subprocess_exec first (secure, no shell injection risk), falls
back to create_subprocess_shell only when the command is not found as
a standalone executable (shell builtins like echo, true, false).
- ScriptExecutor: add command-not-found detection after shell fallback
via exit codes (127 on Unix, 1/9009 on Windows) paired with stderr
pattern matching to avoid false positives.
- cli/run.py: open log files with encoding='utf-8' instead of system
default (cp1252 on Windows) to handle Unicode log content.
Test changes:
- test_script.py: replace all Unix-only commands (sh, echo, printf, pwd,
sleep, false) with sys.executable -c Python invocations for true
cross-platform independence.
- test_script_workflow.py: replace echo/true/false with sys.executable.
- test_event_emission.py: replace echo with sys.executable.
- test_checkpoint.py: use PurePosixPath for test data expecting
forward-slash paths (Path('/a') produces '\a' on Windows).
- test_logging.py: add encoding='utf-8' to read_text() calls to match
production write encoding. Skip two chmod-based permission tests on
Windows where NTFS ACLs don't honor POSIX mode bits.
55de448 to
275ba20
Compare
|
Thanks for the thorough review @jrob5756! Great points on the false-positive risks with the exec-first/shell-fallback approach. I've updated the PR to:
All 1539 tests pass (12 skipped on Windows as expected). |
Summary
Fixes 48 test failures when running the test suite on Windows by making the production code and tests fully cross-platform.
Problem
The existing code had several Unix-specific assumptions that caused failures on Windows:
create_subprocess_execwhich cannot find shell builtins (echo,true,false) on Windows these aren't standalone executables, they're built into the shell.sh,printf,pwd,sleep,false,true) that don't exist on Windows.Path("/a")produces\aon Windows, breaking checkpoint serialization tests that expected forward slashes.open(path, "w")defaults tocp1252on Windows, which can't encode the Unicode box-drawing characters used in Rich console output.os.chmod(path, 0o444)doesn't prevent writes on NTFS, so permission-denied tests always passed.Approach
Production: exec-first with shell fallback (
script.py)Rather than always using
create_subprocess_shell(which introduces shell injection risk) or alwayscreate_subprocess_exec(which can't find shell builtins), the new_create_process()method uses a two-stage strategy:create_subprocess_execfirst this is the secure path with no shell injection surface. It works for standalone executables (python,node,curl, etc.).FileNotFoundError, fall back tocreate_subprocess_shellthis handles shell builtins and commands that require shell features (pipes, redirection, globbing). Platform-aware quoting is used:shlex.join()on Unix,subprocess.list2cmdline()on Windows.This preserves security for the common case (standalone executables) while maintaining compatibility with shell builtins and complex shell expressions.
Command-not-found detection
After shell fallback execution, the executor detects "command not found" errors by requiring both:
127on Unix,{1, 9009}on Windows"not found"or"not recognized"The dual requirement prevents false positives a script that legitimately exits with code 1 won't be misidentified as "command not found" unless its stderr also contains the diagnostic pattern.
Production: UTF-8 encoding (
run.py)Log files are now opened with
encoding="utf-8"instead of the system default, preventingUnicodeEncodeErroron Windows wherecp1252can't encode Rich's Unicode box-drawing characters.Tests: true cross-platform independence
All Unix-only commands in tests have been replaced with
sys.executable -cPython one-liners:echo hellosys.executable -c "print('hello')"sh -c 'echo ...'sys.executable -c "..."printf 'text'sys.executable -c "print('text', end='')"pwdsys.executable -c "import os; print(os.getcwd())"sleep 10sys.executable -c "import time; time.sleep(10)"falsesys.executable -c "import sys; sys.exit(1)"truesys.executable -c "import sys; sys.exit(0)"Tests that only inspect config/plans without executing (e.g.
TestScriptDryRun) still useechosince no subprocess is created.Tests: path and encoding fixes
Path("/a")PurePosixPath("/a")to guarantee forward-slash serialization on all platforms.read_text()read_text(encoding="utf-8")to match the production write encoding.chmod-based tests skipped on Windows with@pytest.mark.skipif(sys.platform == "win32")since NTFS ACLs don't honor POSIX mode bits.Files Changed
src/conductor/executor/script.py_create_process(), command-not-found detectionsrc/conductor/cli/run.pyencoding="utf-8"on log file opentests/test_executor/test_script.pysys.executabletests/test_engine/test_script_workflow.pyecho/true/falsesys.executabletests/test_engine/test_event_emission.pyechosys.executabletests/test_engine/test_checkpoint.pyPathPurePosixPathfor test datatests/test_cli/test_logging.pyread_text(), skip chmod tests on WindowsTesting
Full test suite passes on Windows: 1,539 passed, 12 skipped, 0 failures.