fix(update): sanitise env before spawning installer#899
fix(update): sanitise env before spawning installer#899danielmeppiel merged 3 commits intomicrosoft:mainfrom
Conversation
apm update inherits the PyInstaller bootloader's LD_LIBRARY_PATH when spawning the platform installer. The shell -- and the curl / tar / sudo calls install.sh makes -- then dlopens libssl.so.3 / libcrypto.so.3 from the bundle's _internal/ directory instead of the system ones. When the bundled libs are ABI-incompatible with what the system libcurl needs, curl aborts with "OPENSSL_3.2.0 not found" on the very first release fetch, blocking the upgrade path for every user on an affected distro (Debian trixie arm64 dev-containers, Fedora 43, and similar). Centralise PyInstaller env sanitisation in a new helper, apm_cli.utils.subprocess_env.external_process_env, which restores LD_LIBRARY_PATH / DYLD_LIBRARY_PATH / DYLD_FRAMEWORK_PATH from the <NAME>_ORIG snapshots that PyInstaller's bootloader saves at launch, or drops them entirely when no snapshot exists. The _ORIG keys are stripped from the returned env so PyInstaller internals do not leak to the child. Outside a frozen build and on Windows the helper is a no-op. apm update now calls subprocess.run with env=external_process_env() so the installer runs against the user's pre-launch environment. Restoring from _ORIG rather than blindly popping preserves legitimate user exports (CUDA, Nix, custom toolchains). Complements microsoft#466's build-side exclude: that fix stopped new binaries from shipping the offending libs; this fix stops them from being inherited by spawned children even when they are present in older binaries or in any future bundle that re-introduces a similar dependency. Closes microsoft#894
APM Review Panel VerdictDisposition: APPROVE (with one recommended follow-up tracked below) Per-persona findingsPython Architect: This is a routine PR -- a new utility module in 1. OO / class diagramclassDiagram
direction LR
class subprocess_env {
<<Pure>>
+_PYINSTALLER_MANAGED_LIBRARY_VARS tuple
+external_process_env(base) dict
}
class update {
<<IOBoundary>>
+update() command
+_get_installer_run_command() list
}
class os_environ {
<<External>>
}
class sys_frozen {
<<External>>
}
subprocess_env ..> os_environ : reads copy
subprocess_env ..> sys_frozen : checks getattr
update ..> subprocess_env : uses external_process_env()
class subprocess_env:::touched
class update:::touched
classDef touched fill:#fff3b0,stroke:#d47600
2. Execution flow diagramflowchart TD
A["user: apm update"] --> B["update.py: update()"]
B --> C["[NET] download installer via requests.get"]
C --> D["[FS] write to NamedTemporaryFile"]
D --> E["external_process_env() -- subprocess_env.py"]
E --> F{"getattr(sys, 'frozen', False)"}
F -- "False (dev / source install)" --> G["return dict(os.environ) -- no-op"]
F -- "True (PyInstaller binary)" --> H["iterate _PYINSTALLER_MANAGED_LIBRARY_VARS"]
H --> I{"key_ORIG in env?"}
I -- "Yes -- user had prior export" --> J["env[key] = env[key_ORIG]; pop _ORIG"]
I -- "No -- PyInstaller injected it" --> K["env.pop(key, None)"]
J --> L["[EXEC] subprocess.run(cmd, env=sanitised_env, check=False)"]
K --> L
G --> L
L --> M["[EXEC] install.sh spawns system curl / tar / sudo"]
M --> N["system binaries resolve libs from system paths"]
Design patterns
Quality observations:
CLI Logging Expert: No output path changes anywhere in the diff. The fix is entirely below the logging layer -- DevX UX Expert: No CLI surface changes. Supply Chain Security Expert: The fix is security-positive: it narrows the child process's effective environment by removing PyInstaller's bundled library-path overrides. No auth variables ( Flag (non-blocking, follow-up): The module docstring correctly states this helper is "the single source of truth for child-process environment sanitisation", but the PR applies it to exactly one call site (
The PR's docstring explicitly references issue #462 (the Auth Expert: Not activated -- the PR touches only OSS Growth Hacker: This fix removes a trust-destroying failure mode on two fast-growing developer segments: arm64 dev-containers (Debian trixie, GitHub Codespaces arm64) and Fedora 43. Developers setting up AI-native tooling in containers are exactly the users APM needs to convert to repeat users -- and a broken Side-channel to CEO: The CHANGELOG entry is already well-written for a release note beat. Worth surfacing in the next release post as a reliability signal: "APM update now works on modern Linux distros including arm64 dev-containers." That is a concrete, repostable claim with a platform callout. No growth-strategy.md update required for a point fix of this scope. CEO arbitrationAll five mandatory specialists are in alignment: this is a correct, well-scoped, well-tested fix that should ship. The implementation follows PyInstaller's official Required actions before merge
Optional follow-ups
|
…#921) * fix(ci): add merge_group trigger to Merge Gate so it reports in queue Branch protection / merge-queue ruleset requires the 'gate' check on both PR-time and merge-queue contexts, but the gate workflow only fired on 'pull_request'. In the merge queue, GitHub fires 'merge_group' events against a temp merge commit -- the gate check was never created on that SHA, so PRs sat in the queue with 'gate' stuck in 'Expected -- Waiting for status to be reported' indefinitely (observed on PR #899). Changes ------- .github/workflows/merge-gate.yml - Add 'merge_group' (types: checks_requested) and keep existing 'pull_request' + 'workflow_dispatch' triggers. - Resolve head SHA per event: workflow_dispatch -> gh api .../pulls/N --jq .head.sha merge_group -> github.event.merge_group.head_sha pull_request -> github.event.pull_request.head.sha - Branch EXPECTED_CHECKS by event: pull_request / workflow_dispatch: 'Build & Test (Linux),APM Self-Check' merge_group: + 'Build (Linux),Smoke Test (Linux), Integration Tests (Linux),Release Validation (Linux)' (the merge_group-only checks emitted by ci-integration.yml plus the ci.yml checks that also run on merge_group) - Bump TIMEOUT_MIN 30 -> 55 and job timeout-minutes 35 -> 60 to absorb ci-integration.yml's theoretical worst-case critical path (Build -> Smoke -> Integration[20m] -> Release Validation[20m]). - Update header comment + recovery instructions to cover both contexts. .github/scripts/ci/merge_gate_wait.sh - Accept new optional EVENT_NAME env var; emit event-aware recovery instructions on exit code 2 (in merge_group context, pushing a commit does NOT retrigger the merge_group event -- the user must re-queue). - Add '&filter=latest' to the Checks API query so GitHub returns only the latest run per name, removing reliance on client-side sort and pagination order. Concurrency ----------- The existing key 'merge-gate-${{ pull_request.number || inputs.pr_number || github.ref }}' falls through to github.ref in merge_group context. github.ref there is 'refs/heads/gh-readonly-queue/main/pr-N-<sha>', unique per queue entry, so cancel-in-progress dedupes correctly within a single temp branch and never collides across PR/merge_group channels. Self-deadlock ------------- 'gate' is intentionally absent from EXPECTED_CHECKS in both contexts. Audit ----- Design audited against live GitHub docs: - docs.github.com/.../webhook-events-and-payloads#merge_group - docs.github.com/.../managing-a-merge-queue - docs.github.com/en/rest/checks/runs Verdict: ship with the event-aware recovery message included here. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(ci): drop paths-ignore from gate + ci so docs-only PRs satisfy gate Both .github/workflows/merge-gate.yml and .github/workflows/ci.yml carried identical paths-ignore (docs/**, .gitignore, LICENSE). For a docs-only PR neither workflow fires, so the 'gate' check-run is never created -- if the PR ruleset requires 'gate', branch protection displays it as 'Expected -- Waiting' forever and the PR cannot merge. Removing paths-ignore from BOTH (not just one) is required: dropping it only from merge-gate.yml would leave the gate polling for ci.yml checks that never appear, timing out at TIMEOUT_MIN with exit 2 (false failure). Removing from both means ci.yml runs on docs-only PRs (~5 min of free GitHub-hosted runner time) and the gate aggregates as normal -- coherent regardless of which ruleset tier requires gate. Caught in code review on PR #921. Same observation was flagged but left out-of-scope in the original PR description; folding in now. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
apm updateinherits the PyInstaller bootloader'sLD_LIBRARY_PATHwhen spawning the platform installer. The shell -- and thecurl/tar/sudocallsinstall.shmakes -- then dlopenslibssl.so.3/libcrypto.so.3from the bundle's_internal/directory instead of the system ones. When the bundled libs are ABI-incompatible with what the systemlibcurlneeds,curlaborts withOPENSSL_3.2.0 not foundon the very first release fetch, blocking the upgrade path for every user on an affected distro (Debian trixie arm64 dev-containers, Fedora 43, and similar).Fixes #894
Fix
New helper
apm_cli.utils.subprocess_env.external_process_envcentralises PyInstaller env sanitisation:LD_LIBRARY_PATH/DYLD_LIBRARY_PATH/DYLD_FRAMEWORK_PATHfrom the<NAME>_ORIGsnapshots that PyInstaller's bootloader saves at launch._ORIGsnapshot exists (no pre-launch value to restore)._ORIGkeys from the returned env so PyInstaller internals do not leak to the child.apm updatenow callssubprocess.runwithenv=external_process_env()so the installer runs against the user's pre-launch environment. Restoring from_ORIGrather than blindly popping preserves legitimate user exports (CUDA, Nix, custom toolchains).Complements #466's build-side exclude: that fix stopped new binaries from shipping the offending libs; this fix stops them from being inherited by spawned children even when they are present in older binaries or in any future bundle that re-introduces a similar dependency.
Type of change
Testing
Unit coverage
tests/unit/test_subprocess_env.py-- 11 tests locking in the helper contract: no-op when not frozen,_ORIGrestoration, drop when no_ORIG, DYLD variants, immutability of input mapping andos.environ, base-mapping precedence.tests/unit/test_update_command.py-- two regression guards asserting the installer is always spawned with an explicitenv=kwarg (Unix and Windows paths).Full unit suite: 5308 passed.
End-to-end reproduction on WSL Ubuntu 22.04
libssl.so.3/libcrypto.so.3in a fake_internal/dir.LD_LIBRARY_PATHpointed at that dir,curl https://api.github.com/...failed witherror while loading shared libraries: libssl.so.3: file too short-- the same dlopen-failure class as [BUG] apm update fails #894.sys.frozen=Trueplus the realexternal_process_env()applied to the subprocess env, the samecurlcall returnedhttp_code=200, while the main process still saw the pollutedLD_LIBRARY_PATH-- proving the helper does not mutate the live environment.curlstill fails -- confirming dev-environment behaviour is untouched.Note on upgrade path from pre-0.9.3 binaries
This fix takes effect from the next release onwards. Users already on an affected binary (0.8.5 or earlier in an environment whose system
libcurlneedsOPENSSL_3.2.0+) cannotapm updatetheir way out, because their running binary lacks the fix.install.shalready points such users at the pip fallback (pip install --user apm-cli), which is a one-time escape. From 0.9.3+ onwards,apm updateis immune to this bug class.