perf: optimise sandbox boot time with parallel startup and symlinked deps#506
perf: optimise sandbox boot time with parallel startup and symlinked deps#506ColeMurray wants to merge 2 commits intomainfrom
Conversation
…deps - Pre-stage .opencode/ layout in Docker image (tools, skills, deps, node_modules) so boot avoids per-file copy from scattered sources - Symlink node_modules instead of shutil.copytree (~2.3s → 27ms) - Start OpenCode and sidecars concurrently with git sync for repo_image boots, overlapping ~2.8s of Node.js cold start - Reduce health check poll interval from 500ms to 100ms - Extract _start_sidecars() for parallel execution - Add .claude/worktrees/ to .gitignore and ESLint ignores - Add sandbox-runtime to lint-staged ruff checks
📝 WalkthroughWalkthroughInfrastructure updates to optimize sandbox image builds and runtime startup. Pre-stages OpenCode dependencies into the Docker image layer, introduces a fast-path installation flow in the runtime with legacy fallback, implements concurrent sidecar startup, and adds Python linting for the sandbox-runtime package. Changes
Sequence DiagramsequenceDiagram
participant DockerBuild as Docker Build
participant ImageLayer as Image Layer
participant RuntimeStart as Runtime Startup
participant FastPath as Fast Install (Stage)
participant LegacyPath as Legacy Install
participant Sidecars as Sidecars<br/>(code_server, ttyd)
DockerBuild->>ImageLayer: Pre-stage /app/opencode-stage/<br/>(tools, skills, node_modules)
ImageLayer->>ImageLayer: Cache v47-prestage-opencode
RuntimeStart->>RuntimeStart: Check for /app/opencode-stage/
alt Stage Directory Exists
RuntimeStart->>FastPath: Call _install_from_stage()
FastPath->>FastPath: Copy manifests + symlink node_modules
FastPath->>RuntimeStart: ✓ Fast path complete
else Stage Directory Missing
RuntimeStart->>LegacyPath: Call _install_tools_legacy()
LegacyPath->>LegacyPath: Copy tools + deps per-file
LegacyPath->>RuntimeStart: ✓ Legacy path complete
end
par Concurrent Startup
RuntimeStart->>Sidecars: _start_sidecars() (concurrent)
RuntimeStart->>RuntimeStart: start_opencode() (concurrent)
end
Sidecars->>Sidecars: code_server + ttyd running
RuntimeStart->>RuntimeStart: OpenCode initialized
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Terraform Validation Results
Pushed by: @ColeMurray, Action: |
| opencode_task: asyncio.Task | None = None | ||
| sidecar_task: asyncio.Task | None = None | ||
| if from_repo_image: | ||
| opencode_task = asyncio.create_task(self.start_opencode()) |
There was a problem hiding this comment.
Starting start_opencode() before _update_existing_repo() now creates .opencode/ inside the repo before git checkout -B runs. That can make checkout fail with "untracked working tree files would be overwritten" for any repository that tracks .opencode/* (or adds it on the target branch), which is a regression from the previous ordering. Could we avoid writing into the repo until after checkout succeeds, or move the staged OpenCode files outside the repo root?
There was a problem hiding this comment.
Good catch. In repo_image mode _install_from_stage() writes into workdir/.opencode/ which could race with git checkout -B. Moved opencode_task to after git sync + hooks complete — we now only overlap OpenCode startup with the sidecar wait. Fixed in 9a2a313.
| opencode_task: asyncio.Task | None = None | ||
| sidecar_task: asyncio.Task | None = None | ||
| if from_repo_image: | ||
| opencode_task = asyncio.create_task(self.start_opencode()) |
There was a problem hiding this comment.
This reorders the runtime contract: run_start_script() used to complete before OpenCode booted, but in the new repo-image path OpenCode now starts concurrently and may read the repo or .opencode state before the start hook has finished preparing it. That means hook side effects like generating config, writing files, or starting local services are no longer guaranteed to be visible during OpenCode startup. Can we keep the overlap to git/sidecars while preserving the start.sh -> OpenCode ordering?
There was a problem hiding this comment.
Agreed — the start hook contract should be preserved. Same fix: OpenCode now starts after hooks complete, overlapping only with sidecars. Fixed in 9a2a313.
There was a problem hiding this comment.
Summary
This PR improves sandbox boot performance by pre-staging .opencode/, reducing health-check latency, and overlapping startup work. The image-stage changes look reasonable, but the new repo-image parallel startup path introduces two blocking correctness regressions around git checkout safety and the established start-hook ordering.
- PR Title and number:
perf: optimise sandbox boot time with parallel startup and symlinked deps(#506) - Author:
@ColeMurray - Files changed count, additions/deletions:
6 files,+131/-36
Critical Issues
- [Functionality & Correctness]
packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py:1246-start_opencode()now writes.opencode/into the repo before_update_existing_repo()runsgit checkout -B. That can cause checkout failures when the target branch contains.opencode/*, because git will refuse to overwrite the newly-created untracked files. - [Functionality & Correctness]
packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py:1246- The repo-image path now starts OpenCode beforerun_start_script()completes. That changes the previous contract where the runtime hook finished preparing the repo before OpenCode booted, so hook-created files, generated config, or local services are no longer guaranteed to be available during OpenCode startup.
Suggestions
- None beyond the blocking issues above.
Nitpicks
- None.
Positive Feedback
- The staged
.opencodelayout is a nice minimal optimization, especially keeping the legacy fallback for pre-v47 images. - Extracting
_start_sidecars()makes the lifecycle easier to follow and reduces duplication. - The lint-staged and ignore-list updates are small but helpful hygiene improvements alongside the runtime work.
Questions
- None.
Verdict
Request Changes: Critical correctness issues in the new parallel repo-image startup path should be addressed before merge.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py (1)
1240-1299:⚠️ Potential issue | 🔴 CriticalCancel and drain background tasks on error to prevent event loop leaks and unhandled exceptions.
When
run_start_script()fails or any earlier exception is raised, control jumps to theexceptblock and thenfinally: await self.shutdown(). The eagerly createdopencode_taskandsidecar_taskare never cancelled or awaited, causing:
- Python to emit
Task exception was never retrievedfor any exception insidestart_opencode()(e.g.,_wait_for_healthtiming out).- If
start_opencode()is still running,shutdown()terminatesself.opencode_processwhile the task concurrently awaits the health endpoint — the task continues until timeout/connection-refused, leaving the event loop alive afterrun()returns.opencode.readymay never get set but the error report path doesn't include state about whether OpenCode came up.Implement the suggested cleanup in the finally block to cancel and drain both tasks before shutdown.
Additionally, in the
from_repo_image=Trueboot path,start_opencode()calls_install_tools()which writes to.opencode/tool/,.opencode/skills/, etc., while_update_existing_repo()concurrently executesgit checkout -Bon the same repository. If any tracked file under.opencode/exists on the target branch (unlikely but possible), this creates a race condition. Verify that.opencode/is guaranteed gitignored in all repositories using this sandbox.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py` around lines 1240 - 1299, The created background tasks opencode_task and sidecar_task must be cancelled and drained in the run() finally path to avoid leaked tasks/unhandled exceptions: in the finally block, if opencode_task or sidecar_task is not None and not done(), call cancel(), then await them (e.g., with asyncio.gather(..., return_exceptions=True)) to retrieve and swallow expected CancelledError/other exceptions before calling await self.shutdown(); also ensure any exceptions from those tasks are logged or attached to the error report (include opencode.ready state). For the from_repo_image path, either ensure .opencode/ is guaranteed gitignored in all repos or change the startup ordering (do not call start_opencode() until after _update_existing_repo()/git sync completes) to avoid concurrent writes vs git checkout races.
🧹 Nitpick comments (1)
packages/sandbox-runtime/tests/test_entrypoint_build_mode.py (1)
255-277: Optional: pin the new eager-start contract with a positive assertion.The updated docstring states
start_opencodeis now fired eagerly, but the test no longer asserts this. Addingsupervisor.start_opencode.assert_called_once()(or.assert_called()) would lock in the concurrent-start invariant so a future refactor that accidentally reverts to sequential startup is caught here rather than only in production. Not blocking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/sandbox-runtime/tests/test_entrypoint_build_mode.py` around lines 255 - 277, Update the test_start_script_failure_is_fatal to assert that start_opencode was invoked eagerly: after awaiting supervisor.run() add supervisor.start_opencode.assert_called_once() (or .assert_called()) alongside the existing fatal-error and start_bridge assertions so the test verifies the concurrent-start contract for the supervisor object and its start_opencode method.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/modal-infra/src/images/base.py`:
- Around line 183-195: The shell fallbacks "|| true" in the .run_commands(...)
block mask missing glob matches for "/app/sandbox_runtime/tools/*.js" and
"/app/sandbox_runtime/skills/*" so staging can appear present while empty;
remove the "|| true" from those cp commands so the build fails loudly when
expected inputs are missing, or alternatively add explicit existence checks
before copying (checking the glob expands or directory is non-empty) so missing
tools/skills cause an error; also consider adding a runtime check in
_install_from_stage() to verify stage_dir/tool and stage_dir/skills are
non-empty and call _install_tools_legacy() if they are empty.
In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py`:
- Around line 311-341: The fast-path installer _install_from_stage currently
performs symlink_to, shutil.copy2 and shutil.copytree calls without error
handling and unconditionally logs skills_path; wrap the sequence of filesystem
ops in a try/except around the sections that create
local_modules/local_tools/local_skills (references: _install_from_stage,
staged_modules/local_modules, shutil.copy2, shutil.copytree, symlink_to) so that
on any exception you (a) log the exception with details, (b) clean up partial
state in opencode_dir (remove the symlink and any partially created files/dirs
under opencode_dir) or call the legacy fallback _install_tools_legacy() to
ensure a consistent state, and (c) only emit
self.log.info("opencode.stage_installed", skills_path=...) when
local_skills.exists() (or otherwise include a boolean flag indicating absence)
so telemetry doesn’t claim a skills path that wasn’t created.
---
Outside diff comments:
In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py`:
- Around line 1240-1299: The created background tasks opencode_task and
sidecar_task must be cancelled and drained in the run() finally path to avoid
leaked tasks/unhandled exceptions: in the finally block, if opencode_task or
sidecar_task is not None and not done(), call cancel(), then await them (e.g.,
with asyncio.gather(..., return_exceptions=True)) to retrieve and swallow
expected CancelledError/other exceptions before calling await self.shutdown();
also ensure any exceptions from those tasks are logged or attached to the error
report (include opencode.ready state). For the from_repo_image path, either
ensure .opencode/ is guaranteed gitignored in all repos or change the startup
ordering (do not call start_opencode() until after _update_existing_repo()/git
sync completes) to avoid concurrent writes vs git checkout races.
---
Nitpick comments:
In `@packages/sandbox-runtime/tests/test_entrypoint_build_mode.py`:
- Around line 255-277: Update the test_start_script_failure_is_fatal to assert
that start_opencode was invoked eagerly: after awaiting supervisor.run() add
supervisor.start_opencode.assert_called_once() (or .assert_called()) alongside
the existing fatal-error and start_bridge assertions so the test verifies the
concurrent-start contract for the supervisor object and its start_opencode
method.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: daa54a62-283e-4f56-9cc0-dfe37319fad5
📒 Files selected for processing (6)
.gitignoreeslint.config.jspackage.jsonpackages/modal-infra/src/images/base.pypackages/sandbox-runtime/src/sandbox_runtime/entrypoint.pypackages/sandbox-runtime/tests/test_entrypoint_build_mode.py
| .run_commands( | ||
| "mkdir -p /app/opencode-stage/tool /app/opencode-stage/skills", | ||
| # Tools: legacy plugin + new tools | ||
| "cp /app/sandbox_runtime/plugins/inspect-plugin.js" | ||
| " /app/opencode-stage/tool/create-pull-request.js || true", | ||
| "cp /app/sandbox_runtime/tools/*.js /app/opencode-stage/tool/ || true", | ||
| # Skills: copy each skill subdirectory | ||
| "cp -r /app/sandbox_runtime/skills/* /app/opencode-stage/skills/ || true", | ||
| # Deps: package.json, lockfile, and node_modules from pre-built cache | ||
| "cp /app/opencode-deps/package.json /app/opencode-deps/package-lock.json" | ||
| " /app/opencode-stage/", | ||
| "cp -a /app/opencode-deps/node_modules /app/opencode-stage/node_modules", | ||
| ) |
There was a problem hiding this comment.
Silent || true fallbacks mask incomplete staging and defeat the legacy fallback.
If /app/sandbox_runtime/tools/*.js or /app/sandbox_runtime/skills/* produces no glob matches (shell passes the literal pattern to cp), the copy fails but || true swallows the error. The stage directory still gets created, so at boot _install_tools() sees stage_dir.is_dir() and takes the fast path — _install_tools_legacy() will never trigger even when tools/skills are missing. Users will silently boot with an empty .opencode/tool/ or .opencode/skills/.
Two options:
- Remove
|| trueand let the build fail loudly if required inputs are missing (preferred if these globs are expected to be non-empty in every image). - Guard the globs explicitly so “no match” is distinguishable from “cp error”, e.g.:
🛡️ Proposed fix — explicit existence checks + loud failure on real errors
.run_commands(
"mkdir -p /app/opencode-stage/tool /app/opencode-stage/skills",
- # Tools: legacy plugin + new tools
- "cp /app/sandbox_runtime/plugins/inspect-plugin.js"
- " /app/opencode-stage/tool/create-pull-request.js || true",
- "cp /app/sandbox_runtime/tools/*.js /app/opencode-stage/tool/ || true",
- # Skills: copy each skill subdirectory
- "cp -r /app/sandbox_runtime/skills/* /app/opencode-stage/skills/ || true",
+ # Tools: legacy plugin + new tools (only copy if present; fail loudly on real errors)
+ "if [ -f /app/sandbox_runtime/plugins/inspect-plugin.js ]; then"
+ " cp /app/sandbox_runtime/plugins/inspect-plugin.js"
+ " /app/opencode-stage/tool/create-pull-request.js;"
+ " fi",
+ "shopt -s nullglob;"
+ " files=(/app/sandbox_runtime/tools/*.js);"
+ ' if [ ${`#files`[@]} -gt 0 ]; then cp "${files[@]}" /app/opencode-stage/tool/; fi',
+ # Skills: copy each skill subdirectory
+ "if [ -d /app/sandbox_runtime/skills ]; then"
+ " cp -r /app/sandbox_runtime/skills/. /app/opencode-stage/skills/;"
+ " fi",
# Deps: package.json, lockfile, and node_modules from pre-built cache
"cp /app/opencode-deps/package.json /app/opencode-deps/package-lock.json"
" /app/opencode-stage/",
"cp -a /app/opencode-deps/node_modules /app/opencode-stage/node_modules",
)Alternatively, have _install_from_stage() validate that tool/ and skills/ are non-empty and fall back to _install_tools_legacy() otherwise.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/modal-infra/src/images/base.py` around lines 183 - 195, The shell
fallbacks "|| true" in the .run_commands(...) block mask missing glob matches
for "/app/sandbox_runtime/tools/*.js" and "/app/sandbox_runtime/skills/*" so
staging can appear present while empty; remove the "|| true" from those cp
commands so the build fails loudly when expected inputs are missing, or
alternatively add explicit existence checks before copying (checking the glob
expands or directory is non-empty) so missing tools/skills cause an error; also
consider adding a runtime check in _install_from_stage() to verify
stage_dir/tool and stage_dir/skills are non-empty and call
_install_tools_legacy() if they are empty.
| def _install_from_stage(self, stage_dir: Path, opencode_dir: Path) -> None: | ||
| """Fast path: populate .opencode/ from the pre-staged image directory.""" | ||
| opencode_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| # Symlink node_modules — OpenCode's Npm.install() sees the lockfile | ||
| # in sync and skips arborist reify(), so the tree is never mutated. | ||
| staged_modules = stage_dir / "node_modules" | ||
| local_modules = opencode_dir / "node_modules" | ||
| if staged_modules.is_dir() and not local_modules.exists(): | ||
| local_modules.symlink_to(staged_modules) | ||
|
|
||
| # Copy package.json + package-lock.json (small files, needed alongside symlink) | ||
| for name in ("package.json", "package-lock.json"): | ||
| src = stage_dir / name | ||
| dest = opencode_dir / name | ||
| if src.exists() and not dest.exists(): | ||
| shutil.copy2(src, dest) | ||
|
|
||
| # Copy tool/ directory | ||
| staged_tools = stage_dir / "tool" | ||
| local_tools = opencode_dir / "tool" | ||
| if staged_tools.is_dir() and not local_tools.exists(): | ||
| shutil.copytree(staged_tools, local_tools) | ||
|
|
||
| # Copy skills/ directory | ||
| staged_skills = stage_dir / "skills" | ||
| local_skills = opencode_dir / "skills" | ||
| if staged_skills.is_dir() and not local_skills.exists(): | ||
| shutil.copytree(staged_skills, local_skills) | ||
|
|
||
| self.log.info("opencode.stage_installed", skills_path=str(local_skills)) |
There was a problem hiding this comment.
Partial staging leaves .opencode/ in an inconsistent state with no recovery, and the success log is misleading.
Two issues:
-
No error handling on copy/symlink operations. If
shutil.copytree(staged_tools, local_tools)fails (e.g. permission issue, disk full, partial stage dir),.opencode/is left with a symlinkednode_modulesplus possiblytool/but noskills/. The next boot won't re-stage (all thenot X.exists()guards short-circuit), and the legacy fallback is unreachable becausestage_dir.is_dir()is still true. Consider wrapping in try/except and either cleaning up partial state or falling back to_install_tools_legacy(). -
Line 341 always logs
skills_path=str(local_skills)regardless of whetherstaged_skillsactually existed orlocal_skillswas populated. If the stage lacksskills/(see concern inbase.py), operators will seeopencode.stage_installedpointing at a path that was never created — misleading telemetry that hides the very failure mode this fast path can produce.
🛡️ Suggested hardening
def _install_from_stage(self, stage_dir: Path, opencode_dir: Path) -> None:
"""Fast path: populate .opencode/ from the pre-staged image directory."""
opencode_dir.mkdir(parents=True, exist_ok=True)
- # Symlink node_modules — OpenCode's Npm.install() sees the lockfile
- # in sync and skips arborist reify(), so the tree is never mutated.
- staged_modules = stage_dir / "node_modules"
- local_modules = opencode_dir / "node_modules"
- if staged_modules.is_dir() and not local_modules.exists():
- local_modules.symlink_to(staged_modules)
-
- # Copy package.json + package-lock.json (small files, needed alongside symlink)
- for name in ("package.json", "package-lock.json"):
- src = stage_dir / name
- dest = opencode_dir / name
- if src.exists() and not dest.exists():
- shutil.copy2(src, dest)
-
- # Copy tool/ directory
- staged_tools = stage_dir / "tool"
- local_tools = opencode_dir / "tool"
- if staged_tools.is_dir() and not local_tools.exists():
- shutil.copytree(staged_tools, local_tools)
-
- # Copy skills/ directory
- staged_skills = stage_dir / "skills"
- local_skills = opencode_dir / "skills"
- if staged_skills.is_dir() and not local_skills.exists():
- shutil.copytree(staged_skills, local_skills)
-
- self.log.info("opencode.stage_installed", skills_path=str(local_skills))
+ installed = {"node_modules": False, "tool": False, "skills": False}
+ try:
+ staged_modules = stage_dir / "node_modules"
+ local_modules = opencode_dir / "node_modules"
+ if staged_modules.is_dir() and not local_modules.exists():
+ local_modules.symlink_to(staged_modules)
+ installed["node_modules"] = True
+
+ for name in ("package.json", "package-lock.json"):
+ src = stage_dir / name
+ dest = opencode_dir / name
+ if src.exists() and not dest.exists():
+ shutil.copy2(src, dest)
+
+ for subdir in ("tool", "skills"):
+ staged = stage_dir / subdir
+ local = opencode_dir / subdir
+ if staged.is_dir() and not local.exists():
+ shutil.copytree(staged, local)
+ installed[subdir] = True
+ except Exception as e:
+ self.log.warn("opencode.stage_install_failed", exc=e, installed=installed)
+ self._install_tools_legacy(opencode_dir.parent)
+ return
+
+ self.log.info("opencode.stage_installed", **{f"{k}_installed": v for k, v in installed.items()})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py` around lines 311
- 341, The fast-path installer _install_from_stage currently performs
symlink_to, shutil.copy2 and shutil.copytree calls without error handling and
unconditionally logs skills_path; wrap the sequence of filesystem ops in a
try/except around the sections that create
local_modules/local_tools/local_skills (references: _install_from_stage,
staged_modules/local_modules, shutil.copy2, shutil.copytree, symlink_to) so that
on any exception you (a) log the exception with details, (b) clean up partial
state in opencode_dir (remove the symlink and any partially created files/dirs
under opencode_dir) or call the legacy fallback _install_tools_legacy() to
ensure a consistent state, and (c) only emit
self.log.info("opencode.stage_installed", skills_path=...) when
local_skills.exists() (or otherwise include a boolean flag indicating absence)
so telemetry doesn’t claim a skills path that wasn’t created.
Move OpenCode startup to after git sync + hooks complete to avoid racing with git checkout (.opencode/ writes) and to preserve the contract that start.sh side effects are available during OpenCode boot. Sidecars still start concurrently with git sync since they don't write to .opencode/ or depend on hooks.
Terraform Validation Results
Pushed by: @ColeMurray, Action: |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py (1)
1239-1302:⚠️ Potential issue | 🟡 Minor
sidecar_taskcan be orphaned on the failure path.
sidecar_taskis created at line 1245 but only awaited inside the happy path at line 1292. If_update_existing_repo()raises,run_start_script()returns False (→raise RuntimeErrorat 1273), orstart_opencode()raises before line 1299, control jumps to theexceptat 1327 without ever awaiting or cancellingsidecar_task. The task will be garbage-collected while still pending, producingTask was destroyed but it is pending!warnings, and any exception from it is silently dropped.shutdown()also doesn't wait on it before terminating the sidecar processes it spawned, which can race process startup with SIGTERM.Consider cancelling/awaiting it in
finally, e.g.:finally: + if sidecar_task and not sidecar_task.done(): + sidecar_task.cancel() + try: + await sidecar_task + except (asyncio.CancelledError, Exception): + pass await self.shutdown()Also nit:
if from_repo_image and sidecar_task:at line 1290 is redundant —sidecar_taskis non-None ifffrom_repo_image—if sidecar_task:is sufficient and keeps the two branches symmetric.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py` around lines 1239 - 1302, The sidecar_task created by calling self._start_sidecars() can be left running if later operations raise, so ensure it is always cleaned up: wrap the try block so that in the finally you check if sidecar_task is not None and if still pending cancel it and await it (catch asyncio.CancelledError and log any exceptions), and also ensure shutdown() waits for any outstanding sidecar_task; replace the redundant "if from_repo_image and sidecar_task:" check with "if sidecar_task:" to keep branches symmetric and to reliably await the already-created task when present (references: sidecar_task, self._start_sidecars, self._update_existing_repo, self.perform_git_sync, self.run_start_script, self.start_opencode, shutdown).
♻️ Duplicate comments (1)
packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py (1)
311-341:⚠️ Potential issue | 🟠 MajorFast-path install still lacks error handling and emits misleading telemetry.
Per the earlier review thread, the author agreed to add try/except + fallback to
_install_tools_legacy()and to makeopencode.stage_installedreflect what was actually installed, but the current code on 311–341 still performs unguardedsymlink_to/copy2/copytreeand unconditionally logsskills_path=str(local_skills)even whenstaged_skillswas absent orcopytreefailed mid-way. A partial failure leaves.opencode/inconsistent, subsequent boots short-circuit on thenot X.exists()guards, and the legacy fallback is unreachable becausestage_dir.is_dir()is still true.Please land the agreed hardening (wrap the ops in try/except, fall back to legacy or clean up partial state on failure, and log per-subdir installed booleans instead of a potentially-nonexistent
local_skills).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py` around lines 311 - 341, The _install_from_stage fast-path performs unguarded filesystem ops (symlink_to, shutil.copy2, shutil.copytree) and always logs skills_path even on failure; wrap the sequence in a try/except inside _install_from_stage so any exception cleans up partially-created artifacts (remove local_modules symlink, remove partially-copied local_tools/local_skills, remove copied package files) and then call the fallback method _install_tools_legacy(stage_dir, opencode_dir) before re-raising or returning; change the telemetry call (self.log.info("opencode.stage_installed", ...)) to report booleans/flags (e.g., modules_installed, package_json_copied, package_lock_copied, tools_installed, skills_installed) based on existence of local_modules/local_tools/local_skills and copied files so it reflects what actually succeeded rather than unconditionally emitting skills_path.
🧹 Nitpick comments (1)
packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py (1)
815-815: Extract the health-poll interval into a named constant.The new
0.1poll literal is a magic number on a timing-sensitive hot path and sits alongside existing named constants (HEALTH_CHECK_TIMEOUT,SIDECAR_TIMEOUT_SECONDS, etc.). Per the repo's timing conventions, define it once with a unit-suffixed name and reuse it (the same value is also hard-coded at line 704 in_wait_for_port).HEALTH_CHECK_TIMEOUT = 30.0 + HEALTH_CHECK_POLL_INTERVAL_SECONDS = 0.1 ... - await asyncio.sleep(0.1) + await asyncio.sleep(self.HEALTH_CHECK_POLL_INTERVAL_SECONDS)As per coding guidelines: "Define each default timeout value exactly once as a named constant and import it everywhere rather than restating literal values" and "always encode the unit (seconds for Python…) in the variable name".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py` at line 815, Introduce a named constant for the 0.1 second poll interval (e.g., HEALTH_POLL_INTERVAL_SECONDS = 0.1) in the module and replace the literal 0.1 in the await asyncio.sleep call inside the health loop and the duplicate literal in _wait_for_port with that constant; update any imports/usages in this file so timing defaults are defined once, use the unit-suffixed name (SECONDS) to match existing conventions, and ensure both the health check loop and _wait_for_port reference the new HEALTH_POLL_INTERVAL_SECONDS constant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py`:
- Around line 1239-1302: The sidecar_task created by calling
self._start_sidecars() can be left running if later operations raise, so ensure
it is always cleaned up: wrap the try block so that in the finally you check if
sidecar_task is not None and if still pending cancel it and await it (catch
asyncio.CancelledError and log any exceptions), and also ensure shutdown() waits
for any outstanding sidecar_task; replace the redundant "if from_repo_image and
sidecar_task:" check with "if sidecar_task:" to keep branches symmetric and to
reliably await the already-created task when present (references: sidecar_task,
self._start_sidecars, self._update_existing_repo, self.perform_git_sync,
self.run_start_script, self.start_opencode, shutdown).
---
Duplicate comments:
In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py`:
- Around line 311-341: The _install_from_stage fast-path performs unguarded
filesystem ops (symlink_to, shutil.copy2, shutil.copytree) and always logs
skills_path even on failure; wrap the sequence in a try/except inside
_install_from_stage so any exception cleans up partially-created artifacts
(remove local_modules symlink, remove partially-copied local_tools/local_skills,
remove copied package files) and then call the fallback method
_install_tools_legacy(stage_dir, opencode_dir) before re-raising or returning;
change the telemetry call (self.log.info("opencode.stage_installed", ...)) to
report booleans/flags (e.g., modules_installed, package_json_copied,
package_lock_copied, tools_installed, skills_installed) based on existence of
local_modules/local_tools/local_skills and copied files so it reflects what
actually succeeded rather than unconditionally emitting skills_path.
---
Nitpick comments:
In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py`:
- Line 815: Introduce a named constant for the 0.1 second poll interval (e.g.,
HEALTH_POLL_INTERVAL_SECONDS = 0.1) in the module and replace the literal 0.1 in
the await asyncio.sleep call inside the health loop and the duplicate literal in
_wait_for_port with that constant; update any imports/usages in this file so
timing defaults are defined once, use the unit-suffixed name (SECONDS) to match
existing conventions, and ensure both the health check loop and _wait_for_port
reference the new HEALTH_POLL_INTERVAL_SECONDS constant.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3c35231a-05ab-44f7-b041-6d63a60f3342
📒 Files selected for processing (1)
packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py
Summary
Optimises container boot time for sandbox environments, validated in production with measured results.
Pre-stage
.opencode/in Docker image — assembles tools, skills, deps, andnode_modulesinto/app/opencode-stage/at image build time. At boot,node_modulesis symlinked (instant) instead ofshutil.copytree(~2.3s), and the remaining small files are shallow-copied.Parallel startup for
repo_imageboots — OpenCode and sidecars (code-server, ttyd) are started concurrently with git sync and hooks viaasyncio.create_task, overlapping ~2.8s of Node.js cold start with work that's already happening. The bridge still starts last, so prompts can't arrive before git sync and hooks complete.Health check poll interval reduced from 500ms to 100ms, saving ~200ms on average.
Measured results (production
repo_imageboot)opencode.start→ file install completeopencode.start→opencode.readyThe parallel startup change (OpenCode + sidecars concurrent with git sync) will further reduce end-to-end boot time by hiding OpenCode's ~2.8s Node.js cold start behind the git fetch.
Also included
_start_sidecars()method for parallel execution_install_tools_legacy) for pre-v47 images.claude/worktrees/to.gitignoreand ESLint ignorespackages/sandbox-runtime/**/*.pyto lint-staged ruff checksTest plan
test_start_script_failure_is_fatal)opencode.stage_installedevent confirms fast path activeSummary by CodeRabbit
Release Notes