Skip to content

fix: pre-build OpenCode plugin deps in image to avoid first-prompt timeout#491

Merged
ColeMurray merged 2 commits intomainfrom
worktree-fix+sandbox-boot-npm-timeout
Apr 17, 2026
Merged

fix: pre-build OpenCode plugin deps in image to avoid first-prompt timeout#491
ColeMurray merged 2 commits intomainfrom
worktree-fix+sandbox-boot-npm-timeout

Conversation

@ColeMurray
Copy link
Copy Markdown
Owner

@ColeMurray ColeMurray commented Apr 17, 2026

Summary

  • Pre-build @opencode-ai/plugin deps (package.json, package-lock.json, node_modules) into the sandbox image at /app/opencode-deps/
  • At boot, copy them into .opencode/ so OpenCode's Npm.install() finds the lockfile in sync and skips arb.reify() entirely
  • Bump bridge OPENCODE_REQUEST_TIMEOUT from 10s to 30s as a safety net

Problem

When a sandbox boots, OpenCode checks package.json deps against package-lock.json using @npmcli/arborist. The old entrypoint wrote a minimal package.json with no dependencies and symlinked node_modules to the global install. OpenCode then added @opencode-ai/plugin to package.json, found no lockfile, and triggered arb.reify() (2-22s) — blocking the first prompt inside the bridge's 10s HTTP timeout.

How it works

OpenCode's Npm.install() does a name-only check — it compares declared dependency names in package.json against names in package-lock.json. Since the pre-built lockfile already contains @opencode-ai/plugin, the check passes and arb.reify() is never called.

Test plan

  • All 235 sandbox-runtime tests pass
  • New tests: cache copy, no-overwrite of existing files (snapshot restores)
  • Ruff lint + format clean
  • Deploy image rebuild and verify first prompt completes without reify delay

Summary by CodeRabbit

  • New Features

    • Added a pre-built plugin dependencies cache to speed initial setup and reduce first-run latency.
  • Bug Fixes

    • Increased HTTP request timeout for external service calls to improve reliability on slower/complex operations.
  • Chores

    • Switched dependency installation to use cached artifacts instead of previous fallback mechanisms.
  • Tests

    • Updated and added tests to validate cache usage, ensure existing local manifests aren’t overwritten, and verify dependency presence.

…meout

When a sandbox boots, OpenCode checks package.json deps against
package-lock.json. If any dep is missing from the lockfile, it calls
arb.reify() (npm install) which takes 2-22s and can exceed the
bridge's 10s HTTP timeout, failing the first prompt.

Fix: bake the plugin deps (package.json, package-lock.json,
node_modules) into the sandbox image at build time, then copy them
into .opencode/ at boot. OpenCode's Npm.install() finds the lockfile
in sync and skips reify() entirely.

Also bump OPENCODE_REQUEST_TIMEOUT from 10s to 30s as a safety net.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5b11e5ec-39ea-4e35-b42e-9c01a8a8f042

📥 Commits

Reviewing files that changed from the base of the PR and between 977e59a and 29c7e7a.

📒 Files selected for processing (1)
  • packages/modal-infra/src/images/base.py
✅ Files skipped from review due to trivial changes (1)
  • packages/modal-infra/src/images/base.py

📝 Walkthrough

Walkthrough

Adds a prebuilt npm dependency layer at /app/opencode-deps in the container image and updates the runtime entrypoint to copy those artifacts into workdir/.opencode when absent. Also increases OpenCode HTTP request timeout from 10s to 30s and updates tests accordingly.

Changes

Cohort / File(s) Summary
Container image layer
packages/modal-infra/src/images/base.py
Replaced image cache-buster tag and added a build step that creates /app/opencode-deps, writes a minimal package.json listing @opencode-ai/plugin, and runs npm install to pre-populate plugin dependencies.
Entrypoint / runtime copy logic
packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py
Removed symlink/fallback creation logic; added guarded copy of /app/opencode-deps/package.json, package-lock.json, and node_modules into workdir/.opencode/ when those files/dirs do not already exist.
Request timeout
packages/sandbox-runtime/src/sandbox_runtime/bridge.py
Increased AgentBridge.OPENCODE_REQUEST_TIMEOUT from 10.0 to 30.0 seconds for OpenCode-related HTTP calls.
Tests
packages/sandbox-runtime/tests/test_tool_installation.py
Updated test helper to point to /app/opencode-deps, replaced symlink assertions with cache-copy tests, added preservation test for existing workdir/.opencode/package.json, and added import json for test artifact creation.

Sequence Diagram(s)

sequenceDiagram
  participant Builder as Builder\n(image build)
  participant ImageFS as Image FS\n(`/app/opencode-deps`)
  participant Entrypoint as Runtime Entrypoint
  participant Workdir as Workdir\n(`workdir/.opencode`)
  participant OpenCode as OpenCode Service

  Builder->>ImageFS: create /app/opencode-deps\nwrite package.json\nnpm install (prebuilt node_modules)
  Entrypoint->>ImageFS: check for /app/opencode-deps artifacts
  ImageFS-->>Entrypoint: artifacts exist
  Entrypoint->>Workdir: copy package.json, package-lock.json, node_modules\n(if missing)
  Entrypoint-->>Workdir: workdir now has prebuilt deps
  Entrypoint->>OpenCode: start session / send requests\n(timeout = 30s)
  OpenCode-->>Entrypoint: respond
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through layers, packed deps with care,

/app/opencode-deps tucked neatly there.
Copy, don't symlink — a cleaner trail,
Thirty seconds now if responses sail.
🥕✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: pre-building OpenCode plugin dependencies into the image to prevent first-prompt timeout issues. It accurately reflects the core problem being addressed and the solution implemented across all modified files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-fix+sandbox-boot-npm-timeout

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

Terraform Validation Results

Step Status
Format
Init
Validate

Note: Terraform plan was skipped because secrets are not configured. This is expected for external contributors. See docs/GETTING_STARTED.md for setup instructions.

Pushed by: @ColeMurray, Action: pull_request

Copy link
Copy Markdown
Contributor

@open-inspect open-inspect Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

PR Title: fix: pre-build OpenCode plugin deps in image to avoid first-prompt timeout (#491)
Author: @ColeMurray
Files changed: 4 files, +83/-29

This change moves the expensive OpenCode plugin dependency install work out of the first prompt path by baking a lockfile-backed dependency set into the image and copying it into .opencode/ at boot, with a longer bridge timeout as a safety net. I reviewed the diff and surrounding runtime/test context and did not find any blocking correctness, security, or maintainability issues.

Critical Issues

None.

Suggestions

  • [Testing] packages/sandbox-runtime/tests/test_tool_installation.py - Consider adding one more restored-snapshot case that preserves existing package-lock.json and/or node_modules, not just package.json, since this code intentionally keeps prior .opencode state and that is the main place stale combinations could resurface.

Nitpicks

None.

Positive Feedback

  • Keeping the slow npm/arborist work in the image build rather than the request path is a clean fix for the reported timeout.
  • The boot-time copy logic is minimal and deliberately avoids overwriting restored .opencode state.
  • The new tests cover the main happy path and the important no-overwrite behavior.

Questions

None.

Verdict

Approve

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/modal-infra/src/images/base.py (1)

35-37: ⚠️ Potential issue | 🟠 Major

Bump CACHE_BUSTER to force image rebuild.

This PR adds a new run_commands layer at lines 116-127 that pre-builds /app/opencode-deps, but CACHE_BUSTER still reads "v45-ttyd". Per the repo's coding guideline for this file, this constant must be updated whenever image content changes so Modal invalidates the cached layers and the new deps cache actually makes it into the deployed image. Without this bump, the "Pending: rebuild of the deployed image to verify first prompt" step in the PR plan won't produce a meaningfully new image on Modal hosts that already have v45-ttyd cached.

Proposed bump
-# Cache buster - change this to force Modal image rebuild
-# v45: add ttyd web terminal
-CACHE_BUSTER = "v45-ttyd"
+# Cache buster - change this to force Modal image rebuild
+# v46: pre-build `@opencode-ai/plugin` deps cache at /app/opencode-deps
+CACHE_BUSTER = "v46-opencode-deps"

As per coding guidelines: "Update CACHE_BUSTER constant in packages/modal-infra/src/images/base.py to force a Modal image rebuild".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/modal-infra/src/images/base.py` around lines 35 - 37, CACHE_BUSTER
is not updated so Modal may reuse a cached image and the new run_commands layer
(pre-building /app/opencode-deps) won't be included; update the CACHE_BUSTER
constant value (the string assigned to CACHE_BUSTER) in the same file so Modal
invalidates cached layers and rebuilds the image, ensuring the new run_commands
layer at function/section handling run_commands (lines where /app/opencode-deps
is prebuilt) is applied in the deployed image.
packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py (1)

284-312: ⚠️ Potential issue | 🟡 Minor

Deps cache copy is gated behind has_tools; decouple it.

Lines 284-286 return early when neither the legacy tool nor tools_dir exists, which means the new pre-built deps cache at lines 299-312 is never copied in that scenario. Today the image always ships /app/sandbox_runtime/tools/, so it works in practice, but the coupling is fragile — the whole point of this PR is to guarantee the pre-built lockfile is in place before OpenCode starts, and that guarantee shouldn't depend on an unrelated tools directory existing.

Additionally, opencode_dir currently only gets created via tool_dest.mkdir(parents=True, exist_ok=True) on line 288, so if you hoist the deps copy above the early return you'll need to ensure the directory exists first.

Suggested refactor
     def _install_tools(self, workdir: Path) -> None:
         """Copy custom tools into the .opencode/tool directory for OpenCode to discover."""
         opencode_dir = workdir / ".opencode"
         tool_dest = opencode_dir / "tool"
 
-        # Legacy tool (inspect-plugin.js → create-pull-request.js)
         legacy_tool = Path("/app/sandbox_runtime/plugins/inspect-plugin.js")
-        # New tools directory
         tools_dir = Path("/app/sandbox_runtime/tools")
+        deps_cache = Path("/app/opencode-deps")
 
-        has_tools = legacy_tool.exists() or tools_dir.exists()
-        if not has_tools:
+        has_tools = legacy_tool.exists() or tools_dir.exists()
+        has_deps = deps_cache.exists()
+        if not has_tools and not has_deps:
             return
 
-        tool_dest.mkdir(parents=True, exist_ok=True)
+        opencode_dir.mkdir(parents=True, exist_ok=True)
+        if has_tools:
+            tool_dest.mkdir(parents=True, exist_ok=True)
 
         if legacy_tool.exists():
             shutil.copy(legacy_tool, tool_dest / "create-pull-request.js")
 
         # Copy all .js files from tools/ — these must export tool() for OpenCode
         if tools_dir.exists():
             for tool_file in tools_dir.iterdir():
                 if tool_file.is_file() and tool_file.suffix == ".js":
                     shutil.copy(tool_file, tool_dest / tool_file.name)
 
         # Copy pre-built deps (package.json, package-lock.json, node_modules)
-        # from the image staging directory.  ...
-        deps_cache = Path("/app/opencode-deps")
         for name in ("package.json", "package-lock.json"):
             src = deps_cache / name
             dest = opencode_dir / name
             if src.exists() and not dest.exists():
                 shutil.copy2(src, dest)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py` around lines 284
- 312, The deps copy is incorrectly gated by the has_tools early return; move
the pre-built deps copy block (the code that uses deps_cache, cached_modules,
and local_modules) out from under the has_tools check so it runs regardless of
legacy_tool or tools_dir, and ensure opencode_dir exists before copying by
calling opencode_dir.mkdir(parents=True, exist_ok=True) (or similar) prior to
copying; keep the existing logic that still creates tool_dest
(tool_dest.mkdir(...)) and copies legacy_tool/tools files only when has_tools is
true.
🧹 Nitpick comments (3)
packages/sandbox-runtime/src/sandbox_runtime/bridge.py (1)

137-137: Timeout name lacks the _SECONDS suffix required by the style guide.

Per the repo's timeout-naming rule, Python timeout constants must encode the unit in the identifier (timeout_seconds / TIMEOUT_SECONDS). OPENCODE_REQUEST_TIMEOUT is ambiguous and inconsistent with its siblings in the class (e.g. SSE_INACTIVITY_TIMEOUT, HTTP_DEFAULT_TIMEOUT have the same issue). Since you're touching this line, a drive-by rename to OPENCODE_REQUEST_TIMEOUT_SECONDS (and updating the five self.OPENCODE_REQUEST_TIMEOUT references) would bring it into compliance; the rest of the constants can follow in a separate sweep.

Separately: bumping the per-request ceiling to 30s is appropriate as a safety net, but it also means /session creation and /abort can now block a caller for up to 30s. Confirm the control plane's end-to-end timeouts on stop/prompt command handling still leave headroom above this.

As per coding guidelines: "Use seconds for Python timeouts and milliseconds for TypeScript timeouts, encoding the unit in variable names (Python: timeout_seconds, TypeScript: timeoutMs, TIMEOUT_MS)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/sandbox-runtime/src/sandbox_runtime/bridge.py` at line 137, Rename
the constant OPENCODE_REQUEST_TIMEOUT to OPENCODE_REQUEST_TIMEOUT_SECONDS and
update all uses of self.OPENCODE_REQUEST_TIMEOUT to
self.OPENCODE_REQUEST_TIMEOUT_SECONDS (there are five references) so the timeout
unit is encoded in the identifier; ensure the constant value remains 30.0 and
run tests/linters to catch any missed references.
packages/modal-infra/src/images/base.py (1)

116-127: Pin @opencode-ai/plugin to match what OpenCode will request at runtime.

Using "@opencode-ai/plugin":"*" during image build means the lockfile snapshots whatever version was latest at build time. If the globally-installed opencode-ai at line 110 (also @latest) later resolves a plugin entry whose name-only comparison logic considers the lockfile stale (e.g. if OpenCode introduces a second transitive requirement or verifies semver), you'll silently fall back to the slow arb.reify() path this PR is trying to eliminate. Consider pinning both opencode-ai and @opencode-ai/plugin to explicit versions (or deriving the plugin version from the installed opencode CLI) so the cache and runtime stay in lockstep.

Also worth considering: run npm cache clean --force after the install (similar to line 182) to avoid baking ~/.npm into this layer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/modal-infra/src/images/base.py` around lines 116 - 127, Pin the
build-time dependency versions so the prebuilt lockfile matches runtime OpenCode
resolution: replace the wildcard dependency in the staging package.json creation
(the run_commands block that writes /app/opencode-deps/package.json and
installs) to use the same explicit version you install for the global opencode
CLI (the opencode-ai version installed earlier), i.e., set "@opencode-ai/plugin"
to that exact version (or derive the plugin version from the installed opencode
CLI) and likewise pin "opencode-tools" if needed; after the npm install command
in the same run_commands sequence, add an "npm cache clean --force" invocation
to avoid baking ~/.npm into the image layer.
packages/sandbox-runtime/tests/test_tool_installation.py (1)

177-204: Widen no-overwrite assertions to cover package-lock.json and node_modules.

This test only proves package.json is preserved. The production code also guards package-lock.json and node_modules with not dest.exists() checks — regressions there (e.g. someone removing the guards to "refresh" the lockfile) would silently clobber a snapshot's working state without failing a test. A couple extra asserts would lock in the contract:

Suggested additions
         existing_pkg = workdir / ".opencode" / "package.json"
         existing_pkg.write_text('{"name": "existing"}')
+        existing_lock = workdir / ".opencode" / "package-lock.json"
+        existing_lock.write_text('{"lockfileVersion": 2, "existing": true}')
+        existing_nm = workdir / ".opencode" / "node_modules"
+        existing_nm.mkdir()
+        (existing_nm / "marker").write_text("existing")

         with _patch_paths(legacy=legacy_tool, tools=tmp_path / "no-tools", deps_cache=deps_cache):
             sup._install_tools(workdir)

-        # Existing package.json should be preserved, not overwritten by cache
         assert existing_pkg.read_text() == '{"name": "existing"}'
+        assert '"existing": true' in existing_lock.read_text()
+        assert (existing_nm / "marker").read_text() == "existing"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/sandbox-runtime/tests/test_tool_installation.py` around lines 177 -
204, The test test_does_not_overwrite_existing_files only asserts package.json
is preserved; expand it to also assert package-lock.json and node_modules in the
.opencode directory are not clobbered by the cache. After creating existing_pkg,
create references like existing_lock = workdir / ".opencode" /
"package-lock.json" and existing_node_modules = workdir / ".opencode" /
"node_modules" (or use the existing variables if present), write a sentinel to
existing_lock and create a directory or marker file under existing_node_modules,
then after calling sup._install_tools(workdir) assert existing_lock.read_text()
still equals the sentinel and existing_node_modules.exists() (and/or contains
the marker) so package-lock.json and node_modules are preserved just like
package.json.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/modal-infra/src/images/base.py`:
- Around line 35-37: CACHE_BUSTER is not updated so Modal may reuse a cached
image and the new run_commands layer (pre-building /app/opencode-deps) won't be
included; update the CACHE_BUSTER constant value (the string assigned to
CACHE_BUSTER) in the same file so Modal invalidates cached layers and rebuilds
the image, ensuring the new run_commands layer at function/section handling
run_commands (lines where /app/opencode-deps is prebuilt) is applied in the
deployed image.

In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py`:
- Around line 284-312: The deps copy is incorrectly gated by the has_tools early
return; move the pre-built deps copy block (the code that uses deps_cache,
cached_modules, and local_modules) out from under the has_tools check so it runs
regardless of legacy_tool or tools_dir, and ensure opencode_dir exists before
copying by calling opencode_dir.mkdir(parents=True, exist_ok=True) (or similar)
prior to copying; keep the existing logic that still creates tool_dest
(tool_dest.mkdir(...)) and copies legacy_tool/tools files only when has_tools is
true.

---

Nitpick comments:
In `@packages/modal-infra/src/images/base.py`:
- Around line 116-127: Pin the build-time dependency versions so the prebuilt
lockfile matches runtime OpenCode resolution: replace the wildcard dependency in
the staging package.json creation (the run_commands block that writes
/app/opencode-deps/package.json and installs) to use the same explicit version
you install for the global opencode CLI (the opencode-ai version installed
earlier), i.e., set "@opencode-ai/plugin" to that exact version (or derive the
plugin version from the installed opencode CLI) and likewise pin
"opencode-tools" if needed; after the npm install command in the same
run_commands sequence, add an "npm cache clean --force" invocation to avoid
baking ~/.npm into the image layer.

In `@packages/sandbox-runtime/src/sandbox_runtime/bridge.py`:
- Line 137: Rename the constant OPENCODE_REQUEST_TIMEOUT to
OPENCODE_REQUEST_TIMEOUT_SECONDS and update all uses of
self.OPENCODE_REQUEST_TIMEOUT to self.OPENCODE_REQUEST_TIMEOUT_SECONDS (there
are five references) so the timeout unit is encoded in the identifier; ensure
the constant value remains 30.0 and run tests/linters to catch any missed
references.

In `@packages/sandbox-runtime/tests/test_tool_installation.py`:
- Around line 177-204: The test test_does_not_overwrite_existing_files only
asserts package.json is preserved; expand it to also assert package-lock.json
and node_modules in the .opencode directory are not clobbered by the cache.
After creating existing_pkg, create references like existing_lock = workdir /
".opencode" / "package-lock.json" and existing_node_modules = workdir /
".opencode" / "node_modules" (or use the existing variables if present), write a
sentinel to existing_lock and create a directory or marker file under
existing_node_modules, then after calling sup._install_tools(workdir) assert
existing_lock.read_text() still equals the sentinel and
existing_node_modules.exists() (and/or contains the marker) so package-lock.json
and node_modules are preserved just like package.json.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 99c1ff21-81d7-4f15-9cac-7ff5465471b0

📥 Commits

Reviewing files that changed from the base of the PR and between d663fda and 977e59a.

📒 Files selected for processing (4)
  • packages/modal-infra/src/images/base.py
  • packages/sandbox-runtime/src/sandbox_runtime/bridge.py
  • packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py
  • packages/sandbox-runtime/tests/test_tool_installation.py

@github-actions
Copy link
Copy Markdown

Terraform Validation Results

Step Status
Format
Init
Validate

Note: Terraform plan was skipped because secrets are not configured. This is expected for external contributors. See docs/GETTING_STARTED.md for setup instructions.

Pushed by: @ColeMurray, Action: pull_request

@ColeMurray ColeMurray merged commit dc5130b into main Apr 17, 2026
18 checks passed
@ColeMurray ColeMurray deleted the worktree-fix+sandbox-boot-npm-timeout branch April 17, 2026 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant