fix: pre-build OpenCode plugin deps in image to avoid first-prompt timeout#491
Conversation
…meout When a sandbox boots, OpenCode checks package.json deps against package-lock.json. If any dep is missing from the lockfile, it calls arb.reify() (npm install) which takes 2-22s and can exceed the bridge's 10s HTTP timeout, failing the first prompt. Fix: bake the plugin deps (package.json, package-lock.json, node_modules) into the sandbox image at build time, then copy them into .opencode/ at boot. OpenCode's Npm.install() finds the lockfile in sync and skips reify() entirely. Also bump OPENCODE_REQUEST_TIMEOUT from 10s to 30s as a safety net.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds a prebuilt npm dependency layer at Changes
Sequence Diagram(s)sequenceDiagram
participant Builder as Builder\n(image build)
participant ImageFS as Image FS\n(`/app/opencode-deps`)
participant Entrypoint as Runtime Entrypoint
participant Workdir as Workdir\n(`workdir/.opencode`)
participant OpenCode as OpenCode Service
Builder->>ImageFS: create /app/opencode-deps\nwrite package.json\nnpm install (prebuilt node_modules)
Entrypoint->>ImageFS: check for /app/opencode-deps artifacts
ImageFS-->>Entrypoint: artifacts exist
Entrypoint->>Workdir: copy package.json, package-lock.json, node_modules\n(if missing)
Entrypoint-->>Workdir: workdir now has prebuilt deps
Entrypoint->>OpenCode: start session / send requests\n(timeout = 30s)
OpenCode-->>Entrypoint: respond
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Terraform Validation Results
Pushed by: @ColeMurray, Action: |
There was a problem hiding this comment.
Summary
PR Title: fix: pre-build OpenCode plugin deps in image to avoid first-prompt timeout (#491)
Author: @ColeMurray
Files changed: 4 files, +83/-29
This change moves the expensive OpenCode plugin dependency install work out of the first prompt path by baking a lockfile-backed dependency set into the image and copying it into .opencode/ at boot, with a longer bridge timeout as a safety net. I reviewed the diff and surrounding runtime/test context and did not find any blocking correctness, security, or maintainability issues.
Critical Issues
None.
Suggestions
- [Testing]
packages/sandbox-runtime/tests/test_tool_installation.py- Consider adding one more restored-snapshot case that preserves existingpackage-lock.jsonand/ornode_modules, not justpackage.json, since this code intentionally keeps prior.opencodestate and that is the main place stale combinations could resurface.
Nitpicks
None.
Positive Feedback
- Keeping the slow npm/arborist work in the image build rather than the request path is a clean fix for the reported timeout.
- The boot-time copy logic is minimal and deliberately avoids overwriting restored
.opencodestate. - The new tests cover the main happy path and the important no-overwrite behavior.
Questions
None.
Verdict
Approve
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/modal-infra/src/images/base.py (1)
35-37:⚠️ Potential issue | 🟠 MajorBump
CACHE_BUSTERto force image rebuild.This PR adds a new
run_commandslayer at lines 116-127 that pre-builds/app/opencode-deps, butCACHE_BUSTERstill reads"v45-ttyd". Per the repo's coding guideline for this file, this constant must be updated whenever image content changes so Modal invalidates the cached layers and the new deps cache actually makes it into the deployed image. Without this bump, the "Pending: rebuild of the deployed image to verify first prompt" step in the PR plan won't produce a meaningfully new image on Modal hosts that already havev45-ttydcached.Proposed bump
-# Cache buster - change this to force Modal image rebuild -# v45: add ttyd web terminal -CACHE_BUSTER = "v45-ttyd" +# Cache buster - change this to force Modal image rebuild +# v46: pre-build `@opencode-ai/plugin` deps cache at /app/opencode-deps +CACHE_BUSTER = "v46-opencode-deps"As per coding guidelines: "Update
CACHE_BUSTERconstant inpackages/modal-infra/src/images/base.pyto force a Modal image rebuild".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/modal-infra/src/images/base.py` around lines 35 - 37, CACHE_BUSTER is not updated so Modal may reuse a cached image and the new run_commands layer (pre-building /app/opencode-deps) won't be included; update the CACHE_BUSTER constant value (the string assigned to CACHE_BUSTER) in the same file so Modal invalidates cached layers and rebuilds the image, ensuring the new run_commands layer at function/section handling run_commands (lines where /app/opencode-deps is prebuilt) is applied in the deployed image.packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py (1)
284-312:⚠️ Potential issue | 🟡 MinorDeps cache copy is gated behind
has_tools; decouple it.Lines 284-286 return early when neither the legacy tool nor
tools_direxists, which means the new pre-built deps cache at lines 299-312 is never copied in that scenario. Today the image always ships/app/sandbox_runtime/tools/, so it works in practice, but the coupling is fragile — the whole point of this PR is to guarantee the pre-built lockfile is in place before OpenCode starts, and that guarantee shouldn't depend on an unrelated tools directory existing.Additionally,
opencode_dircurrently only gets created viatool_dest.mkdir(parents=True, exist_ok=True)on line 288, so if you hoist the deps copy above the early return you'll need to ensure the directory exists first.Suggested refactor
def _install_tools(self, workdir: Path) -> None: """Copy custom tools into the .opencode/tool directory for OpenCode to discover.""" opencode_dir = workdir / ".opencode" tool_dest = opencode_dir / "tool" - # Legacy tool (inspect-plugin.js → create-pull-request.js) legacy_tool = Path("/app/sandbox_runtime/plugins/inspect-plugin.js") - # New tools directory tools_dir = Path("/app/sandbox_runtime/tools") + deps_cache = Path("/app/opencode-deps") - has_tools = legacy_tool.exists() or tools_dir.exists() - if not has_tools: + has_tools = legacy_tool.exists() or tools_dir.exists() + has_deps = deps_cache.exists() + if not has_tools and not has_deps: return - tool_dest.mkdir(parents=True, exist_ok=True) + opencode_dir.mkdir(parents=True, exist_ok=True) + if has_tools: + tool_dest.mkdir(parents=True, exist_ok=True) if legacy_tool.exists(): shutil.copy(legacy_tool, tool_dest / "create-pull-request.js") # Copy all .js files from tools/ — these must export tool() for OpenCode if tools_dir.exists(): for tool_file in tools_dir.iterdir(): if tool_file.is_file() and tool_file.suffix == ".js": shutil.copy(tool_file, tool_dest / tool_file.name) # Copy pre-built deps (package.json, package-lock.json, node_modules) - # from the image staging directory. ... - deps_cache = Path("/app/opencode-deps") for name in ("package.json", "package-lock.json"): src = deps_cache / name dest = opencode_dir / name if src.exists() and not dest.exists(): shutil.copy2(src, dest)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py` around lines 284 - 312, The deps copy is incorrectly gated by the has_tools early return; move the pre-built deps copy block (the code that uses deps_cache, cached_modules, and local_modules) out from under the has_tools check so it runs regardless of legacy_tool or tools_dir, and ensure opencode_dir exists before copying by calling opencode_dir.mkdir(parents=True, exist_ok=True) (or similar) prior to copying; keep the existing logic that still creates tool_dest (tool_dest.mkdir(...)) and copies legacy_tool/tools files only when has_tools is true.
🧹 Nitpick comments (3)
packages/sandbox-runtime/src/sandbox_runtime/bridge.py (1)
137-137: Timeout name lacks the_SECONDSsuffix required by the style guide.Per the repo's timeout-naming rule, Python timeout constants must encode the unit in the identifier (
timeout_seconds/TIMEOUT_SECONDS).OPENCODE_REQUEST_TIMEOUTis ambiguous and inconsistent with its siblings in the class (e.g.SSE_INACTIVITY_TIMEOUT,HTTP_DEFAULT_TIMEOUThave the same issue). Since you're touching this line, a drive-by rename toOPENCODE_REQUEST_TIMEOUT_SECONDS(and updating the fiveself.OPENCODE_REQUEST_TIMEOUTreferences) would bring it into compliance; the rest of the constants can follow in a separate sweep.Separately: bumping the per-request ceiling to 30s is appropriate as a safety net, but it also means
/sessioncreation and/abortcan now block a caller for up to 30s. Confirm the control plane's end-to-end timeouts onstop/promptcommand handling still leave headroom above this.As per coding guidelines: "Use seconds for Python timeouts and milliseconds for TypeScript timeouts, encoding the unit in variable names (Python:
timeout_seconds, TypeScript:timeoutMs,TIMEOUT_MS)".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/sandbox-runtime/src/sandbox_runtime/bridge.py` at line 137, Rename the constant OPENCODE_REQUEST_TIMEOUT to OPENCODE_REQUEST_TIMEOUT_SECONDS and update all uses of self.OPENCODE_REQUEST_TIMEOUT to self.OPENCODE_REQUEST_TIMEOUT_SECONDS (there are five references) so the timeout unit is encoded in the identifier; ensure the constant value remains 30.0 and run tests/linters to catch any missed references.packages/modal-infra/src/images/base.py (1)
116-127: Pin@opencode-ai/pluginto match what OpenCode will request at runtime.Using
"@opencode-ai/plugin":"*"during image build means the lockfile snapshots whatever version was latest at build time. If the globally-installedopencode-aiat line 110 (also@latest) later resolves a plugin entry whose name-only comparison logic considers the lockfile stale (e.g. if OpenCode introduces a second transitive requirement or verifies semver), you'll silently fall back to the slowarb.reify()path this PR is trying to eliminate. Consider pinning bothopencode-aiand@opencode-ai/pluginto explicit versions (or deriving the plugin version from the installed opencode CLI) so the cache and runtime stay in lockstep.Also worth considering: run
npm cache clean --forceafter the install (similar to line 182) to avoid baking~/.npminto this layer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/modal-infra/src/images/base.py` around lines 116 - 127, Pin the build-time dependency versions so the prebuilt lockfile matches runtime OpenCode resolution: replace the wildcard dependency in the staging package.json creation (the run_commands block that writes /app/opencode-deps/package.json and installs) to use the same explicit version you install for the global opencode CLI (the opencode-ai version installed earlier), i.e., set "@opencode-ai/plugin" to that exact version (or derive the plugin version from the installed opencode CLI) and likewise pin "opencode-tools" if needed; after the npm install command in the same run_commands sequence, add an "npm cache clean --force" invocation to avoid baking ~/.npm into the image layer.packages/sandbox-runtime/tests/test_tool_installation.py (1)
177-204: Widen no-overwrite assertions to coverpackage-lock.jsonandnode_modules.This test only proves
package.jsonis preserved. The production code also guardspackage-lock.jsonandnode_moduleswithnot dest.exists()checks — regressions there (e.g. someone removing the guards to "refresh" the lockfile) would silently clobber a snapshot's working state without failing a test. A couple extra asserts would lock in the contract:Suggested additions
existing_pkg = workdir / ".opencode" / "package.json" existing_pkg.write_text('{"name": "existing"}') + existing_lock = workdir / ".opencode" / "package-lock.json" + existing_lock.write_text('{"lockfileVersion": 2, "existing": true}') + existing_nm = workdir / ".opencode" / "node_modules" + existing_nm.mkdir() + (existing_nm / "marker").write_text("existing") with _patch_paths(legacy=legacy_tool, tools=tmp_path / "no-tools", deps_cache=deps_cache): sup._install_tools(workdir) - # Existing package.json should be preserved, not overwritten by cache assert existing_pkg.read_text() == '{"name": "existing"}' + assert '"existing": true' in existing_lock.read_text() + assert (existing_nm / "marker").read_text() == "existing"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/sandbox-runtime/tests/test_tool_installation.py` around lines 177 - 204, The test test_does_not_overwrite_existing_files only asserts package.json is preserved; expand it to also assert package-lock.json and node_modules in the .opencode directory are not clobbered by the cache. After creating existing_pkg, create references like existing_lock = workdir / ".opencode" / "package-lock.json" and existing_node_modules = workdir / ".opencode" / "node_modules" (or use the existing variables if present), write a sentinel to existing_lock and create a directory or marker file under existing_node_modules, then after calling sup._install_tools(workdir) assert existing_lock.read_text() still equals the sentinel and existing_node_modules.exists() (and/or contains the marker) so package-lock.json and node_modules are preserved just like package.json.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/modal-infra/src/images/base.py`:
- Around line 35-37: CACHE_BUSTER is not updated so Modal may reuse a cached
image and the new run_commands layer (pre-building /app/opencode-deps) won't be
included; update the CACHE_BUSTER constant value (the string assigned to
CACHE_BUSTER) in the same file so Modal invalidates cached layers and rebuilds
the image, ensuring the new run_commands layer at function/section handling
run_commands (lines where /app/opencode-deps is prebuilt) is applied in the
deployed image.
In `@packages/sandbox-runtime/src/sandbox_runtime/entrypoint.py`:
- Around line 284-312: The deps copy is incorrectly gated by the has_tools early
return; move the pre-built deps copy block (the code that uses deps_cache,
cached_modules, and local_modules) out from under the has_tools check so it runs
regardless of legacy_tool or tools_dir, and ensure opencode_dir exists before
copying by calling opencode_dir.mkdir(parents=True, exist_ok=True) (or similar)
prior to copying; keep the existing logic that still creates tool_dest
(tool_dest.mkdir(...)) and copies legacy_tool/tools files only when has_tools is
true.
---
Nitpick comments:
In `@packages/modal-infra/src/images/base.py`:
- Around line 116-127: Pin the build-time dependency versions so the prebuilt
lockfile matches runtime OpenCode resolution: replace the wildcard dependency in
the staging package.json creation (the run_commands block that writes
/app/opencode-deps/package.json and installs) to use the same explicit version
you install for the global opencode CLI (the opencode-ai version installed
earlier), i.e., set "@opencode-ai/plugin" to that exact version (or derive the
plugin version from the installed opencode CLI) and likewise pin
"opencode-tools" if needed; after the npm install command in the same
run_commands sequence, add an "npm cache clean --force" invocation to avoid
baking ~/.npm into the image layer.
In `@packages/sandbox-runtime/src/sandbox_runtime/bridge.py`:
- Line 137: Rename the constant OPENCODE_REQUEST_TIMEOUT to
OPENCODE_REQUEST_TIMEOUT_SECONDS and update all uses of
self.OPENCODE_REQUEST_TIMEOUT to self.OPENCODE_REQUEST_TIMEOUT_SECONDS (there
are five references) so the timeout unit is encoded in the identifier; ensure
the constant value remains 30.0 and run tests/linters to catch any missed
references.
In `@packages/sandbox-runtime/tests/test_tool_installation.py`:
- Around line 177-204: The test test_does_not_overwrite_existing_files only
asserts package.json is preserved; expand it to also assert package-lock.json
and node_modules in the .opencode directory are not clobbered by the cache.
After creating existing_pkg, create references like existing_lock = workdir /
".opencode" / "package-lock.json" and existing_node_modules = workdir /
".opencode" / "node_modules" (or use the existing variables if present), write a
sentinel to existing_lock and create a directory or marker file under
existing_node_modules, then after calling sup._install_tools(workdir) assert
existing_lock.read_text() still equals the sentinel and
existing_node_modules.exists() (and/or contains the marker) so package-lock.json
and node_modules are preserved just like package.json.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 99c1ff21-81d7-4f15-9cac-7ff5465471b0
📒 Files selected for processing (4)
packages/modal-infra/src/images/base.pypackages/sandbox-runtime/src/sandbox_runtime/bridge.pypackages/sandbox-runtime/src/sandbox_runtime/entrypoint.pypackages/sandbox-runtime/tests/test_tool_installation.py
Terraform Validation Results
Pushed by: @ColeMurray, Action: |
Summary
@opencode-ai/plugindeps (package.json,package-lock.json,node_modules) into the sandbox image at/app/opencode-deps/.opencode/so OpenCode'sNpm.install()finds the lockfile in sync and skipsarb.reify()entirelyOPENCODE_REQUEST_TIMEOUTfrom 10s to 30s as a safety netProblem
When a sandbox boots, OpenCode checks
package.jsondeps againstpackage-lock.jsonusing@npmcli/arborist. The old entrypoint wrote a minimalpackage.jsonwith no dependencies and symlinkednode_modulesto the global install. OpenCode then added@opencode-ai/plugintopackage.json, found no lockfile, and triggeredarb.reify()(2-22s) — blocking the first prompt inside the bridge's 10s HTTP timeout.How it works
OpenCode's
Npm.install()does a name-only check — it compares declared dependency names inpackage.jsonagainst names inpackage-lock.json. Since the pre-built lockfile already contains@opencode-ai/plugin, the check passes andarb.reify()is never called.Test plan
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests