Skip to content

feat: auto-download Chrome via fetcher, unify Docker image, fix singleton lock#268

Merged
jamiepine merged 6 commits intomainfrom
feat/browser-chrome-fetcher
Mar 1, 2026
Merged

feat: auto-download Chrome via fetcher, unify Docker image, fix singleton lock#268
jamiepine merged 6 commits intomainfrom
feat/browser-chrome-fetcher

Conversation

@jamiepine
Copy link
Member

Summary

  • Chrome auto-download: Enable chromiumoxide's _fetcher-rustls-tokio feature so Chrome is downloaded on first browser tool use and cached in {data_dir}/chrome_cache/ (same pattern as fastembed's embedding_cache/). Works on bare binary (macOS/Linux), Docker, and hosted instances.
  • Unified Docker image: Collapse slim/full into a single stage. Drop apt-get install chromium (~200MB), keep only the runtime shared libraries (~30MB). Image shrinks from 369MB to ~167MB compressed.
  • Fix singleton lock crashes: Each browser launch gets a unique temp dir instead of sharing /tmp/chromiumoxide-runner/. Adds Drop impl on BrowserState to clean up on worker exit. Fixes the "Browser process exited with status ExitStatus(unix_wait_status(5376))" error users were hitting.

Chrome Detection Chain

  1. Explicit executable_path from config (user override)
  2. CHROME / CHROME_PATH environment variables
  3. chromiumoxide default detection (system PATH + well-known paths)
  4. Auto-download via BrowserFetcher (cached, ~150MB first download)

Changes

File Change
Cargo.toml Add _fetcher-rustls-tokio feature
src/tools/browser.rs Rewrite launch with detection chain, unique user data dir, Drop impl
src/config.rs Add chrome_cache_dir to BrowserConfig
src/update.rs Remove ImageVariant, simplify resolve_target_image
Dockerfile Single stage, drop chromium pkg, keep runtime libs
.github/workflows/release.yml Single image build (no slim/full)
fly.toml, fly.staging.toml Remove target = "full"

Migration

  • Hosted instances: Chrome downloads on first browser tool use (~10-30s). Cached on persistent volume.
  • Self-hosted Docker: Users on :full or :slim tags should switch to :latest or versioned tags. Legacy variant suffixes are stripped by resolve_target_image.
  • Bare binary: Browser tool now works out of the box on systems with Chrome runtime libs.

Verification

  • cargo check — clean
  • cargo fmt --check — clean
  • cargo clippy --all-targets — clean
  • cargo test --lib — 274 passed

…eton lock

Replace the slim/full Docker image split with a single unified image.
Chrome is now auto-downloaded on first browser tool use via
chromiumoxide's fetcher and cached in {data_dir}/chrome_cache/,
matching the pattern used by fastembed for embedding models.

- Enable _fetcher-rustls-tokio feature on chromiumoxide
- Add 4-step Chrome detection: config > env vars > system PATH > fetcher
- Use unique temp dir per browser launch (prevents singleton lock crashes)
- Add Drop impl on BrowserState for cleanup on worker exit
- Collapse Dockerfile slim/full into single stage with runtime libs only
- Simplify release.yml to build one image per arch (no slim/full matrix)
- Remove ImageVariant enum and all variant branching from update.rs
- Read CHROME/CHROME_PATH env vars as fallback detection
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Consolidates the Dockerfile into a single runtime stage; adds chrome_cache_dir to BrowserConfig; implements layered Chrome executable resolution with optional auto-download and per-launch user-data-dir lifecycle management; and removes image-variant tagging to produce plain vX.Y.Z image tags.

Changes

Cohort / File(s) Summary
Dockerfile Runtime Consolidation
Dockerfile
Collapse multi-stage build into one runtime stage, broaden runtime packages (OpenSSH, fonts, browser libs), move binary/entrypoint/healthcheck/ports/envs into final image, and remove CHROME_PATH/CHROME_FLAGS and slim/full stage split.
Browser Configuration
src/config.rs
Add public chrome_cache_dir: PathBuf to BrowserConfig (default chrome_cache); compute per-instance chrome_cache_dir via instance_dir.join("chrome_cache") and propagate through TOML/default/per-agent resolution.
Browser Tool Enhancement
src/tools/browser.rs
Introduce layered Chrome executable resolution (config → env vars → detection → optional fetch) with helpers (resolve_chrome_executable, detect_chrome_from_env, fetch_chrome); add per-launch user_data_dir to BrowserState, create/remove temp user-data dirs, implement Drop cleanup, and adjust launch/close flows and logging.
Image Variant Removal
src/update.rs
Remove ImageVariant enum and variant-specific logic; simplify resolve_target_image(current_image, new_version) to always emit plain vX.Y.Z tags; update callers and tests accordingly.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • spacedriveapp/spacebot PR 207: Modifies image-tagging and variant-aware logic in src/update.rs, directly related to the removal/simplification of variant handling here.
  • spacedriveapp/spacebot PR 143: Also changes resolve_target_image behavior and tag parsing in src/update.rs, overlapping with the tag normalization in this PR.
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the three main changes: auto-download Chrome via fetcher, unify Docker image, and fix singleton lock.
Description check ✅ Passed The description comprehensively covers all aspects of the changeset, including Chrome auto-download, unified Docker image, singleton lock fixes, and migration guidance.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/browser-chrome-fetcher

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
Dockerfile (1)

57-57: Remove unused openssh-server package from runtime stage.

The runtime exposes only ports 19898 (API/UI) and 18789 (webhook). No SSH service is started or referenced anywhere in the codebase, making this package unnecessary attack surface and patch overhead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` at line 57, Remove the unused openssh-server package from the
runtime image's package list: locate the installs in the Dockerfile runtime
stage where "openssh-server" appears and delete that token from the apt install
(or equivalent) line(s), then rebuild the image to verify no runtime references
to openssh-server remain (the package name "openssh-server" identifies the exact
item to remove); ensure exposed ports 19898 and 18789 and any startup scripts
remain unchanged.
src/config.rs (1)

4334-4358: Extract browser resolution into a dedicated BrowserConfig::resolve(...) path.

Lines 4334-4358 inline browser resolution logic inside Config::from_toml. Moving this into a per-subsystem resolver would reduce drift with other load paths (like Config::load_from_env) and keep precedence behavior centralized.

As per coding guidelines, src/config.rs: "Use config resolution order: env > DB > default with per-subsystem resolve() methods".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.rs` around lines 4334 - 4358, Extract the inline browser-merging
logic into a new BrowserConfig::resolve(&self, base: &BrowserConfig,
instance_dir: &Path) -> BrowserConfig method and replace the block in
Config::from_toml with a call to that resolver; the resolve method should
encapsulate precedence (env/DB/default), build chrome_cache_dir from
instance_dir, merge fields using unwrap_or/or_else like the current logic (use
base_defaults.browser as the base), and ensure other callers (e.g.,
Config::load_from_env or any DB-load path) also call BrowserConfig::resolve to
centralize resolution behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/config.rs`:
- Line 740: BrowserConfig::default() sets chrome_cache_dir to a relative
PathBuf::from("chrome_cache"), causing env-only loading paths to differ from
Config::from_toml which resolves to instance_dir/chrome_cache; update
BrowserConfig::default() to construct chrome_cache_dir relative to
DefaultsConfig::instance_dir (or return a path that will be resolved by
Config::load_from_env) so both Config::from_toml and Config::load_from_env
produce the same instance_dir/chrome_cache location; specifically change the
default for chrome_cache_dir in BrowserConfig::default() (the
PathBuf::from("chrome_cache") entry) so it is either a joined path with the
instance dir or a sentinel relative path that Config::load_from_env resolves the
same way Config::from_toml does.

---

Nitpick comments:
In `@Dockerfile`:
- Line 57: Remove the unused openssh-server package from the runtime image's
package list: locate the installs in the Dockerfile runtime stage where
"openssh-server" appears and delete that token from the apt install (or
equivalent) line(s), then rebuild the image to verify no runtime references to
openssh-server remain (the package name "openssh-server" identifies the exact
item to remove); ensure exposed ports 19898 and 18789 and any startup scripts
remain unchanged.

In `@src/config.rs`:
- Around line 4334-4358: Extract the inline browser-merging logic into a new
BrowserConfig::resolve(&self, base: &BrowserConfig, instance_dir: &Path) ->
BrowserConfig method and replace the block in Config::from_toml with a call to
that resolver; the resolve method should encapsulate precedence
(env/DB/default), build chrome_cache_dir from instance_dir, merge fields using
unwrap_or/or_else like the current logic (use base_defaults.browser as the
base), and ensure other callers (e.g., Config::load_from_env or any DB-load
path) also call BrowserConfig::resolve to centralize resolution behavior.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f8af5c and 68ff582.

⛔ Files ignored due to path filters (5)
  • .github/workflows/release.yml is excluded by !**/*.yml
  • Cargo.lock is excluded by !**/*.lock, !**/*.lock
  • Cargo.toml is excluded by !**/*.toml
  • fly.staging.toml is excluded by !**/*.toml
  • fly.toml is excluded by !**/*.toml
📒 Files selected for processing (4)
  • Dockerfile
  • src/config.rs
  • src/tools/browser.rs
  • src/update.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/config.rs (1)

740-740: ⚠️ Potential issue | 🟠 Major

chrome_cache_dir remains relative in env-only loads.

Line 740 still defaults to a relative path, and Config::load_from_env keeps DefaultsConfig::default(), so env-only runs can cache under the process CWD instead of {instance_dir}/chrome_cache. This diverges from TOML resolution and can cause cache misses/permission surprises.

🔧 Proposed fix
@@ pub fn load_from_env(instance_dir: &Path) -> Result<Self> {
-        Ok(Self {
+        let mut defaults = DefaultsConfig::default();
+        defaults.browser.chrome_cache_dir = instance_dir.join("chrome_cache");
+
+        Ok(Self {
             instance_dir: instance_dir.to_path_buf(),
             llm,
-            defaults: DefaultsConfig::default(),
+            defaults,
             agents,
             links: Vec::new(),
             groups: Vec::new(),
As per coding guidelines, "Use config resolution order: env > DB > default with per-subsystem `resolve()` methods".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.rs` at line 740, The chrome_cache_dir default is a relative
PathBuf ("chrome_cache") which causes env-only loads (Config::load_from_env) to
cache under the process CWD; update resolution so defaults are anchored to the
instance directory by calling the defaults' resolve method when building
env-only configs — e.g. ensure DefaultsConfig::default() is passed through
DefaultsConfig::resolve(instance_dir) (or call DefaultsConfig::resolve in
Config::load_from_env) so chrome_cache_dir is converted to
{instance_dir}/chrome_cache; touch the chrome_cache_dir field and any related
resolution logic in DefaultsConfig/Config to perform this conversion.
🧹 Nitpick comments (1)
src/config.rs (1)

4397-4409: Use a descriptive closure variable instead of b.

Rename b to something explicit like browser_config for readability and consistency.

♻️ Suggested rename
-                    .map(|b| {
+                    .map(|browser_config| {
                         let base = &base_defaults.browser;
                         BrowserConfig {
-                            enabled: b.enabled.unwrap_or(base.enabled),
-                            headless: b.headless.unwrap_or(base.headless),
-                            evaluate_enabled: b.evaluate_enabled.unwrap_or(base.evaluate_enabled),
-                            executable_path: b
+                            enabled: browser_config.enabled.unwrap_or(base.enabled),
+                            headless: browser_config.headless.unwrap_or(base.headless),
+                            evaluate_enabled: browser_config
+                                .evaluate_enabled
+                                .unwrap_or(base.evaluate_enabled),
+                            executable_path: browser_config
                                 .executable_path
                                 .or_else(|| base.executable_path.clone()),
-                            screenshot_dir: b
+                            screenshot_dir: browser_config
                                 .screenshot_dir
                                 .map(PathBuf::from)
                                 .or_else(|| base.screenshot_dir.clone()),
As per coding guidelines, "Don't abbreviate variable names. Use `queue` not `q`, `message` not `msg`, `channel` not `ch`. Common abbreviations like `config` are fine."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.rs` around lines 4397 - 4409, The closure variable `b` in the map
that builds BrowserConfig is non-descriptive; rename it to `browser_config` (or
similar) in the closure argument and update all references inside the closure
(e.g., uses in enabled: b.enabled, headless: b.headless, evaluate_enabled:
b.evaluate_enabled, executable_path: b.executable_path, screenshot_dir:
b.screenshot_dir) so the code reads enabled: browser_config.enabled, etc.,
leaving the rest of the mapping and references to base_defaults.browser and the
BrowserConfig struct unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/config.rs`:
- Line 740: The chrome_cache_dir default is a relative PathBuf ("chrome_cache")
which causes env-only loads (Config::load_from_env) to cache under the process
CWD; update resolution so defaults are anchored to the instance directory by
calling the defaults' resolve method when building env-only configs — e.g.
ensure DefaultsConfig::default() is passed through
DefaultsConfig::resolve(instance_dir) (or call DefaultsConfig::resolve in
Config::load_from_env) so chrome_cache_dir is converted to
{instance_dir}/chrome_cache; touch the chrome_cache_dir field and any related
resolution logic in DefaultsConfig/Config to perform this conversion.

---

Nitpick comments:
In `@src/config.rs`:
- Around line 4397-4409: The closure variable `b` in the map that builds
BrowserConfig is non-descriptive; rename it to `browser_config` (or similar) in
the closure argument and update all references inside the closure (e.g., uses in
enabled: b.enabled, headless: b.headless, evaluate_enabled: b.evaluate_enabled,
executable_path: b.executable_path, screenshot_dir: b.screenshot_dir) so the
code reads enabled: browser_config.enabled, etc., leaving the rest of the
mapping and references to base_defaults.browser and the BrowserConfig struct
unchanged.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68ff582 and 9de6da7.

📒 Files selected for processing (1)
  • src/config.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/tools/browser.rs (1)

1221-1232: Consider logging when env var is set but path doesn't exist.

When CHROME or CHROME_PATH is set but points to a non-existent path, the function silently falls through. A debug log would help troubleshoot misconfigured environments:

Suggested improvement
 fn detect_chrome_from_env() -> Option<PathBuf> {
     for variable in ["CHROME", "CHROME_PATH"] {
         if let Ok(value) = std::env::var(variable) {
             let path = PathBuf::from(&value);
             if path.exists() {
                 return Some(path);
+            } else {
+                tracing::debug!(
+                    variable,
+                    path = %path.display(),
+                    "environment variable set but path does not exist"
+                );
             }
         }
     }
     None
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/browser.rs` around lines 1221 - 1232, The detect_chrome_from_env
function currently ignores cases where CHROME or CHROME_PATH is set but points
to a non-existent path; update detect_chrome_from_env to emit a debug log when
an environment variable is present but the PathBuf does not exist (include the
env variable name and its value in the message) so misconfigurations are
visible; use the project’s logging facility (e.g., debug! or tracing::debug!)
consistent with other modules, and keep the existing return behavior (return
Some(path) only when path.exists()) while logging the failed cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/tools/browser.rs`:
- Around line 550-556: The cleanup currently silently discards errors when
removing the user_data_dir; replace the blocking std::fs::remove_dir_all call
with the async tokio::fs::remove_dir_all and handle errors instead of using `let
_ =`. In the branch that checks state.browser.is_some() (after drop(browser) and
referencing user_data_dir), call tokio::fs::remove_dir_all(&user_data_dir).await
and log any Err(e) (e.g., with tracing::warn! or the project's logger) including
the path and error details so failures are not silently ignored.
- Around line 154-176: In the BrowserState Drop implementation (fn drop), the
fallback branch that runs outside a tokio runtime currently discards the Result
with `let _ = std::fs::remove_dir_all(&dir);`; change this to handle and report
errors instead: call std::fs::remove_dir_all(&dir) and match on the Result,
printing any Err via a fallback logger such as eprintln! (including the
directory path and error) or using tracing::error if available, so the failure
isn't silently ignored when cleaning up the user_data_dir.

---

Nitpick comments:
In `@src/tools/browser.rs`:
- Around line 1221-1232: The detect_chrome_from_env function currently ignores
cases where CHROME or CHROME_PATH is set but points to a non-existent path;
update detect_chrome_from_env to emit a debug log when an environment variable
is present but the PathBuf does not exist (include the env variable name and its
value in the message) so misconfigurations are visible; use the project’s
logging facility (e.g., debug! or tracing::debug!) consistent with other
modules, and keep the existing return behavior (return Some(path) only when
path.exists()) while logging the failed cases.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9de6da7 and c1294b9.

📒 Files selected for processing (1)
  • src/tools/browser.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/config.rs (1)

4396-4420: Rename abbreviated closure variable for guideline compliance.

Line 4400 uses b; please use a descriptive name (for example, browser_override) in this block.

♻️ Proposed refactor
-                toml.defaults
-                    .browser
-                    .map(|b| {
+                toml.defaults
+                    .browser
+                    .map(|browser_override| {
                         let base = &base_defaults.browser;
                         BrowserConfig {
-                            enabled: b.enabled.unwrap_or(base.enabled),
-                            headless: b.headless.unwrap_or(base.headless),
-                            evaluate_enabled: b.evaluate_enabled.unwrap_or(base.evaluate_enabled),
-                            executable_path: b
+                            enabled: browser_override.enabled.unwrap_or(base.enabled),
+                            headless: browser_override.headless.unwrap_or(base.headless),
+                            evaluate_enabled: browser_override
+                                .evaluate_enabled
+                                .unwrap_or(base.evaluate_enabled),
+                            executable_path: browser_override
                                 .executable_path
                                 .or_else(|| base.executable_path.clone()),
-                            screenshot_dir: b
+                            screenshot_dir: browser_override
                                 .screenshot_dir
                                 .map(PathBuf::from)
                                 .or_else(|| base.screenshot_dir.clone()),
                             chrome_cache_dir: chrome_cache_dir.clone(),
                         }
                     })

As per coding guidelines, "Don't abbreviate variable names. Use queue not q, message not msg, channel not ch. Common abbreviations like config are fine."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.rs` around lines 4396 - 4420, Rename the abbreviated closure
variable `b` to a descriptive name such as `browser_override` in the browser
mapping closure so it complies with naming guidelines; update the mapping
expression that constructs `BrowserConfig` (the closure passed to
toml.defaults.browser.map) to use `browser_override` wherever `b` was used
(e.g., `browser_override.enabled`, `browser_override.headless`,
`browser_override.executable_path`, `browser_override.screenshot_dir`) while
leaving surrounding symbols (`BrowserConfig`, `base_defaults.browser`,
`chrome_cache_dir`, and the unwrap_or_else branch) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/config.rs`:
- Around line 4396-4420: Rename the abbreviated closure variable `b` to a
descriptive name such as `browser_override` in the browser mapping closure so it
complies with naming guidelines; update the mapping expression that constructs
`BrowserConfig` (the closure passed to toml.defaults.browser.map) to use
`browser_override` wherever `b` was used (e.g., `browser_override.enabled`,
`browser_override.headless`, `browser_override.executable_path`,
`browser_override.screenshot_dir`) while leaving surrounding symbols
(`BrowserConfig`, `base_defaults.browser`, `chrome_cache_dir`, and the
unwrap_or_else branch) unchanged.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7071293 and 4e5ec12.

📒 Files selected for processing (1)
  • src/config.rs

@jamiepine jamiepine merged commit d0664bf into main Mar 1, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant