Skip to content

fix: browser session persistence across workers and restarts#348

Merged
jamiepine merged 6 commits intomainfrom
fix/browser-session-persistence
Mar 7, 2026
Merged

fix: browser session persistence across workers and restarts#348
jamiepine merged 6 commits intomainfrom
fix/browser-session-persistence

Conversation

@jamiepine
Copy link
Member

@jamiepine jamiepine commented Mar 7, 2026

Summary

  • Tabs always destroyed when workers finish: close_policy defaulted to CloseBrowser even with persist_session=true, so the LLM's close call (prompted by the worker system prompt) killed Chrome. Fixed by auto-defaulting close_policy to Detach when persist_session is enabled, and conditionalizing the worker prompt to say "detach" instead of "shut down".

  • Cookies lost across agent restarts: user_data_dir was always a random UUID temp path (/tmp/spacebot-browser-<uuid4>). On restart, BrowserState::Drop deleted it (Chrome's cookie DB), and the new launch created a fresh empty profile. Fixed by using a stable path ({chrome_cache_dir}/profile) for persistent sessions and skipping cleanup in Drop, CloseBrowser, and the concurrent-launch race path.

Changes

  • src/config/load.rs — Default close_policy to Detach when persist_session=true and no explicit policy set (both defaults + per-agent sites)
  • src/tools/browser.rs — Stable profile dir for persistent sessions; persistent_profile flag on BrowserState; skip cleanup in Drop and CloseBrowser for persistent profiles
  • src/prompts/engine.rs — Pass browser_persist_session to worker prompt template
  • prompts/en/worker.md.j2 — Conditionalize close step: "Detach" vs "Shut down" based on persist setting
  • src/agent/channel_dispatch.rs, src/agent/cortex.rs — Thread persist_session to prompt renderer

Note

Fixes browser session persistence by ensuring browser profiles are reused across agent restarts instead of being destroyed. This PR auto-defaults the close policy to Detach when persist_session is enabled, uses a stable profile directory for persistent sessions, and conditionalizes the worker prompt instructions accordingly. Browser tabs and cookies now survive across worker cycles when persistence is configured.

Written by Tembo for commit 1caa405. This will update automatically on new commits.

When persist_session is true, close_policy still defaulted to
CloseBrowser, so the LLM's close call (prompted by the worker
system prompt) killed the entire Chrome process. The shared
BrowserState handle survived the worker drop correctly but was
left holding wiped state.

Two fixes:
- Default close_policy to Detach when persist_session is enabled
  and no explicit close_policy is configured (config/load.rs)
- Conditionalize the worker prompt so persistent sessions say
  'Detach from the browser' instead of 'Shut down the browser'
The user_data_dir was always a random UUID temp path, even with
persist_session=true. On agent restart the old BrowserState dropped,
deleting the temp dir (and Chrome's cookie database) via Drop, then
the new launch created a fresh UUID dir with no cookies.

Fix: when persist_session is true, use a stable path under
chrome_cache_dir/profile instead of a random temp dir. Mark it as
a persistent profile so Drop, CloseBrowser, and the concurrent-
launch cleanup path all skip deleting it.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 7, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Threads a browser session persistence flag through config, prompt rendering, agent dispatch, and browser tooling; templates now conditionally describe close behavior; defaults choose Detach when persistence enabled; adds persistent vs ephemeral profile lifecycle and associated cleanup logic.

Changes

Cohort / File(s) Summary
Template and Prompt Rendering
prompts/en/worker.md.j2, src/prompts/engine.rs
Worker template now conditions the 6th browser action text on browser_persist_session. render_worker_prompt gained a browser_persist_session: bool parameter and the prompt engine injects it into the template context.
Configuration and Close Policy Defaults
src/config/load.rs
Adds resolve_close_policy helper and uses it to default close_policy to Detach when persist_session is true and explicit policy is absent; applies to global and per-agent BrowserConfig construction.
Agent Dispatch and Orchestration
src/agent/channel_dispatch.rs, src/agent/cortex.rs, tests/context_dump.rs
Load/clone browser_config once and pass browser_config.persist_session into render_worker_prompt, Worker creation/resume, and dump helpers; removes redundant config reloads and keeps persist_session threaded through interactive, builtin, resume, and pickup flows.
Browser State and Profile Persistence
src/tools/browser.rs
Adds public persistent_profile: bool to BrowserState. Launch chooses a stable shared profile dir for persistent sessions and per-launch temp dirs for ephemeral sessions; cleanup/Drop and Close handling avoid deleting persistent profiles; Debug output updated.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: fixing browser session persistence across workers and restarts through stable profiles and proper policy defaults.
Description check ✅ Passed The description clearly details the two main issues fixed and comprehensively explains the changes across relevant files, making it directly related to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/browser-session-persistence

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

// Persistent sessions use a stable profile dir under chrome_cache_dir so
// cookies, localStorage, and login sessions survive across agent restarts.
// Ephemeral sessions use a random temp dir to avoid singleton lock collisions.
let (user_data_dir, persistent_profile) = if self.config.persist_session {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Persistent sessions now share a stable user-data-dir; if two workers call launch concurrently, Chrome can fail to start due to the profile lock even though another launch is in-flight/just succeeded. Consider serializing persistent launches (e.g., a launch_in_progress flag) or, on launch failure, re-checking state.browser and reconnecting before returning an error.

@@ -1424,7 +1424,15 @@ impl Config {
.or_else(|| base.screenshot_dir.clone()),
persist_session: b.persist_session.unwrap_or(base.persist_session),
close_policy: parse_close_policy(b.close_policy.as_deref())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defaulting logic (detach when persist_session is true and no explicit close_policy) is duplicated here and again in the per-agent override path below. Consider extracting a small helper so the behavior can’t drift between the two paths.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/tools/browser.rs (2)

571-636: ⚠️ Potential issue | 🟡 Minor

Log the cleanup error in the concurrent launch race path.

Line 623 silently discards the remove_dir_all result. The cleanup at lines 1254-1259 correctly logs errors; this path should be consistent.

Proposed fix
             if !persistent_profile {
                 let dir = user_data_dir;
                 tokio::spawn(async move {
-                    let _ = tokio::fs::remove_dir_all(&dir).await;
+                    if let Err(error) = tokio::fs::remove_dir_all(&dir).await {
+                        tracing::debug!(
+                            path = %dir.display(),
+                            %error,
+                            "failed to clean up browser user data dir (concurrent launch race)"
+                        );
+                    }
                 });
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/browser.rs` around lines 571 - 636, In the concurrent-launch
cleanup path after spawning the tokio task for the ephemeral user_data_dir,
capture and log any error from tokio::fs::remove_dir_all instead of discarding
it: change the spawned async block that currently does "let _ =
tokio::fs::remove_dir_all(&dir).await;" to await the result and call
tracing::warn! or tracing::error! with context (e.g., "failed to remove temp
user_data_dir") when Err, referencing the user_data_dir variable; this keeps
behavior consistent with the other cleanup path and ensures failures in the
cleanup are logged.

189-218: ⚠️ Potential issue | 🟡 Minor

Log the cleanup error when dropped outside a tokio runtime.

Line 215 silently discards the remove_dir_all result. Per coding guidelines, errors should be logged rather than silently discarded.

Proposed fix
             } else {
                 // Dropped outside a tokio runtime (unlikely) — clean up inline.
-                let _ = std::fs::remove_dir_all(&dir);
+                if let Err(error) = std::fs::remove_dir_all(&dir) {
+                    // Can't use tracing here (might be during shutdown), just ignore.
+                    // This path is unlikely to be hit in practice.
+                    eprintln!(
+                        "failed to clean up browser user data dir {}: {error}",
+                        dir.display()
+                    );
+                }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/browser.rs` around lines 189 - 218, In BrowserState's Drop impl (fn
drop) the branch that runs when dropping outside a tokio runtime currently
discards the result of std::fs::remove_dir_all(&dir); change this to log any
error instead: capture the Result from remove_dir_all(&dir) and, on Err(e), call
tracing::debug or tracing::warn with context (e.g. path = %dir.display(), %e,
"failed to clean up browser user data dir"); keep the existing behavior when Ok.
This updates the cleanup in the non-tokio branch to mirror the error logging
used in the spawn_blocking branch.
🧹 Nitpick comments (1)
src/agent/cortex.rs (1)

2401-2411: Load browser_config once for both prompt rendering and worker construction.

These values come from two separate runtime_config.browser_config.load() snapshots. If config reloads between them, the worker can receive prompt instructions for one close behavior and a BrowserConfig with another. Derive the prompt input from the same cloned browser_config you pass to Worker::new.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/cortex.rs` around lines 2401 - 2411, Load browser_config once into
a local variable (e.g., let browser_config =
deps.runtime_config.browser_config.load().clone()) and use that same cloned
instance for both prompt rendering (pass browser_config.persist_session and
other fields into prompt_engine.render_worker_prompt) and when constructing the
Worker (pass browser_config into Worker::new) so both prompt inputs and the
Worker share the identical BrowserConfig instead of calling
deps.runtime_config.browser_config.load() twice.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@prompts/en/worker.md.j2`:
- Around line 103-107: Update the templated sentence to render from the resolved
ClosePolicy rather than the browser_persist_session flag: use the effective
close policy variable (e.g., ClosePolicy or close_policy) that the
runtime/template resolves and add explicit branches for close_browser,
close_tabs and close_session (or the equivalent values) so the text reads "close
— Shut down the browser when done" for close_browser, "close — Detach from tabs
but end the session when done" for close_tabs, and "close — Detach from the
browser when done (tabs and session are preserved for the next worker)" for
persistent sessions; replace the browser_persist_session conditional with these
close_policy branches in prompts/en/worker.md.j2 and ensure the template uses
the resolved policy variable name used elsewhere in the codebase.

In `@src/config/load.rs`:
- Around line 1427-1435: The agent-level close_policy fallback is using
base.close_policy which may itself be an implied value; instead, when resolving
ClosePolicy for an agent (the unwrap_or_else block that references
b.persist_session, base.persist_session, ClosePolicy::Detach and
base.close_policy), recompute the default from the raw persist_session inputs
rather than reusing a potentially-derived base.close_policy: if the agent did
not explicitly set close_policy, decide Detach based on the agent-level
persist_session if present, otherwise based on the raw defaults/base
persist_session, and only use base.close_policy if that base value was
explicitly provided; apply the same change to the other similar block around
lines 1611-1623.

---

Outside diff comments:
In `@src/tools/browser.rs`:
- Around line 571-636: In the concurrent-launch cleanup path after spawning the
tokio task for the ephemeral user_data_dir, capture and log any error from
tokio::fs::remove_dir_all instead of discarding it: change the spawned async
block that currently does "let _ = tokio::fs::remove_dir_all(&dir).await;" to
await the result and call tracing::warn! or tracing::error! with context (e.g.,
"failed to remove temp user_data_dir") when Err, referencing the user_data_dir
variable; this keeps behavior consistent with the other cleanup path and ensures
failures in the cleanup are logged.
- Around line 189-218: In BrowserState's Drop impl (fn drop) the branch that
runs when dropping outside a tokio runtime currently discards the result of
std::fs::remove_dir_all(&dir); change this to log any error instead: capture the
Result from remove_dir_all(&dir) and, on Err(e), call tracing::debug or
tracing::warn with context (e.g. path = %dir.display(), %e, "failed to clean up
browser user data dir"); keep the existing behavior when Ok. This updates the
cleanup in the non-tokio branch to mirror the error logging used in the
spawn_blocking branch.

---

Nitpick comments:
In `@src/agent/cortex.rs`:
- Around line 2401-2411: Load browser_config once into a local variable (e.g.,
let browser_config = deps.runtime_config.browser_config.load().clone()) and use
that same cloned instance for both prompt rendering (pass
browser_config.persist_session and other fields into
prompt_engine.render_worker_prompt) and when constructing the Worker (pass
browser_config into Worker::new) so both prompt inputs and the Worker share the
identical BrowserConfig instead of calling
deps.runtime_config.browser_config.load() twice.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a083d2e0-24b5-4cc7-83ce-f7f86a46eabf

📥 Commits

Reviewing files that changed from the base of the PR and between 9d39ae8 and 1caa405.

📒 Files selected for processing (6)
  • prompts/en/worker.md.j2
  • src/agent/channel_dispatch.rs
  • src/agent/cortex.rs
  • src/config/load.rs
  • src/prompts/engine.rs
  • src/tools/browser.rs

Comment on lines +103 to +107
{%- if browser_persist_session %}
6. `close` — Detach from the browser when done (tabs and session are preserved for the next worker)
{%- else %}
6. `close` — Shut down the browser when done
{%- endif %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Drive this instruction from the effective close policy, not browser_persist_session.

persist_session=true only changes the default. If config explicitly sets close_policy = "close_browser" or close_policy = "close_tabs", this text is now wrong and will tell the worker to preserve tabs/session when it will not. Please render against the resolved ClosePolicy and add a close_tabs branch as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prompts/en/worker.md.j2` around lines 103 - 107, Update the templated
sentence to render from the resolved ClosePolicy rather than the
browser_persist_session flag: use the effective close policy variable (e.g.,
ClosePolicy or close_policy) that the runtime/template resolves and add explicit
branches for close_browser, close_tabs and close_session (or the equivalent
values) so the text reads "close — Shut down the browser when done" for
close_browser, "close — Detach from tabs but end the session when done" for
close_tabs, and "close — Detach from the browser when done (tabs and session are
preserved for the next worker)" for persistent sessions; replace the
browser_persist_session conditional with these close_policy branches in
prompts/en/worker.md.j2 and ensure the template uses the resolved policy
variable name used elsewhere in the codebase.

Comment on lines +1427 to +1435
.unwrap_or_else(|| {
// When persist_session is enabled and no explicit close_policy
// is set, default to Detach so tabs survive across workers.
if b.persist_session.unwrap_or(base.persist_session) {
ClosePolicy::Detach
} else {
base.close_policy
}
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Per-agent persist_session = false can still inherit Detach.

This eagerly materializes defaults.browser.close_policy from defaults.browser.persist_session, then later reuses that derived value for agents. If defaults enable persistence and an agent only overrides persist_session = false, the agent still falls back to defaults.browser.close_policy == Detach, so the override is ineffective unless close_policy is also set. Please preserve whether close_policy was explicit vs implied, or recompute the default from the raw layer inputs when resolving each agent.

Also applies to: 1611-1623

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/load.rs` around lines 1427 - 1435, The agent-level close_policy
fallback is using base.close_policy which may itself be an implied value;
instead, when resolving ClosePolicy for an agent (the unwrap_or_else block that
references b.persist_session, base.persist_session, ClosePolicy::Detach and
base.close_policy), recompute the default from the raw persist_session inputs
rather than reusing a potentially-derived base.close_policy: if the agent did
not explicitly set close_policy, decide Detach based on the agent-level
persist_session if present, otherwise based on the raw defaults/base
persist_session, and only use base.close_policy if that base value was
explicitly provided; apply the same change to the other similar block around
lines 1611-1623.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/tools/browser.rs (1)

617-625: ⚠️ Potential issue | 🟠 Major

Persistent launches can still fail before this fallback runs.

This branch only handles the caller that successfully got through Browser::launch and then lost the in-memory race. With a shared persistent profile, a concurrent caller can fail earlier on the Chrome profile lock and return an error even though another worker has already launched the browser. Serialize persistent launches, or on launch failure re-lock state and reconnect if another launch won.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/browser.rs` around lines 617 - 625, The persistent-profile cleanup
branch only handles callers that completed Browser::launch and lost the
in-memory race, but callers that fail earlier (e.g., due to Chrome profile lock)
can return an error even though another worker actually launched the browser;
update Browser::launch to serialize persistent launches or, on any launch
failure when persistent_profile is true, re-acquire the shared state lock (the
same `state` used to detect the winner) and check if a browser instance became
available, then reconnect to that instance instead of returning an error;
identify and update the launch path that uses `persistent_profile`,
`user_data_dir`, and the `tokio::spawn` cleanup block so that failures trigger
the re-lock-and-reconnect logic (or implement a per-profile mutex to serialize
launches) before propagating the error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/tools/browser.rs`:
- Around line 571-576: The code currently uses a hard-coded profile path by
joining self.config.chrome_cache_dir with "profile", which can cause different
agents to share the same on-disk Chrome profile; instead, add a per-agent/site
stable profile path into the BrowserConfig during config loading (e.g.,
config.persistent_profile_dir or config.agent_profile_dir) and replace the
join("profile") usage in the browser construction (the
user_data_dir/persistent_profile branch where self.config.persist_session is
checked) to use that precomputed per-agent path; ensure the code still sets
persistent_profile = true when persist_session is enabled and falls back to
ephemeral temp dirs when not.
- Around line 622-624: The tokio::spawn block that currently does let _ =
tokio::fs::remove_dir_all(&dir).await must not silently discard failures; change
the closure to await the Result and log or handle errors the same way as the
CloseBrowser cleanup path (i.e., inspect the Result from
tokio::fs::remove_dir_all(&dir).await and call the same logging/error-handling
routine used by CloseBrowser). Locate the tokio::spawn(...) with
remove_dir_all(&dir) and replace the discard with explicit match/if let Err(e)
=> { /* call the CloseBrowser cleanup logger or process the error */ } so
cleanup failures are recorded. Ensure you still run the removal asynchronously
but propagate/log any error instead of using let _ =.

---

Duplicate comments:
In `@src/tools/browser.rs`:
- Around line 617-625: The persistent-profile cleanup branch only handles
callers that completed Browser::launch and lost the in-memory race, but callers
that fail earlier (e.g., due to Chrome profile lock) can return an error even
though another worker actually launched the browser; update Browser::launch to
serialize persistent launches or, on any launch failure when persistent_profile
is true, re-acquire the shared state lock (the same `state` used to detect the
winner) and check if a browser instance became available, then reconnect to that
instance instead of returning an error; identify and update the launch path that
uses `persistent_profile`, `user_data_dir`, and the `tokio::spawn` cleanup block
so that failures trigger the re-lock-and-reconnect logic (or implement a
per-profile mutex to serialize launches) before propagating the error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f9ffcd1f-f320-4dbe-ac62-855fdc84a84c

📥 Commits

Reviewing files that changed from the base of the PR and between 9128ee2 and 5e47b3c.

📒 Files selected for processing (1)
  • src/tools/browser.rs

Comment on lines +571 to +576
// Persistent sessions use a stable profile dir under chrome_cache_dir so
// cookies, localStorage, and login sessions survive across agent restarts.
// Ephemeral sessions use a random temp dir to avoid singleton lock collisions.
let (user_data_dir, persistent_profile) = if self.config.persist_session {
(self.config.chrome_cache_dir.join("profile"), true)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Scope the persistent profile path to the agent/site, not the whole instance.

Line 575 hard-codes {chrome_cache_dir}/profile. chrome_cache_dir is populated from {instance_dir}/chrome_cache during config loading, while the shared browser handle is per-agent. By default, that means two agents with persist_session = true can reuse the same on-disk profile, leaking cookies/login state across agents and contending on Chrome's profile lock. Please derive a stable path that is unique per agent/site, even if that means precomputing it during config load and passing it through BrowserConfig.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/browser.rs` around lines 571 - 576, The code currently uses a
hard-coded profile path by joining self.config.chrome_cache_dir with "profile",
which can cause different agents to share the same on-disk Chrome profile;
instead, add a per-agent/site stable profile path into the BrowserConfig during
config loading (e.g., config.persistent_profile_dir or config.agent_profile_dir)
and replace the join("profile") usage in the browser construction (the
user_data_dir/persistent_profile branch where self.config.persist_session is
checked) to use that precomputed per-agent path; ensure the code still sets
persistent_profile = true when persist_session is enabled and falls back to
ephemeral temp dirs when not.

jamiepine and others added 2 commits March 6, 2026 20:23
- Log errors instead of `let _ =` on race-path and Drop cleanup
- Extract resolve_close_policy helper to deduplicate defaulting logic
- Single browser_config load in cortex task pickup (no TOCTOU)
@jamiepine jamiepine merged commit bba4b59 into main Mar 7, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant