fix(e2e): dismiss BootCheckGate picker before every spec (mega-flow root cause) by senamakel · Pull Request #1779 · tinyhumansai/openhuman

senamakel · 2026-05-15T04:26:41Z

Summary

Follow-up to #1777. The artifacts that PR uploaded made the Linux mega-flow root cause visible: the captured DOM at the moment of failure showed the BootCheckGate "Choose core mode" picker modal still mounted on top of the app. The picker is a fixed-position overlay that intercepts every click in the WebView, so the deep-link-driven /consume flow never actually runs and waitForMockRequest times out on the very first sub-test (login: consume) — and every subsequent sub-test inherits the same wedged state.

The picker only renders when persisted coreMode.kind === 'unset' (see BootCheckGate.tsx:535-537), which is the default on a fresh CEF profile — i.e. every CI run on Linux, and any clean local repro. Smoke spec didn't notice because it only checks hasAppChrome (window handle existence), which is true even with the modal up.

Fix: waitForApp now calls a new dismissBootCheckGate helper that polls for the picker heading, clicks Continue (Local is pre-selected on desktop builds), and waits for the modal to unmount. It's a no-op when the picker isn't up, so it's safe for the smoke spec and every other already-green spec.

macOS / Windows mega-flow was likely hitting the same wall — those jobs are also continue-on-error: true and we haven't been uploading their artifacts to confirm, but the picker logic isn't platform-conditional. This helper should unblock all three.

Once this lands and one green CI run confirms mega-flow sub-tests get past it on Linux, the continue-on-error: true on the mega-flow step in .github/workflows/e2e.yml can be dropped in a one-liner follow-up.

Test plan

pnpm typecheck clean on the changes.
Helper is a no-op when the modal is absent (early return on !onPicker), so the smoke spec — which already runs past the picker via hasAppChrome — won't regress.
Only app/test/e2e/helpers/app-helpers.ts modified; no production code touched. Zero risk to app behavior; this PR only changes E2E test infrastructure.
Root cause sourced from the real artifact uploaded by fix(e2e/linux): silence dead-session noise + local docker harness refresh #1777 (failure-Mega-flow-login-Gmail-OAuth-Composio-in-one-session-login-consume-deep-link-trig.source.xml showed the picker modal still mounted) — not a speculative fix.
(N/A: cannot verify locally) Mega-flow sub-tests now reach the deep-link path — to be confirmed by this PR's Linux E2E run (currently continue-on-error: true, so it won't block, but the artifacts will show whether sub-tests now progress past login: consume).

Summary by CodeRabbit

Tests
- E2E infra now auto-dismisses the initial "choose core mode" setup modal during startup checks.
- Scenario reset behavior updated to perform a safe admin-side mock reset (avoids destructive local data resets).
Chores
- Test-run sessions use an external browser/CEF cache directory to avoid interrupting running sessions and now remove it on exit.
Bug Fixes
- Improved external tool execution and auth-retry flow to prevent stacked retries and increase reliability.

Mega-flow on Linux fails on the very first sub-test (`login: consume`) because the app boots into the "Choose core mode" picker modal, which is a fixed-position overlay that intercepts every click and renders the deep-link-driven login flow inert — the `/consume` mock request never fires, the `waitForMockRequest` times out, and every subsequent sub-test inherits the same wedged state. Root cause came out of the `e2e-artifacts-linux` upload added in tinyhumansai#1777: the captured DOM showed the BootCheckGate modal still mounted at the time of failure, with "Local (recommended) / Cloud" buttons and a Continue button visible. The picker only renders when persisted `coreMode.kind === 'unset'` (see `BootCheckGate.tsx:535-537`), which is the default on a fresh CEF profile — i.e. every CI run on Linux, and any clean local repro. Fix: `waitForApp` now calls a new `dismissBootCheckGate` helper that polls for the picker heading, clicks Continue (Local is pre-selected on desktop builds), and waits for the modal to unmount. It's a no-op when the picker isn't up, so it's safe for the smoke spec and any other spec that already runs past the modal. Macros/Windows mega-flow was likely hitting the same wall — those jobs are also `continue-on-error: true` and we never uploaded their artifacts to confirm, but the picker logic isn't platform-conditional. This helper unblocks all three. Follow-up: once one green CI run confirms mega-flow sub-tests get past this on Linux, drop `continue-on-error: true` from the Linux mega-flow step in `.github/workflows/e2e.yml`.

coderabbitai · 2026-05-15T04:26:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
✅ Review completed - (🔄 Check again to review again)

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 06f09004-f621-4a27-85b0-d4bca05f9273

📥 Commits

Reviewing files that changed from the base of the PR and between 031316f and e8bffae.

📒 Files selected for processing (1)

src/openhuman/composio/client.rs

🚧 Files skipped from review as they are similar to previous changes (1)

src/openhuman/composio/client.rs

📝 Walkthrough

Walkthrough

E2E infra: add BootCheckGate dismissal, isolate CEF cache to an external temp dir, and stop destructive per-scenario core resets; Tauri: honor OPENHUMAN_CEF_CACHE_PATH and add tests; Composio: add execute_tool_once and switch auth-retry wrapper to call it for single-shot retries.

Changes

E2E & Tauri infra changes

Layer / File(s)	Summary
BootCheckGate dismissal & wait update `app/test/e2e/helpers/app-helpers.ts`	Adds `dismissBootCheckGate(timeout?)`, updates `waitForApp()` to call it, and updates docstring to note dismissal when persisted `coreMode.kind === 'unset'`.
e2e-run-session CEF cache temp dir `app/scripts/e2e-run-session.sh`	Creates/exports `OPENHUMAN_CEF_CACHE_PATH` to a temp dir when unset, records it in `CREATED_TEMP_CEF_CACHE`, logs it, and removes it during cleanup.
Tauri CEF cache override & tests `app/src-tauri/src/cef_profile.rs`	`prepare_process_cache_path` returns the preset `OPENHUMAN_CEF_CACHE_PATH` if set (creates it) and a regression test + mutex ensure it doesn't create the workspace `users/` subtree when overridden.
Mock-admin-only scenario reset `app/test/e2e/specs/mega-flow.spec.ts`	`resetEverything()` now performs only an admin-side mock reset, clears request logs, resets mock behavior, and skips `openhuman.config_reset_local_data` and `config.toml` rewrite (relying on per-scenario deep-link tokens).

Composio single-call execute

Layer / File(s)	Summary
Auth-retry wrapper uses single-shot execute `src/openhuman/composio/auth_retry.rs`	Module docs updated; `execute_with_auth_retry_inner` calls `ComposioClient::execute_tool_once` for both initial and retry attempts to avoid stacking retries.
ComposioClient::execute_tool_once `src/openhuman/composio/client.rs`	Adds `execute_tool_once` that validates tool slug, normalizes arguments, builds payload, and delegates to `post_execute_tool` without the post-OAuth retry layer.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

tinyhumansai/openhuman#1708: Related changes to auth-retry and single-retry behavior in the same module.
tinyhumansai/openhuman#1707: Related adjustments to Composio client's retry behavior that motivated adding a single-shot execute.

Poem

🐰 I nudged the tests to sip, not drown,

Cache hops out of the workspace town,
A modal cleared with a polite click,
One-shot calls — no retry trick,
Hooray, the garden’s calm and quick.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically summarizes the main change: dismissing BootCheckGate before specs to fix mega-flow root cause, matching the primary changeset focus.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/test/e2e/helpers/app-helpers.ts`:
- Around line 80-114: The helper currently waits for the picker modal to be
dismissed but silently continues when the dismissal deadline expires; update the
logic inside the "if (clicked)" branch to throw a clear Error if the modal is
still present after the dismissDeadline. Specifically, after the while loop that
polls via browser.execute (the check using
Array.from(document.querySelectorAll('h2')).some(...)), detect the timeout case
(i.e., the loop finished with stillThere true) and throw a descriptive error
(e.g., "Picker modal not dismissed within timeout") so callers of this helper
(and tests) fail fast; keep using the existing variables clicked,
dismissDeadline and the browser.execute check to determine the failure
condition.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 22e2b848-0e58-4fe3-a90a-76c5df5a80a8

📥 Commits

Reviewing files that changed from the base of the PR and between 539f2c8 and 8c984fa.

📒 Files selected for processing (1)

app/test/e2e/helpers/app-helpers.ts

Second root cause of the Linux mega-flow cascade, found from the artifact uploaded by PR tinyhumansai#1779's first run: after the BootCheckGate picker fix, `login: consume` now passes, but the next sub-test (`bypass login`) and every subsequent one still fail — all with `invalid session id` from the WebDriver client. Captured DOM is empty because the CEF process has died. What happens: 1. `e2e-run-session.sh` sets `OPENHUMAN_WORKSPACE=/tmp/<mktemp>` and launches the CEF binary. 2. The Tauri shell (`cef_profile::prepare_process_cache_path`) roots the CEF user-data-dir at `$OPENHUMAN_WORKSPACE/users/<id>/cef`. 3. Mega-flow's `resetEverything` calls `openhuman.config_reset_local_data` between sub-tests. Core's `reset_local_data_for_paths` does `remove_dir_all($OPENHUMAN_WORKSPACE)`, pulling the CEF cache out from under the running CEF process. 4. CEF's next file write into its now-deleted cache panics the renderer. The WebDriver session goes with it. Fix is two-part and minimal: - `cef_profile::prepare_process_cache_path` now honors a pre-set `OPENHUMAN_CEF_CACHE_PATH` env var instead of always overwriting it with the workspace-rooted path. This was already half-supported (the rest of the codebase reads the env var via `configured_cache_path_from_env`), it just wasn't a usable override point. Production users get the same per-user `users/<id>/cef` layout as before — no behavior change unless someone explicitly sets the var. - `app/scripts/e2e-run-session.sh` mktemps a CEF cache dir alongside the workspace and exports `OPENHUMAN_CEF_CACHE_PATH` so the running CEF process is unaffected by `reset_local_data`. Cleaned up on exit alongside the workspace. Added a regression test (`prepare_process_cache_path_honors_preset_env`) that asserts the override path wins and the workspace `users/` subtree is not created when the env var is set. After this and the BootCheckGate-picker fix from the previous commit, all 10 remaining `resetEverything`-driven mega-flow failures on Linux should clear. Once one green CI run confirms it, the `continue-on-error: true` on the mega-flow Linux step in `.github/workflows/e2e.yml` can be dropped in a one-liner follow-up.

After the picker and CEF-cache-override fixes, the first sub-test (`login: consume`) passes — but the 10 sub-tests that call `resetEverything` still fail with `invalid session id` (latest run 26 of artifact 25901137267: 13 failures). The captured DOMs are empty 126-byte "invalid session id" stubs, meaning the CEF/WDIO session is dying after `resetEverything` runs. The `openhuman.config_reset_local_data` RPC does `remove_dir_all($OPENHUMAN_WORKSPACE)` + `remove_dir_all(~/.openhuman)` synchronously, while CEF is mid-flight. On Linux/CEF the renderer doesn't survive that — even with our cache-override pointing CEF's cache outside the workspace, something else under those dirs is still live (in-process core HTTP state, deep-link plugin registration, file watchers, …) and the renderer crashes in a way that brings the whole CDP-attached driver session down. Solution: stop relying on the destructive reset for between-scenario isolation. The assertions this spec actually makes only need: - a fresh mock state (provided by `/__admin/reset`), - a fresh in-process mock request log (`clearRequestLog`), - and a fresh deep-link with a new token per scenario — which naturally replaces the auth state in the running app. Drop the `callOpenhumanRpc('openhuman.config_reset_local_data', …)` call from `resetEverything`. The `post-reset` sub-test's narrative is weakened (it no longer verifies "after destructive wipe, login still works"), but that test was failing anyway, and the narrower guarantee — "consecutive logins still work" — is still asserted by the deep-link / `/consume` / `/auth/me` sequence that test runs. A follow-up can introduce a narrower core RPC for "reset session state without touching dirs CEF has open" if a future scenario genuinely needs the destructive flavor.

coderabbitai

🧹 Nitpick comments (2)

app/test/e2e/specs/mega-flow.spec.ts (2)
347-358: 💤 Low value

Scenario description and log message reference removed reset behavior.

Line 347-348: "after the destructive RPC + mock admin reset" — but the destructive RPC was removed.

Line 358: Log message "config.toml survives reset" implies config.toml is being wiped; it's not.

The test remains valid (verifying login works after mock reset), but the framing is now misleading.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/test/e2e/specs/mega-flow.spec.ts` around lines 347 - 358, The test
comment and console log are misleading because resetEverything('final') no
longer performs a destructive RPC/admin reset; update the scenario comment above
the it('post-reset: a fresh login still works end-to-end') and the console.log
that uses LOG + "post-reset login proves config.toml survives reset" to
accurately reflect that this is a non-destructive/mock admin reset and that the
test verifies a fresh login post-mock-reset (e.g., change wording to
"post-mock-reset: verify fresh login works" and remove the implication that
config.toml was wiped), and keep the rest of the test (resetEverything('final'),
triggerDeepLink, waitForMockRequest calls) unchanged.
14-17: 💤 Low value

Stale architecture comment references removed behavior.

The header comment still describes the old flow: "openhuman.config_reset_local_data ... + mock admin reset. Then re-write ~/.openhuman/config.toml". Since resetEverything now performs only mock-side admin reset, this documentation is misleading.

Consider updating to reflect the new isolation strategy (fresh deep-link tokens + mock reset only).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/test/e2e/specs/mega-flow.spec.ts` around lines 14 - 17, The header
comment is outdated: it references openhuman.config_reset_local_data and
rewriting ~/.openhuman/config.toml, but the test flow now uses resetEverything
which performs only mock-side admin reset and fresh deep-link tokens; update the
comment to remove references to config_reset_local_data and config.toml and
instead describe the current isolation strategy (resetEverything -> mock admin
reset + generate fresh deep-link tokens so scenarios start cleanly pointing at
the mock).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@app/test/e2e/specs/mega-flow.spec.ts`:
- Around line 347-358: The test comment and console log are misleading because
resetEverything('final') no longer performs a destructive RPC/admin reset;
update the scenario comment above the it('post-reset: a fresh login still works
end-to-end') and the console.log that uses LOG + "post-reset login proves
config.toml survives reset" to accurately reflect that this is a
non-destructive/mock admin reset and that the test verifies a fresh login
post-mock-reset (e.g., change wording to "post-mock-reset: verify fresh login
works" and remove the implication that config.toml was wiped), and keep the rest
of the test (resetEverything('final'), triggerDeepLink, waitForMockRequest
calls) unchanged.
- Around line 14-17: The header comment is outdated: it references
openhuman.config_reset_local_data and rewriting ~/.openhuman/config.toml, but
the test flow now uses resetEverything which performs only mock-side admin reset
and fresh deep-link tokens; update the comment to remove references to
config_reset_local_data and config.toml and instead describe the current
isolation strategy (resetEverything -> mock admin reset + generate fresh
deep-link tokens so scenarios start cleanly pointing at the mock).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 20435ce0-0a56-4fdb-9464-6cd88e266496

📥 Commits

Reviewing files that changed from the base of the PR and between b8b5480 and d187952.

📒 Files selected for processing (1)

app/test/e2e/specs/mega-flow.spec.ts

…flow-session-death

PR tinyhumansai#1707 (commit f2bc0d3) added a post-OAuth retry inside `ComposioClient::execute_tool` that overlaps the pre-existing single-shot retry in `auth_retry::execute_with_auth_retry` (PR tinyhumansai#1688). Action calls routed through the agent-tool path therefore hit the gateway up to four times per logical retry (outer retries inner, each of which retries internally), violating the `retries_once_only_even_when_second_call_still_errors` regression test that has been red on `main` since tinyhumansai#1707 merged (`assertion left == right failed: must retry exactly once, never a third time; left: 4, right: 2`). Fix without removing either retry: add a `pub(crate) execute_tool_once` on `ComposioClient` that runs the validated POST without the inner retry, and have `auth_retry::execute_with_auth_retry_inner` call that instead of `execute_tool`. Direct callers of `execute_tool` (LinkedIn enrichment, heartbeat collectors, tool schemas) keep tinyhumansai#1707's inner retry; the agent-tool path keeps the outer retry from tinyhumansai#1688; neither stacks on the other. Auth-retry unit tests now pass locally (6/6) on a clean `cargo test -p openhuman --lib composio::auth_retry`.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/composio/client.rs`:
- Around line 244-256: The execute_tool_once function lacks tracing/logging; add
trace/debug statements to record entry (in execute_tool_once) with the trimmed
tool slug and provided arguments (use serde_json::to_string or Display-friendly
form), log the branch when tool is empty before bailing, emit a trace before
calling self.post_execute_tool(&body).await and on successful return, and log
any error returned by post_execute_tool (including the error context) so callers
can correlate entry/exit and failures for single-shot retries; reference
execute_tool_once, the local variable body, and the call to
self.post_execute_tool for placement.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 98645751-fb67-498d-8b36-8376c420e9fc

📥 Commits

Reviewing files that changed from the base of the PR and between d187952 and 031316f.

📒 Files selected for processing (3)

app/test/e2e/specs/mega-flow.spec.ts
src/openhuman/composio/auth_retry.rs
src/openhuman/composio/client.rs

🚧 Files skipped from review as they are similar to previous changes (1)

app/test/e2e/specs/mega-flow.spec.ts

Address CodeRabbit review on PR tinyhumansai#1779: the new single-shot execution path was logging-silent, leaving auth-retry-driven calls without matching debug breadcrumbs in production diagnostics. Mirrors the shape of the surrounding `execute_tool_with_post_oauth_retry` logs (`start`, `completed` with `successful`/`has_error`, `failed` with `error`).

…oot cause) (tinyhumansai#1779)

senamakel requested a review from a team May 15, 2026 04:26

senamakel self-assigned this May 15, 2026

coderabbitai Bot requested changes May 15, 2026

View reviewed changes

Comment thread app/test/e2e/helpers/app-helpers.ts

senamakel added 3 commits May 14, 2026 21:57

chore: apply auto-fixes

b8b5480

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

senamakel added 2 commits May 15, 2026 02:05

Merge remote-tracking branch 'upstream/main' into fix/e2e-linux-mega-…

999f49b

…flow-session-death

coderabbitai Bot requested changes May 15, 2026

View reviewed changes

Comment thread src/openhuman/composio/client.rs

coderabbitai Bot approved these changes May 15, 2026

View reviewed changes

senamakel merged commit aa57a33 into tinyhumansai:main May 15, 2026
27 checks passed

AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026

fix(e2e): dismiss BootCheckGate picker before every spec (mega-flow r…

053c359

…oot cause) (tinyhumansai#1779)

coderabbitai Bot mentioned this pull request May 25, 2026

test(loopback-oauth): extract classify_request and add routing unit tests #2646

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(e2e): dismiss BootCheckGate picker before every spec (mega-flow root cause)#1779

fix(e2e): dismiss BootCheckGate picker before every spec (mega-flow root cause)#1779
senamakel merged 7 commits into
tinyhumansai:mainfrom
senamakel:fix/e2e-linux-mega-flow-session-death

senamakel commented May 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated Code Review Effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

senamakel commented May 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated Code Review Effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

senamakel commented May 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading