fix(e2e): dismiss BootCheckGate picker before every spec (mega-flow root cause)#1779
Conversation
Mega-flow on Linux fails on the very first sub-test (`login: consume`) because the app boots into the "Choose core mode" picker modal, which is a fixed-position overlay that intercepts every click and renders the deep-link-driven login flow inert — the `/consume` mock request never fires, the `waitForMockRequest` times out, and every subsequent sub-test inherits the same wedged state. Root cause came out of the `e2e-artifacts-linux` upload added in tinyhumansai#1777: the captured DOM showed the BootCheckGate modal still mounted at the time of failure, with "Local (recommended) / Cloud" buttons and a Continue button visible. The picker only renders when persisted `coreMode.kind === 'unset'` (see `BootCheckGate.tsx:535-537`), which is the default on a fresh CEF profile — i.e. every CI run on Linux, and any clean local repro. Fix: `waitForApp` now calls a new `dismissBootCheckGate` helper that polls for the picker heading, clicks Continue (Local is pre-selected on desktop builds), and waits for the modal to unmount. It's a no-op when the picker isn't up, so it's safe for the smoke spec and any other spec that already runs past the modal. Macros/Windows mega-flow was likely hitting the same wall — those jobs are also `continue-on-error: true` and we never uploaded their artifacts to confirm, but the picker logic isn't platform-conditional. This helper unblocks all three. Follow-up: once one green CI run confirms mega-flow sub-tests get past this on Linux, drop `continue-on-error: true` from the Linux mega-flow step in `.github/workflows/e2e.yml`.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughE2E infra: add BootCheckGate dismissal, isolate CEF cache to an external temp dir, and stop destructive per-scenario core resets; Tauri: honor OPENHUMAN_CEF_CACHE_PATH and add tests; Composio: add execute_tool_once and switch auth-retry wrapper to call it for single-shot retries. ChangesE2E & Tauri infra changes
Composio single-call execute
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/test/e2e/helpers/app-helpers.ts`:
- Around line 80-114: The helper currently waits for the picker modal to be
dismissed but silently continues when the dismissal deadline expires; update the
logic inside the "if (clicked)" branch to throw a clear Error if the modal is
still present after the dismissDeadline. Specifically, after the while loop that
polls via browser.execute (the check using
Array.from(document.querySelectorAll('h2')).some(...)), detect the timeout case
(i.e., the loop finished with stillThere true) and throw a descriptive error
(e.g., "Picker modal not dismissed within timeout") so callers of this helper
(and tests) fail fast; keep using the existing variables clicked,
dismissDeadline and the browser.execute check to determine the failure
condition.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 22e2b848-0e58-4fe3-a90a-76c5df5a80a8
📒 Files selected for processing (1)
app/test/e2e/helpers/app-helpers.ts
Second root cause of the Linux mega-flow cascade, found from the artifact uploaded by PR tinyhumansai#1779's first run: after the BootCheckGate picker fix, `login: consume` now passes, but the next sub-test (`bypass login`) and every subsequent one still fail — all with `invalid session id` from the WebDriver client. Captured DOM is empty because the CEF process has died. What happens: 1. `e2e-run-session.sh` sets `OPENHUMAN_WORKSPACE=/tmp/<mktemp>` and launches the CEF binary. 2. The Tauri shell (`cef_profile::prepare_process_cache_path`) roots the CEF user-data-dir at `$OPENHUMAN_WORKSPACE/users/<id>/cef`. 3. Mega-flow's `resetEverything` calls `openhuman.config_reset_local_data` between sub-tests. Core's `reset_local_data_for_paths` does `remove_dir_all($OPENHUMAN_WORKSPACE)`, pulling the CEF cache out from under the running CEF process. 4. CEF's next file write into its now-deleted cache panics the renderer. The WebDriver session goes with it. Fix is two-part and minimal: - `cef_profile::prepare_process_cache_path` now honors a pre-set `OPENHUMAN_CEF_CACHE_PATH` env var instead of always overwriting it with the workspace-rooted path. This was already half-supported (the rest of the codebase reads the env var via `configured_cache_path_from_env`), it just wasn't a usable override point. Production users get the same per-user `users/<id>/cef` layout as before — no behavior change unless someone explicitly sets the var. - `app/scripts/e2e-run-session.sh` mktemps a CEF cache dir alongside the workspace and exports `OPENHUMAN_CEF_CACHE_PATH` so the running CEF process is unaffected by `reset_local_data`. Cleaned up on exit alongside the workspace. Added a regression test (`prepare_process_cache_path_honors_preset_env`) that asserts the override path wins and the workspace `users/` subtree is not created when the env var is set. After this and the BootCheckGate-picker fix from the previous commit, all 10 remaining `resetEverything`-driven mega-flow failures on Linux should clear. Once one green CI run confirms it, the `continue-on-error: true` on the mega-flow Linux step in `.github/workflows/e2e.yml` can be dropped in a one-liner follow-up.
After the picker and CEF-cache-override fixes, the first sub-test
(`login: consume`) passes — but the 10 sub-tests that call
`resetEverything` still fail with `invalid session id` (latest run
26 of artifact 25901137267: 13 failures). The captured DOMs are
empty 126-byte "invalid session id" stubs, meaning the CEF/WDIO
session is dying after `resetEverything` runs.
The `openhuman.config_reset_local_data` RPC does
`remove_dir_all($OPENHUMAN_WORKSPACE)` + `remove_dir_all(~/.openhuman)`
synchronously, while CEF is mid-flight. On Linux/CEF the renderer
doesn't survive that — even with our cache-override pointing CEF's
cache outside the workspace, something else under those dirs is still
live (in-process core HTTP state, deep-link plugin registration,
file watchers, …) and the renderer crashes in a way that brings the
whole CDP-attached driver session down.
Solution: stop relying on the destructive reset for between-scenario
isolation. The assertions this spec actually makes only need:
- a fresh mock state (provided by `/__admin/reset`),
- a fresh in-process mock request log (`clearRequestLog`),
- and a fresh deep-link with a new token per scenario — which
naturally replaces the auth state in the running app.
Drop the `callOpenhumanRpc('openhuman.config_reset_local_data', …)`
call from `resetEverything`. The `post-reset` sub-test's narrative
is weakened (it no longer verifies "after destructive wipe, login
still works"), but that test was failing anyway, and the narrower
guarantee — "consecutive logins still work" — is still asserted by
the deep-link / `/consume` / `/auth/me` sequence that test runs.
A follow-up can introduce a narrower core RPC for "reset session
state without touching dirs CEF has open" if a future scenario
genuinely needs the destructive flavor.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
app/test/e2e/specs/mega-flow.spec.ts (2)
347-358: 💤 Low valueScenario description and log message reference removed reset behavior.
Line 347-348: "after the destructive RPC + mock admin reset" — but the destructive RPC was removed.
Line 358: Log message "config.toml survives reset" implies config.toml is being wiped; it's not.
The test remains valid (verifying login works after mock reset), but the framing is now misleading.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/test/e2e/specs/mega-flow.spec.ts` around lines 347 - 358, The test comment and console log are misleading because resetEverything('final') no longer performs a destructive RPC/admin reset; update the scenario comment above the it('post-reset: a fresh login still works end-to-end') and the console.log that uses LOG + "post-reset login proves config.toml survives reset" to accurately reflect that this is a non-destructive/mock admin reset and that the test verifies a fresh login post-mock-reset (e.g., change wording to "post-mock-reset: verify fresh login works" and remove the implication that config.toml was wiped), and keep the rest of the test (resetEverything('final'), triggerDeepLink, waitForMockRequest calls) unchanged.
14-17: 💤 Low valueStale architecture comment references removed behavior.
The header comment still describes the old flow: "
openhuman.config_reset_local_data... + mock admin reset. Then re-write~/.openhuman/config.toml". SinceresetEverythingnow performs only mock-side admin reset, this documentation is misleading.Consider updating to reflect the new isolation strategy (fresh deep-link tokens + mock reset only).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/test/e2e/specs/mega-flow.spec.ts` around lines 14 - 17, The header comment is outdated: it references openhuman.config_reset_local_data and rewriting ~/.openhuman/config.toml, but the test flow now uses resetEverything which performs only mock-side admin reset and fresh deep-link tokens; update the comment to remove references to config_reset_local_data and config.toml and instead describe the current isolation strategy (resetEverything -> mock admin reset + generate fresh deep-link tokens so scenarios start cleanly pointing at the mock).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@app/test/e2e/specs/mega-flow.spec.ts`:
- Around line 347-358: The test comment and console log are misleading because
resetEverything('final') no longer performs a destructive RPC/admin reset;
update the scenario comment above the it('post-reset: a fresh login still works
end-to-end') and the console.log that uses LOG + "post-reset login proves
config.toml survives reset" to accurately reflect that this is a
non-destructive/mock admin reset and that the test verifies a fresh login
post-mock-reset (e.g., change wording to "post-mock-reset: verify fresh login
works" and remove the implication that config.toml was wiped), and keep the rest
of the test (resetEverything('final'), triggerDeepLink, waitForMockRequest
calls) unchanged.
- Around line 14-17: The header comment is outdated: it references
openhuman.config_reset_local_data and rewriting ~/.openhuman/config.toml, but
the test flow now uses resetEverything which performs only mock-side admin reset
and fresh deep-link tokens; update the comment to remove references to
config_reset_local_data and config.toml and instead describe the current
isolation strategy (resetEverything -> mock admin reset + generate fresh
deep-link tokens so scenarios start cleanly pointing at the mock).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 20435ce0-0a56-4fdb-9464-6cd88e266496
📒 Files selected for processing (1)
app/test/e2e/specs/mega-flow.spec.ts
…flow-session-death
PR tinyhumansai#1707 (commit f2bc0d3) added a post-OAuth retry inside `ComposioClient::execute_tool` that overlaps the pre-existing single-shot retry in `auth_retry::execute_with_auth_retry` (PR tinyhumansai#1688). Action calls routed through the agent-tool path therefore hit the gateway up to four times per logical retry (outer retries inner, each of which retries internally), violating the `retries_once_only_even_when_second_call_still_errors` regression test that has been red on `main` since tinyhumansai#1707 merged (`assertion left == right failed: must retry exactly once, never a third time; left: 4, right: 2`). Fix without removing either retry: add a `pub(crate) execute_tool_once` on `ComposioClient` that runs the validated POST without the inner retry, and have `auth_retry::execute_with_auth_retry_inner` call that instead of `execute_tool`. Direct callers of `execute_tool` (LinkedIn enrichment, heartbeat collectors, tool schemas) keep tinyhumansai#1707's inner retry; the agent-tool path keeps the outer retry from tinyhumansai#1688; neither stacks on the other. Auth-retry unit tests now pass locally (6/6) on a clean `cargo test -p openhuman --lib composio::auth_retry`.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/composio/client.rs`:
- Around line 244-256: The execute_tool_once function lacks tracing/logging; add
trace/debug statements to record entry (in execute_tool_once) with the trimmed
tool slug and provided arguments (use serde_json::to_string or Display-friendly
form), log the branch when tool is empty before bailing, emit a trace before
calling self.post_execute_tool(&body).await and on successful return, and log
any error returned by post_execute_tool (including the error context) so callers
can correlate entry/exit and failures for single-shot retries; reference
execute_tool_once, the local variable body, and the call to
self.post_execute_tool for placement.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 98645751-fb67-498d-8b36-8376c420e9fc
📒 Files selected for processing (3)
app/test/e2e/specs/mega-flow.spec.tssrc/openhuman/composio/auth_retry.rssrc/openhuman/composio/client.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- app/test/e2e/specs/mega-flow.spec.ts
Address CodeRabbit review on PR tinyhumansai#1779: the new single-shot execution path was logging-silent, leaving auth-retry-driven calls without matching debug breadcrumbs in production diagnostics. Mirrors the shape of the surrounding `execute_tool_with_post_oauth_retry` logs (`start`, `completed` with `successful`/`has_error`, `failed` with `error`).
Summary
Follow-up to #1777. The artifacts that PR uploaded made the Linux mega-flow root cause visible: the captured DOM at the moment of failure showed the
BootCheckGate"Choose core mode" picker modal still mounted on top of the app. The picker is a fixed-position overlay that intercepts every click in the WebView, so the deep-link-driven/consumeflow never actually runs andwaitForMockRequesttimes out on the very first sub-test (login: consume) — and every subsequent sub-test inherits the same wedged state.The picker only renders when persisted
coreMode.kind === 'unset'(seeBootCheckGate.tsx:535-537), which is the default on a fresh CEF profile — i.e. every CI run on Linux, and any clean local repro. Smoke spec didn't notice because it only checkshasAppChrome(window handle existence), which is true even with the modal up.Fix:
waitForAppnow calls a newdismissBootCheckGatehelper that polls for the picker heading, clicks Continue (Local is pre-selected on desktop builds), and waits for the modal to unmount. It's a no-op when the picker isn't up, so it's safe for the smoke spec and every other already-green spec.macOS / Windows mega-flow was likely hitting the same wall — those jobs are also
continue-on-error: trueand we haven't been uploading their artifacts to confirm, but the picker logic isn't platform-conditional. This helper should unblock all three.Once this lands and one green CI run confirms mega-flow sub-tests get past it on Linux, the
continue-on-error: trueon the mega-flow step in.github/workflows/e2e.ymlcan be dropped in a one-liner follow-up.Test plan
pnpm typecheckclean on the changes.returnon!onPicker), so the smoke spec — which already runs past the picker viahasAppChrome— won't regress.app/test/e2e/helpers/app-helpers.tsmodified; no production code touched. Zero risk to app behavior; this PR only changes E2E test infrastructure.failure-Mega-flow-login-Gmail-OAuth-Composio-in-one-session-login-consume-deep-link-trig.source.xmlshowed the picker modal still mounted) — not a speculative fix.continue-on-error: true, so it won't block, but the artifacts will show whether sub-tests now progress pastlogin: consume).Summary by CodeRabbit
Tests
Chores
Bug Fixes