Skip to content

fix(e2e): dismiss BootCheckGate picker before every spec (mega-flow root cause)#1779

Merged
senamakel merged 7 commits into
tinyhumansai:mainfrom
senamakel:fix/e2e-linux-mega-flow-session-death
May 15, 2026
Merged

fix(e2e): dismiss BootCheckGate picker before every spec (mega-flow root cause)#1779
senamakel merged 7 commits into
tinyhumansai:mainfrom
senamakel:fix/e2e-linux-mega-flow-session-death

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented May 15, 2026

Summary

Follow-up to #1777. The artifacts that PR uploaded made the Linux mega-flow root cause visible: the captured DOM at the moment of failure showed the BootCheckGate "Choose core mode" picker modal still mounted on top of the app. The picker is a fixed-position overlay that intercepts every click in the WebView, so the deep-link-driven /consume flow never actually runs and waitForMockRequest times out on the very first sub-test (login: consume) — and every subsequent sub-test inherits the same wedged state.

The picker only renders when persisted coreMode.kind === 'unset' (see BootCheckGate.tsx:535-537), which is the default on a fresh CEF profile — i.e. every CI run on Linux, and any clean local repro. Smoke spec didn't notice because it only checks hasAppChrome (window handle existence), which is true even with the modal up.

Fix: waitForApp now calls a new dismissBootCheckGate helper that polls for the picker heading, clicks Continue (Local is pre-selected on desktop builds), and waits for the modal to unmount. It's a no-op when the picker isn't up, so it's safe for the smoke spec and every other already-green spec.

macOS / Windows mega-flow was likely hitting the same wall — those jobs are also continue-on-error: true and we haven't been uploading their artifacts to confirm, but the picker logic isn't platform-conditional. This helper should unblock all three.

Once this lands and one green CI run confirms mega-flow sub-tests get past it on Linux, the continue-on-error: true on the mega-flow step in .github/workflows/e2e.yml can be dropped in a one-liner follow-up.

Test plan

  • pnpm typecheck clean on the changes.
  • Helper is a no-op when the modal is absent (early return on !onPicker), so the smoke spec — which already runs past the picker via hasAppChrome — won't regress.
  • Only app/test/e2e/helpers/app-helpers.ts modified; no production code touched. Zero risk to app behavior; this PR only changes E2E test infrastructure.
  • Root cause sourced from the real artifact uploaded by fix(e2e/linux): silence dead-session noise + local docker harness refresh #1777 (failure-Mega-flow-login-Gmail-OAuth-Composio-in-one-session-login-consume-deep-link-trig.source.xml showed the picker modal still mounted) — not a speculative fix.
  • (N/A: cannot verify locally) Mega-flow sub-tests now reach the deep-link path — to be confirmed by this PR's Linux E2E run (currently continue-on-error: true, so it won't block, but the artifacts will show whether sub-tests now progress past login: consume).

Summary by CodeRabbit

  • Tests

    • E2E infra now auto-dismisses the initial "choose core mode" setup modal during startup checks.
    • Scenario reset behavior updated to perform a safe admin-side mock reset (avoids destructive local data resets).
  • Chores

    • Test-run sessions use an external browser/CEF cache directory to avoid interrupting running sessions and now remove it on exit.
  • Bug Fixes

    • Improved external tool execution and auth-retry flow to prevent stacked retries and increase reliability.

Review Change Stack

Mega-flow on Linux fails on the very first sub-test (`login: consume`)
because the app boots into the "Choose core mode" picker modal, which is
a fixed-position overlay that intercepts every click and renders the
deep-link-driven login flow inert — the `/consume` mock request never
fires, the `waitForMockRequest` times out, and every subsequent
sub-test inherits the same wedged state.

Root cause came out of the `e2e-artifacts-linux` upload added in tinyhumansai#1777:
the captured DOM showed the BootCheckGate modal still mounted at the
time of failure, with "Local (recommended) / Cloud" buttons and a
Continue button visible.

The picker only renders when persisted `coreMode.kind === 'unset'` (see
`BootCheckGate.tsx:535-537`), which is the default on a fresh CEF
profile — i.e. every CI run on Linux, and any clean local repro.

Fix: `waitForApp` now calls a new `dismissBootCheckGate` helper that
polls for the picker heading, clicks Continue (Local is pre-selected
on desktop builds), and waits for the modal to unmount. It's a no-op
when the picker isn't up, so it's safe for the smoke spec and any
other spec that already runs past the modal.

Macros/Windows mega-flow was likely hitting the same wall — those jobs
are also `continue-on-error: true` and we never uploaded their
artifacts to confirm, but the picker logic isn't platform-conditional.
This helper unblocks all three.

Follow-up: once one green CI run confirms mega-flow sub-tests get past
this on Linux, drop `continue-on-error: true` from the Linux mega-flow
step in `.github/workflows/e2e.yml`.
@senamakel senamakel requested a review from a team May 15, 2026 04:26
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 06f09004-f621-4a27-85b0-d4bca05f9273

📥 Commits

Reviewing files that changed from the base of the PR and between 031316f and e8bffae.

📒 Files selected for processing (1)
  • src/openhuman/composio/client.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/composio/client.rs

📝 Walkthrough

Walkthrough

E2E infra: add BootCheckGate dismissal, isolate CEF cache to an external temp dir, and stop destructive per-scenario core resets; Tauri: honor OPENHUMAN_CEF_CACHE_PATH and add tests; Composio: add execute_tool_once and switch auth-retry wrapper to call it for single-shot retries.

Changes

E2E & Tauri infra changes

Layer / File(s) Summary
BootCheckGate dismissal & wait update
app/test/e2e/helpers/app-helpers.ts
Adds dismissBootCheckGate(timeout?), updates waitForApp() to call it, and updates docstring to note dismissal when persisted coreMode.kind === 'unset'.
e2e-run-session CEF cache temp dir
app/scripts/e2e-run-session.sh
Creates/exports OPENHUMAN_CEF_CACHE_PATH to a temp dir when unset, records it in CREATED_TEMP_CEF_CACHE, logs it, and removes it during cleanup.
Tauri CEF cache override & tests
app/src-tauri/src/cef_profile.rs
prepare_process_cache_path returns the preset OPENHUMAN_CEF_CACHE_PATH if set (creates it) and a regression test + mutex ensure it doesn't create the workspace users/ subtree when overridden.
Mock-admin-only scenario reset
app/test/e2e/specs/mega-flow.spec.ts
resetEverything() now performs only an admin-side mock reset, clears request logs, resets mock behavior, and skips openhuman.config_reset_local_data and config.toml rewrite (relying on per-scenario deep-link tokens).

Composio single-call execute

Layer / File(s) Summary
Auth-retry wrapper uses single-shot execute
src/openhuman/composio/auth_retry.rs
Module docs updated; execute_with_auth_retry_inner calls ComposioClient::execute_tool_once for both initial and retry attempts to avoid stacking retries.
ComposioClient::execute_tool_once
src/openhuman/composio/client.rs
Adds execute_tool_once that validates tool slug, normalizes arguments, builds payload, and delegates to post_execute_tool without the post-OAuth retry layer.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I nudged the tests to sip, not drown,

Cache hops out of the workspace town,
A modal cleared with a polite click,
One-shot calls — no retry trick,
Hooray, the garden’s calm and quick.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically summarizes the main change: dismissing BootCheckGate before specs to fix mega-flow root cause, matching the primary changeset focus.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@senamakel senamakel self-assigned this May 15, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/test/e2e/helpers/app-helpers.ts`:
- Around line 80-114: The helper currently waits for the picker modal to be
dismissed but silently continues when the dismissal deadline expires; update the
logic inside the "if (clicked)" branch to throw a clear Error if the modal is
still present after the dismissDeadline. Specifically, after the while loop that
polls via browser.execute (the check using
Array.from(document.querySelectorAll('h2')).some(...)), detect the timeout case
(i.e., the loop finished with stillThere true) and throw a descriptive error
(e.g., "Picker modal not dismissed within timeout") so callers of this helper
(and tests) fail fast; keep using the existing variables clicked,
dismissDeadline and the browser.execute check to determine the failure
condition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 22e2b848-0e58-4fe3-a90a-76c5df5a80a8

📥 Commits

Reviewing files that changed from the base of the PR and between 539f2c8 and 8c984fa.

📒 Files selected for processing (1)
  • app/test/e2e/helpers/app-helpers.ts

Comment thread app/test/e2e/helpers/app-helpers.ts
senamakel added 3 commits May 14, 2026 21:57
Second root cause of the Linux mega-flow cascade, found from the
artifact uploaded by PR tinyhumansai#1779's first run: after the BootCheckGate
picker fix, `login: consume` now passes, but the next sub-test
(`bypass login`) and every subsequent one still fail — all with
`invalid session id` from the WebDriver client. Captured DOM is empty
because the CEF process has died.

What happens:

1. `e2e-run-session.sh` sets `OPENHUMAN_WORKSPACE=/tmp/<mktemp>` and
   launches the CEF binary.
2. The Tauri shell (`cef_profile::prepare_process_cache_path`) roots the
   CEF user-data-dir at `$OPENHUMAN_WORKSPACE/users/<id>/cef`.
3. Mega-flow's `resetEverything` calls
   `openhuman.config_reset_local_data` between sub-tests. Core's
   `reset_local_data_for_paths` does `remove_dir_all($OPENHUMAN_WORKSPACE)`,
   pulling the CEF cache out from under the running CEF process.
4. CEF's next file write into its now-deleted cache panics the renderer.
   The WebDriver session goes with it.

Fix is two-part and minimal:

- `cef_profile::prepare_process_cache_path` now honors a pre-set
  `OPENHUMAN_CEF_CACHE_PATH` env var instead of always overwriting it
  with the workspace-rooted path. This was already half-supported (the
  rest of the codebase reads the env var via
  `configured_cache_path_from_env`), it just wasn't a usable override
  point. Production users get the same per-user `users/<id>/cef` layout
  as before — no behavior change unless someone explicitly sets the var.

- `app/scripts/e2e-run-session.sh` mktemps a CEF cache dir alongside the
  workspace and exports `OPENHUMAN_CEF_CACHE_PATH` so the running CEF
  process is unaffected by `reset_local_data`. Cleaned up on exit
  alongside the workspace.

Added a regression test (`prepare_process_cache_path_honors_preset_env`)
that asserts the override path wins and the workspace `users/` subtree
is not created when the env var is set.

After this and the BootCheckGate-picker fix from the previous commit,
all 10 remaining `resetEverything`-driven mega-flow failures on Linux
should clear. Once one green CI run confirms it, the
`continue-on-error: true` on the mega-flow Linux step in
`.github/workflows/e2e.yml` can be dropped in a one-liner follow-up.
After the picker and CEF-cache-override fixes, the first sub-test
(`login: consume`) passes — but the 10 sub-tests that call
`resetEverything` still fail with `invalid session id` (latest run
26 of artifact 25901137267: 13 failures). The captured DOMs are
empty 126-byte "invalid session id" stubs, meaning the CEF/WDIO
session is dying after `resetEverything` runs.

The `openhuman.config_reset_local_data` RPC does
`remove_dir_all($OPENHUMAN_WORKSPACE)` + `remove_dir_all(~/.openhuman)`
synchronously, while CEF is mid-flight. On Linux/CEF the renderer
doesn't survive that — even with our cache-override pointing CEF's
cache outside the workspace, something else under those dirs is still
live (in-process core HTTP state, deep-link plugin registration,
file watchers, …) and the renderer crashes in a way that brings the
whole CDP-attached driver session down.

Solution: stop relying on the destructive reset for between-scenario
isolation. The assertions this spec actually makes only need:
- a fresh mock state (provided by `/__admin/reset`),
- a fresh in-process mock request log (`clearRequestLog`),
- and a fresh deep-link with a new token per scenario — which
  naturally replaces the auth state in the running app.

Drop the `callOpenhumanRpc('openhuman.config_reset_local_data', …)`
call from `resetEverything`. The `post-reset` sub-test's narrative
is weakened (it no longer verifies "after destructive wipe, login
still works"), but that test was failing anyway, and the narrower
guarantee — "consecutive logins still work" — is still asserted by
the deep-link / `/consume` / `/auth/me` sequence that test runs.

A follow-up can introduce a narrower core RPC for "reset session
state without touching dirs CEF has open" if a future scenario
genuinely needs the destructive flavor.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
app/test/e2e/specs/mega-flow.spec.ts (2)

347-358: 💤 Low value

Scenario description and log message reference removed reset behavior.

Line 347-348: "after the destructive RPC + mock admin reset" — but the destructive RPC was removed.

Line 358: Log message "config.toml survives reset" implies config.toml is being wiped; it's not.

The test remains valid (verifying login works after mock reset), but the framing is now misleading.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/test/e2e/specs/mega-flow.spec.ts` around lines 347 - 358, The test
comment and console log are misleading because resetEverything('final') no
longer performs a destructive RPC/admin reset; update the scenario comment above
the it('post-reset: a fresh login still works end-to-end') and the console.log
that uses LOG + "post-reset login proves config.toml survives reset" to
accurately reflect that this is a non-destructive/mock admin reset and that the
test verifies a fresh login post-mock-reset (e.g., change wording to
"post-mock-reset: verify fresh login works" and remove the implication that
config.toml was wiped), and keep the rest of the test (resetEverything('final'),
triggerDeepLink, waitForMockRequest calls) unchanged.

14-17: 💤 Low value

Stale architecture comment references removed behavior.

The header comment still describes the old flow: "openhuman.config_reset_local_data ... + mock admin reset. Then re-write ~/.openhuman/config.toml". Since resetEverything now performs only mock-side admin reset, this documentation is misleading.

Consider updating to reflect the new isolation strategy (fresh deep-link tokens + mock reset only).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/test/e2e/specs/mega-flow.spec.ts` around lines 14 - 17, The header
comment is outdated: it references openhuman.config_reset_local_data and
rewriting ~/.openhuman/config.toml, but the test flow now uses resetEverything
which performs only mock-side admin reset and fresh deep-link tokens; update the
comment to remove references to config_reset_local_data and config.toml and
instead describe the current isolation strategy (resetEverything -> mock admin
reset + generate fresh deep-link tokens so scenarios start cleanly pointing at
the mock).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@app/test/e2e/specs/mega-flow.spec.ts`:
- Around line 347-358: The test comment and console log are misleading because
resetEverything('final') no longer performs a destructive RPC/admin reset;
update the scenario comment above the it('post-reset: a fresh login still works
end-to-end') and the console.log that uses LOG + "post-reset login proves
config.toml survives reset" to accurately reflect that this is a
non-destructive/mock admin reset and that the test verifies a fresh login
post-mock-reset (e.g., change wording to "post-mock-reset: verify fresh login
works" and remove the implication that config.toml was wiped), and keep the rest
of the test (resetEverything('final'), triggerDeepLink, waitForMockRequest
calls) unchanged.
- Around line 14-17: The header comment is outdated: it references
openhuman.config_reset_local_data and rewriting ~/.openhuman/config.toml, but
the test flow now uses resetEverything which performs only mock-side admin reset
and fresh deep-link tokens; update the comment to remove references to
config_reset_local_data and config.toml and instead describe the current
isolation strategy (resetEverything -> mock admin reset + generate fresh
deep-link tokens so scenarios start cleanly pointing at the mock).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 20435ce0-0a56-4fdb-9464-6cd88e266496

📥 Commits

Reviewing files that changed from the base of the PR and between b8b5480 and d187952.

📒 Files selected for processing (1)
  • app/test/e2e/specs/mega-flow.spec.ts

senamakel added 2 commits May 15, 2026 02:05
PR tinyhumansai#1707 (commit f2bc0d3) added a post-OAuth retry inside
`ComposioClient::execute_tool` that overlaps the pre-existing
single-shot retry in `auth_retry::execute_with_auth_retry` (PR tinyhumansai#1688).
Action calls routed through the agent-tool path therefore hit the
gateway up to four times per logical retry (outer retries inner, each
of which retries internally), violating the
`retries_once_only_even_when_second_call_still_errors` regression test
that has been red on `main` since tinyhumansai#1707 merged
(`assertion left == right failed: must retry exactly once, never a third time;
left: 4, right: 2`).

Fix without removing either retry: add a `pub(crate) execute_tool_once`
on `ComposioClient` that runs the validated POST without the inner
retry, and have `auth_retry::execute_with_auth_retry_inner` call that
instead of `execute_tool`. Direct callers of `execute_tool` (LinkedIn
enrichment, heartbeat collectors, tool schemas) keep tinyhumansai#1707's inner
retry; the agent-tool path keeps the outer retry from tinyhumansai#1688; neither
stacks on the other.

Auth-retry unit tests now pass locally (6/6) on a clean
`cargo test -p openhuman --lib composio::auth_retry`.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/composio/client.rs`:
- Around line 244-256: The execute_tool_once function lacks tracing/logging; add
trace/debug statements to record entry (in execute_tool_once) with the trimmed
tool slug and provided arguments (use serde_json::to_string or Display-friendly
form), log the branch when tool is empty before bailing, emit a trace before
calling self.post_execute_tool(&body).await and on successful return, and log
any error returned by post_execute_tool (including the error context) so callers
can correlate entry/exit and failures for single-shot retries; reference
execute_tool_once, the local variable body, and the call to
self.post_execute_tool for placement.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 98645751-fb67-498d-8b36-8376c420e9fc

📥 Commits

Reviewing files that changed from the base of the PR and between d187952 and 031316f.

📒 Files selected for processing (3)
  • app/test/e2e/specs/mega-flow.spec.ts
  • src/openhuman/composio/auth_retry.rs
  • src/openhuman/composio/client.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/test/e2e/specs/mega-flow.spec.ts

Comment thread src/openhuman/composio/client.rs
Address CodeRabbit review on PR tinyhumansai#1779: the new single-shot execution
path was logging-silent, leaving auth-retry-driven calls without
matching debug breadcrumbs in production diagnostics. Mirrors the
shape of the surrounding `execute_tool_with_post_oauth_retry` logs
(`start`, `completed` with `successful`/`has_error`, `failed` with
`error`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant