Make QuickJS runtime fully synchronous to fix async deadlock#422
Conversation
- Introduced a mechanism to track background sync status with a new `sync_in_flight` flag. - Updated the event loop to check for sync completion and persist state to memory upon completion. - Modified the `handle_sync` function to fire the `onSync` handler in the JS runtime asynchronously, allowing for immediate return while the sync process runs in the background. - Enhanced logging to provide feedback on sync status and completion.
…g on POST requests through staging proxy
- Added logging to track promise resolution progress and timeout events. - Implemented a polling mechanism to drive the QuickJS job queue until promises resolve or timeout occurs. - Introduced debug information logging for stalled promises to aid in troubleshooting. - Improved feedback during polling to indicate ongoing operations. This update aims to improve the reliability and debuggability of asynchronous JavaScript calls within the application.
- Updated the `fetch` function in the QuickJS library to operate synchronously from the JavaScript perspective, blocking the thread until the HTTP request completes. - Enhanced error handling and logging for HTTP requests, including timeouts and response details. - Removed the previous asynchronous implementation to streamline the fetch operation. - Cleaned up the code by removing unused comments and improving readability. This change aims to improve the consistency and reliability of network operations within the application.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughModified the skill sync flow to track asynchronous onSync operations using a Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant EventLoop as Event Loop
participant RPCHandler as RPC Handler
participant JSRuntime as JS Runtime
Client->>RPCHandler: skill/sync RPC call
RPCHandler->>JSRuntime: Evaluate JS onSync()
alt Synchronous Handler
JSRuntime-->>RPCHandler: "done" (synchronous result)
RPCHandler->>EventLoop: Return Ok, sync_in_flight = false
RPCHandler->>EventLoop: persist_state_to_memory()
else Asynchronous Handler (Promise)
JSRuntime-->>RPCHandler: "started" (Promise)
RPCHandler->>JSRuntime: Mark globalThis.__syncInFlight = true
RPCHandler->>EventLoop: Return Ok immediately
loop Event Loop Polling (until sync completes)
EventLoop->>JSRuntime: Poll globalThis.__syncInFlight
JSRuntime-->>EventLoop: false (still in-flight) / undefined (completed)
end
JSRuntime->>JSRuntime: Promise resolves/rejects
JSRuntime->>JSRuntime: Clear globalThis.__syncInFlight
EventLoop->>EventLoop: Detect sync completion
EventLoop->>EventLoop: persist_state_to_memory()
end
RPCHandler-->>Client: skill/sync response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (6)
src/openhuman/skills/qjs_skill_instance/event_loop/rpc_handlers.rs (1)
392-397: Cloned handles are only used in the synchronous"done"path.
ops_for_cb,mem_client_for_cb, andmem_tx_for_cbare cloned but only used whenstatus == "done"(lines 454-461). For the"started"async case, memory persistence is handled by the event loop's completion detection, not these clones.This is minor overhead, but the naming (
_for_cb) suggests these were intended for a callback pattern that isn't implemented here. Consider inlining the references in the"done"branch or renaming for clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/skills/qjs_skill_instance/event_loop/rpc_handlers.rs` around lines 392 - 397, The cloned handles ops_for_cb, mem_client_for_cb, and mem_tx_for_cb are created unconditionally but only used in the synchronous "done" branch (when status == "done") in the RPC handler; remove the unnecessary clones by either moving the clone operations into the "done" branch so they are only created when needed, or rename them to reflect their limited scope (e.g., ops_for_done/mem_client_for_done) and inline use directly from ops_state.clone(), memory_client.clone(), and memory_write_tx.clone() inside the "done" branch in the function handling the status variable to eliminate the minor overhead and clarify intent.src/openhuman/skills/qjs_skill_instance/event_loop/mod.rs (1)
492-517: Verify JS evaluation of__syncInFlight === falsewhen undefined.The polling logic evaluates
globalThis.__syncInFlight === false. In JavaScript,undefined === falsereturnsfalse, so this should be safe when__syncInFlightwas never set. However, this relies on the subtle difference between=== falsevs== falseor!__syncInFlight.The current logic is correct but consider adding a comment explaining this intentional strictness, or initializing
__syncInFlightto a known state in the JS setup.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/skills/qjs_skill_instance/event_loop/mod.rs` around lines 492 - 517, The JS check using ctx.with(...).eval::<bool, _>(b"globalThis.__syncInFlight === false") relies on strict === semantics (undefined === false -> false); add a short comment next to the sync_in_flight check explaining this intentional strictness (referencing sync_in_flight, ctx.with, and the eval call) and/or ensure __syncInFlight is initialized to a known boolean in the JS bootstrap logic so the eval always reflects a boolean, then keep the existing persist_state_to_memory(skill_id, ...) call unchanged.src/openhuman/skills/quickjs_libs/qjs_ops/ops_net.rs (1)
24-27: Connection pooling workaround may need revisiting.Disabling connection pooling entirely (
pool_idle_timeout(0ms),pool_max_idle_per_host(0)) works around the staging proxy issue but impacts performance for skills making many sequential requests. Consider adding a TODO or tracking issue to investigate the root cause with the staging proxy.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/skills/quickjs_libs/qjs_ops/ops_net.rs` around lines 24 - 27, The workaround that disables connection pooling by calling .pool_idle_timeout(std::time::Duration::from_millis(0)) and .pool_max_idle_per_host(0) should be annotated and tracked: add a TODO comment above these calls referencing a tracking issue (or create one) to investigate and fix the staging-proxy root cause so pooling can be re-enabled for performance; ensure the TODO mentions the specific builder chain calls (.pool_idle_timeout and .pool_max_idle_per_host) and the symptom (reused connections hang on consecutive POST requests through the staging proxy) so future maintainers can find and revert the workaround safely.src/openhuman/skills/quickjs_libs/bootstrap.js (3)
1144-1175:webhook.createTunnelstill usesawait net.fetchbutnet.fetchis now synchronous.The
createTunnelfunction is declaredasyncand usesawait net.fetch(...), butnet.fetchnow returns synchronously. This works but is misleading. Consider making it synchronous for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/skills/quickjs_libs/bootstrap.js` around lines 1144 - 1175, The createTunnel function declares async and uses await with net.fetch even though net.fetch is synchronous; remove the async keyword from function createTunnel and drop the await before net.fetch so the call is synchronous, keep the rest of the flow (parse result, throw on parsed.status, extract data/tunnel, call webhook.register and build tunnel.webhookUrl) unchanged; ensure references to net.fetch, createTunnel, webhook.register, __ops.get_session_token, and __platform.env remain correct and no other await remains in that function.
267-309: WebSocket_connectand_startReceivingstill useawaiton now-synchronous ops.The WebSocket class methods use
async/awaitpatterns with__ops.ws_connectand__ops.ws_recv, but these ops are now synchronous (as seen inops_net.rslines 167-194). While this won't break (awaiting a non-Promise returns immediately), it's misleading and creates unnecessary microtask overhead.♻️ Proposed fix to remove unnecessary async
- async _connect() { + _connect() { try { - this._id = await __ops.ws_connect(this.url); + this._id = __ops.ws_connect(this.url); this.readyState = WebSocket.OPEN; // ... } } - async _startReceiving() { + _startReceiving() { while (this.readyState === WebSocket.OPEN) { try { - const message = await __ops.ws_recv(this._id); + const message = __ops.ws_recv(this._id); // ... } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/skills/quickjs_libs/bootstrap.js` around lines 267 - 309, The _connect and _startReceiving functions currently use async/await for __ops.ws_connect and __ops.ws_recv which are synchronous; change both functions to synchronous (remove the async keyword), remove the await keywords when calling __ops.ws_connect and __ops.ws_recv, and keep the same try/catch and control flow so errors are still caught and handled (preserve calls to WebSocket._instances.set(this._id, this), this.onopen, this.onmessage, and this._handleClose). Ensure _startReceiving still loops while this.readyState === WebSocket.OPEN and breaks/handles close codes the same way, and update any callers if they relied on the methods returning Promises.
134-168: Coding guidelines preferconst/letovervar.The changed lines use
varfor variable declarations. While consistent with the existing file style, coding guidelines specify "Use const by default, let if reassignment is needed, avoid var."♻️ Proposed fix to use const/let
-globalThis.fetch = function (url, options) { - options = options || {}; - var method = options.method || 'GET'; - var headers = options.headers || {}; - var body = options.body || null; +globalThis.fetch = function (url, options = {}) { + const method = options.method || 'GET'; + const headers = options.headers || {}; + const body = options.body || null; // Convert Headers object to plain object if needed - var headersObj = {}; + let headersObj = {}; if (headers instanceof Headers) { - headers.forEach(function (value, key) { + headers.forEach((value, key) => { headersObj[key] = value; }); } else { headersObj = headers; } // __ops.fetch expects a JSON string for options (not a JS object) - var resultJson = __ops.fetch( + const resultJson = __ops.fetch( url.toString(), JSON.stringify({ - method: method, + method, headers: headersObj, body: typeof body === 'string' ? body : body ? JSON.stringify(body) : null, }) ); // __ops.fetch returns a JSON string — parse it to access status/headers/body - var parsed = JSON.parse(resultJson); + const parsed = JSON.parse(resultJson);As per coding guidelines: "Use const by default, let if reassignment is needed, avoid var".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/skills/quickjs_libs/bootstrap.js` around lines 134 - 168, In globalThis.fetch replace all var declarations with const or let as appropriate: use const for variables that are never reassigned (e.g., method, body when not reassigned, headersObj when assigned once, resultJson, parsed) and let only for variables that may be reassigned; update the header conversion block to declare headers and headersObj with const/let accordingly and ensure the function still builds the JSON and returns the Response (locate the globalThis.fetch function and variables method, headers, body, headersObj, resultJson, parsed to apply changes).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/skills/qjs_skill_instance/event_loop/mod.rs`:
- Around line 773-786: The code unconditionally sets sync_in_flight = true after
calling rpc_handlers::handle_sync, causing a stale Rust-side flag for
synchronous completions; change the "skill/sync" arm so you only set
*sync_in_flight = true when the handler returns Ok with a status indicating the
sync actually "started" (not when it returned a synchronous "done" result).
Concretely, inspect the returned Result from rpc_handlers::handle_sync (the
variable result) and set sync_in_flight true only if result is Ok and the
returned status equals "started" (otherwise leave sync_in_flight false);
preserve returning result unchanged.
In `@src/openhuman/skills/qjs_skill_instance/event_loop/rpc_handlers.rs`:
- Around line 418-422: The injected JS in rpc_handlers.rs currently hardcodes
the log prefix "[notion][sync]" for both console.log and console.error; change
those to use a dynamic prefix (e.g. include the handler's skill_id or a generic
"[skill][sync]") by injecting or interpolating the skill identifier into the
string used in the success and failure callbacks (update the two occurrences of
console.log('[notion][sync]...') and console.error('[notion][sync]...') to use
something like `"[${skill_id || 'skill'}][sync]"`), ensuring the same dynamic
prefix is used when setting globalThis.__syncInFlight = false in the error path
so logs consistently identify the originating skill.
---
Nitpick comments:
In `@src/openhuman/skills/qjs_skill_instance/event_loop/mod.rs`:
- Around line 492-517: The JS check using ctx.with(...).eval::<bool,
_>(b"globalThis.__syncInFlight === false") relies on strict === semantics
(undefined === false -> false); add a short comment next to the sync_in_flight
check explaining this intentional strictness (referencing sync_in_flight,
ctx.with, and the eval call) and/or ensure __syncInFlight is initialized to a
known boolean in the JS bootstrap logic so the eval always reflects a boolean,
then keep the existing persist_state_to_memory(skill_id, ...) call unchanged.
In `@src/openhuman/skills/qjs_skill_instance/event_loop/rpc_handlers.rs`:
- Around line 392-397: The cloned handles ops_for_cb, mem_client_for_cb, and
mem_tx_for_cb are created unconditionally but only used in the synchronous
"done" branch (when status == "done") in the RPC handler; remove the unnecessary
clones by either moving the clone operations into the "done" branch so they are
only created when needed, or rename them to reflect their limited scope (e.g.,
ops_for_done/mem_client_for_done) and inline use directly from
ops_state.clone(), memory_client.clone(), and memory_write_tx.clone() inside the
"done" branch in the function handling the status variable to eliminate the
minor overhead and clarify intent.
In `@src/openhuman/skills/quickjs_libs/bootstrap.js`:
- Around line 1144-1175: The createTunnel function declares async and uses await
with net.fetch even though net.fetch is synchronous; remove the async keyword
from function createTunnel and drop the await before net.fetch so the call is
synchronous, keep the rest of the flow (parse result, throw on parsed.status,
extract data/tunnel, call webhook.register and build tunnel.webhookUrl)
unchanged; ensure references to net.fetch, createTunnel, webhook.register,
__ops.get_session_token, and __platform.env remain correct and no other await
remains in that function.
- Around line 267-309: The _connect and _startReceiving functions currently use
async/await for __ops.ws_connect and __ops.ws_recv which are synchronous; change
both functions to synchronous (remove the async keyword), remove the await
keywords when calling __ops.ws_connect and __ops.ws_recv, and keep the same
try/catch and control flow so errors are still caught and handled (preserve
calls to WebSocket._instances.set(this._id, this), this.onopen, this.onmessage,
and this._handleClose). Ensure _startReceiving still loops while this.readyState
=== WebSocket.OPEN and breaks/handles close codes the same way, and update any
callers if they relied on the methods returning Promises.
- Around line 134-168: In globalThis.fetch replace all var declarations with
const or let as appropriate: use const for variables that are never reassigned
(e.g., method, body when not reassigned, headersObj when assigned once,
resultJson, parsed) and let only for variables that may be reassigned; update
the header conversion block to declare headers and headersObj with const/let
accordingly and ensure the function still builds the JSON and returns the
Response (locate the globalThis.fetch function and variables method, headers,
body, headersObj, resultJson, parsed to apply changes).
In `@src/openhuman/skills/quickjs_libs/qjs_ops/ops_net.rs`:
- Around line 24-27: The workaround that disables connection pooling by calling
.pool_idle_timeout(std::time::Duration::from_millis(0)) and
.pool_max_idle_per_host(0) should be annotated and tracked: add a TODO comment
above these calls referencing a tracking issue (or create one) to investigate
and fix the staging-proxy root cause so pooling can be re-enabled for
performance; ensure the TODO mentions the specific builder chain calls
(.pool_idle_timeout and .pool_max_idle_per_host) and the symptom (reused
connections hang on consecutive POST requests through the staging proxy) so
future maintainers can find and revert the workaround safely.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4713d6a8-da0a-429a-8829-17a8e426d66f
📒 Files selected for processing (5)
src/openhuman/skills/qjs_skill_instance/event_loop/mod.rssrc/openhuman/skills/qjs_skill_instance/event_loop/rpc_handlers.rssrc/openhuman/skills/qjs_skill_instance/js_handlers.rssrc/openhuman/skills/quickjs_libs/bootstrap.jssrc/openhuman/skills/quickjs_libs/qjs_ops/ops_net.rs
| "skill/sync" => { | ||
| rpc_handlers::handle_sync( | ||
| let result = rpc_handlers::handle_sync( | ||
| rt, | ||
| ctx, | ||
| skill_id, | ||
| ops_state, | ||
| memory_client, | ||
| memory_write_tx, | ||
| ) | ||
| .await | ||
| .await; | ||
| if result.is_ok() { | ||
| *sync_in_flight = true; | ||
| } | ||
| result |
There was a problem hiding this comment.
Potential double memory persistence for synchronous onSync.
When handle_sync returns Ok with status "done" (synchronous completion), the RPC handler in rpc_handlers.rs already calls persist_state_to_memory. However, this code unconditionally sets sync_in_flight = true regardless of status.
Then at lines 492-517, the event loop polls globalThis.__syncInFlight === false. For the synchronous case, __syncInFlight was never set to true in JS, so this evaluates to true immediately (since undefined === false is false, but wait—let me re-check the logic).
Actually, undefined === false evaluates to false in JS, so this may not trigger. However, the Rust flag sync_in_flight is still set to true, and the loop will keep polling until it times out or the JS global is explicitly set.
Consider only setting sync_in_flight = true when status is "started":
🛠️ Proposed fix
let result = rpc_handlers::handle_sync(
rt,
ctx,
skill_id,
ops_state,
memory_client,
memory_write_tx,
)
.await;
- if result.is_ok() {
- *sync_in_flight = true;
+ // Only track in-flight state for async syncs
+ if let Ok(ref val) = result {
+ if val.get("status").and_then(|s| s.as_str()) == Some("started") {
+ *sync_in_flight = true;
+ }
}
result🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/skills/qjs_skill_instance/event_loop/mod.rs` around lines 773 -
786, The code unconditionally sets sync_in_flight = true after calling
rpc_handlers::handle_sync, causing a stale Rust-side flag for synchronous
completions; change the "skill/sync" arm so you only set *sync_in_flight = true
when the handler returns Ok with a status indicating the sync actually "started"
(not when it returned a synchronous "done" result). Concretely, inspect the
returned Result from rpc_handlers::handle_sync (the variable result) and set
sync_in_flight true only if result is Ok and the returned status equals
"started" (otherwise leave sync_in_flight false); preserve returning result
unchanged.
| console.log('[notion][sync] background sync completed successfully'); | ||
| }, | ||
| function(e) { | ||
| globalThis.__syncInFlight = false; | ||
| console.error('[notion][sync] background sync failed: ' + (e && e.message ? e.message : String(e))); |
There was a problem hiding this comment.
Hardcoded [notion][sync] log prefix should be generic.
The JS-injected console.log/error messages use [notion][sync] but this handler serves all skills. Use the skill_id or a generic prefix.
🔧 Proposed fix
result.then(
function() {
globalThis.__syncInFlight = false;
- console.log('[notion][sync] background sync completed successfully');
+ console.log('[skill][sync] background sync completed successfully');
},
function(e) {
globalThis.__syncInFlight = false;
- console.error('[notion][sync] background sync failed: ' + (e && e.message ? e.message : String(e)));
+ console.error('[skill][sync] background sync failed: ' + (e && e.message ? e.message : String(e)));
}
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| console.log('[notion][sync] background sync completed successfully'); | |
| }, | |
| function(e) { | |
| globalThis.__syncInFlight = false; | |
| console.error('[notion][sync] background sync failed: ' + (e && e.message ? e.message : String(e))); | |
| console.log('[skill][sync] background sync completed successfully'); | |
| }, | |
| function(e) { | |
| globalThis.__syncInFlight = false; | |
| console.error('[skill][sync] background sync failed: ' + (e && e.message ? e.message : String(e))); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/skills/qjs_skill_instance/event_loop/rpc_handlers.rs` around
lines 418 - 422, The injected JS in rpc_handlers.rs currently hardcodes the log
prefix "[notion][sync]" for both console.log and console.error; change those to
use a dynamic prefix (e.g. include the handler's skill_id or a generic
"[skill][sync]") by injecting or interpolating the skill identifier into the
string used in the success and failure callbacks (update the two occurrences of
console.log('[notion][sync]...') and console.error('[notion][sync]...') to use
something like `"[${skill_id || 'skill'}][sync]"`), ensuring the same dynamic
prefix is used when setting globalThis.__syncInFlight = false in the error path
so logs consistently identify the originating skill.
…ansai#422) * feat(sync): implement background sync handling in event loop - Introduced a mechanism to track background sync status with a new `sync_in_flight` flag. - Updated the event loop to check for sync completion and persist state to memory upon completion. - Modified the `handle_sync` function to fire the `onSync` handler in the JS runtime asynchronously, allowing for immediate return while the sync process runs in the background. - Enhanced logging to provide feedback on sync status and completion. * fix(net): disable connection pooling in HTTP client to prevent hanging on POST requests through staging proxy * Enhance promise handling in JS call processing - Added logging to track promise resolution progress and timeout events. - Implemented a polling mechanism to drive the QuickJS job queue until promises resolve or timeout occurs. - Introduced debug information logging for stalled promises to aid in troubleshooting. - Improved feedback during polling to indicate ongoing operations. This update aims to improve the reliability and debuggability of asynchronous JavaScript calls within the application. * Refactor JS fetch implementation for synchronous behavior - Updated the `fetch` function in the QuickJS library to operate synchronously from the JavaScript perspective, blocking the thread until the HTTP request completes. - Enhanced error handling and logging for HTTP requests, including timeouts and response details. - Removed the previous asynchronous implementation to streamline the fetch operation. - Cleaned up the code by removing unused comments and improving readability. This change aims to improve the consistency and reliability of network operations within the application. * Fix Rust formatting in ops_net.rs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
net.fetchsynchronous usingtokio::task::block_in_place+block_onto fix rquickjs async deadlockbootstrap.jsto remove all async/await from bridge APIsmodel.generate,model.summarize,backend.fetch,backend.submitDatahandle_js_callProblem
Asyncfunctions complete their tokio future, butrt.idle()can't deliver results back to JS promisesoauthCompleteto take 30s (timeout) and all tool calls to hangSolution
__ops.fetchfromAsyncto synchronous usingblock_in_place+block_onSubmission Checklist
Impact
Related
Summary by CodeRabbit
New Features
Bug Fixes
Refactor