Skip to content

fix(ipc): retry initialize() on transient failures#112

Merged
yishuiliunian merged 1 commit intomainfrom
fix/flaky-cluster-e2e-initialize
Apr 16, 2026
Merged

fix(ipc): retry initialize() on transient failures#112
yishuiliunian merged 1 commit intomainfrom
fix/flaky-cluster-e2e-initialize

Conversation

@yishuiliunian
Copy link
Copy Markdown
Contributor

Summary

  • AgentClient::initialize() now retries up to 5 times with 2s timeout per attempt and exponential backoff (100ms, 200ms, 300ms, 400ms)
  • Fixes flaky cluster_boots_two_hubs_with_agents and cluster_cross_hub_message_delivery e2e tests that fail on Ubuntu CI

Root cause

AgentProcess::spawn_with_env() returns immediately after creating the OS process, but the agent server may not be ready to handle JSON-RPC requests yet. On resource-constrained CI runners, initialize() sends the request before the server loop starts, causing the response oneshot channel to be dropped when the reader loop exits.

Changes

  • crates/loopal-agent-client/src/client.rsinitialize() method: add timeout + retry loop with exponential backoff. All callers benefit automatically.

Test plan

  • CI passes (specifically loopal-meta-hub_e2e on Ubuntu)

AgentClient::initialize() now retries up to 5 times with exponential
backoff when the agent process is slow to start. Fixes flaky cluster
e2e tests on CI where the process isn't ready to handle JSON-RPC when
initialize is called immediately after spawn.
@yishuiliunian yishuiliunian merged commit a778e3f into main Apr 16, 2026
4 checks passed
yishuiliunian added a commit that referenced this pull request Apr 16, 2026
#112)

Two features:

1. `.mcp.json` config file support — industry-standard camelCase format
   (`mcpServers`) in `.loopal/` and plugin directories. Stdio type is
   implicit (no `type` field needed). Within a layer, `.mcp.json`
   overrides `settings.json` same-name servers.

2. MCP sub-page action menu — Enter on a selected server opens a
   Disconnect/Reconnect menu instead of directly reconnecting.
   Full disconnect pipeline: TUI → ControlCommand::McpDisconnect →
   Runtime → McpManager::disconnect_connection → ToolRegistry cleanup.
yishuiliunian added a commit that referenced this pull request Apr 16, 2026
#112) (#115)

Two features:

1. `.mcp.json` config file support — industry-standard camelCase format
   (`mcpServers`) in `.loopal/` and plugin directories. Stdio type is
   implicit (no `type` field needed). Within a layer, `.mcp.json`
   overrides `settings.json` same-name servers.

2. MCP sub-page action menu — Enter on a selected server opens a
   Disconnect/Reconnect menu instead of directly reconnecting.
   Full disconnect pipeline: TUI → ControlCommand::McpDisconnect →
   Runtime → McpManager::disconnect_connection → ToolRegistry cleanup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant