Skip to content

Feature/scalable agent loop#4

Open
tuhinkanti wants to merge 11 commits intomainfrom
feature/scalable-agent-loop
Open

Feature/scalable agent loop#4
tuhinkanti wants to merge 11 commits intomainfrom
feature/scalable-agent-loop

Conversation

@tuhinkanti
Copy link
Copy Markdown
Owner

@tuhinkanti tuhinkanti commented Mar 1, 2026

Tuhin Sharma and others added 3 commits March 1, 2026 14:36
- Fix response body double-read bug in AnthropicProvider
- Add exponential backoff retry for 429/5xx errors
- Config-driven timeouts, maxTokens, and max iterations
- Parallel tool execution using virtual threads
- Error isolation: single tool failure no longer crashes the loop
- Context window truncation (180K token budget, chars/4 estimate)
- Ralph Wiggum self-correcting outer loop (executeRalph)
- LlmProviderFactory for multi-provider abstraction
- Async gateway dispatch via virtual thread executor
- Session recovery from JSONL on startup with TTL eviction
- Console channel supports /ralph prefix and global ralph mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 new potential issues.

View 15 additional findings in Devin Review.

Open in Devin Review

Comment on lines +170 to +173
if ("assistant_tool_use".equals(firstRole) || "tool_result".equals(firstRole)) {
while (!truncated.isEmpty() && !"user".equals(truncated.get(0).getRole())) {
truncated.remove(0);
}
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Context truncation inserts a 'user' notice that can create consecutive user messages, violating Anthropic API alternation requirement

When context truncation occurs, a synthetic user message is prepended to the truncated messages. If the first retained message also has role user, this creates two consecutive user messages, violating the Anthropic API requirement that messages must alternate between user and assistant roles.

Root Cause

At AgentExecutor.java:171-173:

Message notice = new Message("user",
    "[System: " + dropped + " earlier messages were truncated ...]");
truncated.add(0, notice);

The truncation works by keeping the N most recent messages that fit the token budget. In a normal conversation, the most recent messages start from the end. If the first kept message is a user message (which is common — e.g., the user just sent a prompt), prepending another user message creates two consecutive user messages.

After adding the system message at line 177, the final context looks like: [system, user(notice), user(original), ...]. The Anthropic API will reject this with a 400 error because it requires strict role alternation.

Impact: Context truncation in many common scenarios will cause the next API call to fail.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration[bot]

This comment was marked as resolved.

tuhinkanti and others added 2 commits March 2, 2026 18:09
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

View 17 additional findings in Devin Review.

Open in Devin Review

done
done
fi
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Extra fi in entrypoint.sh causes shell syntax error, preventing Docker container startup

The entrypoint.sh script has an unmatched fi on line 27, which causes a bash syntax error that prevents the Docker container from starting.

Root Cause

The shell structure is:

  • Line 7: if [ -d /certs ] && ....; then
  • Line 8: for bundle in /certs/*.crt; do
  • Line 14: for pem in /tmp/cert-*.pem; do
  • Line 24: done (closes inner for pem)
  • Line 25: done (closes outer for bundle)
  • Line 26: fi (closes the if from line 7)
  • Line 27: fiextra/unmatched, causes syntax error

Verified with bash -n entrypoint.sh:

entrypoint.sh: line 27: syntax error near unexpected token `fi'

Impact: The Docker container will fail to start entirely because set -e is enabled and the entrypoint script cannot be parsed. This blocks all Docker-based deployments.

Suggested change
fi
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

if (result == null) {
sendError(conn, request.getId(), -32601, "Method not found: " + request.getMethod());
return;
agentExecutor.submit(() -> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Concurrent session access via non-thread-safe ArrayList after gateway dispatches to virtual threads

The GatewayServer.onMessage now dispatches requests to virtual threads via agentExecutor.submit(), but Session.messages is a plain ArrayList with no synchronization, enabling concurrent modification.

Root Cause

Before this PR, GatewayServer.onMessage() processed requests synchronously. Now at src/main/java/ai/openclaw/gateway/GatewayServer.java:97, it dispatches to a virtual thread pool:

agentExecutor.submit(() -> { ... });

This means two WebSocket messages for the same session can run agentExecutor.execute(sessionId, message) concurrently. Inside, execute() calls sessionStore.appendMessage() which calls session.addMessage() at src/main/java/ai/openclaw/session/Session.java:76-78:

public void addMessage(Message message) {
    this.messages.add(message); // ArrayList - not thread-safe
    this.lastActiveAt = Instant.now();
}

Meanwhile, buildContext() at src/main/java/ai/openclaw/agent/AgentExecutor.java:154 iterates over session.getMessages() which returns the same ArrayList reference. Concurrent reads and writes to ArrayList can cause ConcurrentModificationException, ArrayIndexOutOfBoundsException, or silent data corruption.

Impact: Under concurrent requests for the same session (now possible via WebSocket virtual thread dispatch or simultaneous Slack + WebSocket requests), the session's message list can be corrupted, causing crashes or lost messages.

Prompt for agents
The Session.messages field (src/main/java/ai/openclaw/session/Session.java) is a plain ArrayList that is now accessed concurrently due to the GatewayServer dispatching to virtual threads. Either:
1. Change Session.messages to a thread-safe list (e.g., Collections.synchronizedList or CopyOnWriteArrayList) in Session.java, and synchronize iteration in AgentExecutor.buildContext()
2. Or add session-level locking in AgentExecutor.execute() to serialize access per session ID

Both Session.java (constructor and setMessages) and AgentExecutor.java (buildContext iterating session.getMessages()) need to be updated.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant