fix(sessions): preserve terminal status after agent run completes#60290
fix(sessions): preserve terminal status after agent run completes#60290jwchmodx wants to merge 5 commits intoopenclaw:mainfrom
Conversation
…ostReplyRootId Direct messages in Mattermost were creating threads even with replyToMode=off because resolveMattermostReplyRootId would fall back to payload.replyToId regardless of chat kind. When a downstream payload carried a replyToId (e.g. from block-streaming delivery), this bypassed the earlier DM threading guard and set root_id on the outbound post, making DM replies appear as threads instead of in the channel body. Fix: pass kind through all three delivery call sites and hard-return undefined inside resolveMattermostReplyRootId for kind="direct", mirroring the resolveMattermostEffectiveReplyToId guard that already existed for session-key resolution. Fixes openclaw#59981
…hreshold The generic_repeat detector only checked warningThreshold and always returned level "warning", making criticalThreshold effectively a no-op for the most common runaway loop pattern (same tool + same args). Add a critical-threshold check before the warning check, consistent with how known_poll_no_progress and ping_pong already behave. Fixes openclaw#60111 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…aining-only fields MiniMax's usage_percent / usagePercent fields report the *remaining* quota as a percentage, not the consumed quota. When count fields (prompt_limit / prompt_remain) are also present, fromCounts already computed the correct usedPercent and the inverted value was silently ignored. But when only usage_percent is returned (no count fields), the code treated it as a used-percent and passed it through unchanged, causing the menu bar to show "2% left" instead of "98% left". Move usage_percent and usagePercent from PERCENT_KEYS to a new REMAINING_PERCENT_KEYS array. deriveUsedPercent now inverts remaining-percent values to obtain usedPercent, matching the behaviour already validated by the existing "prefers count-based usage when percent looks inverted" test. Count- based fromCounts still takes priority over both key groups. Fixes openclaw#60193 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MiMo reasoning models (mimo-v2-pro, mimo-v2-omni) output their full response to reasoning_content with an empty content field. This causes OpenClaw to emit no visible text since the pi-ai stream emits reasoning_content deltas as thinking blocks, not as main content. Setting enable_thinking: false in the request payload directs the model to write its reply to the standard content field instead. Closes openclaw#60261 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
persistGatewaySessionLifecycleEvent writes the terminal status to disk as a fire-and-forget write. updateSessionStoreAfterAgentRun runs after it, but reads the in-memory sessionStore (loaded before the run) which still carries status: "running". The stale status was being spread into the merge patch, clobbering the "done"/"failed"/"killed" status on disk. Result: sessions could remain stuck as "running" even though endedAt and runtimeMs were correctly set, blocking new inbound messages and stop/abort commands until the store row was edited manually. Fix: omit the status field from the patch in updateSessionStoreAfterAgentRun so that the lifecycle-managed terminal status on disk is always preserved. Closes openclaw#60250 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR bundles five independent fixes: the primary fix strips Confidence Score: 5/5Safe to merge — all five fixes are targeted, well-tested, and carry no P0/P1 issues. All remaining findings are P2 or lower. The primary session-status fix is logically correct and covered by a regression test. The four accompanying fixes (Mattermost DM threading, tool-loop critical escalation, MiniMax percent inversion, Xiaomi reasoning-model payload) are each independently sound and tested. No data integrity, security, or reliability concerns were found. No files require special attention. Reviews (1): Last reviewed commit: "fix(sessions): preserve terminal status ..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c230d1d0e8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // MiniMax's usage_percent / usagePercent fields report the remaining quota | ||
| // as a percentage, not the consumed quota. Treat them as "remaining percent" | ||
| // and invert to get usedPercent. Count-based fromCounts always takes priority. | ||
| const REMAINING_PERCENT_KEYS = ["usage_percent", "usagePercent"] as const; |
There was a problem hiding this comment.
Score remaining-percent keys when selecting MiniMax usage records
REMAINING_PERCENT_KEYS is introduced for usage_percent, but scoreUsageRecord still only awards score for PERCENT_KEYS. As a result, nested usage records that only expose usage_percent/usagePercent (and no count fields) can be filtered out of collectUsageCandidates, causing fetchMinimaxUsage to return Unsupported response shape even though valid usage data exists. This regression appears in payloads where the remaining-percent field is present only in nested objects.
Useful? React with 👍 / 👎.
Problem
After a run completes, a session can remain persisted with
status: "running"even thoughendedAtandruntimeMsare already set. This wedges the session: new inbound messages are not accepted and stop/abort commands do not work until the store row is manually edited.Root cause
persistGatewaySessionLifecycleEventis called withvoid(fire-and-forget) in the agent event listener. It enqueues a write to set the terminal status on disk. Immediately after,updateSessionStoreAfterAgentRunenqueues its own write.The lock queue runs them in order:
{ status: "done", endedAt: X, runtimeMs: Y }✅updateSessionStoreAfterAgentRun: reads disk (has"done"), but patches from the in-memorysessionStorewhich was loaded during session initialisation — before the run — and still carriesstatus: "running".mergeSessionEntry(diskState, next)spreadsnext.status = "running"over the disk's"done", leavingendedAt/runtimeMsintact (they were never innext).Final persisted state:
{ status: "running", endedAt: X, runtimeMs: Y }— the exact stuck pattern described in the issue.Fix
Omit
statusfrom the patch passed tomergeSessionEntryinsideupdateSessionStoreAfterAgentRun. This function is responsible for token/cost/model fields only; lifecycle status is exclusively owned bypersistGatewaySessionLifecycleEvent.Test
Added a regression test in
src/commands/agent/session-store.test.ts:{ status: "done", endedAt: X, runtimeMs: 1234 }(as if lifecycle end already ran){ status: "running" }(stale)updateSessionStoreAfterAgentRun, assertspersisted.status === "done"andendedAt/runtimeMspreservedCloses #60250
🤖 Generated with Claude Code