fix(channels): preserve overflow segments instead of dropping them (#1041)#1051
Conversation
…inyhumansai#1041) `segment_for_delivery` capped delivered segments at MAX_SEGMENTS=5 via `.take(MAX_SEGMENTS)`, silently discarding every paragraph past the fifth. Long agent replies (e.g. drafts with bullet lists + several trailing paragraphs) lost roughly half their content before reaching the UI — exactly the symptom reported in tinyhumansai#1041 (UI shows the first 5 bubbles, bullets and tail prose are gone, message ends mid-thought). New `cap_segments(segments, max, joiner)` helper keeps the first `max - 1` segments intact and concatenates any overflow into the final segment with the appropriate joiner (`\n\n` for paragraph strategy, `" "` for sentence strategy). Bounded number of bubbles preserved; zero content loss. Tests: `issue_1041_transcript_preserves_bullets_and_trailing_paragraphs` runs the exact agent reply text from the issue's transcript screenshot and asserts every bullet + every trailing paragraph survives delivery. The previously-passing `max_segments_respected` test was renamed to `max_segments_respected_without_dropping_content` and now also asserts that all 10 input paragraphs appear in the joined output — the assertion that would have caught this bug originally. Closes tinyhumansai#1041 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe presentation segmentation now uses a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/openhuman/channels/providers/presentation_tests.rs (1)
75-80: ⚡ Quick winStrengthen the no-drop assertion to check full paragraph text, not just markers.
Current checks can pass even if paragraph bodies are partially truncated. Using each original
paras[i]as the assertion target gives tighter regression protection.Proposed patch
- for i in 0..10 { - assert!( - joined.contains(&format!("Paragraph number {} ", i)), - "paragraph {} missing from delivered segments", - i - ); - } + for (i, para) in paras.iter().enumerate() { + assert!(joined.contains(para), "paragraph {} missing from delivered segments", i); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/channels/providers/presentation_tests.rs` around lines 75 - 80, The test currently checks only for the paragraph marker via joined.contains(&format!("Paragraph number {} ", i)), which can miss truncated bodies; update the assertion inside the for loop (for i in 0..10) to assert that the concatenated result named joined contains the full original paragraph string paras[i] (i.e., use joined.contains(¶s[i])) and keep the failure message referencing i so missing or truncated paragraphs fail the test.src/openhuman/channels/providers/presentation.rs (1)
261-268: ⚡ Quick winAdd a debug log when overflow is merged into the tail segment.
This branch materially changes delivery shape and is worth logging for quick field debugging/regression triage.
As per coding guidelines, "Log entry/exit, branches ... with stable grep-friendly prefixes (`[domain]`, `[rpc]`, `[ui-flow]`) and correlation fields."Proposed patch
fn cap_segments(segments: Vec<String>, max: usize, joiner: &str) -> Vec<String> { if max == 0 || segments.len() <= max { return segments; } + let original = segments.len(); let mut iter = segments.into_iter(); let mut result: Vec<String> = (&mut iter).take(max - 1).collect(); let tail: Vec<String> = iter.collect(); result.push(tail.join(joiner)); + tracing::debug!( + original_segments = original, + delivered_segments = result.len(), + max_segments = max, + "[presentation:segment] merged overflow into final segment" + ); result }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/channels/providers/presentation.rs` around lines 261 - 268, When collapsing overflow segments into a tail (the else branch that runs when max != 0 && segments.len() > max), add a debug log before pushing the joined tail that uses the project's logging macro (e.g., tracing::debug! or log::debug!) with a stable grep-friendly prefix like "[presentation]" and include correlation fields: max, original_len = segments.len() before consumption, tail_len, and the joiner value; place this log immediately before the call to result.push(tail.join(joiner)) so the merge event is recorded for debugging/regression triage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/openhuman/channels/providers/presentation_tests.rs`:
- Around line 75-80: The test currently checks only for the paragraph marker via
joined.contains(&format!("Paragraph number {} ", i)), which can miss truncated
bodies; update the assertion inside the for loop (for i in 0..10) to assert that
the concatenated result named joined contains the full original paragraph string
paras[i] (i.e., use joined.contains(¶s[i])) and keep the failure message
referencing i so missing or truncated paragraphs fail the test.
In `@src/openhuman/channels/providers/presentation.rs`:
- Around line 261-268: When collapsing overflow segments into a tail (the else
branch that runs when max != 0 && segments.len() > max), add a debug log before
pushing the joined tail that uses the project's logging macro (e.g.,
tracing::debug! or log::debug!) with a stable grep-friendly prefix like
"[presentation]" and include correlation fields: max, original_len =
segments.len() before consumption, tail_len, and the joiner value; place this
log immediately before the call to result.push(tail.join(joiner)) so the merge
event is recorded for debugging/regression triage.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 20cb64e8-fd38-4dfa-8b01-f296b7a05186
📒 Files selected for processing (2)
src/openhuman/channels/providers/presentation.rssrc/openhuman/channels/providers/presentation_tests.rs
…ntation fix
- presentation.rs: emit a `tracing::debug!` log when `cap_segments`
merges overflow into the tail, with `[presentation:segment]` prefix
and correlation fields (max, original_len, tail_count, tail_len,
joiner_len). Aligns with the project's "verbose diagnostics on new
flows" rule from CLAUDE.md.
- presentation_tests.rs: tighten the
`max_segments_respected_without_dropping_content` assertion from
`joined.contains(&format!("Paragraph number {} ", i))` to
`joined.contains(¶s[i])` so a hypothetical mid-paragraph
truncation cannot slip past the test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/openhuman/channels/providers/presentation_tests.rs (1)
63-75: ⚡ Quick winAlign new test local variable names with camelCase guideline.
New locals like
result,joined,tail_count-style patterns are in snake_case; this path requires camelCase naming for variables.As per coding guidelines:
src/openhuman/**/*.rs: Use camelCase for variable names.Also applies to: 88-115, 123-143
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/channels/providers/presentation_tests.rs` around lines 63 - 75, The test in presentation_tests.rs uses snake_case locals (paras, result, joined, tail_count patterns) but the project requires camelCase for locals under src/openhuman; rename these local variables to camelCase (e.g., paras -> paragraphs or paragraphList, result -> resultSegments, joined -> joinedText, tail_count -> tailCount) and update all usages in the test including assertions that reference MAX_SEGMENTS and the call to segment_for_delivery so the code compiles; apply the same camelCase renames to the other test blocks referenced (the locals in the blocks around the other assertions currently flagged).src/openhuman/channels/providers/presentation.rs (1)
264-277: ⚡ Quick winUse camelCase for newly introduced local variables in
cap_segments.
original_len,tail_count, andjoiner_lendon’t match the repository naming rule for this path.Proposed rename
- let original_len = segments.len(); + let originalLen = segments.len(); @@ - let tail_count = tail.len(); + let tailCount = tail.len(); @@ target: "presentation", max, - original_len, - tail_count, + originalLen, + tailCount, tail_len = merged.len(), - joiner_len = joiner.len(), + joinerLen = joiner.len(), "[presentation:segment] merging {} overflow segments into tail", - tail_count + tailCountAs per coding guidelines:
src/openhuman/**/*.rs: Use camelCase for variable names.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/channels/providers/presentation.rs` around lines 264 - 277, The local variables in cap_segments use snake_case; rename them to camelCase (e.g., originalLen, tailCount, tailLen or joinerLen as appropriate) and update all their usages (including the tracing::debug! invocation and any assignments like tail_len = merged.len()) to the new camelCase names; ensure the variables original_len, tail_count, tail_len, and joiner_len are replaced consistently within the cap_segments function and the debug macro.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/openhuman/channels/providers/presentation_tests.rs`:
- Around line 63-75: The test in presentation_tests.rs uses snake_case locals
(paras, result, joined, tail_count patterns) but the project requires camelCase
for locals under src/openhuman; rename these local variables to camelCase (e.g.,
paras -> paragraphs or paragraphList, result -> resultSegments, joined ->
joinedText, tail_count -> tailCount) and update all usages in the test including
assertions that reference MAX_SEGMENTS and the call to segment_for_delivery so
the code compiles; apply the same camelCase renames to the other test blocks
referenced (the locals in the blocks around the other assertions currently
flagged).
In `@src/openhuman/channels/providers/presentation.rs`:
- Around line 264-277: The local variables in cap_segments use snake_case;
rename them to camelCase (e.g., originalLen, tailCount, tailLen or joinerLen as
appropriate) and update all their usages (including the tracing::debug!
invocation and any assignments like tail_len = merged.len()) to the new
camelCase names; ensure the variables original_len, tail_count, tail_len, and
joiner_len are replaced consistently within the cap_segments function and the
debug macro.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc00f923-36b9-4e27-8428-021718af779d
📒 Files selected for processing (2)
src/openhuman/channels/providers/presentation.rssrc/openhuman/channels/providers/presentation_tests.rs
graycyrus
left a comment
There was a problem hiding this comment.
Clean fix. The cap_segments helper is the right shape — bounded bubble count, no content loss, well-documented. The regression test against the actual transcript from the issue is a nice touch; it exercises exactly the code path that was broken rather than just the helper in isolation. Left two minor inline comments on the debug logging, neither blocking.
| @@ -187,7 +187,7 @@ fn segment_for_delivery(text: &str) -> Vec<String> { | |||
| segments = merged.len(), | |||
| "[presentation:segment] split by paragraphs" | |||
There was a problem hiding this comment.
[minor] segments = merged.len() logs the pre-cap count, so when cap_segments actually fires you'd see segments=10 followed by merging 5 overflow segments into tail — the sequence tells the story, but the first log reads like 10 segments were delivered. Renaming the field to segments_before_cap would make the two lines unambiguous when tailing logs.
| original_len, | ||
| tail_count, | ||
| tail_len = merged.len(), | ||
| joiner_len = joiner.len(), |
There was a problem hiding this comment.
[minor] joiner_len = joiner.len() is always 2 ("\n\n") or 1 (" "), both knowable from the two call sites. It does not help distinguish anything at runtime — you already have target: "presentation" and the message string identifying the path. Fine to keep for completeness, just noting it does not carry diagnostic weight.
graycyrus
left a comment
There was a problem hiding this comment.
Two minor items to address before merge — nothing structural, just tightening up the debug logging:
-
segmentslog field ambiguity — thetracing::debug!on the line just beforecap_segmentslogssegments = merged.len()which is the pre-cap count. Whencap_segmentsthen logs its own merge message, a reader sees two different segment counts in the span. Rename tosegments_before_capor similar to make the sequence unambiguous. -
joiner_lenincap_segmentsdebug log — this is always 2 ("\n\n") or 1 (" "), fully deterministic from the two call sites. It doesn't carry diagnostic weight. Consider dropping it or replacing with the joiner string itself (joiner = %joiner) which is more immediately readable.
Both are quick fixes. The logic itself is solid — nice work on the regression test with the exact transcript.
…inyhumansai#1041) (tinyhumansai#1051) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
segment_for_delivery(src/openhuman/channels/providers/presentation.rs):.take(MAX_SEGMENTS)silently dropped every paragraph past the 5th cap.cap_segments(segments, max, joiner)helper preserves all content by merging overflow into the final segment instead of discarding it.MAX_SEGMENTS = 5cap retained — bubble count is still bounded; no inter-bubble delay regression.Problem
The web channel applies a "human-feeling" presentation layer that splits long replies into multiple chat bubbles. The split was implemented as
paragraphs.into_iter().take(MAX_SEGMENTS).collect()(and the same pattern on the sentence-fallback path).Once a reply produced more than
MAX_SEGMENTS = 5natural paragraphs, every paragraph past the cap was silently dropped before the segments were ever published to the frontend. The finalchat_doneevent still carriedfull_responsewith the complete text, but the frontend'sonDonehandler inChatRuntimeProviderskips the full-response branch whensegment_total > 0(it trusts segments already delivered everything) — so the truncation was unrecoverable on the client side.The reporter's screenshots show this exactly: an 11-paragraph draft email rendered as the first 5 bubbles, then
2m agoand end of message. Bullets and the entire sign-off were gone.Solution
Replace
.take(MAX_SEGMENTS)with acap_segmentshelper that:max - 1segments as-is and concatenates the remaining tail into a single trailing segment with the appropriate joiner (\n\nfor paragraph strategy," "for sentence strategy).max == 0as a no-op rather than panicking.Applied at both call sites in
segment_for_delivery. TheMAX_SEGMENTSconstant and the inter-bubble delay model (segment_delay) are unchanged — bubble count is still bounded, the user just no longer loses content past the cap.The previously-passing test
max_segments_respectedwas renamed tomax_segments_respected_without_dropping_contentand now also asserts that every input paragraph appears in the joined output — the assertion that would have caught this bug originally.Submission Checklist
docs/TESTING-STRATEGY.md— 4 new tests + 1 enhanced existing test inpresentation_tests.rsCloses #1041belowImpact
Related
ChatRuntimeProvider.onDonefallback (usingevent.full_responsewhensegment_total > 0but fewer segments arrived than claimed) was considered as defense-in-depth but isn't needed once the backend is correct.🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests