fix(streaming-markdown): list-marker awareness + table chunk-boundary (issue #69)#99
Conversation
Three production Slack streaming bugs from issue #69 comment: 1. `_remend("* item one\n")` appended a stray `*`. `_close_emphasis` counted the line-leading bullet as an italic opener. Now skips single-`*` runs preceded by line-leading whitespace and followed by a space/tab (list-marker shape), matching upstream `remend`'s awareness of list markers. 2. `StreamingMarkdownRenderer.finish()` on an odd-count bullet list produced `...item three\n*` — same root cause as #1; the fix covers both. 3. `_get_committable_prefix` released a confirmed table the moment a separator arrived, even with zero body rows. Slack `chat.appendStream` saw header+separator alone as broken syntax and then bare body rows in subsequent appends, never reconnecting them as a single table. Now the header+separator block is held until at least one body row arrives, then released atomically. The original tests pinned the buggy "commit on separator" behavior; updated to require a body row before committing the table block, and added a TestIssue69Regressions class with the exact reproducers from the issue comment plus invariants for the new contract. Closes part of #69 (the streaming bugs). The broader hand-rolled parser question (Options A/B/C in the issue) remains open.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR fixes Issue ChangesStreaming Markdown Edge-Case Fixes (Issue
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces list-marker awareness to the streaming markdown renderer to prevent bullet points from being incorrectly treated as emphasis markers. It also updates the table rendering logic to buffer headers and separators until a body row is available, ensuring valid syntax for append-only consumers. A bug was identified in the list marker detection logic where it fails to correctly handle markers followed by newlines or at the end of a string, and a fix was suggested to align with the CommonMark specification.
| if after >= len(stripped): | ||
| return False | ||
| return stripped[after] in (" ", "\t") |
There was a problem hiding this comment.
According to the CommonMark spec (Section 5.2), a list marker can be followed by either horizontal whitespace (spaces/tabs) or a line ending (newline). Additionally, a marker at the very end of the document (or string) is also considered a valid list marker if it is at the start of a line.
The current implementation returns False if the marker is at the end of the string or followed by a newline, which causes _remend to incorrectly append a closing italic marker (e.g., *\n becomes **\n). Including \n and allowing the end-of-string case improves the accuracy of the repair and prevents flickering during streaming when a bullet is first typed.
| if after >= len(stripped): | |
| return False | |
| return stripped[after] in (" ", "\t") | |
| if after == len(stripped): | |
| return True | |
| return stripped[after] in (" ", "\t", "\n") |
The previous `_is_list_marker` skipped only line-leading single `*` followed by horizontal whitespace. Tighten to match upstream remend's `shouldSkipAsterisk` (packages/remend/src/emphasis-handlers.ts): exclude any single `*` flanked by whitespace (or text boundary) on both sides, which is what CommonMark says isn't a valid emphasis delimiter anyway. The line-leading bullet case still falls out naturally. Picks up three additional cases the previous narrower check missed: - `text * more` -- whitespace-flanked mid-line - `trailing *\n` -- asterisk at end of line - `partial *` -- asterisk at end of buffer Same `_remend` over-counting failure mode as the original issue #69 bugs, just different surface forms. Renamed the helper to `_is_excluded_asterisk` to reflect the broader scope. Remaining remend divergences (word-internal asterisks, math-block contents, escaped sequences, multi-backtick spans, setext headings, indented code, raw HTML, footnotes) are tracked on issue #69 for the parser-strategy discussion -- out of scope for this PR.
The spike concluded with Option A (PRs #99 + #101) being the right call for the SDK's current chat-only scope. Documenting the concrete triggers under which that conclusion should be re-examined: - Non-chat input source (RAG, user-authored memory, markdown-file ingestion, GitHub-body parsing -- anywhere humans write the input) - Long-form artifact output surface (citations + footnotes; math rendering; one-shot parse where latency is less critical) - Web rendering surface (port of upstream `@chat-adapter/web`) - New chat platform demanding richer parsing Plus a playbook for re-running the bake-off (author fixture, run parity + benchmark harness, compare silent-drop count / parse time) and a decision threshold for promoting markdown-it-py into runtime if a trigger materialises. Spot-checked upstream's `packages/` at the time of writing: only `adapter-web` is the relevant new package (added in v4.27, Python port deferred); no artifact / RAG / ingestion packages exist upstream. All triggers are forward-looking.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7eda0400d2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for i in range(separator_idx - 1, -1, -1): | ||
| trimmed = lines[i].strip() | ||
| if trimmed == "" or TABLE_ROW_RE.match(trimmed) is None: | ||
| break | ||
| table_start = i |
There was a problem hiding this comment.
Stop backward table hold at earlier separator lines
When separator_found and held_count == 0, the backward scan uses TABLE_ROW_RE to extend table_start, but separator lines also match TABLE_ROW_RE. In a stream like |A|B|\n|---|---|\n|1|2|\n|C|D|\n|---|---|\n, this makes table_start walk into the already-committed block and _get_committable_prefix collapses from the previous committed prefix to "", violating the append-only monotonic contract of get_committable_text(). Adapters that compute deltas from prior committed length can then emit incorrect/no deltas for that turn; the scan should stop when it encounters an earlier separator (or otherwise avoid capturing a previously confirmed table block).
Useful? React with 👍 / 👎.
PR #99 review (chatgpt-codex P1): in `_get_committable_prefix`, the "hold pre-separator block" backward walk uses `TABLE_ROW_RE` to extend `table_start`, but separator lines also match that pattern. For a stream like `|A|B|\n|---|---|\n|1|2|\n|C|D|\n|---|---|\n` (two tables with no blank line between them), the walk crosses the first table's separator and collapses `_get_committable_prefix` back to "", violating the monotonic append-only contract of `get_committable_text()`. Adapters that compute deltas from prior committed length then emit incorrect/no deltas. Fix: the backward walk now stops at empty lines (existing), non-row content (existing), AND at prior separators (new). When it hits a prior separator -- meaning the candidate "new header" row above it was already committed as a body row of the prior table -- the function falls back to `return text` instead of holding. That emits one "stray separator" on the rollback delta, which is broken markup but the lesser evil compared to non-monotonic rollback. The fix preserves the well-formed multi-table case where the second table is separated from the first by a blank line; the empty-line break in the walk fires before reaching the prior separator. Tests added: - `test_back_to_back_tables_keep_committable_monotonic` (the exact reviewer repro) - `test_second_table_after_blank_line_still_holds_header` (verifies the fix doesn't regress the blank-line-separated case) The outdated comment from gemini-code-assist on `_is_excluded_asterisk` suggesting we also include `\n` and end-of-string in the list-marker exclusion is already addressed in the second commit on this branch, which broadened the helper to match remend's `shouldSkipAsterisk` exactly. 79 streaming tests pass / 3598 total / 1 pre-existing unrelated failure.
|
@codex review Substantive changes since your last review:
Generated by Claude Code |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Closes the three production Slack streaming bugs called out in issue #69 comment 4514752058. Targets the 4.27 cycle as a Python-only divergence fix (no upstream port, no version bump — follows the same convention as #84–#90 where the consolidating sync PR owns the changelog entry).
The broader Option A/B/C decision for the hand-rolled parser stays open. This PR is the surgical fix for the streaming-side regressions only.
Three reproducers, all fixed
What changed
shared/streaming_markdown.py— two narrow changes:New
_is_excluded_asterisk(stripped, run_start, run_len, ch)helper._close_emphasisskips single-*runs flanked by whitespace (or text boundary) on both sides — covering line-leading bullet markers as well as any other non-delimiter asterisks per CommonMark's flanking rules. Mirrors upstreamremend'sshouldSkipAsteriskinvercel/streamdown/packages/remend/src/emphasis-handlers.ts. Only*is ambiguous;_/**/~~are unaffected._get_committable_prefixno longer treats "separator arrived" as "commit table." If the trailing block has a separator but zero body rows beneath it, the entire pre-separator table block is held — the next push that adds a body row releases header + separator + first row atomically.finish()still flushes everything as before.tests/test_streaming_markdown.py:TestIssue69Regressionsclass (15 tests) pinning all three reproducers plus invariants for the new contract (genuine italic/bold still close, bold inside bullet items balances, indented bullets recognized, mid-line / trailing / end-of-buffer flank cases, header+separator-only flushes onfinish()).Test plan
uv run pytest tests/test_streaming_markdown.py— 79/79 passuv run pytest tests/— 3596 pass, 11 skipped, 1 pre-existing failure (test_github_webhook.py::TestGitHubAdapterConstructor::test_throws_when_no_auth, unrelated — same failure documented in sync: upstream v4.27.0 (alpha 1) #83's test plan)uv run ruff check src/ tests/ scripts/— passesuv run ruff format --check src/ tests/ scripts/— 193 files already formatteduv run python scripts/audit_test_quality.py— 0 hard failures, 39 pre-existing warningsScope notes
Summary by CodeRabbit
Bug Fixes
* item) from being misinterpreted as inline emphasis.Tests