Skip to content

fix(streaming-markdown): list-marker awareness + table chunk-boundary (issue #69)#99

Merged
patrick-chinchill merged 3 commits into
mainfrom
claude/wizardly-heisenberg-VLInV
May 28, 2026
Merged

fix(streaming-markdown): list-marker awareness + table chunk-boundary (issue #69)#99
patrick-chinchill merged 3 commits into
mainfrom
claude/wizardly-heisenberg-VLInV

Conversation

@patrick-chinchill
Copy link
Copy Markdown
Collaborator

@patrick-chinchill patrick-chinchill commented May 22, 2026

Summary

Closes the three production Slack streaming bugs called out in issue #69 comment 4514752058. Targets the 4.27 cycle as a Python-only divergence fix (no upstream port, no version bump — follows the same convention as #84#90 where the consolidating sync PR owns the changelog entry).

The broader Option A/B/C decision for the hand-rolled parser stays open. This PR is the surgical fix for the streaming-side regressions only.

Three reproducers, all fixed

from chat_sdk.shared.streaming_markdown import StreamingMarkdownRenderer, _remend

# Bug 1: _remend on single bullet -- previously: '* item one\n*'
assert _remend("* item one\n") == "* item one\n"

# Bug 2: odd-count bullets at finish() -- previously: '...\n* item three\n*'
r = StreamingMarkdownRenderer(wrap_tables_for_append=False)
r.push("* item one\n* item two\n* item three\n")
assert r.finish() == "* item one\n* item two\n* item three\n"

# Bug 3: table header+separator alone was emitted in one delta with zero body rows,
# breaking Slack chat.appendStream's incremental parser. Now held until the first
# body row arrives, then released atomically.
r = StreamingMarkdownRenderer(wrap_tables_for_append=False)
for chunk in ["Header:\n\n", "| ID", " | Status |\n", "|---|---|\n",
              "| 1 | Open |\n", "| 2 | Closed |\n"]:
    r.push(chunk)
# Deltas: ["Header:\n\n", "", "", "",
#          "| ID | Status |\n|---|---|\n| 1 | Open |\n",
#          "| 2 | Closed |\n"]

What changed

shared/streaming_markdown.py — two narrow changes:

  1. New _is_excluded_asterisk(stripped, run_start, run_len, ch) helper. _close_emphasis skips single-* runs flanked by whitespace (or text boundary) on both sides — covering line-leading bullet markers as well as any other non-delimiter asterisks per CommonMark's flanking rules. Mirrors upstream remend's shouldSkipAsterisk in vercel/streamdown/packages/remend/src/emphasis-handlers.ts. Only * is ambiguous; _/**/~~ are unaffected.

  2. _get_committable_prefix no longer treats "separator arrived" as "commit table." If the trailing block has a separator but zero body rows beneath it, the entire pre-separator table block is held — the next push that adds a body row releases header + separator + first row atomically. finish() still flushes everything as before.

tests/test_streaming_markdown.py:

  • New TestIssue69Regressions class (15 tests) pinning all three reproducers plus invariants for the new contract (genuine italic/bold still close, bold inside bullet items balances, indented bullets recognized, mid-line / trailing / end-of-buffer flank cases, header+separator-only flushes on finish()).
  • Updated 6 existing tests that codified the buggy "commit on separator with zero rows" behavior: they now push a body row before asserting the table is committed.

Test plan

  • uv run pytest tests/test_streaming_markdown.py — 79/79 pass
  • uv run pytest tests/ — 3596 pass, 11 skipped, 1 pre-existing failure (test_github_webhook.py::TestGitHubAdapterConstructor::test_throws_when_no_auth, unrelated — same failure documented in sync: upstream v4.27.0 (alpha 1) #83's test plan)
  • uv run ruff check src/ tests/ scripts/ — passes
  • uv run ruff format --check src/ tests/ scripts/ — 193 files already formatted
  • uv run python scripts/audit_test_quality.py — 0 hard failures, 39 pre-existing warnings

Scope notes

Summary by CodeRabbit

  • Bug Fixes

    • Prevented single-asterisk runs (including list-bullet style * item) from being misinterpreted as inline emphasis.
    • Improved append-only streaming table handling: header+separator are held until a body row arrives; separator-only pushes do not emit; first body row releases the confirmed table atomically; incomplete held tables flush on finish.
  • Tests

    • Added/expanded regression tests covering emphasis edge cases, table confirm/hold/release behavior, flush-on-finish, and multi-table scenarios.

Review Change Stack

Three production Slack streaming bugs from issue #69 comment:

1. `_remend("* item one\n")` appended a stray `*`. `_close_emphasis`
   counted the line-leading bullet as an italic opener. Now skips
   single-`*` runs preceded by line-leading whitespace and followed by
   a space/tab (list-marker shape), matching upstream `remend`'s
   awareness of list markers.

2. `StreamingMarkdownRenderer.finish()` on an odd-count bullet list
   produced `...item three\n*` — same root cause as #1; the fix covers
   both.

3. `_get_committable_prefix` released a confirmed table the moment a
   separator arrived, even with zero body rows. Slack
   `chat.appendStream` saw header+separator alone as broken syntax and
   then bare body rows in subsequent appends, never reconnecting them
   as a single table. Now the header+separator block is held until at
   least one body row arrives, then released atomically.

The original tests pinned the buggy "commit on separator" behavior;
updated to require a body row before committing the table block, and
added a TestIssue69Regressions class with the exact reproducers from
the issue comment plus invariants for the new contract.

Closes part of #69 (the streaming bugs). The broader hand-rolled
parser question (Options A/B/C in the issue) remains open.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 78bd3314-0444-44a1-b71a-998b3748fc0e

📥 Commits

Reviewing files that changed from the base of the PR and between 7eda040 and 0bb0353.

📒 Files selected for processing (2)
  • src/chat_sdk/shared/streaming_markdown.py
  • tests/test_streaming_markdown.py

📝 Walkthrough

Walkthrough

This PR fixes Issue #69 by enhancing streaming markdown repair logic to correctly handle CommonMark emphasis edge-cases (preventing list bullets from triggering italic markers) and refining table header buffering to emit complete table blocks atomically only after both separator and body row arrive in append-only streaming.

Changes

Streaming Markdown Edge-Case Fixes (Issue #69)

Layer / File(s) Summary
Single-asterisk emphasis exclusion for list bullets
src/chat_sdk/shared/streaming_markdown.py
New private helper identifies whitespace-flanked single * runs (including * item patterns) as excluded from emphasis delimiter counting. _close_emphasis documentation and marker-run scanning updated to skip those excluded runs during unbalanced emphasis balancing, preventing incorrect repairs.
Table separator buffering for append-only streaming
src/chat_sdk/shared/streaming_markdown.py, tests/test_streaming_markdown.py
_get_committable_prefix tracks separator line index and withholds entire table block when a separator exists but no body row has arrived, ensuring atomic emission only after first body row. Tests verify separator-only state yields no delta, first body row releases header+separator atomically, and subsequent rows commit immediately.
Issue #69 comprehensive regression test suite
tests/test_streaming_markdown.py
New TestIssue69Regressions class with targeted regressions for emphasis-delimiter edge cases and table buffering semantics, including multi-bullet handling, chunk-boundary behavior, and flush-on-finish for incomplete held blocks.

🎯 3 (Moderate) | ⏱️ ~20 minutes


A rabbit hops through markdown trees,
Fixing bullets that shouldn't be italic breeze,
Tables now wait for their rows to arrive—
No premature headers, just whole tables thrive! ✨
Issue #69 conquered, append-only streams shine. 🐰

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check ❓ Inconclusive The PR addresses specific streaming regressions (single-asterisk emphasis and table header-separator hold) but does not fully address the broader issue #69 scope of closing parser gaps (setext, footnotes, HTML, indented code). Clarify whether this PR is a complete fix for issue #69 or a focused bug fix; the broader parser gap closure (Option A/B/C) scope may remain open.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: fixing list-marker asterisk awareness and table chunk-boundary handling, directly addressing issue #69.
Out of Scope Changes check ✅ Passed All changes are scoped to streaming_markdown.py (and its tests), directly addressing the three production regressions and held-table behavior from issue #69 without unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/wizardly-heisenberg-VLInV

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces list-marker awareness to the streaming markdown renderer to prevent bullet points from being incorrectly treated as emphasis markers. It also updates the table rendering logic to buffer headers and separators until a body row is available, ensuring valid syntax for append-only consumers. A bug was identified in the list marker detection logic where it fails to correctly handle markers followed by newlines or at the end of a string, and a fix was suggested to align with the CommonMark specification.

Comment on lines +76 to +78
if after >= len(stripped):
return False
return stripped[after] in (" ", "\t")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the CommonMark spec (Section 5.2), a list marker can be followed by either horizontal whitespace (spaces/tabs) or a line ending (newline). Additionally, a marker at the very end of the document (or string) is also considered a valid list marker if it is at the start of a line.

The current implementation returns False if the marker is at the end of the string or followed by a newline, which causes _remend to incorrectly append a closing italic marker (e.g., *\n becomes **\n). Including \n and allowing the end-of-string case improves the accuracy of the repair and prevents flickering during streaming when a bullet is first typed.

Suggested change
if after >= len(stripped):
return False
return stripped[after] in (" ", "\t")
if after == len(stripped):
return True
return stripped[after] in (" ", "\t", "\n")

The previous `_is_list_marker` skipped only line-leading single `*`
followed by horizontal whitespace. Tighten to match upstream remend's
`shouldSkipAsterisk` (packages/remend/src/emphasis-handlers.ts):
exclude any single `*` flanked by whitespace (or text boundary) on
both sides, which is what CommonMark says isn't a valid emphasis
delimiter anyway. The line-leading bullet case still falls out
naturally.

Picks up three additional cases the previous narrower check missed:
- `text * more` -- whitespace-flanked mid-line
- `trailing *\n` -- asterisk at end of line
- `partial *`   -- asterisk at end of buffer

Same `_remend` over-counting failure mode as the original issue #69
bugs, just different surface forms. Renamed the helper to
`_is_excluded_asterisk` to reflect the broader scope.

Remaining remend divergences (word-internal asterisks, math-block
contents, escaped sequences, multi-backtick spans, setext headings,
indented code, raw HTML, footnotes) are tracked on issue #69 for the
parser-strategy discussion -- out of scope for this PR.
patrick-chinchill pushed a commit that referenced this pull request May 22, 2026
The spike concluded with Option A (PRs #99 + #101) being the right
call for the SDK's current chat-only scope. Documenting the concrete
triggers under which that conclusion should be re-examined:

- Non-chat input source (RAG, user-authored memory, markdown-file
  ingestion, GitHub-body parsing -- anywhere humans write the input)
- Long-form artifact output surface (citations + footnotes; math
  rendering; one-shot parse where latency is less critical)
- Web rendering surface (port of upstream `@chat-adapter/web`)
- New chat platform demanding richer parsing

Plus a playbook for re-running the bake-off (author fixture, run
parity + benchmark harness, compare silent-drop count / parse time)
and a decision threshold for promoting markdown-it-py into runtime
if a trigger materialises.

Spot-checked upstream's `packages/` at the time of writing: only
`adapter-web` is the relevant new package (added in v4.27, Python
port deferred); no artifact / RAG / ingestion packages exist
upstream. All triggers are forward-looking.
@patrick-chinchill patrick-chinchill marked this pull request as ready for review May 22, 2026 21:55
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7eda0400d2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +291 to +295
for i in range(separator_idx - 1, -1, -1):
trimmed = lines[i].strip()
if trimmed == "" or TABLE_ROW_RE.match(trimmed) is None:
break
table_start = i
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop backward table hold at earlier separator lines

When separator_found and held_count == 0, the backward scan uses TABLE_ROW_RE to extend table_start, but separator lines also match TABLE_ROW_RE. In a stream like |A|B|\n|---|---|\n|1|2|\n|C|D|\n|---|---|\n, this makes table_start walk into the already-committed block and _get_committable_prefix collapses from the previous committed prefix to "", violating the append-only monotonic contract of get_committable_text(). Adapters that compute deltas from prior committed length can then emit incorrect/no deltas for that turn; the scan should stop when it encounters an earlier separator (or otherwise avoid capturing a previously confirmed table block).

Useful? React with 👍 / 👎.

PR #99 review (chatgpt-codex P1): in `_get_committable_prefix`, the
"hold pre-separator block" backward walk uses `TABLE_ROW_RE` to extend
`table_start`, but separator lines also match that pattern. For a
stream like `|A|B|\n|---|---|\n|1|2|\n|C|D|\n|---|---|\n` (two tables
with no blank line between them), the walk crosses the first table's
separator and collapses `_get_committable_prefix` back to "",
violating the monotonic append-only contract of
`get_committable_text()`. Adapters that compute deltas from prior
committed length then emit incorrect/no deltas.

Fix: the backward walk now stops at empty lines (existing), non-row
content (existing), AND at prior separators (new). When it hits a
prior separator -- meaning the candidate "new header" row above it was
already committed as a body row of the prior table -- the function
falls back to `return text` instead of holding. That emits one "stray
separator" on the rollback delta, which is broken markup but the
lesser evil compared to non-monotonic rollback.

The fix preserves the well-formed multi-table case where the second
table is separated from the first by a blank line; the empty-line
break in the walk fires before reaching the prior separator.

Tests added:
- `test_back_to_back_tables_keep_committable_monotonic` (the exact
  reviewer repro)
- `test_second_table_after_blank_line_still_holds_header` (verifies
  the fix doesn't regress the blank-line-separated case)

The outdated comment from gemini-code-assist on `_is_excluded_asterisk`
suggesting we also include `\n` and end-of-string in the
list-marker exclusion is already addressed in the second commit on
this branch, which broadened the helper to match remend's
`shouldSkipAsterisk` exactly.

79 streaming tests pass / 3598 total / 1 pre-existing unrelated failure.
@patrick-chinchill patrick-chinchill marked this pull request as draft May 23, 2026 00:59
@patrick-chinchill patrick-chinchill marked this pull request as ready for review May 23, 2026 00:59
Copy link
Copy Markdown
Collaborator Author

@codex review

Substantive changes since your last review:

  • 0bb0353 — fixed the P1 monotonicity violation you flagged. Backward walk in _get_committable_prefix now stops at prior separators and falls back to return text to preserve append-only monotonicity. Two new tests pin the repro + the well-formed multi-table case.

Generated by Claude Code

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@patrick-chinchill patrick-chinchill merged commit f75fe60 into main May 28, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revisit hand-rolled markdown parser: close setext / footnotes / HTML / indented-code gaps

2 participants