Skip to content

[codex] Fix review regressions#433

Merged
bensonwong merged 13 commits intomainfrom
codex/fix-review-regressions
Apr 16, 2026
Merged

[codex] Fix review regressions#433
bensonwong merged 13 commits intomainfrom
codex/fix-review-regressions

Conversation

@bensonwong
Copy link
Copy Markdown
Collaborator

@bensonwong bensonwong commented Apr 16, 2026

Summary

  • Preserve raw HTML tags when wrapping citation markers, including attributes that contain > inside quoted values.
  • Repair malformed citation JSON without converting CRLF pairs into double newlines.
  • Include requestTimeoutMs in the verification dedupe key so callers with different timeouts do not share the same in-flight request.
  • Include the resolved endUserId in that dedupe key so multi-tenant requests do not reuse each other’s in-flight verification promise.
  • Evict failed verification requests from the cache so retries can proceed cleanly.
  • Reject deprecated --no-publish with a migration error and direct callers to --local-only.

Validation

  • bun test src/__tests__/markdownToHtml.test.ts src/__tests__/cliCommands.test.ts src/__tests__/parseCitation.test.ts

Notes

  • The new regression tests cover the review comments that were flagged.
  • The splitHtmlPreservingTags fix is quote-aware and preserves > characters inside quoted attributes.

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

4 Skipped Deployments
Project Deployment Actions Updated (UTC)
agui-chat-deepcitation Ignored Ignored Preview Apr 16, 2026 5:42pm
deepcitation-langchain-rag-chat Ignored Ignored Preview Apr 16, 2026 5:42pm
mastra-rag-deepcitation Ignored Ignored Preview Apr 16, 2026 5:42pm
nextjs-ai-sdk-deepcitation Ignored Ignored Preview Apr 16, 2026 5:42pm

@bensonwong bensonwong marked this pull request as draft April 16, 2026 17:12
Comment thread src/cli/commands.ts
function writePrepareFile(outPath: string, body: string): void {
const parent = dirname(outPath);
if (!existsSync(parent)) mkdirSync(parent, { recursive: true });
writeFileSync(outPath, body); // lgtm[js/http-to-file-access]
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 16, 2026

✅ Playwright Test Report

Status: Tests passed

📊 Download Report & Snapshots (see Artifacts section)

What's in the Visual Snapshots

The gallery includes visual snapshots for:

  • 🖥️ Desktop showcase (all variants × all states)
  • 📱 Mobile showcase (iPhone SE viewport)
  • 📟 Tablet showcase (iPad viewport)
  • 🔍 Popover states (verified, partial, not found)
  • 🔗 URL citation variants

Run ID: 24525052786

@claude
Copy link
Copy Markdown

claude bot commented Apr 16, 2026

PR Review: Fix review regressions

Good set of targeted fixes with solid regression test coverage. Here are my findings:


Bug: Double-segment in splitHtmlPreservingTags when HTML ends with a closed tag

File: src/cli/markdownToHtml.tssplitHtmlPreservingTags

When the last character of the input HTML is > (i.e., the string ends with a complete closing tag like </p>), both the "tag found" path and the "unterminated tag" path fire:

  1. The inner while finds >, increments i to html.length, pushes html.slice(start, i) as a segment, then breaks.
  2. After the while, the check if (i >= html.length) is TRUE, so buffer += html.slice(start) appends the same tag text to buffer again.
  3. After the outer for loop, if (buffer) segments.push(buffer) pushes it a second time.

Any HTML that ends with a closed tag — e.g., <p>Some text [1]</p> — will have the final </p> duplicated, producing malformed output like </p></p>. The existing test for quoted attributes passes only because it doesn't assert on the tail of the string.

Fix suggestion: Track whether the tag was closed and skip the unterminated-tag branch:

let closed = false;
while (i < html.length) {
  // ...
  if (tagChar === '>' && !inSingleQuote && !inDoubleQuote) {
    i++;
    segments.push(html.slice(start, i));
    closed = true;
    break;
  }
  i++;
}

if (!closed) {
  buffer += html.slice(start);
  break;          // unterminated tag — treat rest as text, exit outer loop
}
i--;              // compensate for outer for-loop's i++

Misleading warning message in extractMarkersFromBody

File: src/cli/cite.ts (warning console.error in the reuse block)

The warning still says (ignored) after the variant label is pushed to claimTextVariants:

Warning: cite:1 reused with different label — "Horizontal Boundaries" (used) vs "Vertical Boundaries" (ignored).

But it is no longer ignored — it's stored for fallback search. Consider updating the message to (stored as variant) or similar so authors aren't misled about whether their label will be used.


Undocumented change: endUserId added to the verification dedupe key

File: src/client/DeepCitation.ts

resolvedEndUserId is now part of rawKey, which is a behavioral change not mentioned in the PR summary. This prevents deduplication across requests from different end-users with identical citation payloads. That's probably the right call for multi-tenant correctness, but it should be called out explicitly so the impact on high-traffic scenarios is understood.


Removal of word-count guard in auto-promote logic

File: src/cli/commands.tshasMeaningfulLabelOverlap and the promote block

The old guard wordCount > 4 || visible.length > 40 prevented long labels from being auto-promoted as anchors. The new code removes this guard entirely, relying solely on the 3-char token overlap. A label like "The company's revenue grew significantly in Q4" would now be promoted if it shares a 3+ char token with currentAnchor. Consider whether a length cap (e.g. skip if visible.length > 60) should be kept alongside the overlap check.


--no-publish removed without a deprecation period

File: src/cli/commands.ts, CHANGELOG.md

--publish / --pub are retained as no-op aliases for backward compat, but --no-publish is simply dropped (replaced by --local-only). Any existing scripts or CI pipelines using --no-publish will silently pass the flag through to parseArgs and may not error out, causing unintended uploads. Either add --no-publish to a known-flags set that emits a migration error (similar to how --text/--txt are handled in prepare), or add a deprecation cycle.


Empty citation block behavior change — minor

File: src/parsing/citationParser.ts

Returning success: true for a block that has delimiters but no JSON content (whitespace-only) unifies two previously distinct cases. This is reasonable for robustness, but the test description change from "fails when citation block is present but whitespace-only" to "treats whitespace-only citation blocks as recoverable empties" undersells the intentional behavior change. Worth a brief code comment explaining why masking the error is preferable here (i.e., callers fall back to body markers).


Positives worth noting

  • The splitHtmlPreservingTags approach (splitting on quote-aware >) is the right architecture for the attribute-quoting bug — the old single-pass regex couldn't handle > in quoted values at all.
  • escapeLiteralControlCharactersInJsonStrings correctly handles CRLF pairs as a single \n (not \n\n), and the new test in parseCitation.test.ts covers this precisely.
  • Including requestTimeoutMs in the dedupe key and evicting failed requests in a catch block (rather than only on !response.ok) are solid fixes.
  • The claimTextVariants fallback search in verifyMarkdown is a clean design; trying anchorHint → claimText → variants in order preserves existing priority while enabling the new recovery path.
  • Test coverage for the removed prepare flags using it.each is clean and thorough.

The blocking item is the double-segment bug in splitHtmlPreservingTags.

@bensonwong bensonwong marked this pull request as ready for review April 16, 2026 17:27
Benson and others added 3 commits April 16, 2026 11:33
TypeScript 6 deprecated moduleResolution=node10; add ignoreDeprecations:
"6.0" to silence it, and remove the exclude array that was inadvertently
blocking test files from being type-checked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bensonwong bensonwong merged commit 2517a35 into main Apr 16, 2026
14 checks passed
@bensonwong bensonwong deleted the codex/fix-review-regressions branch April 16, 2026 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants