Skip to content

[Repo Assist] Fix HtmlNode.ToString: preserve whitespace in elements nested inside <pre>#1605

Merged
dsyme merged 2 commits intomainfrom
repo-assist/fix-html-pre-whitespace-1509-e19da38c877032c0
Feb 22, 2026
Merged

[Repo Assist] Fix HtmlNode.ToString: preserve whitespace in elements nested inside <pre>#1605
dsyme merged 2 commits intomainfrom
repo-assist/fix-html-pre-whitespace-1509-e19da38c877032c0

Conversation

@github-actions
Copy link
Contributor

🤖 Repo Assist here — I'm an automated AI assistant for this repository.

Closes #1509

Problem

When HtmlNode.ToString() serializes a <pre> block, newlines and indentation can be incorrectly inserted into elements nested multiple levels deep inside <pre>. This corrupts output from syntax highlighters (e.g. shiki) that emit HTML like:

<pre><code><span class="line"><span style="color:red">let</span> <span style="color:blue">x</span></span></code></pre>

In this case, the <code> element (which has onlyText = false) would have a newline inserted after <code>, and the <span> elements inside <span class="line"> would also have newlines inserted between them — corrupting the output.

Root Cause

The serialize function in HtmlNode.fs handled the <pre> tag itself correctly (skipping formatting for its direct children via the isPreTag flag), but did not propagate any "inside pre" context to deeper descendants. When a non-pre element with multiple element children (onlyText = false) appeared inside <pre>, the canAddNewLine flag on sibling elements caused newLine calls.

Fix

Add an insidePre parameter to serialize. When true, all newline/indentation formatting is suppressed regardless of element type. The value is set to true when entering a <pre> element and propagated to all recursive calls:

-let rec serialize (sb: StringBuilder) indentation canAddNewLine html =
+let rec serialize (sb: StringBuilder) indentation canAddNewLine insidePre html =
     ...
     let isPreTag = name = "pre"
+    let nowInsidePre = insidePre || isPreTag

-    if canAddNewLine && not (onlyText || isPreTag) then
+    if canAddNewLine && not insidePre && not (onlyText || isPreTag) then
         newLine 0
     ...
-        if not (onlyText || isPreTag) then
+        if not insidePre && not (onlyText || isPreTag) then
             newLine 2
         for element in elements do
-            serialize sb (indentation + 2) canAddNewLine element
+            serialize sb (indentation + 2) canAddNewLine nowInsidePre element

The single-level case (spans directly inside <pre>) was already handled correctly by the existing isPreTag/onlyText checks; this fix extends that handling to all depths.

Test

Added regression test Maintain whitespace in deeply nested elements inside pre through round-trip in HtmlParser.fs covering the shiki-style deeply nested structure.

Test Status

  • Build passes (dotnet build src/FSharp.Data.Html.Core/)
  • All 2227 tests in FSharp.Data.Core.Tests pass (including the new regression test and all 10 existing whitespace-related tests)
  • FSharp.Data.Tests could not be built: the WorldBank type provider requires network access to api.worldbank.org, which is blocked by the build environment's network proxy — unrelated to this change.

Generated by Repo Assist

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@4cb6855f0b3c0a719d7d5c3af44d1646450e63e9. View source at https://github.com/githubnext/agentics/tree/4cb6855f0b3c0a719d7d5c3af44d1646450e63e9/workflows/repo-assist.md.

Warning

⚠️ Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • www.google.com

…<pre>

Elements nested multiple levels deep inside a <pre> block could have
newlines and indentation inserted between them during serialization.
This corrupted output from syntax highlighters (e.g. shiki) that emit
<pre><code><span class="line"><span>...</span></span></code></pre>.

Root cause: the serialize function did not propagate an 'insidePre'
context to descendant elements. When a non-pre element with multiple
element children (onlyText = false) appeared inside <pre>, the
canAddNewLine flag on sibling elements caused newLines to be inserted.

Fix: add an 'insidePre' parameter to serialize. When true, all
newline/indentation formatting is suppressed regardless of element type.

Closes #1509

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dsyme dsyme marked this pull request as ready for review February 22, 2026 00:48
@dsyme dsyme merged commit 2566e81 into main Feb 22, 2026
2 checks passed
@dsyme dsyme deleted the repo-assist/fix-html-pre-whitespace-1509-e19da38c877032c0 branch February 22, 2026 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant