Skip to content

Content from tags inside pre is missing whitespace characters #1223

@johannesegger

Description

@johannesegger

When emitting a parsed document whitespace characters are usually preserved for pre tags. The following test checks this.

https://github.com/fsharp/FSharp.Data/blob/33e6e825bc2978eb9ce6dd880f31d9e60d452699/tests/FSharp.Data.Tests/HtmlParser.fs#L763-L772

However that's not the case when there are child tags within the pre tag. As soon as the parser encounters a different tag, whitespace characters are removed. So e.g. when emitting the parsed document from the following snippet, whitespace characters after the span tag are missing. Whitespace before the span tag is not missing.

<pre>\r\n        This <span>code</span> should be indented and\r\n        have line feeds in it</pre>

The code that's responsible for "normalizing" whitespace characters is the following:

https://github.com/fsharp/FSharp.Data/blob/33e6e825bc2978eb9ce6dd880f31d9e60d452699/src/Html/HtmlParser.fs#L373-L382

And x.InsertionMode is calculated as follows:

https://github.com/fsharp/FSharp.Data/blob/49a3bfb22a8955463d7536af1d2df86449e335c6/src/Html/HtmlParser.fs#L353-L356

x.IsFormattedTag is only true if the last parsed tag is pre or code. It should check if it's currently inside a formatted tag, shouldn't it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions